Transformers
Safetensors
PEFT
English
finan
AMD
lora

Intelligent Complaint Triage using Fine-Tuned Qwen2.5-7B

The Problem

Financial institutions process thousands of customer complaints daily across mobile apps, websites, contact centres, email, and regulatory portals. These complaints arrive as free-form text and must be manually categorized before investigation and resolution can begin.

The result is predictable:

  • Complaints are routed to the wrong operational teams
  • Manual review effort increases
  • Resolution times become longer
  • Customer experience deteriorates
  • Regulatory complaint handling becomes more expensive

Traditional classifiers typically predict one label at a time and struggle with the nuanced language used in consumer finance complaints.

This project addresses the problem as a structured generation task. A single model call extracts all required complaint taxonomy fields simultaneously.


What the Model Does

Given a customer complaint narrative, the model generates:

{
  "product": "Checking or savings account",
  "sub_product": "Checking account",
  "issue": "Unauthorized transactions or other transaction problem",
  "sub_issue": "Debit card issue"
}

These fields map directly to the CFPB complaint taxonomy and can be consumed by routing systems, workflow engines, complaint management platforms, and analytics pipelines.


Model Details

Property Value
Base Model Qwen/Qwen2.5-7B-Instruct
Fine-Tuning Method LoRA (PEFT)
Training Hardware AMD Instinct MI300X
Precision bfloat16
Task Type Structured JSON Generation
Output Format CFPB Taxonomy JSON

Training Configuration

LoRA Adapter

Parameter Value
Rank (r) 16
Alpha 32
Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj

Only LoRA adapter weights were updated during training.

Training Hyperparameters

Parameter Value
Epochs 5
Batch Size / Device 8
Gradient Accumulation 4
Effective Batch Size 32
Learning Rate 1e-4
Optimizer AdamW
Scheduler Linear
Precision bf16
Max Sequence Length 1024

Training Convergence

Step Training Loss Validation Loss
100 1.7377 1.7092
200 1.6316 1.6485
300 1.6508 1.6295
400 1.6078 1.6204
500 1.6090 1.6145
600 1.6191 1.6101
700 1.5926 1.6058
800 1.6128 1.6034
900 1.6076 1.6012
1000 1.5874 1.5997
1100 1.6001 1.5984

Validation loss steadily decreased from 1.709 → 1.598, demonstrating successful adaptation of the base model to the CFPB complaint taxonomy.


Dataset

Source: CFPB Consumer Complaint Database

The model was trained to predict four operational complaint fields:

  1. Product
  2. Sub-Product
  3. Issue
  4. Sub-Issue

The task is formulated as structured JSON generation rather than independent classification.


Inference with Constrained Decoding

Inference uses a two-stage approach:

Stage 1

The fine-tuned model generates structured JSON.

Stage 2

Generated values are aligned to the nearest canonical CFPB label using TF-IDF similarity matching.

This improves robustness when the model generates labels that are semantically correct but differ slightly from official CFPB terminology.


Evaluation Results

Evaluated on 250 held-out CFPB complaints.

Baseline refers to the original Qwen2.5-7B-Instruct model without fine-tuning.

Product Classification Performance

Metric Baseline Fine-Tuned Improvement
Exact Match 0.0100 0.9080 +0.8980
Precision 0.5180 0.9082 +0.3902
Recall 0.0100 0.9080 +0.8980
F1 Score 0.0196 0.9068 +0.8872

Sub-Product Semantic Similarity

Metric Baseline Fine-Tuned Improvement
ROUGE-1 0.0041 0.7122 +0.7081
ROUGE-2 0.0030 0.6452 +0.6422
ROUGE-L 0.0041 0.7122 +0.7081
BLEU 0.0000 0.5026 +0.5026

Issue Semantic Similarity

Metric Baseline Fine-Tuned Improvement
ROUGE-1 0.0018 0.4018 +0.4000
ROUGE-2 0.0000 0.3463 +0.3463
ROUGE-L 0.0018 0.4013 +0.3995
BLEU 0.0000 0.3368 +0.3368

Sub-Issue Semantic Similarity

Metric Baseline Fine-Tuned Improvement
ROUGE-1 0.0004 0.5215 +0.5211
ROUGE-2 0.0000 0.4895 +0.4895
ROUGE-L 0.0004 0.5207 +0.5203
BLEU 0.0000 0.2283 +0.2283

Final Results Summary

Category Base Qwen2.5-7B Fine-Tuned CFPB Model
Product Classification (Exact Match) 1.0% 90.8%
Product F1 Score 1.96% 90.7%
Sub-Product ROUGE-L 0.004 0.712
Issue ROUGE-L 0.002 0.401
Sub-Issue ROUGE-L 0.000 0.521
Output Structure Inconsistent Reliable CFPB JSON
Taxonomy Alignment Poor High
Training Time ~45 Minutes ~45 Minutes
Inference Latency Baseline Near Identical
Additional GPU Memory Baseline ~50 MB Adapter

Run inference

def categorise_complaint(complaint_text: str, model, tokenizer) -> dict:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a banking complaint classification assistant. "
                "Given a consumer complaint narrative, extract the CFPB ticket fields "
                "as a JSON object with keys: product, sub_product, issue, sub_issue."
            ),
        },
        {
            "role": "user",
            "content": complaint_text,
        },
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=128,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )

    prompt_len = inputs["input_ids"].shape[1]
    generated  = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
    return generated


complaint = """
I reported fraudulent transactions on my debit card and the bank reversed
my provisional credit without explaining the investigation outcome.
"""

result = categorise_complaint(complaint, model, tokenizer)
print(result)
# {"product": "Checking or savings account", "sub_product": "Checking account",
#  "issue": "Unauthorized transactions or other transaction problem",
#  "sub_issue": "Debit card issue"}

Dependencies

transformers==4.44.0
peft==0.12.0
accelerate==0.34.0
datasets==2.21.0
torch (ROCm-compatible build for AMD, or standard CUDA build)
scikit-learn
rouge-score
sacrebleu
nltk

Limitations

  • CFPB taxonomy only. The model is trained on and constrained to CFPB Consumer Complaint Database labels. It is not a general-purpose complaint classifier and should not be used with complaint taxonomies from other regulatory bodies or internal systems without retraining.
  • Issue field accuracy. The issue field (33.6% accuracy) is the weakest link. The CFPB issue taxonomy contains 80+ canonical strings with overlapping phrasing. Expanding training data and further tuning the constrained decoder are the most direct paths to improvement.
  • English language only. All training data is in English. Performance on non-English complaints is untested and likely poor.
  • Context length. Complaints longer than 1024 tokens will be truncated. Most CFPB complaints are well within this limit, but very long narratives may lose relevant context.

Intended Use

This model is intended for use by:

  • Banking operations teams automating first-touch complaint categorisation
  • Compliance teams processing regulatory complaint filings
  • Contact centre platforms routing incoming complaints before agent assignment
  • Research teams studying LLM adaptation for financial NLP tasks

It is not intended for consumer-facing deployment without human review of outputs, or for use in jurisdictions where automated complaint classification decisions have legal or regulatory implications without appropriate oversight.


Training Infrastructure

Trained on an AMD Instinct MI300X GPU (192 GB HBM3 VRAM) running ROCm 7.2.4. The training stack is fully ROCm-native — bitsandbytes (CUDA-only) is not used. Model precision is bfloat16, which is the native compute type for the CDNA3 architecture.


Citation

If you use this model in research or production, please cite the CFPB Consumer Complaint Database as the data source:

Consumer Financial Protection Bureau (CFPB)
Consumer Complaint Database
https://www.consumerfinance.gov/data-research/consumer-complaints/
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2

Base model

Qwen/Qwen2.5-7B
Adapter
(2228)
this model

Dataset used to train aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2