Instructions to use aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2", dtype="auto") - PEFT
How to use aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Intelligent Complaint Triage using Fine-Tuned Qwen2.5-7B
Intelligent Complaint Triage using Fine-Tuned Qwen2.5-7B
The Problem
Financial institutions process thousands of customer complaints daily across mobile apps, websites, contact centres, email, and regulatory portals. These complaints arrive as free-form text and must be manually categorized before investigation and resolution can begin.
The result is predictable:
- Complaints are routed to the wrong operational teams
- Manual review effort increases
- Resolution times become longer
- Customer experience deteriorates
- Regulatory complaint handling becomes more expensive
Traditional classifiers typically predict one label at a time and struggle with the nuanced language used in consumer finance complaints.
This project addresses the problem as a structured generation task. A single model call extracts all required complaint taxonomy fields simultaneously.
What the Model Does
Given a customer complaint narrative, the model generates:
{
"product": "Checking or savings account",
"sub_product": "Checking account",
"issue": "Unauthorized transactions or other transaction problem",
"sub_issue": "Debit card issue"
}
These fields map directly to the CFPB complaint taxonomy and can be consumed by routing systems, workflow engines, complaint management platforms, and analytics pipelines.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B-Instruct |
| Fine-Tuning Method | LoRA (PEFT) |
| Training Hardware | AMD Instinct MI300X |
| Precision | bfloat16 |
| Task Type | Structured JSON Generation |
| Output Format | CFPB Taxonomy JSON |
Training Configuration
LoRA Adapter
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
Only LoRA adapter weights were updated during training.
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 5 |
| Batch Size / Device | 8 |
| Gradient Accumulation | 4 |
| Effective Batch Size | 32 |
| Learning Rate | 1e-4 |
| Optimizer | AdamW |
| Scheduler | Linear |
| Precision | bf16 |
| Max Sequence Length | 1024 |
Training Convergence
| Step | Training Loss | Validation Loss |
|---|---|---|
| 100 | 1.7377 | 1.7092 |
| 200 | 1.6316 | 1.6485 |
| 300 | 1.6508 | 1.6295 |
| 400 | 1.6078 | 1.6204 |
| 500 | 1.6090 | 1.6145 |
| 600 | 1.6191 | 1.6101 |
| 700 | 1.5926 | 1.6058 |
| 800 | 1.6128 | 1.6034 |
| 900 | 1.6076 | 1.6012 |
| 1000 | 1.5874 | 1.5997 |
| 1100 | 1.6001 | 1.5984 |
Validation loss steadily decreased from 1.709 → 1.598, demonstrating successful adaptation of the base model to the CFPB complaint taxonomy.
Dataset
Source: CFPB Consumer Complaint Database
The model was trained to predict four operational complaint fields:
- Product
- Sub-Product
- Issue
- Sub-Issue
The task is formulated as structured JSON generation rather than independent classification.
Inference with Constrained Decoding
Inference uses a two-stage approach:
Stage 1
The fine-tuned model generates structured JSON.
Stage 2
Generated values are aligned to the nearest canonical CFPB label using TF-IDF similarity matching.
This improves robustness when the model generates labels that are semantically correct but differ slightly from official CFPB terminology.
Evaluation Results
Evaluated on 250 held-out CFPB complaints.
Baseline refers to the original Qwen2.5-7B-Instruct model without fine-tuning.
Product Classification Performance
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| Exact Match | 0.0100 | 0.9080 | +0.8980 |
| Precision | 0.5180 | 0.9082 | +0.3902 |
| Recall | 0.0100 | 0.9080 | +0.8980 |
| F1 Score | 0.0196 | 0.9068 | +0.8872 |
Sub-Product Semantic Similarity
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 0.0041 | 0.7122 | +0.7081 |
| ROUGE-2 | 0.0030 | 0.6452 | +0.6422 |
| ROUGE-L | 0.0041 | 0.7122 | +0.7081 |
| BLEU | 0.0000 | 0.5026 | +0.5026 |
Issue Semantic Similarity
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 0.0018 | 0.4018 | +0.4000 |
| ROUGE-2 | 0.0000 | 0.3463 | +0.3463 |
| ROUGE-L | 0.0018 | 0.4013 | +0.3995 |
| BLEU | 0.0000 | 0.3368 | +0.3368 |
Sub-Issue Semantic Similarity
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 0.0004 | 0.5215 | +0.5211 |
| ROUGE-2 | 0.0000 | 0.4895 | +0.4895 |
| ROUGE-L | 0.0004 | 0.5207 | +0.5203 |
| BLEU | 0.0000 | 0.2283 | +0.2283 |
Final Results Summary
| Category | Base Qwen2.5-7B | Fine-Tuned CFPB Model |
|---|---|---|
| Product Classification (Exact Match) | 1.0% | 90.8% |
| Product F1 Score | 1.96% | 90.7% |
| Sub-Product ROUGE-L | 0.004 | 0.712 |
| Issue ROUGE-L | 0.002 | 0.401 |
| Sub-Issue ROUGE-L | 0.000 | 0.521 |
| Output Structure | Inconsistent | Reliable CFPB JSON |
| Taxonomy Alignment | Poor | High |
| Training Time | ~45 Minutes | ~45 Minutes |
| Inference Latency | Baseline | Near Identical |
| Additional GPU Memory | Baseline | ~50 MB Adapter |
Run inference
def categorise_complaint(complaint_text: str, model, tokenizer) -> dict:
messages = [
{
"role": "system",
"content": (
"You are a banking complaint classification assistant. "
"Given a consumer complaint narrative, extract the CFPB ticket fields "
"as a JSON object with keys: product, sub_product, issue, sub_issue."
),
},
{
"role": "user",
"content": complaint_text,
},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=128,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
prompt_len = inputs["input_ids"].shape[1]
generated = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
return generated
complaint = """
I reported fraudulent transactions on my debit card and the bank reversed
my provisional credit without explaining the investigation outcome.
"""
result = categorise_complaint(complaint, model, tokenizer)
print(result)
# {"product": "Checking or savings account", "sub_product": "Checking account",
# "issue": "Unauthorized transactions or other transaction problem",
# "sub_issue": "Debit card issue"}
Dependencies
transformers==4.44.0
peft==0.12.0
accelerate==0.34.0
datasets==2.21.0
torch (ROCm-compatible build for AMD, or standard CUDA build)
scikit-learn
rouge-score
sacrebleu
nltk
Limitations
- CFPB taxonomy only. The model is trained on and constrained to CFPB Consumer Complaint Database labels. It is not a general-purpose complaint classifier and should not be used with complaint taxonomies from other regulatory bodies or internal systems without retraining.
- Issue field accuracy. The
issuefield (33.6% accuracy) is the weakest link. The CFPB issue taxonomy contains 80+ canonical strings with overlapping phrasing. Expanding training data and further tuning the constrained decoder are the most direct paths to improvement. - English language only. All training data is in English. Performance on non-English complaints is untested and likely poor.
- Context length. Complaints longer than 1024 tokens will be truncated. Most CFPB complaints are well within this limit, but very long narratives may lose relevant context.
Intended Use
This model is intended for use by:
- Banking operations teams automating first-touch complaint categorisation
- Compliance teams processing regulatory complaint filings
- Contact centre platforms routing incoming complaints before agent assignment
- Research teams studying LLM adaptation for financial NLP tasks
It is not intended for consumer-facing deployment without human review of outputs, or for use in jurisdictions where automated complaint classification decisions have legal or regulatory implications without appropriate oversight.
Training Infrastructure
Trained on an AMD Instinct MI300X GPU (192 GB HBM3 VRAM) running ROCm 7.2.4. The training stack is fully ROCm-native — bitsandbytes (CUDA-only) is not used. Model precision is bfloat16, which is the native compute type for the CDNA3 architecture.
Citation
If you use this model in research or production, please cite the CFPB Consumer Complaint Database as the data source:
Consumer Financial Protection Bureau (CFPB)
Consumer Complaint Database
https://www.consumerfinance.gov/data-research/consumer-complaints/