metadata
language:
- en
tags:
- text-classification
- financial-sentiment
- knowledge-distillation
- patient-knowledge-distillation
- albert
- finbert
- finance
- nlp
- distillation
license: mit
datasets:
- financial_phrasebank
base_model: albert-base-v2
model-index:
- name: pkd-albert-student
results:
- task:
type: text-classification
name: Financial Sentiment Analysis
dataset:
name: Financial PhraseBank (100% agreement)
type: financial_phrasebank
split: test
metrics:
- type: accuracy
value: 0.9735
name: Test Accuracy
- type: f1
value: 0.965
name: Macro F1
PKD-ALBERT: Lightweight Financial Sentiment Classifier via Patient Knowledge Distillation
Model Summary
PKD-ALBERT is a lightweight financial sentiment classifier distilled from
ProsusAI/finbert using a two-stage
Patient Knowledge Distillation (PKD) pipeline.
It classifies financial text — headlines, earnings excerpts, news snippets — into
positive, neutral, or negative sentiment, achieving 97.4% accuracy
and 0.965 Macro-F1 on Financial PhraseBank while using ~10× fewer parameters
than the teacher model.
Teacher (FinBERT)
Student (PKD-ALBERT)
Parameters
109.5M
11.7M
Model size
417.7 MB
44.6 MB
Test Accuracy
97.6%
97.4%
Macro F1
0.9696
0.9650
Inference (ms/doc)
1.49 ms
2.02 ms
Accuracy drop
—
−0.2%
> The student retains 99.8% of teacher accuracy at 89% less disk space.
Try It Live
A live Gradio demo is available on Hugging Face Spaces. Paste any financial headline or sentence and receive a sentiment label with a confidence score.
Example inputs from the held-out test set:
| Input | Prediction | Confidence |
|---|---|---|
| "Charles Schwab price target raised to $121 from $119 at JPMorgan." | ✅ Positive | 93.1% |
| "Costco assumed with a Peer Perform at Wolfe Research." | ➖ Neutral | 92.2% |
| "IBM Explains How AI Models Are Making a Familiar Human Mistake." | ❌ Negative | 90.1% |
API Usage
The /predict endpoint accepts raw text and returns a label and confidence score.
import requests
API_URL = "https://hadangvu-pkd-sentiment-api.hf.space/predict"
response = requests.post(API_URL, json={
"text": "Charles Schwab price target raised to $121 from $119 at JPMorgan."
})
print(response.json())
# {
# "label": "positive",
# "confidence": 0.93,
# "latency_ms": 79.3
# }
Input: { "text": "..." } — any financial sentence or headline (max 128 tokens)
Output: { "label": str, "confidence": float, "latency_ms": float }
Distillation Approach
This model was trained using a two-stage Patient Knowledge Distillation strategy.
Stage 1 — Distillation on Pseudo-Labeled Financial News
The student (ALBERT-base) was first trained on a large corpus of scraped financial
news pseudo-labeled by the FinBERT teacher, using a combined loss of soft KL
divergence targets and intermediate layer alignment.
Parameter
Value
Dataset
Scraped financial news (pseudo-labeled by FinBERT)
Train / Val / Test split
5,587 / 1,197 / 1,198
Epochs
3
Batch size
32
Optimizer
AdamW
Learning rate
2e-5
KD temperature sweep
[2, 5, 9]
Alpha (KD loss weight)
0.3
PKD beta
0.02
PKD student layers
[2, 4, 8, 12]
Stage 2 — Fine-tuning on Financial PhraseBank
The distilled student was then fine-tuned on the high-quality
Financial PhraseBank (100% annotator agreement) subset using standard
cross-entropy loss to align the student with gold-label financial sentiment.
Parameter
Value
Dataset
Financial PhraseBank (100% agreement)
Train / Val / Test split
1,584 / 340 / 340
Epochs
1
Loss
Cross-entropy
Loss Function
The Stage 1 total loss combines:
- KL Divergence between teacher soft targets and student logits (soft label transfer)
- Patient KD alignment between intermediate ALBERT and FinBERT hidden layers
- Alpha controls the balance between hard label CE loss and soft KD loss
Full Performance Comparison
The table below compares all training strategies evaluated against the same
Financial PhraseBank test set (340 samples):
Model
Params
Size
Test Acc
Macro F1
KL (teacher→student)
Teacher FinBERT
109.5M
417.7 MB
97.6%
0.9696
—
Fresh → FP (baseline)
11.7M
44.6 MB
77.1%
0.6126
0.359
CE-scraped → FP
11.7M
44.6 MB
95.9%
0.9392
0.230
KD-scraped → FP
11.7M
44.6 MB
96.5%
0.9514
0.156
PKD-scraped → FP (ours)
11.7M
44.6 MB
97.4%
0.9650
0.188
Key takeaway: Patient KD achieves the highest Macro F1 among all student
variants and closes to within 0.5% of the teacher — demonstrating that
intermediate layer alignment significantly improves distillation quality beyond
standard KD.
Intended Use
Designed for
- Classifying sentiment in financial news headlines and short excerpts
- Lightweight inference in resource-constrained environments (edge, serverless)
- Research into compact NLP models for financial NLP tasks
- Downstream integration into financial analytics pipelines
Not designed for
- Long-form financial documents (more than 128 tokens per segment — chunk first)
- Non-financial general text (model is domain-specialized)
- High-stakes trading decisions without additional validation
Limitations
- Domain-specific: Trained exclusively on financial text. Performance on general-domain sentiment will degrade.
- Sequence length cap: Maximum 128 tokens per input. Longer documents should be chunked by sentence.
- Label distribution: Financial PhraseBank skews neutral-heavy. Rare strongly negative or positive samples may receive lower confidence.
- English only: Both training datasets are English. No multilingual support.
Training Framework
- PyTorch with a custom training loop for KD and PKD loss computation
- Hugging Face Transformers for model loading, tokenization, and checkpointing
- Teacher model frozen throughout distillation; only student weights updated
Citation
If you use this model or the distillation pipeline in your work, please cite:
@misc{pkd-albert-finbert,
author = {Ha Dang Vu},
title = {PKD-ALBERT: Lightweight Financial Sentiment via Patient Knowledge Distillation},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/hadangvu/pkd-albert-student}
}
Related Work
- ProsusAI/finbert — Teacher model
- Patient Knowledge Distillation (Sun et al., 2019) — PKD method
- Financial PhraseBank (Malo et al., 2014) — Evaluation dataset
Built as part of a financial NLP research project exploring efficient model compression for domain-specific sentiment analysis.