---
language:
- en
tags:
- text-classification
- financial-sentiment
- knowledge-distillation
- patient-knowledge-distillation
- albert
- finbert
- finance
- nlp
- distillation
license: mit
datasets:
- financial_phrasebank
base_model: albert-base-v2
model-index:
- name: pkd-albert-student
results:
- task:
type: text-classification
name: Financial Sentiment Analysis
dataset:
name: Financial PhraseBank (100% agreement)
type: financial_phrasebank
split: test
metrics:
- type: accuracy
value: 0.9735
name: Test Accuracy
- type: f1
value: 0.9650
name: Macro F1
---
# PKD-ALBERT: Lightweight Financial Sentiment Classifier via Patient Knowledge Distillation
[](https://huggingface.co/spaces/hadangvu/pkd-sentiment-api)
[](https://huggingface.co/spaces/hadangvu/pkd-sentiment-api)
---
## Model Summary
**PKD-ALBERT** is a lightweight financial sentiment classifier distilled from
[ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert) using a two-stage
**Patient Knowledge Distillation (PKD)** pipeline.
It classifies financial text β headlines, earnings excerpts, news snippets β into
**positive**, **neutral**, or **negative** sentiment, achieving **97.4% accuracy**
and **0.965 Macro-F1** on Financial PhraseBank while using **~10Γ fewer parameters**
than the teacher model.
|
Teacher (FinBERT) |
Student (PKD-ALBERT) |
| Parameters |
109.5M |
11.7M |
| Model size |
417.7 MB |
44.6 MB |
| Test Accuracy |
97.6% |
97.4% |
| Macro F1 |
0.9696 |
0.9650 |
| Inference (ms/doc) |
1.49 ms |
2.02 ms |
| Accuracy drop |
β |
β0.2% |
> The student retains **99.8% of teacher accuracy** at **89% less disk space**.
---
## Try It Live
A live Gradio demo is available on Hugging Face Spaces. Paste any financial
headline or sentence and receive a sentiment label with a confidence score.
π **[Open Live Demo β](https://huggingface.co/spaces/hadangvu/pkd-sentiment-api)**
**Example inputs from the held-out test set:**
| Input |
Prediction |
Confidence |
| "Charles Schwab price target raised to $121 from $119 at JPMorgan." |
β
Positive |
93.1% |
| "Costco assumed with a Peer Perform at Wolfe Research." |
β Neutral |
92.2% |
| "IBM Explains How AI Models Are Making a Familiar Human Mistake." |
β Negative |
90.1% |
---
## API Usage
The `/predict` endpoint accepts raw text and returns a label and confidence score.
```python
import requests
API_URL = "https://hadangvu-pkd-sentiment-api.hf.space/predict"
response = requests.post(API_URL, json={
"text": "Charles Schwab price target raised to $121 from $119 at JPMorgan."
})
print(response.json())
# {
# "label": "positive",
# "confidence": 0.93,
# "latency_ms": 79.3
# }
```
**Input:** `{ "text": "..." }` β any financial sentence or headline (max 128 tokens)
**Output:** `{ "label": str, "confidence": float, "latency_ms": float }`
---
## Distillation Approach
This model was trained using a **two-stage Patient Knowledge Distillation** strategy.
Stage 1 β Distillation on Pseudo-Labeled Financial News
The student (ALBERT-base) was first trained on a large corpus of scraped financial
news pseudo-labeled by the FinBERT teacher, using a combined loss of soft KL
divergence targets and intermediate layer alignment.
| Parameter |
Value |
| Dataset |
Scraped financial news (pseudo-labeled by FinBERT) |
| Train / Val / Test split |
5,587 / 1,197 / 1,198 |
| Epochs |
3 |
| Batch size |
32 |
| Optimizer |
AdamW |
| Learning rate |
2e-5 |
| KD temperature sweep |
[2, 5, 9] |
| Alpha (KD loss weight) |
0.3 |
| PKD beta |
0.02 |
| PKD student layers |
[2, 4, 8, 12] |
Stage 2 β Fine-tuning on Financial PhraseBank
The distilled student was then fine-tuned on the high-quality
**Financial PhraseBank (100% annotator agreement)** subset using standard
cross-entropy loss to align the student with gold-label financial sentiment.
| Parameter |
Value |
| Dataset |
Financial PhraseBank (100% agreement) |
| Train / Val / Test split |
1,584 / 340 / 340 |
| Epochs |
1 |
| Loss |
Cross-entropy |
Loss Function
The Stage 1 total loss combines:
- KL Divergence between teacher soft targets and student logits (soft label transfer)
- Patient KD alignment between intermediate ALBERT and FinBERT hidden layers
- Alpha controls the balance between hard label CE loss and soft KD loss
---
## Full Performance Comparison
The table below compares all training strategies evaluated against the same
Financial PhraseBank test set (340 samples):
| Model |
Params |
Size |
Test Acc |
Macro F1 |
KL (teacherβstudent) |
| Teacher FinBERT |
109.5M |
417.7 MB |
97.6% |
0.9696 |
β |
| Fresh β FP (baseline) |
11.7M |
44.6 MB |
77.1% |
0.6126 |
0.359 |
| CE-scraped β FP |
11.7M |
44.6 MB |
95.9% |
0.9392 |
0.230 |
| KD-scraped β FP |
11.7M |
44.6 MB |
96.5% |
0.9514 |
0.156 |
| PKD-scraped β FP (ours) |
11.7M |
44.6 MB |
97.4% |
0.9650 |
0.188 |
**Key takeaway:** Patient KD achieves the highest Macro F1 among all student
variants and closes to within 0.5% of the teacher β demonstrating that
intermediate layer alignment significantly improves distillation quality beyond
standard KD.
---
## Intended Use
Designed for
- Classifying sentiment in financial news headlines and short excerpts
- Lightweight inference in resource-constrained environments (edge, serverless)
- Research into compact NLP models for financial NLP tasks
- Downstream integration into financial analytics pipelines
Not designed for
- Long-form financial documents (more than 128 tokens per segment β chunk first)
- Non-financial general text (model is domain-specialized)
- High-stakes trading decisions without additional validation
---
## Limitations
- Domain-specific: Trained exclusively on financial text. Performance on general-domain sentiment will degrade.
- Sequence length cap: Maximum 128 tokens per input. Longer documents should be chunked by sentence.
- Label distribution: Financial PhraseBank skews neutral-heavy. Rare strongly negative or positive samples may receive lower confidence.
- English only: Both training datasets are English. No multilingual support.
---
## Training Framework
- PyTorch with a custom training loop for KD and PKD loss computation
- Hugging Face Transformers for model loading, tokenization, and checkpointing
- Teacher model frozen throughout distillation; only student weights updated
---
## Citation
If you use this model or the distillation pipeline in your work, please cite:
```bibtex
@misc{pkd-albert-finbert,
author = {Ha Dang Vu},
title = {PKD-ALBERT: Lightweight Financial Sentiment via Patient Knowledge Distillation},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/hadangvu/pkd-albert-student}
}
```
## Related Work
---
Built as part of a financial NLP research project exploring efficient model compression for domain-specific sentiment analysis.