takala/financial_phrasebank
Updated • 8.56k • 259
| Teacher (FinBERT) | Student (PKD-ALBERT) | |
|---|---|---|
| Parameters | 109.5M | 11.7M |
| Model size | 417.7 MB | 44.6 MB |
| Test Accuracy | 97.6% | 97.4% |
| Macro F1 | 0.9696 | 0.9650 |
| Inference (ms/doc) | 1.49 ms | 2.02 ms |
| Accuracy drop | — | −0.2% |
| Parameter | Value |
|---|---|
| Dataset | Scraped financial news (pseudo-labeled by FinBERT) |
| Train / Val / Test split | 5,587 / 1,197 / 1,198 |
| Epochs | 3 |
| Batch size | 32 |
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| KD temperature sweep | [2, 5, 9] |
| Alpha (KD loss weight) | 0.3 |
| PKD beta | 0.02 |
| PKD student layers | [2, 4, 8, 12] |
| Parameter | Value |
|---|---|
| Dataset | Financial PhraseBank (100% agreement) |
| Train / Val / Test split | 1,584 / 340 / 340 |
| Epochs | 1 |
| Loss | Cross-entropy |
| Model | Params | Size | Test Acc | Macro F1 | KL (teacher→student) |
|---|---|---|---|---|---|
| Teacher FinBERT | 109.5M | 417.7 MB | 97.6% | 0.9696 | — |
| Fresh → FP (baseline) | 11.7M | 44.6 MB | 77.1% | 0.6126 | 0.359 |
| CE-scraped → FP | 11.7M | 44.6 MB | 95.9% | 0.9392 | 0.230 |
| KD-scraped → FP | 11.7M | 44.6 MB | 96.5% | 0.9514 | 0.156 |
| PKD-scraped → FP (ours) | 11.7M | 44.6 MB | 97.4% | 0.9650 | 0.188 |
| Teacher (FinBERT) | Student (PKD-ALBERT) | |
|---|---|---|
| Parameters | 109.5M | 11.7M |
| Model size | 417.7 MB | 44.6 MB |
| Test Accuracy | 97.6% | 97.4% |
| Macro F1 | 0.9696 | 0.9650 |
| Inference (ms/doc) | 1.49 ms | 2.02 ms |
| Accuracy drop | — | −0.2% |
A live Gradio demo is available on Hugging Face Spaces. Paste any financial headline or sentence and receive a sentiment label with a confidence score.
Example inputs from the held-out test set:
| Input | Prediction | Confidence |
|---|---|---|
| "Charles Schwab price target raised to $121 from $119 at JPMorgan." | ✅ Positive | 93.1% |
| "Costco assumed with a Peer Perform at Wolfe Research." | ➖ Neutral | 92.2% |
| "IBM Explains How AI Models Are Making a Familiar Human Mistake." | ❌ Negative | 90.1% |
The /predict endpoint accepts raw text and returns a label and confidence score.
import requests
API_URL = "https://hadangvu-pkd-sentiment-api.hf.space/predict"
response = requests.post(API_URL, json={
"text": "Charles Schwab price target raised to $121 from $119 at JPMorgan."
})
print(response.json())
# {
# "label": "positive",
# "confidence": 0.93,
# "latency_ms": 79.3
# }
Input: { "text": "..." } — any financial sentence or headline (max 128 tokens)
Output: { "label": str, "confidence": float, "latency_ms": float }
| Parameter | Value |
|---|---|
| Dataset | Scraped financial news (pseudo-labeled by FinBERT) |
| Train / Val / Test split | 5,587 / 1,197 / 1,198 |
| Epochs | 3 |
| Batch size | 32 |
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| KD temperature sweep | [2, 5, 9] |
| Alpha (KD loss weight) | 0.3 |
| PKD beta | 0.02 |
| PKD student layers | [2, 4, 8, 12] |
| Parameter | Value |
|---|---|
| Dataset | Financial PhraseBank (100% agreement) |
| Train / Val / Test split | 1,584 / 340 / 340 |
| Epochs | 1 |
| Loss | Cross-entropy |
| Model | Params | Size | Test Acc | Macro F1 | KL (teacher→student) |
|---|---|---|---|---|---|
| Teacher FinBERT | 109.5M | 417.7 MB | 97.6% | 0.9696 | — |
| Fresh → FP (baseline) | 11.7M | 44.6 MB | 77.1% | 0.6126 | 0.359 |
| CE-scraped → FP | 11.7M | 44.6 MB | 95.9% | 0.9392 | 0.230 |
| KD-scraped → FP | 11.7M | 44.6 MB | 96.5% | 0.9514 | 0.156 |
| PKD-scraped → FP (ours) | 11.7M | 44.6 MB | 97.4% | 0.9650 | 0.188 |
If you use this model or the distillation pipeline in your work, please cite:
@misc{pkd-albert-finbert,
author = {Ha Dang Vu},
title = {PKD-ALBERT: Lightweight Financial Sentiment via Patient Knowledge Distillation},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/hadangvu/pkd-albert-student}
}
Built as part of a financial NLP research project exploring efficient model compression for domain-specific sentiment analysis.
Base model
albert/albert-base-v2