--- language: - en tags: - text-classification - financial-sentiment - knowledge-distillation - patient-knowledge-distillation - albert - finbert - finance - nlp - distillation license: mit datasets: - financial_phrasebank base_model: albert-base-v2 model-index: - name: pkd-albert-student results: - task: type: text-classification name: Financial Sentiment Analysis dataset: name: Financial PhraseBank (100% agreement) type: financial_phrasebank split: test metrics: - type: accuracy value: 0.9735 name: Test Accuracy - type: f1 value: 0.9650 name: Macro F1 --- # PKD-ALBERT: Lightweight Financial Sentiment Classifier via Patient Knowledge Distillation [![Demo on Spaces](https://img.shields.io/badge/πŸ€—%20Spaces-Live%20Demo-blue)](https://huggingface.co/spaces/hadangvu/pkd-sentiment-api) [![API](https://img.shields.io/badge/API-/predict-green)](https://huggingface.co/spaces/hadangvu/pkd-sentiment-api) --- ## Model Summary **PKD-ALBERT** is a lightweight financial sentiment classifier distilled from [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert) using a two-stage **Patient Knowledge Distillation (PKD)** pipeline. It classifies financial text β€” headlines, earnings excerpts, news snippets β€” into **positive**, **neutral**, or **negative** sentiment, achieving **97.4% accuracy** and **0.965 Macro-F1** on Financial PhraseBank while using **~10Γ— fewer parameters** than the teacher model.
Teacher (FinBERT) Student (PKD-ALBERT)
Parameters 109.5M 11.7M
Model size 417.7 MB 44.6 MB
Test Accuracy 97.6% 97.4%
Macro F1 0.9696 0.9650
Inference (ms/doc) 1.49 ms 2.02 ms
Accuracy drop β€” βˆ’0.2%
> The student retains **99.8% of teacher accuracy** at **89% less disk space**. --- ## Try It Live A live Gradio demo is available on Hugging Face Spaces. Paste any financial headline or sentence and receive a sentiment label with a confidence score. πŸ‘‰ **[Open Live Demo β†’](https://huggingface.co/spaces/hadangvu/pkd-sentiment-api)** **Example inputs from the held-out test set:**
Input Prediction Confidence
"Charles Schwab price target raised to $121 from $119 at JPMorgan." βœ… Positive 93.1%
"Costco assumed with a Peer Perform at Wolfe Research." βž– Neutral 92.2%
"IBM Explains How AI Models Are Making a Familiar Human Mistake." ❌ Negative 90.1%
--- ## API Usage The `/predict` endpoint accepts raw text and returns a label and confidence score. ```python import requests API_URL = "https://hadangvu-pkd-sentiment-api.hf.space/predict" response = requests.post(API_URL, json={ "text": "Charles Schwab price target raised to $121 from $119 at JPMorgan." }) print(response.json()) # { # "label": "positive", # "confidence": 0.93, # "latency_ms": 79.3 # } ``` **Input:** `{ "text": "..." }` β€” any financial sentence or headline (max 128 tokens) **Output:** `{ "label": str, "confidence": float, "latency_ms": float }` --- ## Distillation Approach This model was trained using a **two-stage Patient Knowledge Distillation** strategy.

Stage 1 β€” Distillation on Pseudo-Labeled Financial News

The student (ALBERT-base) was first trained on a large corpus of scraped financial news pseudo-labeled by the FinBERT teacher, using a combined loss of soft KL divergence targets and intermediate layer alignment.
Parameter Value
Dataset Scraped financial news (pseudo-labeled by FinBERT)
Train / Val / Test split 5,587 / 1,197 / 1,198
Epochs 3
Batch size 32
Optimizer AdamW
Learning rate 2e-5
KD temperature sweep [2, 5, 9]
Alpha (KD loss weight) 0.3
PKD beta 0.02
PKD student layers [2, 4, 8, 12]

Stage 2 β€” Fine-tuning on Financial PhraseBank

The distilled student was then fine-tuned on the high-quality **Financial PhraseBank (100% annotator agreement)** subset using standard cross-entropy loss to align the student with gold-label financial sentiment.
Parameter Value
Dataset Financial PhraseBank (100% agreement)
Train / Val / Test split 1,584 / 340 / 340
Epochs 1
Loss Cross-entropy

Loss Function

The Stage 1 total loss combines: --- ## Full Performance Comparison The table below compares all training strategies evaluated against the same Financial PhraseBank test set (340 samples):
Model Params Size Test Acc Macro F1 KL (teacher→student)
Teacher FinBERT 109.5M 417.7 MB 97.6% 0.9696 β€”
Fresh β†’ FP (baseline) 11.7M 44.6 MB 77.1% 0.6126 0.359
CE-scraped β†’ FP 11.7M 44.6 MB 95.9% 0.9392 0.230
KD-scraped β†’ FP 11.7M 44.6 MB 96.5% 0.9514 0.156
PKD-scraped β†’ FP (ours) 11.7M 44.6 MB 97.4% 0.9650 0.188
**Key takeaway:** Patient KD achieves the highest Macro F1 among all student variants and closes to within 0.5% of the teacher β€” demonstrating that intermediate layer alignment significantly improves distillation quality beyond standard KD. --- ## Intended Use

Designed for

Not designed for

--- ## Limitations --- ## Training Framework --- ## Citation If you use this model or the distillation pipeline in your work, please cite: ```bibtex @misc{pkd-albert-finbert, author = {Ha Dang Vu}, title = {PKD-ALBERT: Lightweight Financial Sentiment via Patient Knowledge Distillation}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/hadangvu/pkd-albert-student} } ``` ## Related Work --- Built as part of a financial NLP research project exploring efficient model compression for domain-specific sentiment analysis.