PT-BR Financial Sentiment Analysis
A fine-tuned BERT model for sentiment classification of Brazilian Portuguese financial news. Given a news headline or short text about the Brazilian financial market, the model classifies it as POSITIVE, NEGATIVE, or NEUTRAL.
This model was developed as part of an undergraduate thesis (TCC) analysing sentiment trends in the Brazilian financial market from 2016 to 2025.
Base Model
This model is a fine-tuned version of lucas-leme/FinBERT-PT-BR, a BERT model pre-trained on Brazilian Portuguese financial texts.
Labels
| ID | Label | Description |
|---|---|---|
| 0 | POSITIVE | News with a positive financial outlook or outcome |
| 1 | NEGATIVE | News with a negative financial outlook or outcome |
| 2 | NEUTRAL | News that is neither clearly positive nor negative |
Training Details
- Architecture:
BertForSequenceClassification(12 layers, 768 hidden, 12 attention heads) - Loss function: Label-smoothed cross-entropy (
label_smoothing=0.1) - Epochs: 4
- Learning rate: 7e-6
- Weight decay: 0.03
- Class weighting: Square-root balanced (to handle class imbalance)
- Post-hoc calibration: Additive logit bias per class (
POSITIVE: -0.65,NEGATIVE: -0.20,NEUTRAL: 0.00) - Ensemble: 2-seed ensemble (seeds 789 and 123) used during hyperparameter selection
Dataset
- Total labeled examples: 629 Brazilian financial news items (headlines and short summaries)
- Training split: 402 examples
- Calibration split: 101 examples (used for post-hoc bias calibration)
- Holdout split: 126 examples (stratified 20%, seed=2026 โ never seen during training or calibration)
Evaluation
Evaluated on a stratified holdout of 126 examples:
| Model | Accuracy | Macro F1 |
|---|---|---|
Base (FinBERT-PT-BR) |
34.1% | 0.331 |
| Fine-tuned (this model) | 64.3% | 0.643 |
The fine-tuned model achieves roughly +30 pp accuracy and +0.31 macro F1 over the base model on this domain-specific holdout.
Usage
from transformers import AutoTokenizer, BertForSequenceClassification
import torch
model_id = "lucasalmda/pt-br-financial-sentimental-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = BertForSequenceClassification.from_pretrained(model_id)
model.eval()
id2label = {0: "POSITIVE", 1: "NEGATIVE", 2: "NEUTRAL"}
# Optional: apply the same logit biases used during calibration
BIASES = {"POSITIVE": -0.65, "NEGATIVE": -0.20, "NEUTRAL": 0.00}
bias_tensor = torch.tensor([BIASES["POSITIVE"], BIASES["NEGATIVE"], BIASES["NEUTRAL"]])
text = "Ibovespa fecha em alta com expectativa de corte na taxa Selic"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
calibrated_logits = logits + bias_tensor
pred = calibrated_logits.argmax(dim=-1).item()
print(id2label[pred]) # e.g. "POSITIVE"
Limitations
- Trained on a relatively small labeled dataset (629 examples), so performance on edge cases may vary.
- Optimised for Brazilian Portuguese financial news. It is not suited for general-purpose sentiment analysis or other languages.
- The post-hoc calibration biases were selected on a held-out calibration split and may not generalise perfectly to all domains within Brazilian finance.
- Lexically ambiguous headlines (e.g. "Selic cai" combined with negative macro context) remain the most common error pattern.
Citation
If you use this model, please cite the base model:
lucas-leme/FinBERT-PT-BR โ https://huggingface.co/lucas-leme/FinBERT-PT-BR
- Downloads last month
- 43
Model tree for lucasalmda/pt-br-financial-sentimental-analysis
Base model
lucas-leme/FinBERT-PT-BR