PT-BR Financial Sentiment Analysis

A fine-tuned BERT model for sentiment classification of Brazilian Portuguese financial news. Given a news headline or short text about the Brazilian financial market, the model classifies it as POSITIVE, NEGATIVE, or NEUTRAL.

This model was developed as part of an undergraduate thesis (TCC) analysing sentiment trends in the Brazilian financial market from 2016 to 2025.


Base Model

This model is a fine-tuned version of lucas-leme/FinBERT-PT-BR, a BERT model pre-trained on Brazilian Portuguese financial texts.


Labels

ID Label Description
0 POSITIVE News with a positive financial outlook or outcome
1 NEGATIVE News with a negative financial outlook or outcome
2 NEUTRAL News that is neither clearly positive nor negative

Training Details

  • Architecture: BertForSequenceClassification (12 layers, 768 hidden, 12 attention heads)
  • Loss function: Label-smoothed cross-entropy (label_smoothing=0.1)
  • Epochs: 4
  • Learning rate: 7e-6
  • Weight decay: 0.03
  • Class weighting: Square-root balanced (to handle class imbalance)
  • Post-hoc calibration: Additive logit bias per class (POSITIVE: -0.65, NEGATIVE: -0.20, NEUTRAL: 0.00)
  • Ensemble: 2-seed ensemble (seeds 789 and 123) used during hyperparameter selection

Dataset

  • Total labeled examples: 629 Brazilian financial news items (headlines and short summaries)
  • Training split: 402 examples
  • Calibration split: 101 examples (used for post-hoc bias calibration)
  • Holdout split: 126 examples (stratified 20%, seed=2026 โ€” never seen during training or calibration)

Evaluation

Evaluated on a stratified holdout of 126 examples:

Model Accuracy Macro F1
Base (FinBERT-PT-BR) 34.1% 0.331
Fine-tuned (this model) 64.3% 0.643

The fine-tuned model achieves roughly +30 pp accuracy and +0.31 macro F1 over the base model on this domain-specific holdout.


Usage

from transformers import AutoTokenizer, BertForSequenceClassification
import torch

model_id = "lucasalmda/pt-br-financial-sentimental-analysis"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = BertForSequenceClassification.from_pretrained(model_id)
model.eval()

id2label = {0: "POSITIVE", 1: "NEGATIVE", 2: "NEUTRAL"}

# Optional: apply the same logit biases used during calibration
BIASES = {"POSITIVE": -0.65, "NEGATIVE": -0.20, "NEUTRAL": 0.00}
bias_tensor = torch.tensor([BIASES["POSITIVE"], BIASES["NEGATIVE"], BIASES["NEUTRAL"]])

text = "Ibovespa fecha em alta com expectativa de corte na taxa Selic"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    calibrated_logits = logits + bias_tensor
    pred = calibrated_logits.argmax(dim=-1).item()

print(id2label[pred])  # e.g. "POSITIVE"

Limitations

  • Trained on a relatively small labeled dataset (629 examples), so performance on edge cases may vary.
  • Optimised for Brazilian Portuguese financial news. It is not suited for general-purpose sentiment analysis or other languages.
  • The post-hoc calibration biases were selected on a held-out calibration split and may not generalise perfectly to all domains within Brazilian finance.
  • Lexically ambiguous headlines (e.g. "Selic cai" combined with negative macro context) remain the most common error pattern.

Citation

If you use this model, please cite the base model:

lucas-leme/FinBERT-PT-BR โ€” https://huggingface.co/lucas-leme/FinBERT-PT-BR
Downloads last month
43
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lucasalmda/pt-br-financial-sentimental-analysis

Finetuned
(7)
this model