PT-BR Financial Sentiment Analysis

A fine-tuned BERT model for sentiment classification of Brazilian Portuguese financial news. Given a news headline or short text about the Brazilian financial market, the model classifies it as POSITIVE, NEGATIVE, or NEUTRAL.

This model was developed as part of an undergraduate thesis (TCC) analysing sentiment trends in the Brazilian financial market from 2016 to 2025.

Base Model

This model is a fine-tuned version of lucas-leme/FinBERT-PT-BR, a BERT model pre-trained on Brazilian Portuguese financial texts.

Labels

ID	Label	Description
0	POSITIVE	News with a positive financial outlook or outcome
1	NEGATIVE	News with a negative financial outlook or outcome
2	NEUTRAL	News that is neither clearly positive nor negative

Training Details

Architecture: BertForSequenceClassification (12 layers, 768 hidden, 12 attention heads)
Loss function: Label-smoothed cross-entropy (label_smoothing=0.1)
Epochs: 4
Learning rate: 7e-6
Weight decay: 0.03
Class weighting: Square-root balanced (to handle class imbalance)
Post-hoc calibration: Additive logit bias per class (POSITIVE: -0.65, NEGATIVE: -0.20, NEUTRAL: 0.00)
Ensemble: 2-seed ensemble (seeds 789 and 123) used during hyperparameter selection

Dataset

Total labeled examples: 629 Brazilian financial news items (headlines and short summaries)
Training split: 402 examples
Calibration split: 101 examples (used for post-hoc bias calibration)
Holdout split: 126 examples (stratified 20%, seed=2026 — never seen during training or calibration)

Evaluation

Evaluated on a stratified holdout of 126 examples:

Model	Accuracy	Macro F1
Base (`FinBERT-PT-BR`)	34.1%	0.331
Fine-tuned (this model)	64.3%	0.643

The fine-tuned model achieves roughly +30 pp accuracy and +0.31 macro F1 over the base model on this domain-specific holdout.

Usage

from transformers import AutoTokenizer, BertForSequenceClassification
import torch

model_id = "lucasalmda/pt-br-financial-sentimental-analysis"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = BertForSequenceClassification.from_pretrained(model_id)
model.eval()

id2label = {0: "POSITIVE", 1: "NEGATIVE", 2: "NEUTRAL"}

# Optional: apply the same logit biases used during calibration
BIASES = {"POSITIVE": -0.65, "NEGATIVE": -0.20, "NEUTRAL": 0.00}
bias_tensor = torch.tensor([BIASES["POSITIVE"], BIASES["NEGATIVE"], BIASES["NEUTRAL"]])

text = "Ibovespa fecha em alta com expectativa de corte na taxa Selic"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    calibrated_logits = logits + bias_tensor
    pred = calibrated_logits.argmax(dim=-1).item()

print(id2label[pred])  # e.g. "POSITIVE"

Limitations

Trained on a relatively small labeled dataset (629 examples), so performance on edge cases may vary.
Optimised for Brazilian Portuguese financial news. It is not suited for general-purpose sentiment analysis or other languages.
The post-hoc calibration biases were selected on a held-out calibration split and may not generalise perfectly to all domains within Brazilian finance.
Lexically ambiguous headlines (e.g. "Selic cai" combined with negative macro context) remain the most common error pattern.

Citation

If you use this model, please cite the base model:

lucas-leme/FinBERT-PT-BR — https://huggingface.co/lucas-leme/FinBERT-PT-BR

Downloads last month: 43

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for lucasalmda/pt-br-financial-sentimental-analysis

Base model

lucas-leme/FinBERT-PT-BR

Finetuned

(7)

this model