---
language:
  - pt
license: mit
tags:
  - bert
  - finance
  - sentiment-analysis
  - portuguese
  - financial-news
  - text-classification
base_model: lucas-leme/FinBERT-PT-BR
pipeline_tag: text-classification
---

# PT-BR Financial Sentiment Analysis

A fine-tuned BERT model for **sentiment classification of Brazilian Portuguese financial news**. Given a news headline or short text about the Brazilian financial market, the model classifies it as **POSITIVE**, **NEGATIVE**, or **NEUTRAL**.

This model was developed as part of an undergraduate thesis (TCC) analysing sentiment trends in the Brazilian financial market from 2016 to 2025.

---

## Base Model

This model is a fine-tuned version of [lucas-leme/FinBERT-PT-BR](https://huggingface.co/lucas-leme/FinBERT-PT-BR), a BERT model pre-trained on Brazilian Portuguese financial texts.

---

## Labels

| ID | Label    | Description                                         |
|----|----------|-----------------------------------------------------|
| 0  | POSITIVE | News with a positive financial outlook or outcome   |
| 1  | NEGATIVE | News with a negative financial outlook or outcome   |
| 2  | NEUTRAL  | News that is neither clearly positive nor negative  |

---

## Training Details

- **Architecture**: `BertForSequenceClassification` (12 layers, 768 hidden, 12 attention heads)
- **Loss function**: Label-smoothed cross-entropy (`label_smoothing=0.1`)
- **Epochs**: 4
- **Learning rate**: 7e-6
- **Weight decay**: 0.03
- **Class weighting**: Square-root balanced (to handle class imbalance)
- **Post-hoc calibration**: Additive logit bias per class (`POSITIVE: -0.65`, `NEGATIVE: -0.20`, `NEUTRAL: 0.00`)
- **Ensemble**: 2-seed ensemble (seeds 789 and 123) used during hyperparameter selection

### Dataset

- **Total labeled examples**: 629 Brazilian financial news items (headlines and short summaries)
- **Training split**: 402 examples
- **Calibration split**: 101 examples (used for post-hoc bias calibration)
- **Holdout split**: 126 examples (stratified 20%, seed=2026 — never seen during training or calibration)

---

## Evaluation

Evaluated on a stratified holdout of **126 examples**:

| Model                        | Accuracy | Macro F1 |
|------------------------------|----------|----------|
| Base (`FinBERT-PT-BR`)       | 34.1%    | 0.331    |
| Fine-tuned (this model)      | **64.3%** | **0.643** |

The fine-tuned model achieves roughly **+30 pp accuracy** and **+0.31 macro F1** over the base model on this domain-specific holdout.

---

## Usage

```python
from transformers import AutoTokenizer, BertForSequenceClassification
import torch

model_id = "lucasalmda/pt-br-financial-sentimental-analysis"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = BertForSequenceClassification.from_pretrained(model_id)
model.eval()

id2label = {0: "POSITIVE", 1: "NEGATIVE", 2: "NEUTRAL"}

# Optional: apply the same logit biases used during calibration
BIASES = {"POSITIVE": -0.65, "NEGATIVE": -0.20, "NEUTRAL": 0.00}
bias_tensor = torch.tensor([BIASES["POSITIVE"], BIASES["NEGATIVE"], BIASES["NEUTRAL"]])

text = "Ibovespa fecha em alta com expectativa de corte na taxa Selic"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    calibrated_logits = logits + bias_tensor
    pred = calibrated_logits.argmax(dim=-1).item()

print(id2label[pred])  # e.g. "POSITIVE"
```

---

## Limitations

- Trained on a relatively small labeled dataset (629 examples), so performance on edge cases may vary.
- Optimised for **Brazilian Portuguese** financial news. It is not suited for general-purpose sentiment analysis or other languages.
- The post-hoc calibration biases were selected on a held-out calibration split and may not generalise perfectly to all domains within Brazilian finance.
- Lexically ambiguous headlines (e.g. "Selic cai" combined with negative macro context) remain the most common error pattern.

---

## Citation

If you use this model, please cite the base model:

```
lucas-leme/FinBERT-PT-BR — https://huggingface.co/lucas-leme/FinBERT-PT-BR
```