---
license: cc-by-4.0
language:
  - pt
tags:
  - lora
  - brand-voice
  - brazilian-portuguese
  - institutional-communication
  - instruction-tuning
  - adaption
  - marketing
library_name: peft
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.3-70B-Instruct
---

![banner](https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/8410bb23-1a13-4ba5-bce6-21f47da3191e.png)

# VozBR-BrandVoice — Brazilian Portuguese Institutional Brand-Voice Adapter

LoRA adapter fine-tuned on **Llama-3.3-70B-Instruct** (70B) for Brazilian Portuguese institutional brand-voice compliance, via [Adaption's](https://adaptionlabs.ai) AutoScientist platform.

---

## The problem this adapter addresses

Corporate and institutional Portuguese-language assistants routinely drift away from brand/communication guidelines — wrong register, missing required structure, banned informal markers. Given a raw citizen request like *"a aposentadoria não foi paga em março, preciso de uma explicação urgente"*, a base model typically responds informally and without the structure an institution requires:

> *"Foi um erro, vamos verificar e te aviso depois."*

The brand-voice-compliant response follows an explicit structure — formal opening, objective context, explanatory body with protocol/deadline references, closing with a follow-up channel, and institutional identification:

> *"Prezado(a) cidadão(a), em atenção à manifestação registrada sob o protocolo nº [...], informamos que sua solicitação foi encaminhada ao setor competente. O processo encontra-se em andamento, com prazo estimado de 15 dias úteis para conclusão. Você poderá acompanhar a tramitação por meio do canal da ouvidoria..."*

This adapter teaches the model to apply an explicit, written **Brand Voice Guide** (structure, tone, required vocabulary, banned terms) to a raw input, and to comply with a deterministic 10-point conformance rubric.

---

## Adaptive Data results

| Metric | Before | After |
|---|---|---|
| Quality score | 8.0 | 9.4 |
| Quality grade | B | **A** |
| Relative improvement | — | **+17.5%** |
| Percentile (Governance domain) | 16.7 | **57.7** |

---

## Training metrics

| Metric | Value |
|---|---|
| Base model | `meta-llama/Llama-3.3-70B-Instruct` (70B) |
| Trained model name | `adaption_pt_br_formal_gov_complaints` |
| Training method | SFT + LoRA |
| LoRA rank (r) | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0 |
| Trainable modules | all-linear |
| Epochs | 1 |
| Training steps | 113 |
| Learning rate | 1e-4 (cosine scheduler) |
| Warmup ratio | 0.03 |
| Weight decay | 0.01 |
| Max grad norm | 1 |
| Dataset size | 20,204 examples (Grade A) |
| **Adapted model win rate** | **79%** (vs 21% base) |

---

## Dataset

| Platform | Link |
|---|---|
| HuggingFace Dataset (base, 6,505 examples) | [Fernandosr85/vozbr-brandvoice](https://huggingface.co/datasets/Fernandosr85/vozbr-brandvoice) |
| HuggingFace Dataset (expanded, 20,204 examples, used for training) | [Fernandosr85/adaption-pt-br-formal-gov-complaints](https://huggingface.co/datasets/Fernandosr85/adaption-pt-br-formal-gov-complaints) |
| Kaggle Dataset | [VozBR-BrandVoice Dataset](https://www.kaggle.com/datasets/fernandosr85/vozbr-brandvoice-dataset) |
| Source dataset | [FalaBR-SynthLetters](https://www.kaggle.com/datasets/fernandosr85/falabr-synthletters-enhanced-br-ombudsman) |

6,505 base instruction-tuning examples (expanded to 20,204 via Adaption Adaptive Data augmentation — 8,000 domain-specific + 5,700 general-purpose data points), each pairing:

- **`prompt`**: an explicit Brand Voice Guide plus a reframed raw citizen request
- **`completion`**: a formal institutional response, pre-filtered to score ≥ 7/10 on the conformance rubric below

### Brand Voice conformance rubric (10 checks)

| Check | Description |
|---|---|
| `formal_opener` | Formal opening salutation (e.g. "Prezado(a) cidadão(a),") |
| `institutional_voice` | Impersonal institutional voice ("Informa-se que...", "Cumpre informar...") |
| `process_vocab` | Reference to protocol / process / request |
| `progress_vocab` | Progress/deadline terms ("prazo", "andamento", "concluído") |
| `followup_vocab` | Follow-up/escalation channel ("ouvidoria", "canal", "recurso") |
| `formal_closing` | Formal closing |
| `no_banned_terms` | No slang, internet language, or emojis |
| `no_excess_caps` | No excessive capitalization |
| `min_length` | At least 40 words |
| `no_first_person_singular` | No informal first-person singular ("eu acho") |

---

## Source data & provenance

- **CGU / Fala.BR** — Brazil's federal ombudsman open data ([dados.gov.br](https://dadosabertos.cgu.gov.br)), CC BY 4.0
- **[FalaBR-SynthLetters](https://www.kaggle.com/datasets/fernandosr85/falabr-synthletters-enhanced-br-ombudsman)** — 8,203 instruction-completion pairs of formal pt-BR letters, remastered via Adaption Adaptive Data (Grade A, 9/10 quality, Governance domain), CC BY-SA 4.0
- **[FalaBR-GovBench](https://www.kaggle.com/datasets/fernandosr85/falabr-govbench-curation)** — 11-year Brazilian ombudsman benchmark, the original source corpus

All personal identifiers in training examples are templated placeholders (e.g. `[Nome do Requerente]`, `[CPF]`), not real citizen data.

---

## Credits

- **Fine-tuning platform:** [Adaption](https://adaptionlabs.ai) — AutoScientist & Adaptive Data
- **Challenge:** [AutoScientist Challenge 2026](https://adaptionlabs.ai/blog/autoscientist-challenge) — Marketing category
- **Training infrastructure:** Adaption compute credits
- **Dataset remastering:** Adaption Adaptive Data pipeline (Grade A, +17.5% quality improvement)
- **Author:** Fernando Rodrigues · [Kaggle: fernandosr85](https://www.kaggle.com/fernandosr85) · [HuggingFace: Fernandosr85](https://huggingface.co/Fernandosr85)

---

## Disclaimer

Experimental research artifact submitted to AutoScientist Challenge 2026 (Marketing category).
This adapter is derived from public-sector ombudsman correspondence. The Brand Voice Guide and conformance rubric reflect a formal government-correspondence register; applying them to other corporate brand voices may require adjusting the guide's required vocabulary and tone rules. Not a substitute for legal or compliance review of institutional communications.