--- license: cc-by-4.0 language: - pt tags: - lora - brand-voice - brazilian-portuguese - institutional-communication - instruction-tuning - adaption - marketing library_name: peft pipeline_tag: text-generation base_model: meta-llama/Llama-3.3-70B-Instruct --- ![banner](https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/8410bb23-1a13-4ba5-bce6-21f47da3191e.png) # VozBR-BrandVoice — Brazilian Portuguese Institutional Brand-Voice Adapter LoRA adapter fine-tuned on **Llama-3.3-70B-Instruct** (70B) for Brazilian Portuguese institutional brand-voice compliance, via [Adaption's](https://adaptionlabs.ai) AutoScientist platform. --- ## The problem this adapter addresses Corporate and institutional Portuguese-language assistants routinely drift away from brand/communication guidelines — wrong register, missing required structure, banned informal markers. Given a raw citizen request like *"a aposentadoria não foi paga em março, preciso de uma explicação urgente"*, a base model typically responds informally and without the structure an institution requires: > *"Foi um erro, vamos verificar e te aviso depois."* The brand-voice-compliant response follows an explicit structure — formal opening, objective context, explanatory body with protocol/deadline references, closing with a follow-up channel, and institutional identification: > *"Prezado(a) cidadão(a), em atenção à manifestação registrada sob o protocolo nº [...], informamos que sua solicitação foi encaminhada ao setor competente. O processo encontra-se em andamento, com prazo estimado de 15 dias úteis para conclusão. Você poderá acompanhar a tramitação por meio do canal da ouvidoria..."* This adapter teaches the model to apply an explicit, written **Brand Voice Guide** (structure, tone, required vocabulary, banned terms) to a raw input, and to comply with a deterministic 10-point conformance rubric. --- ## Adaptive Data results | Metric | Before | After | |---|---|---| | Quality score | 8.0 | 9.4 | | Quality grade | B | **A** | | Relative improvement | — | **+17.5%** | | Percentile (Governance domain) | 16.7 | **57.7** | --- ## Training metrics | Metric | Value | |---|---| | Base model | `meta-llama/Llama-3.3-70B-Instruct` (70B) | | Trained model name | `adaption_pt_br_formal_gov_complaints` | | Training method | SFT + LoRA | | LoRA rank (r) | 64 | | LoRA alpha | 128 | | LoRA dropout | 0 | | Trainable modules | all-linear | | Epochs | 1 | | Training steps | 113 | | Learning rate | 1e-4 (cosine scheduler) | | Warmup ratio | 0.03 | | Weight decay | 0.01 | | Max grad norm | 1 | | Dataset size | 20,204 examples (Grade A) | | **Adapted model win rate** | **79%** (vs 21% base) | --- ## Dataset | Platform | Link | |---|---| | HuggingFace Dataset (base, 6,505 examples) | [Fernandosr85/vozbr-brandvoice](https://huggingface.co/datasets/Fernandosr85/vozbr-brandvoice) | | HuggingFace Dataset (expanded, 20,204 examples, used for training) | [Fernandosr85/adaption-pt-br-formal-gov-complaints](https://huggingface.co/datasets/Fernandosr85/adaption-pt-br-formal-gov-complaints) | | Kaggle Dataset | [VozBR-BrandVoice Dataset](https://www.kaggle.com/datasets/fernandosr85/vozbr-brandvoice-dataset) | | Source dataset | [FalaBR-SynthLetters](https://www.kaggle.com/datasets/fernandosr85/falabr-synthletters-enhanced-br-ombudsman) | 6,505 base instruction-tuning examples (expanded to 20,204 via Adaption Adaptive Data augmentation — 8,000 domain-specific + 5,700 general-purpose data points), each pairing: - **`prompt`**: an explicit Brand Voice Guide plus a reframed raw citizen request - **`completion`**: a formal institutional response, pre-filtered to score ≥ 7/10 on the conformance rubric below ### Brand Voice conformance rubric (10 checks) | Check | Description | |---|---| | `formal_opener` | Formal opening salutation (e.g. "Prezado(a) cidadão(a),") | | `institutional_voice` | Impersonal institutional voice ("Informa-se que...", "Cumpre informar...") | | `process_vocab` | Reference to protocol / process / request | | `progress_vocab` | Progress/deadline terms ("prazo", "andamento", "concluído") | | `followup_vocab` | Follow-up/escalation channel ("ouvidoria", "canal", "recurso") | | `formal_closing` | Formal closing | | `no_banned_terms` | No slang, internet language, or emojis | | `no_excess_caps` | No excessive capitalization | | `min_length` | At least 40 words | | `no_first_person_singular` | No informal first-person singular ("eu acho") | --- ## Source data & provenance - **CGU / Fala.BR** — Brazil's federal ombudsman open data ([dados.gov.br](https://dadosabertos.cgu.gov.br)), CC BY 4.0 - **[FalaBR-SynthLetters](https://www.kaggle.com/datasets/fernandosr85/falabr-synthletters-enhanced-br-ombudsman)** — 8,203 instruction-completion pairs of formal pt-BR letters, remastered via Adaption Adaptive Data (Grade A, 9/10 quality, Governance domain), CC BY-SA 4.0 - **[FalaBR-GovBench](https://www.kaggle.com/datasets/fernandosr85/falabr-govbench-curation)** — 11-year Brazilian ombudsman benchmark, the original source corpus All personal identifiers in training examples are templated placeholders (e.g. `[Nome do Requerente]`, `[CPF]`), not real citizen data. --- ## Credits - **Fine-tuning platform:** [Adaption](https://adaptionlabs.ai) — AutoScientist & Adaptive Data - **Challenge:** [AutoScientist Challenge 2026](https://adaptionlabs.ai/blog/autoscientist-challenge) — Marketing category - **Training infrastructure:** Adaption compute credits - **Dataset remastering:** Adaption Adaptive Data pipeline (Grade A, +17.5% quality improvement) - **Author:** Fernando Rodrigues · [Kaggle: fernandosr85](https://www.kaggle.com/fernandosr85) · [HuggingFace: Fernandosr85](https://huggingface.co/Fernandosr85) --- ## Disclaimer Experimental research artifact submitted to AutoScientist Challenge 2026 (Marketing category). This adapter is derived from public-sector ombudsman correspondence. The Brand Voice Guide and conformance rubric reflect a formal government-correspondence register; applying them to other corporate brand voices may require adjusting the guide's required vocabulary and tone rules. Not a substitute for legal or compliance review of institutional communications.