--- license: cc-by-2.0 datasets: - mbazaNLP/Kinyarwanda_English_parallel_dataset - mbazaNLP/NMT_Education_parallel_data_en_kin - mbazaNLP/NMT_Tourism_parallel_data_en_kin language: - en - rw library_name: transformers pipeline_tag: translation --- # Nllb_finetuned_general_en_kin — English ↔ Kinyarwanda (General Purpose) General-purpose machine translation model for English ↔ Kinyarwanda. Fine-tuned from [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B). **Fine-tuning code:** [Digital-Umuganda/twb_nllb_finetuning](https://github.com/Digital-Umuganda/twb_nllb_finetuning) ## Usage ```python from transformers import pipeline # English → Kinyarwanda translator = pipeline( "translation", model="mbazaNLP/Nllb_finetuned_general_en_kin", src_lang="eng_Latn", tgt_lang="kin_Latn", max_length=400, ) result = translator("Rwanda is a country in East Africa known for its biodiversity.") print(result[0]["translation_text"]) # Kinyarwanda → English translator_rev = pipeline( "translation", model="mbazaNLP/Nllb_finetuned_general_en_kin", src_lang="kin_Latn", tgt_lang="eng_Latn", max_length=400, ) result = translator_rev("U Rwanda ni igihugu giri mu Afurika yo Hagati.") print(result[0]["translation_text"]) ``` ## Intended Use **Suitable for:** - General-purpose English ↔ Kinyarwanda translation - Applications requiring broad language coverage across domains - Research baseline for NLLB Kinyarwanda translation **Not intended for:** - High-stakes translation without human review - Specialised domains where the education or tourism models may perform better ## Training Fine-tuned on a general-purpose corpus in a single phase: - [mbazaNLP/Kinyarwanda_English_parallel_dataset](https://huggingface.co/datasets/mbazaNLP/Kinyarwanda_English_parallel_dataset) - [mbazaNLP/NMT_Education_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Education_parallel_data_en_kin) - [mbazaNLP/NMT_Tourism_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Tourism_parallel_data_en_kin) Training hardware: A100 40 GB GPU, 2 epochs. ## Evaluation | Lang. Direction | BLEU | spBLEU | chrF++ | TER | |-----------------|------|--------|--------|-----| | Eng → Kin | — | — | — | — | | Kin → Eng | — | — | — | — | ## Limitations - General-purpose training may underperform domain-specific models in education or tourism contexts. - Low-frequency Kinyarwanda vocabulary and tonal nuances may not be handled accurately. - Outputs should be reviewed for high-stakes applications. - Maximum reliable input length is approximately 200 tokens. ## Bias and Fairness Training data spans multiple domains but may not equally represent all registers of Kinyarwanda. Colloquial or dialectal text may translate with lower quality. ## Bias and Fairness Machine translation models can reflect and amplify biases present in training data. Known limitations include: - **Domain bias:** Fine-tuned on specific domain data; performance may be lower on out-of-domain text. - **Cultural bias:** Idiomatic expressions, gender-neutral constructs, and culturally specific references in English may not translate accurately or naturally into Kinyarwanda. - **Data source bias:** Training data was sourced from specific platforms; text from other sources or registers may yield lower quality translations. - **Gender:** English gender-neutral pronouns may be rendered with gendered forms in Kinyarwanda based on distributional patterns in training data. Validate translation quality on domain-representative samples before deployment in high-stakes contexts (legal, medical, government communications). ## Citation ```bibtex @misc{mbazaNLP2023nllb_finetuned_general, author = {MBAZA-NLP Community}, title = {Nllb\_finetuned\_general\_en\_kin: English--Kinyarwanda Machine Translation (General Purpose)}, year = {2023}, url = {https://huggingface.co/mbazaNLP/Nllb_finetuned_general_en_kin}, note = {Hugging Face model repository} } ```