mbazaNLP
/

Nllb_finetuned_general_en_kin

@@ -1,58 +1,99 @@
 ---
 license: cc-by-2.0
 datasets:
-- mbazaNLP/NMT_Tourism_parallel_data_en_kin
-- mbazaNLP/NMT_Education_parallel_data_en_kin
 - mbazaNLP/Kinyarwanda_English_parallel_dataset
 language:
 - en
 - rw
 library_name: transformers
 ---
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is a Machine Translation model, finetuned from [NLLB](https://huggingface.co/facebook/nllb-200-distilled-1.3B)-200's distilled 1.3B model, it is meant to be used in machine translation for education-related data.
-- **Finetuning code repository:** the code used to finetune this model can be found [here](https://github.com/Digital-Umuganda/twb_nllb_finetuning)
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-## How to Get Started with the Model
-Use the code below to get started with the model.
-### Training Procedure
-The model was finetuned on three datasets; a [general](https://huggingface.co/datasets/mbazaNLP/Kinyarwanda_English_parallel_dataset) purpose dataset, a [tourism](https://huggingface.co/datasets/mbazaNLP/NMT_Tourism_parallel_data_en_kin), and an [education](https://huggingface.co/datasets/mbazaNLP/NMT_Education_parallel_data_en_kin) dataset.
-The model was finetuned on an A100 40GB GPU for two epochs.
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-#### Testing Data
-<!-- This should link to a Data Card if possible. -->
-#### Metrics
-Model performance was measured using BLEU, spBLEU, TER, and chrF++ metrics.
-### Results

 ---
 license: cc-by-2.0
 datasets:
 - mbazaNLP/Kinyarwanda_English_parallel_dataset
+- mbazaNLP/NMT_Education_parallel_data_en_kin
+- mbazaNLP/NMT_Tourism_parallel_data_en_kin
 language:
 - en
 - rw
 library_name: transformers
+pipeline_tag: translation
 ---
+# Nllb_finetuned_general_en_kin — English ↔ Kinyarwanda (General Purpose)
+General-purpose machine translation model for English ↔ Kinyarwanda.
+Fine-tuned from [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B).
+**Fine-tuning code:** [Digital-Umuganda/twb_nllb_finetuning](https://github.com/Digital-Umuganda/twb_nllb_finetuning)
+## Usage
+```python
+from transformers import pipeline
+# English → Kinyarwanda
+translator = pipeline(
+    "translation",
+    model="mbazaNLP/Nllb_finetuned_general_en_kin",
+    src_lang="eng_Latn",
+    tgt_lang="kin_Latn",
+    max_length=400,
+)
+result = translator("Rwanda is a country in East Africa known for its biodiversity.")
+print(result[0]["translation_text"])
+# Kinyarwanda → English
+translator_rev = pipeline(
+    "translation",
+    model="mbazaNLP/Nllb_finetuned_general_en_kin",
+    src_lang="kin_Latn",
+    tgt_lang="eng_Latn",
+    max_length=400,
+)
+result = translator_rev("U Rwanda ni igihugu giri mu Afurika yo Hagati.")
+print(result[0]["translation_text"])
+```
+## Intended Use
+**Suitable for:**
+- General-purpose English ↔ Kinyarwanda translation
+- Applications requiring broad language coverage across domains
+- Research baseline for NLLB Kinyarwanda translation
+**Not intended for:**
+- High-stakes translation without human review
+- Specialised domains where the education or tourism models may perform better
+## Training
+Fine-tuned on a general-purpose corpus in a single phase:
+- [mbazaNLP/Kinyarwanda_English_parallel_dataset](https://huggingface.co/datasets/mbazaNLP/Kinyarwanda_English_parallel_dataset)
+- [mbazaNLP/NMT_Education_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Education_parallel_data_en_kin)
+- [mbazaNLP/NMT_Tourism_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Tourism_parallel_data_en_kin)
+Training hardware: A100 40 GB GPU, 2 epochs.
 ## Evaluation
+<!-- TODO: add BLEU/spBLEU/chrF++ scores from evaluation -->
+| Lang. Direction | BLEU | spBLEU | chrF++ | TER |
+|-----------------|------|--------|--------|-----|
+| Eng → Kin       | —    | —      | —      | —   |
+| Kin → Eng       | —    | —      | —      | —   |
+## Limitations
+- General-purpose training may underperform domain-specific models in education or tourism contexts.
+- Low-frequency Kinyarwanda vocabulary and tonal nuances may not be handled accurately.
+- Outputs should be reviewed for high-stakes applications.
+- Maximum reliable input length is approximately 200 tokens.
+## Bias and Fairness
+Training data spans multiple domains but may not equally represent all registers of Kinyarwanda. Colloquial or dialectal text may translate with lower quality.
+## Citation
+```bibtex
+@misc{mbazaNLP2023nllb_finetuned_general,
+  author    = {MBAZA-NLP Community},
+  title     = {Nllb\_finetuned\_general\_en\_kin: English--Kinyarwanda Machine Translation (General Purpose)},
+  year      = {2023},
+  url       = {https://huggingface.co/mbazaNLP/Nllb_finetuned_general_en_kin},
+  note      = {Hugging Face model repository}
+}
+```