Instructions to use mbazaNLP/Nllb_finetuned_general_en_kin with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mbazaNLP/Nllb_finetuned_general_en_kin with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("mbazaNLP/Nllb_finetuned_general_en_kin") model = AutoModelForMultimodalLM.from_pretrained("mbazaNLP/Nllb_finetuned_general_en_kin") - Notebooks
- Google Colab
- Kaggle
Update model card: Intended Use, Limitations, code example, BibTeX, licence/email/description fixes
Browse files
README.md
CHANGED
|
@@ -1,58 +1,99 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-2.0
|
| 3 |
datasets:
|
| 4 |
-
- mbazaNLP/NMT_Tourism_parallel_data_en_kin
|
| 5 |
-
- mbazaNLP/NMT_Education_parallel_data_en_kin
|
| 6 |
- mbazaNLP/Kinyarwanda_English_parallel_dataset
|
|
|
|
|
|
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
- rw
|
| 10 |
library_name: transformers
|
|
|
|
| 11 |
---
|
| 12 |
-
## Model Details
|
| 13 |
-
|
| 14 |
-
### Model Description
|
| 15 |
-
|
| 16 |
-
<!-- Provide a longer summary of what this model is. -->
|
| 17 |
|
| 18 |
-
|
| 19 |
|
|
|
|
|
|
|
| 20 |
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
##
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
| 31 |
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
##
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
| 37 |
|
|
|
|
| 38 |
|
| 39 |
## Evaluation
|
| 40 |
|
| 41 |
-
<!--
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
#### Testing Data
|
| 45 |
-
|
| 46 |
-
<!-- This should link to a Data Card if possible. -->
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
#### Metrics
|
| 50 |
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
##
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
|
|
|
| 56 |
|
|
|
|
| 57 |
|
|
|
|
| 58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-2.0
|
| 3 |
datasets:
|
|
|
|
|
|
|
| 4 |
- mbazaNLP/Kinyarwanda_English_parallel_dataset
|
| 5 |
+
- mbazaNLP/NMT_Education_parallel_data_en_kin
|
| 6 |
+
- mbazaNLP/NMT_Tourism_parallel_data_en_kin
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
- rw
|
| 10 |
library_name: transformers
|
| 11 |
+
pipeline_tag: translation
|
| 12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
# Nllb_finetuned_general_en_kin β English β Kinyarwanda (General Purpose)
|
| 15 |
|
| 16 |
+
General-purpose machine translation model for English β Kinyarwanda.
|
| 17 |
+
Fine-tuned from [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B).
|
| 18 |
|
| 19 |
+
**Fine-tuning code:** [Digital-Umuganda/twb_nllb_finetuning](https://github.com/Digital-Umuganda/twb_nllb_finetuning)
|
| 20 |
|
| 21 |
+
## Usage
|
| 22 |
|
| 23 |
+
```python
|
| 24 |
+
from transformers import pipeline
|
| 25 |
|
| 26 |
+
# English β Kinyarwanda
|
| 27 |
+
translator = pipeline(
|
| 28 |
+
"translation",
|
| 29 |
+
model="mbazaNLP/Nllb_finetuned_general_en_kin",
|
| 30 |
+
src_lang="eng_Latn",
|
| 31 |
+
tgt_lang="kin_Latn",
|
| 32 |
+
max_length=400,
|
| 33 |
+
)
|
| 34 |
+
result = translator("Rwanda is a country in East Africa known for its biodiversity.")
|
| 35 |
+
print(result[0]["translation_text"])
|
| 36 |
|
| 37 |
+
# Kinyarwanda β English
|
| 38 |
+
translator_rev = pipeline(
|
| 39 |
+
"translation",
|
| 40 |
+
model="mbazaNLP/Nllb_finetuned_general_en_kin",
|
| 41 |
+
src_lang="kin_Latn",
|
| 42 |
+
tgt_lang="eng_Latn",
|
| 43 |
+
max_length=400,
|
| 44 |
+
)
|
| 45 |
+
result = translator_rev("U Rwanda ni igihugu giri mu Afurika yo Hagati.")
|
| 46 |
+
print(result[0]["translation_text"])
|
| 47 |
+
```
|
| 48 |
|
| 49 |
+
## Intended Use
|
| 50 |
|
| 51 |
+
**Suitable for:**
|
| 52 |
+
- General-purpose English β Kinyarwanda translation
|
| 53 |
+
- Applications requiring broad language coverage across domains
|
| 54 |
+
- Research baseline for NLLB Kinyarwanda translation
|
| 55 |
|
| 56 |
+
**Not intended for:**
|
| 57 |
+
- High-stakes translation without human review
|
| 58 |
+
- Specialised domains where the education or tourism models may perform better
|
| 59 |
|
| 60 |
+
## Training
|
| 61 |
|
| 62 |
+
Fine-tuned on a general-purpose corpus in a single phase:
|
| 63 |
+
- [mbazaNLP/Kinyarwanda_English_parallel_dataset](https://huggingface.co/datasets/mbazaNLP/Kinyarwanda_English_parallel_dataset)
|
| 64 |
+
- [mbazaNLP/NMT_Education_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Education_parallel_data_en_kin)
|
| 65 |
+
- [mbazaNLP/NMT_Tourism_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Tourism_parallel_data_en_kin)
|
| 66 |
|
| 67 |
+
Training hardware: A100 40 GB GPU, 2 epochs.
|
| 68 |
|
| 69 |
## Evaluation
|
| 70 |
|
| 71 |
+
<!-- TODO: add BLEU/spBLEU/chrF++ scores from evaluation -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
| Lang. Direction | BLEU | spBLEU | chrF++ | TER |
|
| 74 |
+
|-----------------|------|--------|--------|-----|
|
| 75 |
+
| Eng β Kin | β | β | β | β |
|
| 76 |
+
| Kin β Eng | β | β | β | β |
|
| 77 |
|
| 78 |
+
## Limitations
|
| 79 |
|
| 80 |
+
- General-purpose training may underperform domain-specific models in education or tourism contexts.
|
| 81 |
+
- Low-frequency Kinyarwanda vocabulary and tonal nuances may not be handled accurately.
|
| 82 |
+
- Outputs should be reviewed for high-stakes applications.
|
| 83 |
+
- Maximum reliable input length is approximately 200 tokens.
|
| 84 |
|
| 85 |
+
## Bias and Fairness
|
| 86 |
|
| 87 |
+
Training data spans multiple domains but may not equally represent all registers of Kinyarwanda. Colloquial or dialectal text may translate with lower quality.
|
| 88 |
|
| 89 |
+
## Citation
|
| 90 |
|
| 91 |
+
```bibtex
|
| 92 |
+
@misc{mbazaNLP2023nllb_finetuned_general,
|
| 93 |
+
author = {MBAZA-NLP Community},
|
| 94 |
+
title = {Nllb\_finetuned\_general\_en\_kin: English--Kinyarwanda Machine Translation (General Purpose)},
|
| 95 |
+
year = {2023},
|
| 96 |
+
url = {https://huggingface.co/mbazaNLP/Nllb_finetuned_general_en_kin},
|
| 97 |
+
note = {Hugging Face model repository}
|
| 98 |
+
}
|
| 99 |
+
```
|