Instructions to use CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") model = AutoModelForMultimodalLM.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") - Notebooks
- Google Colab
- Kaggle
File size: 2,085 Bytes
1673378 228263d d839e72 228263d d839e72 fbd118f ded9b9b 228263d 48effef a4f0c54 ed5f209 a4f0c54 228263d a4f0c54 228263d 71a5e5c 48effef 355c6cf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ---
language:
- 'zh'
- 'en'
tags:
- translation
- game
- cultivation
license: 'cc-by-nc-4.0'
datasets:
- Custom
metrics:
- BLEU
---
This is a finetuned version of Facebook/M2M100.
It's a project born from the activity of [Amateur Modding Avenue](discord.gg/agFA6xa6un), a Discord based modding community.
Special thanks to the Path of Wuxia modding team for kindly sharing their translations to help build the dataset.
It has been trained on a parallel corpus on several Chinese video games translations. All of them are from human/fan translations.
It's not perfect but it's the best I could do.
It should be sitting somewhere between Google Translate and DeepL, I guess.
So... Before you go any further, lower your expectations.
No, lower.
Just a bit lower... and.. here we are.
That being said, it has upsides for first MT pass in a game translation context :
1) It should not mess up tags
2) It has basic cultivation/martial arts vocabulary
3) Nothing is locked behind a paywall \o/
Sample generation script :
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
tokenizer = transformers.AutoTokenizer.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En")
model = AutoModelForSeq2SeqLM.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En")
model.to(device)
tokenizer.src_lang = "zh"
tokenizer.tgt_lang = "en"
test_string = "地阶上品遁术,施展后便可立于所持之剑上,以极快的速度自由飞行。"
inputs = tokenizer(test_string, return_tensors="pt").to(device)
translated_tokens = model.generate(**inputs, num_beams=10, do_sample=True)
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
print("CH : ", test_string , " // EN : ", translation)
```
Translation sample and comparison with Google Translate and DeepL : [Link to Spreadsheet](https://docs.google.com/spreadsheets/d/1J1i9P0nyI9q5-m2iZGSUatt3ZdHSxU8NOp9tJH7wxsk/edit?usp=sharing) |