Instructions to use CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") model = AutoModelForMultimodalLM.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - 'zh' | |
| - 'en' | |
| tags: | |
| - translation | |
| - game | |
| - cultivation | |
| license: 'cc-by-nc-4.0' | |
| datasets: | |
| - Custom | |
| metrics: | |
| - BLEU | |
| This is a finetuned version of Facebook/M2M100. | |
| It's a project born from the activity of [Amateur Modding Avenue](discord.gg/agFA6xa6un), a Discord based modding community. | |
| Special thanks to the Path of Wuxia modding team for kindly sharing their translations to help build the dataset. | |
| It has been trained on a 46k lines parallel corpus on several Chinese video games translations. All of them are from human/fan translations. | |
| It's not perfect but it's the best I could do. | |
| It should be sitting somewhere between Google Translate and DeepL, I guess. | |
| So... Before you go any further, lower your expectations. | |
| No, lower. | |
| Just a bit lower... and.. here we are. | |
| That being said, it has upsides for first MT pass in a game translation context : | |
| 1) It should not mess up tags | |
| 2) It has basic cultivation/martial arts vocabulary | |
| 3) Nothing is locked behind a paywall \o/ | |
| Sample generation script : | |
| ```python | |
| from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
| import torch | |
| device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") | |
| tokenizer = transformers.AutoTokenizer.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") | |
| model = AutoModelForSeq2SeqLM.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") | |
| model.to(device) | |
| tokenizer.src_lang = "zh" | |
| tokenizer.tgt_lang = "en" | |
| test_string = "地阶上品遁术,施展后便可立于所持之剑上,以极快的速度自由飞行。" | |
| inputs = tokenizer(test_string, return_tensors="pt").to(device) | |
| translated_tokens = model.generate(**inputs, num_beams=10, do_sample=True) | |
| translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] | |
| print("CH : ", test_string , " // EN : ", translation) | |
| ``` | |
| Translation sample and comparison with Google Translate and DeepL : [Link to Spreadsheet](https://docs.google.com/spreadsheets/d/1J1i9P0nyI9q5-m2iZGSUatt3ZdHSxU8NOp9tJH7wxsk/edit?usp=sharing) |