Translation
Transformers
PyTorch
TensorFlow
Safetensors
English
Korean
marian
text2text-generation
opus-mt-tc
Eval Results (legacy)
Instructions to use Helsinki-NLP/opus-mt-tc-big-en-ko with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Helsinki-NLP/opus-mt-tc-big-en-ko with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-ko")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-tc-big-en-ko") model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-tc-big-en-ko") - Inference
- Notebooks
- Google Colab
- Kaggle
Completely wrong translations from english to korean.
#3
by titericz - opened
Helsinki-NLP/opus-mt-tc-big-en-ko is not working.
Me too,, lol
I found a way.
you can make target_vocab.json file using "spm" file in files using the code below. (you should modify it tho)
from transformers import MarianMTModel, MarianTokenizer
import json
tokenizer = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-tc-big-ko-en')
vocab = { tokenizer.spm_source.id_to_piece(id): id for id in range(tokenizer.spm_source.get_piece_size()) }
vocab[tokenizer.pad_token] = tokenizer.pad_token_id
with open("vocab_ko_en.json", "w") as f:
json.dump(vocab, f, indent=2)
target_vocab = { tokenizer.spm_target.id_to_piece(id): id for id in range(tokenizer.spm_target.get_piece_size()) }
target_vocab[tokenizer.pad_token] = tokenizer.pad_token_id
with open("target_vocab_ko_en.json", "w") as f:
json.dump(target_vocab, f, indent=2)
and then change the code like this.
tokenizer = MarianTokenizer.from_pretrained(model_name,separate_vocab=True, target_vocab_file= "./target_vocab.json", separate_vocabs=True)
and it will work!
I found a way.
you can make target_vocab.json file using "spm" file in files using the code below. (you should modify it tho)
from transformers import MarianMTModel, MarianTokenizer import json tokenizer = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-tc-big-ko-en') vocab = { tokenizer.spm_source.id_to_piece(id): id for id in range(tokenizer.spm_source.get_piece_size()) } vocab[tokenizer.pad_token] = tokenizer.pad_token_id with open("vocab_ko_en.json", "w") as f: json.dump(vocab, f, indent=2) target_vocab = { tokenizer.spm_target.id_to_piece(id): id for id in range(tokenizer.spm_target.get_piece_size()) } target_vocab[tokenizer.pad_token] = tokenizer.pad_token_id with open("target_vocab_ko_en.json", "w") as f: json.dump(target_vocab, f, indent=2)and then change the code like this.
tokenizer = MarianTokenizer.from_pretrained(model_name,separate_vocab=True, target_vocab_file= "./target_vocab.json", separate_vocabs=True)and it will work!
Wouldn't work for me, still some very odd results. I did do one amendment, by passing the source_vocab as well to from_pretrained, but to no avail :/