Nematus RNN · English → Turkish (BPE)

An attentional RNN (bi-GRU encoder / GRU decoder) neural machine translation model that translates English → Turkish, trained with the Nematus toolkit on a ~142k-sentence news-domain parallel corpus using a joint 32k BPE subword vocabulary.

This is the RNN baseline (reverse direction) for the machine translation models collection. Siblings: the stronger Transformer atahanuz/transformers-translator-en-tr-75M and the forward RNN atahanuz/rnn-translator-tr-en-63M. All four share the same data and the same joint BPE model.

TL;DR

BLEU 33.93 on a 1,000-sentence held-out test set (beam 12, length-normalized).
~63M parameters · single A100-80GB · ~125 min training.
A baseline: the Transformer sibling scores 42.04 on the same test set (see matrix below).

Model details


Toolkit	Nematus (TensorFlow), commit `49d050863bc9644b8c0a9d9ab6e54ccd30f927dd`
Architecture	Attentional RNN — single-layer bi-GRU encoder, single-layer GRU decoder with attention (conditional GRU, `dec_base_transition_depth=2`)
Direction	English (`en`) → Turkish (`tr`)
Embedding size	512
Hidden / state size	1000
Dropout	none (0.0)
Embedding tying	none (untied)
Subword model	joint 32k BPE (subword-nmt)
Vocab caps	source 18,000 / target 24,000
Parameters	~63M

Training data

Corpus: mt_datasets_vol2 — 144,065 English–Turkish sentence pairs (news / current affairs).
Filtering: drop any pair where either side is empty or longer than 60 whitespace tokens → 143,926 pairs.
Split (shuffled, seed 42): 141,926 train / 1,000 dev / 1,000 test.

Preprocessing

Tokenization: Moses tokenizer.perl -l <lang> -no-escape.
Joint BPE: 32,000 merges learned on the training side only (EN+TR concatenated), applied without a vocabulary-frequency threshold. Resulting subword vocab: TR 23,211 / EN 17,659.

(BPE is identical across all four models — it is architecture- and direction-independent.)

Training configuration

Optimizer: Adam (β1 0.9, β2 0.999, ε 1e-8), learning rate 1e-4 constant, gradient-norm clip 1.0.
Batch 320 sentences, maxlen 120, label smoothing 0.0.
Validation every 200 updates; early stopping with patience 10 on dev cross-entropy.
Hardware: 1× NVIDIA A100-80GB.

Training run

Best validation at update 24,400 (dev cross-entropy 56.70) — the checkpoint in this repo.
Early-stopped at update ~26,900. Wall-clock ≈ 125 min (RNNs converge slower than the Transformer and run slower per word due to sequential recurrence).

Evaluation

BLEU via multi-bleu.perl on the merged-BPE (Moses-tokenized) hypothesis vs the tokenized reference; beam 12, length-normalized.

Architecture × direction matrix (same 1,000 mirrored test pairs):

Architecture	TR → EN	EN → TR
Transformer (~75M)	42.78	42.04
RNN (~63M)	35.10	33.93

This model = RNN EN→TR = 33.93 (n-gram 60.1 / 39.6 / 27.9 / 20.0, BP 1.00). The Transformer beats the RNN by ~7–8 BLEU; Turkish-target (→TR) is the slightly harder direction for both architectures (agglutinative morphology).

Example

English (input)	Model output	Reference
`The US Embassy in Bosnia and Herzegovina welcomed the offer to send soldiers to Iraq.`	`ABD'nin Bosna-Hersek Büyükelçiliği Irak'a asker gönderme teklifini memnuniyetle karşıladı.`	`ABD'nin BH Büyükelçiliği Irak'a asker gönderme teklifini memnuniyetle karşıladı.`

Files in this repo

model.npz.* — Nematus/TensorFlow checkpoint (best-validation, update 24,400).
train.bpe.en.json, train.bpe.tr.json — Nematus source/target vocabularies (referenced by model.npz.json).
tr-en.bpe.codes, vocab.tr, vocab.en — the joint BPE model, used to segment new input.
nematus_tf220_compat.patch — makes Nematus run on TensorFlow ≥ 2.16 / NumPy ≥ 2 / Python 3.12.

Installation

git clone https://github.com/EdinburghNLP/nematus.git
cd nematus && git checkout 49d050863bc9644b8c0a9d9ab6e54ccd30f927dd
git apply /path/to/nematus_tf220_compat.patch
pip install "tensorflow>=2.16" "numpy>=2" subword-nmt sacremoses
cd ..

Usage

# 0. download this model
python3 -c "from huggingface_hub import snapshot_download; \
snapshot_download('atahanuz/rnn-translator-en-tr-63M', local_dir='en_tr_rnn')"

# 1. preprocess an English input file (one sentence per line)
perl nematus/data/tokenizer.perl -l en -no-escape < input.en > input.tok.en
subword-nmt apply-bpe -c en_tr_rnn/tr-en.bpe.codes < input.tok.en > input.bpe.en

# 2. translate — run from the model dir so the dict paths in model.npz.json resolve
cd en_tr_rnn
python3 ../nematus/nematus/translate.py -m model.npz -i ../input.bpe.en -o ../out.bpe.tr -k 12 -n -b 50
cd ..

# 3. postprocess: undo BPE, then detokenize (Turkish)
sed -E 's/(@@ )|(@@ ?$)//g' out.bpe.tr > out.tok.tr
python3 - <<'EOF'
import re
from sacremoses import MosesDetokenizer
d = MosesDetokenizer(lang='tr')
for line in open('out.tok.tr', encoding='utf-8'):
    print(re.sub(r"\s*'\s*", "'", d.detokenize(line.split())))   # join Turkish suffix apostrophes
EOF

Intended use & limitations

A research baseline; for best quality use the Transformer sibling (+8.1 BLEU).
Domain: news / current affairs. Degrades out of domain and on long / complex sentences.
Trained on sentences ≤ 60 tokens.
Tokenized multi-bleu.perl BLEU — for citable numbers use sacreBLEU on detokenized output.
No safety/bias auditing.

License

cc-by-4.0 placeholder — set to match your training-data terms.

Acknowledgements

Nematus; subword-nmt. GRU attentional NMT: Bahdanau et al. (2015) / Sennrich et al. (Nematus).

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including atahanuz/rnn-translator-en-tr-63M

Neural Machine Translation

Collection

Neural models trained for Turkish-English machine translation in both directios, remarkable results with their small sizes • 5 items • Updated 20 days ago