---
language:
  - en
  - hi
license: mit
library_name: pytorch
tags:
  - translation
  - transformer
  - seq2seq
  - english-to-hindi
  - pytorch
  - from-scratch
  - ray-tune
  - optuna
datasets:
  - tatoeba
metrics:
  - bleu
model-index:
  - name: EN-HI Transformer v1.0.0
    results:
      - task:
          type: translation
          name: Machine Translation
        dataset:
          name: Tatoeba EN-HI (raw export, 13186 pairs)
          type: tatoeba
        metrics:
          - type: bleu
            value: 75.66
            name: BLEU (NLTK method4 ×100)
  - name: EN-HI Transformer v1.1.0
    results:
      - task:
          type: translation
          name: Machine Translation
        dataset:
          name: Tatoeba EN-HI (raw export, 13186 pairs)
          type: tatoeba
        metrics:
          - type: bleu
            value: 83.69
            name: BLEU (NLTK method4 ×100)
---

# English → Hindi Transformer

A **from-scratch PyTorch encoder-decoder Transformer** for English → Hindi machine translation,
trained on a **raw [Tatoeba](https://tatoeba.org/en/downloads) EN-HI export**
(13 186 sentence pairs, including multiple Hindi translations per English sentence).

This repository provides **two versioned checkpoints**:

| Version | Description | BLEU | Epochs | Weights file |
|---|---|---|---|---|
| **v1.0.0** | Baseline — fixed hyperparameters | 0.7566 | 100 | `v1.0.0/transformer_translation_final.pth` |
| **v1.1.0** ✔ *recommended* | Ray Tune + Optuna optimised | **0.8369** | **50** | `v1.1.0/m25csa023_ass_4_best_model.pth` |

> v1.1.0 achieves **+10.6% BLEU** in **half the epochs** compared to v1.0.0.

---

## Training Summary

![Training & Evaluation Summary](assets/summary.png)

**(a)** Training loss curves  -  baseline (100 ep) vs tuned (50 ep).
**(b)** BLEU progression across epochs.
**(c)** All 20 Ray Tune trial loss curves (grey = pruned by ASHA, orange = best).
**(d)** Hyperparameter importance (Spearman ρ)  -  batch size & dropout matter most.
**(e–g)** Scatter plots: LR / dropout / batch size vs final loss across all trials.
**(h)** Final comparison bar chart: time, loss, and BLEU for v1.0.0 vs v1.1.0.

---

## Dataset

**Source:** Raw export from [tatoeba.org/en/downloads](https://tatoeba.org/en/downloads)  -  English-Hindi sentence pairs.
**Note:** This is the **unprocessed Tatoeba dump**, not the Helsinki-NLP filtered version.
The file used during training: `English-Hindi.tsv`

### TSV Column Structure

| Column | Content | Example |
|---|---|---|
| 1 | English sentence ID (Tatoeba) | `1282` |
| 2 | English sentence | `Muiriel is 20 now.` |
| 3 | Hindi sentence ID (Tatoeba) | `485968` |
| 4 | Hindi sentence | `म्यूरियल अब बीस साल की हो गई है।` |

### Statistics

| Property | Value |
|---|---|
| Total sentence pairs | 13 186 |
| Unique English sentences | 11 109 (2 077 have multiple Hindi translations) |
| Mean English length | 5.6 words |
| Mean Hindi length | 6.3 words |
| Max English length | 53 tokens |
| Max Hindi length | 57 tokens |
| English ID range | 1 277 – 12 886 231 (Tatoeba IDs) |
| Hindi ID range | 440 811 – 13 125 624 (Tatoeba IDs) |
| Tokenisation | Whitespace split, lowercased |
| Min word frequency (vocab) | 2 |

### Repository File Structure

```
en-hi-transformer/
├── README.md                    ← model card (this page)
├── config.json                  ← shared architecture config
├── assets/
│   └── summary.png              ← training & evaluation plots
├── v1.0.0/
│   ├── transformer_translation_final.pth   ← baseline weights  (~192 MB)
│   └── config.json              ← v1.0.0 hyperparameters
├── v1.1.0/
│   ├── m25csa023_ass_4_best_model.pth      ← optimised weights (~216 MB)  ← recommended
│   └── config.json              ← v1.1.0 hyperparameters + search config
└── vocab/
    ├── en_vocab.pkl             ← English vocabulary (4 117 tokens)
    └── hi_vocab.pkl             ← Hindi vocabulary   (4 044 tokens)
```

---

## Model Architecture

Built from scratch following
[Vaswani et al. (2017)  -  *"Attention Is All You Need"*](https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf),
**no HuggingFace Transformers library used internally**.

| Property | Value |
|---|---|
| Architecture | Encoder-Decoder Transformer |
| d_model | 512 |
| num_layers | 6 encoder + 6 decoder |
| num_heads | 8 |
| d_ff | 2048 (v1.0.0) / **2560 (v1.1.0)** |
| Dropout | 0.10 (v1.0.0) / **0.081 (v1.1.0)** |
| Max sequence length | 50 tokens |
| Positional encoding | Sinusoidal (fixed) |
| Source vocabulary | 4 117 English tokens |
| Target vocabulary | 4 044 Hindi tokens |
| Special tokens | `<pad>` `<sos>` `<eos>` `<unk>` |

---

## Versions

### v1.0.0  -  Baseline

Trained for **100 epochs** with manually chosen hyperparameters on an NVIDIA A100 80 GB
(BF16 autocast + `torch.compile` + `cudnn.benchmark`).

| Hyperparameter | Value |
|---|---|
| Learning rate | 1e-4 |
| Batch size | 60 |
| d_ff | 2048 |
| Dropout | 0.10 |
| Gradient clipping |  -  |

**Results:** BLEU **0.7566** · Loss **0.0998** · Training time **12.3 min**

---

### v1.1.0  -  Ray Tune + Optuna Optimised ✔ 

Hyperparameters discovered automatically using **Ray Tune 2.x** with **OptunaSearch (TPE)**
and an **ASHA early-stopping scheduler** (20 trials, ~65% pruned early).

| Hyperparameter | Optimised Value |
|---|---|
| Learning rate | **1.112e-4** |
| Batch size | **32** |
| d_ff | **2560** |
| Dropout | **0.081** |
| Gradient clipping | max_norm = 1.0 |

**Results:** BLEU **0.8369** · Loss **0.1264** · Training time **13.5 min** · Epochs **50**

The winning configuration first surpassed the v1.0.0 BLEU at **epoch 10** during the search sweep.

---

## How to Use

### 1. Clone the repo & install dependencies

```bash
git lfs install
git clone https://huggingface.co/priyadip/en-hi-transformer
pip install torch
```

### 2. Load a checkpoint

```python
import torch, pickle

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load vocabularies
with open("en-hi-transformer/vocab/en_vocab.pkl", "rb") as f:
    en_vocab = pickle.load(f)
with open("en-hi-transformer/vocab/hi_vocab.pkl", "rb") as f:
    hi_vocab = pickle.load(f)

# Instantiate the model  (Transformer class from the training script)
model = Transformer(
    src_vocab_size = len(en_vocab),
    tgt_vocab_size = len(hi_vocab),
    d_model  = 512,
    num_layers = 6,
    num_heads  = 8,
    d_ff     = 2560,   # use 2048 for v1.0.0
    max_len  = 50,
    dropout  = 0.081,  # use 0.10 for v1.0.0
).to(DEVICE)

# Load weights  -  pick the version you need
model.load_state_dict(
    torch.load("en-hi-transformer/v1.1.0/m25csa023_ass_4_best_model.pth", map_location=DEVICE)
    # or: "en-hi-transformer/v1.0.0/transformer_translation_final.pth"
)
model.eval()
```

### 3. Translate a sentence

```python
def translate(model, sentence, max_len=50):
    tokens = encode_sentence(sentence, en_vocab, max_len)
    src = torch.tensor(tokens).unsqueeze(0).to(DEVICE)
    tgt = [hi_vocab["<sos>"]]
    with torch.no_grad():
        for _ in range(max_len):
            out = model(src, torch.tensor(tgt).unsqueeze(0).to(DEVICE),
                        en_vocab["<pad>"], hi_vocab["<pad>"])
            nxt = out[0, -1].argmax().item()
            tgt.append(nxt)
            if nxt == hi_vocab["<eos>"]:
                break
    return " ".join(hi_vocab.itos[i] for i in tgt[1:-1])

print(translate(model, "How are you?"))         # → तुम कैसी हो?
print(translate(model, "I love you."))          # → मैं तुमसे प्यार करती हूँ।
print(translate(model, "What is your name?"))   # → आपका नाम क्या है?
```

> `Transformer` and `encode_sentence` are defined in the training script available
> in the linked GitHub repository.

---

## Sample Outputs (v1.1.0)

| English | Hindi |
|---|---|
| How are you? | तुम कैसी हो? |
| I love you. | मैं तुमसे प्यार करती हूँ। |
| What is your name? | आपका नाम क्या है? |
| The weather is nice today. | आज मौसम अच्छा है। |
| She is a good teacher. | वह अच्छा शिक्षक है। |

---

## Limitations

- Vocabulary of ~4 K tokens; unknown words map to `<unk>`.
- Optimised for short sentences (≤ 10 words); quality degrades on longer input.
- Greedy decoding  -  no beam search.
- BLEU evaluated on a small held-out set; treat scores as indicative.

---

## Citation

If you use this model, please cite:

**This model:**
```bibtex
@misc{en_hi_transformer_2026,
  author       = {priyadip},
  title        = {English to Hindi Transformer (v1.0.0 / v1.1.0)},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/priyadip/en-hi-transformer}},
  note         = {v1.0.0: BLEU 0.7566 / 100 epochs.
                   v1.1.0: BLEU 0.8369 / 50 epochs via Ray Tune + Optuna (+10.6\%).}
}
```

**Architecture  -  Attention Is All You Need:**
```bibtex
@inproceedings{vaswani2017attention,
  title     = {Attention Is All You Need},
  author    = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and
               Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and
               Kaiser, Lukasz and Polosukhin, Illia},
  booktitle = {Advances in Neural Information Processing Systems},
  volume    = {30},
  year      = {2017},
  url       = {https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf}
}
```

- Paper: https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
- Papers with Code: https://paperswithcode.com/paper/attention-is-all-you-need

**Dataset  -  Tatoeba:**
```bibtex
@misc{tatoeba,
  title        = {Tatoeba: A multilingual sentence collection},
  author       = {Tatoeba contributors},
  howpublished = {\url{https://tatoeba.org}},
  note         = {Raw EN-HI export used; 13 186 pairs including multiple
                   Hindi translations per English sentence.}
}
```