---
title: Audio Language Translator
emoji: 🌍
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
suggested_hardware: t4-small
---

# 🌍 Audio Language Translator

Translate spoken audio between 15 languages using a complete AI pipeline.

## 🎯 What This Does

1. **Upload or record** audio in any supported language
2. **Automatic detection** of source language
3. **Translation** to your chosen target language
4. **Speech synthesis** in the target language with selectable voices

## 🔌 REST API

This translator is also available as a REST API for developers!

**📚 Interactive API Docs:** [https://nav772-audio-language-translator.hf.space/docs](https://nav772-audio-language-translator.hf.space/docs)

### API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/health` | GET | Health check and model status |
| `/api/languages` | GET | List all 15 supported languages |
| `/api/voices/{lang}` | GET | Get available TTS voices for a language |
| `/api/transcribe` | POST | Transcribe audio only (no translation) |
| `/api/translate` | POST | Full pipeline (returns JSON) |
| `/api/translate/audio` | POST | Full pipeline (returns audio file) |

### Quick Example (Python)
```python
import requests

# Translate audio to Spanish
with open("input.wav", "rb") as f:
    response = requests.post(
        "https://nav772-audio-language-translator.hf.space/api/translate",
        files={"file": f},
        params={"target_language": "es"}
    )

result = response.json()
print(f"Original: {result['original_text']}")
print(f"Translated: {result['translated_text']}")
```

### Quick Example (cURL)
```bash
curl -X POST \
  "https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \
  -F "file=@input.wav"
```

## 🛠️ Built With This API

| Project | Developer | Description |
|---------|-----------|-------------|
| [Audio Translator App](https://github.com/kaunghtetsan1101/audio_translator) | [@kaunghtetsan11](https://huggingface.co/kaunghtetsan11) | Mobile app built using this API |

*Want your project featured here? Open a discussion or PR!*

## 🏗️ Architecture
```
Audio Input (any language)
        ↓
Whisper ASR (transcription + language detection)
        ↓
NLLB Translation (to target language)
        ↓
Edge-TTS (neural speech synthesis)
        ↓
Audio Output + Text Display
```

## 🔧 Technical Stack

| Component | Model | Parameters | Purpose |
|-----------|-------|------------|---------|
| **ASR** | openai/whisper-small | 244M | Speech recognition with automatic language detection |
| **Translation** | facebook/nllb-200-distilled-600M | 615M | Multilingual neural machine translation |
| **TTS** | Microsoft Edge-TTS | API | High-quality neural text-to-speech |
| **API** | FastAPI | - | REST API endpoints |
| **UI** | Gradio | - | Interactive web interface |

## 🌐 Supported Languages

### Tier 1: Multiple Voice Options (3 each)
- 🇺🇸 English (US/UK accents)
- 🇪🇸 Spanish (Spain/Mexico)
- 🇫🇷 French (France/Canada)
- 🇩🇪 German (Germany/Austria)
- 🇨🇳 Chinese (Mandarin)

### Tier 2: Single High-Quality Voice
- 🇸🇦 Arabic, 🇮🇳 Hindi, 🇯🇵 Japanese, 🇰🇷 Korean, 🇧🇷 Portuguese
- 🇷🇺 Russian, 🇮🇹 Italian, 🇳🇱 Dutch, 🇵🇱 Polish, 🇹🇷 Turkish

**Total: 15 languages, 25 voices**

## 📚 Research Foundation

| Paper | Authors | Year | Contribution |
|-------|---------|------|--------------|
| [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) | Radford et al. | 2022 | Whisper ASR model |
| [No Language Left Behind](https://arxiv.org/abs/2207.04672) | Costa-jussà et al. | 2022 | NLLB translation model |

## 📝 Limitations

- Audio length: Optimized for clips under 30 seconds
- Internet required: Edge-TTS requires connectivity
- GPU recommended: CPU inference is significantly slower

## 👤 Author

**[Nav772](https://huggingface.co/Nav772)** — Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development.

## 📚 Related Projects

- [LLM Evaluation Dashboard](https://huggingface.co/spaces/Nav772/llm-evaluation-dashboard)
- [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-qa-document)
- [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer)

## 📄 License

MIT License