--- title: Audio Language Translator emoji: 🌍 colorFrom: yellow colorTo: red sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false license: mit suggested_hardware: t4-small --- # 🌍 Audio Language Translator Translate spoken audio between 15 languages using a complete AI pipeline. ## 🎯 What This Does 1. **Upload or record** audio in any supported language 2. **Automatic detection** of source language 3. **Translation** to your chosen target language 4. **Speech synthesis** in the target language with selectable voices ## 🔌 REST API This translator is also available as a REST API for developers! **📚 Interactive API Docs:** [https://nav772-audio-language-translator.hf.space/docs](https://nav772-audio-language-translator.hf.space/docs) ### API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/health` | GET | Health check and model status | | `/api/languages` | GET | List all 15 supported languages | | `/api/voices/{lang}` | GET | Get available TTS voices for a language | | `/api/transcribe` | POST | Transcribe audio only (no translation) | | `/api/translate` | POST | Full pipeline (returns JSON) | | `/api/translate/audio` | POST | Full pipeline (returns audio file) | ### Quick Example (Python) ```python import requests # Translate audio to Spanish with open("input.wav", "rb") as f: response = requests.post( "https://nav772-audio-language-translator.hf.space/api/translate", files={"file": f}, params={"target_language": "es"} ) result = response.json() print(f"Original: {result['original_text']}") print(f"Translated: {result['translated_text']}") ``` ### Quick Example (cURL) ```bash curl -X POST \ "https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \ -F "file=@input.wav" ``` ## 🛠️ Built With This API | Project | Developer | Description | |---------|-----------|-------------| | [Audio Translator App](https://github.com/kaunghtetsan1101/audio_translator) | [@kaunghtetsan11](https://huggingface.co/kaunghtetsan11) | Mobile app built using this API | *Want your project featured here? Open a discussion or PR!* ## 🏗️ Architecture ``` Audio Input (any language) ↓ Whisper ASR (transcription + language detection) ↓ NLLB Translation (to target language) ↓ Edge-TTS (neural speech synthesis) ↓ Audio Output + Text Display ``` ## 🔧 Technical Stack | Component | Model | Parameters | Purpose | |-----------|-------|------------|---------| | **ASR** | openai/whisper-small | 244M | Speech recognition with automatic language detection | | **Translation** | facebook/nllb-200-distilled-600M | 615M | Multilingual neural machine translation | | **TTS** | Microsoft Edge-TTS | API | High-quality neural text-to-speech | | **API** | FastAPI | - | REST API endpoints | | **UI** | Gradio | - | Interactive web interface | ## 🌐 Supported Languages ### Tier 1: Multiple Voice Options (3 each) - 🇺🇸 English (US/UK accents) - 🇪🇸 Spanish (Spain/Mexico) - 🇫🇷 French (France/Canada) - 🇩🇪 German (Germany/Austria) - 🇨🇳 Chinese (Mandarin) ### Tier 2: Single High-Quality Voice - 🇸🇦 Arabic, 🇮🇳 Hindi, 🇯🇵 Japanese, 🇰🇷 Korean, 🇧🇷 Portuguese - 🇷🇺 Russian, 🇮🇹 Italian, 🇳🇱 Dutch, 🇵🇱 Polish, 🇹🇷 Turkish **Total: 15 languages, 25 voices** ## 📚 Research Foundation | Paper | Authors | Year | Contribution | |-------|---------|------|--------------| | [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) | Radford et al. | 2022 | Whisper ASR model | | [No Language Left Behind](https://arxiv.org/abs/2207.04672) | Costa-jussà et al. | 2022 | NLLB translation model | ## 📝 Limitations - Audio length: Optimized for clips under 30 seconds - Internet required: Edge-TTS requires connectivity - GPU recommended: CPU inference is significantly slower ## 👤 Author **[Nav772](https://huggingface.co/Nav772)** — Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development. ## 📚 Related Projects - [LLM Evaluation Dashboard](https://huggingface.co/spaces/Nav772/llm-evaluation-dashboard) - [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-qa-document) - [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer) ## 📄 License MIT License