--- title: Afghan Pashto Voice Hub emoji: 🎙️ colorFrom: green colorTo: red sdk: gradio sdk_version: 6.12.0 python_version: '3.12' app_file: app.py pinned: false license: mit short_description: Afghan Pashto TTS, ASR, and dialect speech hub. tags: - gradio - pashto - speech-processing - text-to-speech - automatic-speech-recognition - afghanistan models: - openai/whisper-small --- ## 🎙️ Afghan Pashto Voice Hub — له اصل پښتو سره This Space is a dialect-aware Afghan Pashto speech playground focused on: - **Text-to-Speech (TTS)** - **Automatic Speech Recognition (ASR)** - **Voice cloning demo workflows** - **Dialect and cultural context guidance** Supported dialect coverage in the interface includes: - Kandahari - Paktiawal - Mazari - Herati - Nangarhari ## What makes this Space special - Afghan Pashto-first interface and examples - traditional phoneme guidance for letters like `ښ`, `ږ`, `ڼ`, and `ړ` - cultural context modes such as **National**, **Tribal**, **Religious**, **Cultural**, and **Folk Tales** - graceful fallback demo behavior when heavyweight model checkpoints are unavailable ## Real model hooks The app is wired to support real model loading through environment variables. ### Supported variables - `PASHTO_ASR_MODEL_ID` - default: `openai/whisper-small` - used for real multilingual ASR via Transformers pipeline - `PASHTO_TTS_MODEL_ID` - optional - should point to a VITS-compatible text-to-speech checkpoint - `PASHTO_TTS_SPEAKER_ID` - optional - reserved for future multi-speaker model routing If a configured model cannot be loaded, the Space falls back to built-in lightweight demo behavior so the UI still works. ## Recommended setup on Hugging Face Spaces Add these as Space Variables / Secrets if you want real model-backed inference: - `PASHTO_ASR_MODEL_ID` - `PASHTO_TTS_MODEL_ID` - `PASHTO_TTS_SPEAKER_ID` - `HF_TOKEN` (recommended for higher Hub rate limits and faster authenticated downloads) ## Current implementation notes - ASR can use a real Transformers pipeline when available. - TTS can use a real VITS-compatible model when configured. - Voice cloning currently remains a structured demo path built on extracted audio traits plus synthesis fallback. ## Deployment troubleshooting If you see startup/runtime warnings in Space logs, check these first: - **Unauthenticated HF Hub requests** - Symptom: warning about unauthenticated requests and lower rate limits - Fix: set `HF_TOKEN` in Space Secrets - **CUDA initialized before `spaces` import** - Symptom: `RuntimeError` about importing `spaces` after CUDA-related imports - Fix: ensure `spaces` is imported before `torch`/CUDA initialization - **Gradio 6 constructor warnings (`theme`/`css`)** - Symptom: warning that parameters moved from `Blocks(...)` to `launch(...)` - Fix: pass `theme` and `css` in `app.launch(...)` - **Large model download delays** - Symptom: long startup while downloading model weights - Fix: keep model IDs stable, set `HF_TOKEN`, and avoid unnecessary model ID changes between deploys ## Future improvements - plug in a dedicated Pashto TTS checkpoint - add speaker embeddings for true cloning - add waveform visualizations and downloadable transcripts - add dialect-specific lexicons and normalization rules ## Space config reference For metadata options, see: [Hugging Face Spaces configuration reference](https://huggingface.co/docs/hub/spaces-config-reference)