---
title: Afghan Pashto Voice Hub
emoji: 🎙️
colorFrom: green
colorTo: red
sdk: gradio
sdk_version: 6.12.0
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: Afghan Pashto TTS, ASR, and dialect speech hub.
tags:
  - gradio
  - pashto
  - speech-processing
  - text-to-speech
  - automatic-speech-recognition
  - afghanistan
models:
  - openai/whisper-small
---

## 🎙️ Afghan Pashto Voice Hub — له اصل پښتو سره

This Space is a dialect-aware Afghan Pashto speech playground focused on:

- **Text-to-Speech (TTS)**
- **Automatic Speech Recognition (ASR)**
- **Voice cloning demo workflows**
- **Dialect and cultural context guidance**

Supported dialect coverage in the interface includes:

- Kandahari
- Paktiawal
- Mazari
- Herati
- Nangarhari

## What makes this Space special

- Afghan Pashto-first interface and examples
- traditional phoneme guidance for letters like `ښ`, `ږ`, `ڼ`, and `ړ`
- cultural context modes such as **National**, **Tribal**, **Religious**, **Cultural**, and **Folk Tales**
- graceful fallback demo behavior when heavyweight model checkpoints are unavailable

## Real model hooks

The app is wired to support real model loading through environment variables.

### Supported variables

- `PASHTO_ASR_MODEL_ID`
  - default: `openai/whisper-small`
  - used for real multilingual ASR via Transformers pipeline

- `PASHTO_TTS_MODEL_ID`
  - optional
  - should point to a VITS-compatible text-to-speech checkpoint

- `PASHTO_TTS_SPEAKER_ID`
  - optional
  - reserved for future multi-speaker model routing

If a configured model cannot be loaded, the Space falls back to built-in lightweight demo behavior so the UI still works.

## Recommended setup on Hugging Face Spaces

Add these as Space Variables / Secrets if you want real model-backed inference:

- `PASHTO_ASR_MODEL_ID`
- `PASHTO_TTS_MODEL_ID`
- `PASHTO_TTS_SPEAKER_ID`
- `HF_TOKEN` (recommended for higher Hub rate limits and faster authenticated downloads)

## Current implementation notes

- ASR can use a real Transformers pipeline when available.
- TTS can use a real VITS-compatible model when configured.
- Voice cloning currently remains a structured demo path built on extracted audio traits plus synthesis fallback.

## Deployment troubleshooting

If you see startup/runtime warnings in Space logs, check these first:

- **Unauthenticated HF Hub requests**
  - Symptom: warning about unauthenticated requests and lower rate limits
  - Fix: set `HF_TOKEN` in Space Secrets

- **CUDA initialized before `spaces` import**
  - Symptom: `RuntimeError` about importing `spaces` after CUDA-related imports
  - Fix: ensure `spaces` is imported before `torch`/CUDA initialization

- **Gradio 6 constructor warnings (`theme`/`css`)**
  - Symptom: warning that parameters moved from `Blocks(...)` to `launch(...)`
  - Fix: pass `theme` and `css` in `app.launch(...)`

- **Large model download delays**
  - Symptom: long startup while downloading model weights
  - Fix: keep model IDs stable, set `HF_TOKEN`, and avoid unnecessary model ID changes between deploys

## Future improvements

- plug in a dedicated Pashto TTS checkpoint
- add speaker embeddings for true cloning
- add waveform visualizations and downloadable transcripts
- add dialect-specific lexicons and normalization rules

## Space config reference

For metadata options, see:

[Hugging Face Spaces configuration reference](https://huggingface.co/docs/hub/spaces-config-reference)