--- title: Numberblocks One Voice Extraction (CPU - Fixed) emoji: 🔊 colorFrom: purple colorTo: pink sdk: docker pinned: false license: mit --- # Numberblocks One Voice Extraction (CPU Version - Fixed) **🔧 äżźć€ç‰ˆæœŹ**: æ·»ćŠ  Web æœćŠĄć™šä»„é€šèż‡ Hugging Face 恄ćș·æŁ€æŸ„æ‰čć€„ç†ä»»ćŠĄćœšćŽć°èżèĄŒă€‚ This Hugging Face Space automatically extracts **One's** voice from Numberblocks audio files using speaker diarization. ## What It Does 1. **Downloads** all audio files from the `ayf3/numberblocks-audio` dataset 2. **Analyzes** each file using `pyannote.audio` speaker diarization 3. **Identifies** which speaker is "One" using heuristic analysis 4. **Extracts** all speech segments belonging to One 5. **Saves** the clean audio segments to `/data/output/one_audio/` ## 🔧 What Was Fixed ### Problem - **Runtime Error**: "Launch timed out, workload was not healthy after 30 min" - **Cause**: The original Dockerfile ran a long-running batch process (15-30 hours) without a web server - **Issue**: Hugging Face's health check expects an HTTP response on port 7860 ### Solution ✅ **Added Flask Web Server**: Responds to health checks immediately ✅ **Background Processing**: Batch task runs in a separate thread ✅ **Status Dashboard**: View processing progress in real-time at the Space homepage ✅ **Progress Tracking**: Status saved after each file for crash recovery ## Features - ✅ **CPU-friendly**: Runs on basic CPU (no GPU required) - ✅ **Fully automated**: Runs on container startup, no user interaction needed - ✅ **Web Dashboard**: Real-time progress tracking via browser - ✅ **Health Checks**: Passes Hugging Face's health monitoring - ✅ **Smart speaker identification**: Uses heuristics to identify One's voice - ✅ **Error handling**: Continues processing even if individual files fail ## Usage ### Viewing Progress Simply visit this Space's homepage to see: - Current processing status (running/completed/error) - Progress counter (processed X of Y files) - Current file being processed - Number of output files generated - Total duration of extracted audio ### API Endpoints - **`/`** - HTML status dashboard - **`/status`** - JSON status API - **`/health`** - Health check endpoint (used by Hugging Face) ## Output The extracted audio segments are saved in `/data/output/one_audio/` with the format: ``` S01E01_One_12.34_15.67.wav ``` Where: - `S01E01_One`: Episode name - `12.34`: Start time in seconds - `15.67`: End time in seconds ## Technical Details - **Model**: `pyannote/speaker-diarization-3.1` - **Hardware**: CPU (no GPU required) - **SDK**: Docker (for automated batch processing) - **Processing time**: ~15-30 hours for 124 files (CPU) - **Web Framework**: Flask (for health checks and status dashboard) ## Progress Tracking Current progress is saved in `/data/output/processing_report.json`: ```json { "total_files": 124, "processed_files": 124, "total_one_audio_hours": 8.5, "segments": [...], "completed_at": "2026-03-18 18:30:00" } ``` ## Logs View real-time processing logs in the **Logs** tab of this Space. ## Next Steps Once extraction is complete: 1. Download the extracted audio files from `/data/output/one_audio/` 2. Use them to train an RVC (Retrieval-based Voice Conversion) model 3. Generate new speech in One's voice --- **Note**: This Space runs automatically on startup and provides a web dashboard for monitoring. No manual interaction required. **Status**: 🔧 Fixed - Health checks passing, background processing enabled. # Trigger rebuild at Thu Mar 19 06:33:50 CST 2026 # Trigger rebuild at Fri Mar 20 12:33:58 2026