---
title: Open Voice Agent
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 8501
pinned: false
---
# 🎤 Open Voice Agent

> A real-time voice AI agent powered by open-source components.

**[Live Demo on HuggingFace Spaces](https://huggingface.co/spaces/dvalle08/open-voice-agent)**

<!-- Replace with your actual GIF after recording -->
![Open Voice Agent Demo](assets/demo2.gif)

---

## Description

A real-time conversational voice agent built with open-source components, deployable on consumer hardware (8GB VRAM).

The core value is in the **custom LiveKit plugins** for STT/TTS and a clean runtime layer for MCP-enabled LLM providers.

## Architecture

Open Voice Agent is built as a local, open-source-first voice pipeline, with hosted providers available when you want to swap a layer.

```text
Recommended stack:
User Audio -> Silero VAD + turn detection -> Moonshine STT -> Ollama -> Pocket TTS -> Audio Response
                            ^ LiveKit realtime transport + Streamlit UI ^
```


| Layer         | Recommended path                                                             | Also supported                                                                      | Notes                                                             |
| ------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| **STT**       | [Moonshine](https://huggingface.co/usefulsensors/moonshine-streaming-medium) | Deepgram `nova-3`, NVIDIA `parakeet-1.1b-en-US-asr-streaming-silero-vad-sortformer` | Moonshine is the local open-source path                           |
| **LLM**       | [Ollama](https://ollama.com/)                                                | NVIDIA NIM                                                                          | Ollama can run locally or via Ollama Cloud                        |
| **TTS**       | [Pocket TTS](https://huggingface.co/kyutai/pocket-tts)                       | Deepgram `aura-2-thalia-en`, NVIDIA Riva with `Magpie-Multilingual.EN-US.Leo`       | Pocket TTS is the local open-source path                          |
| **VAD**       | Silero VAD                                                                   | -                                                                                   | Handles speech start and end detection inside the LiveKit session |
| **Transport** | [LiveKit](https://livekit.io)                                                | -                                                                                   | Realtime WebRTC transport and agent runtime                       |
| **UI**        | Streamlit                                                                    | -                                                                                   | Lightweight browser UI for local and Spaces deployments           |


## Open-Source Core

The reusable core lives in `src/plugins/`. It provides custom LiveKit-compatible components for the local speech stack:

```text
/plugins/
├── moonshine_stt/   # Streaming STT plugin built on Moonshinesrc
└── pocket_tts/      # Streaming TTS plugin built on Pocket TTS
```

These plugins implement LiveKit's `stt.STT` and `tts.TTS` interfaces, so the local stack behaves like any other provider-backed LiveKit pipeline while staying easy to reuse in other agent projects.

- `moonshine_stt` wraps `usefulsensors/moonshine-streaming-medium` for streaming speech-to-text.
- `pocket_tts` wraps `kyutai/pocket-tts` for streaming text-to-speech with native 24 kHz output.
- Deepgram and NVIDIA can replace individual layers without changing the rest of the agent flow.

## Quick Start

- Python 3.10+
- [UV](https://github.com/astral-sh/uv)
- [LiveKit server](https://docs.livekit.io) (cloud or self-hosted)

Recommended setup: Moonshine for STT, Ollama for LLM, and Pocket TTS for speech output.

```bash
git clone https://github.com/dvalle08/open-voice-agent.git
cd open-voice-agent

uv sync
cp .env.example .env
# Edit .env with your LiveKit credentials and provider choices

uv run src/agent/agent.py start
uv run streamlit run src/streamlit_app.py
```

If you prefer hosted components, switch the STT, LLM, or TTS provider in `.env` to Deepgram or NVIDIA for the layers that need it.

Docker remains available for local or Spaces-style deployment:

```bash
docker build -t open-voice-agent .
docker run -p 8501:8501 --env-file .env open-voice-agent
```

## Configuration


| Layer          | Supported providers               | Recommended choice |
| -------------- | --------------------------------- | ------------------ |
| `STT_PROVIDER` | `moonshine`, `deepgram`, `nvidia` | `moonshine`        |
| `LLM_PROVIDER` | `ollama`, `nvidia`                | `ollama`           |
| `TTS_PROVIDER` | `pocket`, `deepgram`, `nvidia`    | `pocket`           |


Recommended `.env` for the local open-source stack:

```bash
STT_PROVIDER=moonshine
MOONSHINE_MODEL_ID=usefulsensors/moonshine-streaming-medium
MOONSHINE_LANGUAGE=en

LLM_PROVIDER=ollama
OLLAMA_CLOUD_MODE=false
OLLAMA_MODEL=your-local-ollama-model
OLLAMA_API_KEY=ollama

TTS_PROVIDER=pocket
POCKET_TTS_VOICE=alba
```

Hosted alternatives by layer:

```bash
STT_PROVIDER=deepgram
DEEPGRAM_STT_MODEL=nova-3
DEEPGRAM_STT_LANGUAGE=en-US
DEEPGRAM_API_KEY=your_deepgram_api_key_here

# or NVIDIA STT
STT_PROVIDER=nvidia
NVIDIA_STT_MODEL=parakeet-1.1b-en-US-asr-streaming-silero-vad-sortformer
NVIDIA_STT_LANGUAGE_CODE=en-US
NVIDIA_STT_API_KEY=

# NVIDIA LLM
LLM_PROVIDER=nvidia
NVIDIA_API_KEY=your_nvidia_api_key_here
NVIDIA_MODEL=meta/llama-3.1-8b-instruct

# Deepgram TTS
TTS_PROVIDER=deepgram
DEEPGRAM_API_KEY=your_deepgram_api_key_here

# or NVIDIA TTS / self-hosted Riva
TTS_PROVIDER=nvidia
NVIDIA_TTS_VOICE=Magpie-Multilingual.EN-US.Leo
NVIDIA_TTS_LANGUAGE_CODE=en-US
NVIDIA_TTS_SERVER=grpc.nvcf.nvidia.com:443
NVIDIA_TTS_FUNCTION_ID=877104f7-e885-42b9-8de8-f6e4c6303969
NVIDIA_TTS_USE_SSL=true
NVIDIA_TTS_API_KEY=
```

## MCP Support

```bash
MCP_ENABLED=true
MCP_SERVER_URL=https://huggingface.co/mcp
MCP_EXTRA_SERVER_URLS=https://docs.livekit.io/mcp
```

## Langfuse Tracing

Langfuse tracing is optional and supported out of the box.

```bash
LANGFUSE_ENABLED=true
LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_PROJECT_ID=your_project_id
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_TRACES=false
```

- The app creates one Langfuse trace per finalized user turn.
- Traces cover the core voice pipeline stages `stt`, `llm`, and `tts`, plus tool activity when tools are used.
- The Streamlit UI shows a live trace dropdown with `Open Trace` links for the current session.
- `LANGFUSE_PROJECT_ID` is required for those trace links.

## Operational Notes

```bash
LLM_CONN_TIMEOUT_SEC=20.0
LLM_CONN_MAX_RETRY=1
LLM_CONN_RETRY_INTERVAL_SEC=1.0
TURN_LLM_STALL_TIMEOUT_SEC=12.0
MCP_STARTUP_GREETING_TIMEOUT_SEC=0.0
POCKET_TTS_CONN_TIMEOUT_SEC=45.0
LIVEKIT_NUM_IDLE_PROCESSES=1
LIVEKIT_INITIALIZE_PROCESS_TIMEOUT_SEC=20.0
```

- `LLM_CONN_*` controls LLM timeout and retry behavior.
- `POCKET_TTS_CONN_TIMEOUT_SEC` is the per-attempt TTS timeout used by Pocket TTS, Deepgram TTS, and NVIDIA TTS in this app.
- `MCP_STARTUP_GREETING_TIMEOUT_SEC=0.0` disables forced interruption of the startup greeting.
- `LIVEKIT_NUM_IDLE_PROCESSES` and `LIVEKIT_INITIALIZE_PROCESS_TIMEOUT_SEC` control idle worker warm-up.
- If STT is working but no reply is produced, start with the backend log line `Turn stalled before LLM stage`.

## Project Structure

```text
open-voice-agent/
├── src/
│   ├── agent/            # LiveKit agent runtime, prompts, tools, tracing
│   ├── plugins/          # Custom Moonshine STT and Pocket TTS plugins
│   ├── api/              # Session bootstrap and token helpers
│   ├── core/             # Settings and logging
│   ├── ui/               # Browser client assets
│   └── streamlit_app.py  # Streamlit entrypoint
├── .env.example          # Full provider and runtime configuration reference
├── Dockerfile            # Container build for local use or Spaces
├── pyproject.toml        # Dependencies
└── start.sh              # Container entrypoint
```

## License

Apache 2.0