---
license: cc-by-nc-sa-4.0
base_model: magma90909/vocence_miner_v7
pipeline_tag: text-to-speech
library_name: transformers
language:
  - en
tags:
  - tts
  - prompttts
  - qwen3-tts
  - voice-design
  - vocence
  - british-english
  - uk-accent
---

# vocence_miner_v8

A naturalness-first prompt-driven TTS, built on top of `magma90909/vocence_miner_v8`. 

## Generate

```bash
pip install qwen-tts transformers torch soundfile
```

```python
from qwen_tts import Qwen3TTSModel
import soundfile as sf

m = Qwen3TTSModel.from_pretrained("magma90909/vocence_miner_v8")

wavs, sr = m.generate_voice_design(
    text="The train to Edinburgh departs from platform four.",
    instruct="A man with a British English accent, calm and natural.",
    language="english",
)
sf.write("out.wav", wavs[0], sr)
```

`demo.py` walks through three preset prompts.

## How to write `instruct`

The model responds best to **subtle, conversational** language — not intensifiers like *"intensely sad"* or *"nearly shouting"*. Stack these elements freely:

| Layer | Phrasings |
|-------|-----------|
| Accent / region | *British English*, *Scottish*, *Welsh*, *Northern Irish*, *Irish*, *unspecified* |
| Gender | *a man*, *a woman*, *a British woman* |
| Mood | *speaking warmly*, *softly sad*, *quietly pleased*, *with a touch of anger* |
| Persona | *bedtime storyteller, soft and warm*; *news anchor, professional and neutral*; *meditation guide, soft and serene* |
| Pace | *unhurried*, *brisk steady*, *naturally measured* |

Some example prompts that work well:

```
A British man speaks calmly and naturally.
A woman with a Scottish accent, in an everyday speaking tone.
A man, softly sad, calm and unhurried.
A British news anchor, professional and neutral, at a brisk steady pace.
A clear, neutral voice reading the sentence.
```

## Best-fit and not-fit

**Best at:**
* Natural, everyday English — both US and UK
* Bedtime storyteller / news anchor / meditation guide style reads
* Conversational sadness, warmth, mild anger, gentle pleasure

**Less suited for:**
* Theatrical / caricatured delivery (loud anger, shouted joy, dramatic sadness)
* Extreme intensifier prompts ("nearly shouting", "intensely sad") — the model intentionally tones these down
* Languages other than English

CC BY-NC-SA 4.0 — research and non-commercial use only.

## Files

```
model.safetensors            # merged Talker weights (3.6 GB)
speech_tokenizer/            # Qwen3 12 Hz audio codec (~650 MB)
tokenizer.json + ...         # text tokenizer
config.json + ...            # model configs
miner.py                     # Vocence engine
chute_config.yml             # Chutes build (TEE / pro_6000)
vocence_config.yaml          # runtime knobs
demo.py                      # quick smoke test
```

The Vocence files make this repo deployable on **Bittensor SN78 (Vocence)** via the canonical Vocence/Chutes wrapper without modification.