--- license: cc-by-nc-sa-4.0 base_model: magma90909/vocence_miner_v7 pipeline_tag: text-to-speech library_name: transformers language: - en tags: - tts - prompttts - qwen3-tts - voice-design - vocence - british-english - uk-accent --- # vocence_miner_v8 A naturalness-first prompt-driven TTS, built on top of `magma90909/vocence_miner_v8`. ## Generate ```bash pip install qwen-tts transformers torch soundfile ``` ```python from qwen_tts import Qwen3TTSModel import soundfile as sf m = Qwen3TTSModel.from_pretrained("magma90909/vocence_miner_v8") wavs, sr = m.generate_voice_design( text="The train to Edinburgh departs from platform four.", instruct="A man with a British English accent, calm and natural.", language="english", ) sf.write("out.wav", wavs[0], sr) ``` `demo.py` walks through three preset prompts. ## How to write `instruct` The model responds best to **subtle, conversational** language — not intensifiers like *"intensely sad"* or *"nearly shouting"*. Stack these elements freely: | Layer | Phrasings | |-------|-----------| | Accent / region | *British English*, *Scottish*, *Welsh*, *Northern Irish*, *Irish*, *unspecified* | | Gender | *a man*, *a woman*, *a British woman* | | Mood | *speaking warmly*, *softly sad*, *quietly pleased*, *with a touch of anger* | | Persona | *bedtime storyteller, soft and warm*; *news anchor, professional and neutral*; *meditation guide, soft and serene* | | Pace | *unhurried*, *brisk steady*, *naturally measured* | Some example prompts that work well: ``` A British man speaks calmly and naturally. A woman with a Scottish accent, in an everyday speaking tone. A man, softly sad, calm and unhurried. A British news anchor, professional and neutral, at a brisk steady pace. A clear, neutral voice reading the sentence. ``` ## Best-fit and not-fit **Best at:** * Natural, everyday English — both US and UK * Bedtime storyteller / news anchor / meditation guide style reads * Conversational sadness, warmth, mild anger, gentle pleasure **Less suited for:** * Theatrical / caricatured delivery (loud anger, shouted joy, dramatic sadness) * Extreme intensifier prompts ("nearly shouting", "intensely sad") — the model intentionally tones these down * Languages other than English CC BY-NC-SA 4.0 — research and non-commercial use only. ## Files ``` model.safetensors # merged Talker weights (3.6 GB) speech_tokenizer/ # Qwen3 12 Hz audio codec (~650 MB) tokenizer.json + ... # text tokenizer config.json + ... # model configs miner.py # Vocence engine chute_config.yml # Chutes build (TEE / pro_6000) vocence_config.yaml # runtime knobs demo.py # quick smoke test ``` The Vocence files make this repo deployable on **Bittensor SN78 (Vocence)** via the canonical Vocence/Chutes wrapper without modification.