Instructions to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1", filename="asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-bf16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Use Docker
docker model run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
- Ollama
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Ollama:
ollama run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
- Unsloth Studio
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 to start chatting
- Pi
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Docker Model Runner:
docker model run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
- Lemonade
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
Run and chat with the model
lemonade run user.asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-Q4_K_M
List all available models
lemonade list
asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1
GGUF quantizations of a fine-tuned model for translating Japanese ASMR transcriptions (ASR/Whisper output) into Traditional Chinese.
The model normalizes imperfect audio transcriptions, applies domain-specific glossaries, and translates character dialogue while retaining emotion and nuances.
Echo Mode
The model "echoes" the source Japanese text in an "input" field alongside the target translation. This anchors cross-attention, significantly reducing hallucinations and omitted segments.
Available Quantizations
| Quantization | Filename | Size | Description |
|---|---|---|---|
| q4_k_m | asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf |
5.2 GB | Good balance of quality and size |
| q6_k | asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q6_k.gguf |
6.9 GB | Higher quality, moderate size |
| q8_0 | asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q8_0.gguf |
8.9 GB | Near-lossless quality |
| bf16 | asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-bf16.gguf |
16.7 GB | Full BF16, no quantization loss |
Prompt Example
將以下日語ASMR逐字稿翻譯成繁體中文。
音軌:track01_示例音軌
場景說明:主角與青梅竹馬在校園下午的對話...
術語表(請嚴格使用zh欄位的譯名):
{
"cvs": [],
"characters": [],
"terms": [{"ja": "放課後", "zh": "放學後"}]
}
翻譯前請靜默修正下列Whisper識別錯誤:
- 重複片語(連續3次以上且無變化):僅保留一次
- 錯字/同音異字:依上下文修正
- 字幕版權行(字幕:/翻訳:/QQ/LINE水印):text設為null
- 錯誤專有名詞:依術語表修正
翻譯規則:
- 呻吟與氣息聲(あ、ん、はあ)→ 自然對應(啊、嗯、哈、呼)
- 擬聲詞:日語形式翻譯(パンパン→啪啪);中文形式保留原樣
- 保留角色語氣與口吻
- text欄位只輸出譯文,不加注釋或括號說明
- input欄位為ids所對應的原始日文片段
輸入:逐字稿JSON陣列 — {"id": <n>, "text": "<日文>", "start": <ms>, "end": <ms>}
輸出:將連續構成同一句話的片段合併,JSON陣列格式:
{"ids": [<n>, ...], "input": "<合併後的原始日文,以空格連接>", "text": "<繁體中文>", "start": <最早ms>, "end": <最晚ms>}
字幕版權行:{"ids": [<n>], "input": "<原始日文>", "text": null, "start": <ms>, "end": <ms>}
每個輸入id必須恰好出現在一個輸出項中。
input欄位為ids所對應的原始日文片段以空格連接,text為其繁體中文翻譯。
逐字稿:
[
{"id": 1, "text": "ねぇ、放課後、", "start": 3000, "end": 5000},
{"id": 2, "text": "一緒に帰らない?", "start": 5000, "end": 7000}
]
Example Output:
[{"ids": [1, 2], "input": "ねぇ、放課後、 一緒に帰らない?", "text": "欸,放學後,要不要一起回去?", "start": 3000, "end": 7000}]
Usage
llama-server
llama-server -m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf -c 4096 --port 8080
llama-cli
llama-cli -m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf -p "<your prompt>" -n 2048
Structured Decoding (Recommended)
This model outputs JSON arrays. Using structured decoding (e.g. GBNF grammar or JSON schema constraints) avoids wasted computation on malformed output and guarantees valid JSON on every generation.
JSON Schema:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"ids": {
"type": "array",
"items": {
"type": "integer"
},
"minItems": 1
},
"input": {
"type": "string"
},
"text": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
]
},
"start": {
"type": "integer"
},
"end": {
"type": "integer"
}
},
"required": [
"ids",
"input",
"text",
"start",
"end"
],
"additionalProperties": false
},
"minItems": 1
}
Supported by llama.cpp (--json-schema), vLLM, and outlines.
Training Details
- Base model:
unsloth/Qwen3.5-9B - Method: LoRA (r=16, alpha=16)
- Target modules: gate_proj, down_proj, q_proj, o_proj, k_proj, up_proj, v_proj
- Locale: zh-tw (Traditional Chinese)
- Mode: Echo Mode
- Max sequence length: 4096
- Precision: bf16
- Downloads last month
- 44
4-bit
6-bit
8-bit
16-bit