Instructions to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1",
	filename="asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-bf16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Use Docker

docker model run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

LM Studio
Jan

vLLM

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Ollama
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Ollama:
```
ollama run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
```

Unsloth Studio

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 to start chatting

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Docker Model Runner:
```
docker model run hf.co/mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M
```

Lemonade

How to use mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1:Q4_K_M

Run and chat with the model

lemonade run user.asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-Q4_K_M

List all available models

lemonade list

asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1

GGUF quantizations of a fine-tuned model for translating Japanese ASMR transcriptions (ASR/Whisper output) into Traditional Chinese.

The model normalizes imperfect audio transcriptions, applies domain-specific glossaries, and translates character dialogue while retaining emotion and nuances.

Echo Mode

The model "echoes" the source Japanese text in an "input" field alongside the target translation. This anchors cross-attention, significantly reducing hallucinations and omitted segments.

Available Quantizations

Quantization	Filename	Size	Description
q4_k_m	`asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf`	5.2 GB	Good balance of quality and size
q6_k	`asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q6_k.gguf`	6.9 GB	Higher quality, moderate size
q8_0	`asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q8_0.gguf`	8.9 GB	Near-lossless quality
bf16	`asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-bf16.gguf`	16.7 GB	Full BF16, no quantization loss

Prompt Example

將以下日語ASMR逐字稿翻譯成繁體中文。

音軌：track01_示例音軌
場景說明：主角與青梅竹馬在校園下午的對話...

術語表（請嚴格使用zh欄位的譯名）：
{
  "cvs": [],
  "characters": [],
  "terms": [{"ja": "放課後", "zh": "放學後"}]
}

翻譯前請靜默修正下列Whisper識別錯誤：
- 重複片語（連續3次以上且無變化）：僅保留一次
- 錯字／同音異字：依上下文修正
- 字幕版權行（字幕：／翻訳：／QQ／LINE水印）：text設為null
- 錯誤專有名詞：依術語表修正

翻譯規則：
- 呻吟與氣息聲（あ、ん、はあ）→ 自然對應（啊、嗯、哈、呼）
- 擬聲詞：日語形式翻譯（パンパン→啪啪）；中文形式保留原樣
- 保留角色語氣與口吻
- text欄位只輸出譯文，不加注釋或括號說明
- input欄位為ids所對應的原始日文片段

輸入：逐字稿JSON陣列 — {"id": <n>, "text": "<日文>", "start": <ms>, "end": <ms>}

輸出：將連續構成同一句話的片段合併，JSON陣列格式：
{"ids": [<n>, ...], "input": "<合併後的原始日文，以空格連接>", "text": "<繁體中文>", "start": <最早ms>, "end": <最晚ms>}

字幕版權行：{"ids": [<n>], "input": "<原始日文>", "text": null, "start": <ms>, "end": <ms>}
每個輸入id必須恰好出現在一個輸出項中。
input欄位為ids所對應的原始日文片段以空格連接，text為其繁體中文翻譯。

逐字稿：
[
  {"id": 1, "text": "ねぇ、放課後、", "start": 3000, "end": 5000},
  {"id": 2, "text": "一緒に帰らない?", "start": 5000, "end": 7000}
]

Example Output:

[{"ids": [1, 2], "input": "ねぇ、放課後、 一緒に帰らない?", "text": "欸，放學後，要不要一起回去？", "start": 3000, "end": 7000}]

Usage

llama-server

llama-server -m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf -c 4096 --port 8080

llama-cli

llama-cli -m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf -p "<your prompt>" -n 2048

Structured Decoding (Recommended)

This model outputs JSON arrays. Using structured decoding (e.g. GBNF grammar or JSON schema constraints) avoids wasted computation on malformed output and guarantees valid JSON on every generation.

JSON Schema:

{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "ids": {
        "type": "array",
        "items": {
          "type": "integer"
        },
        "minItems": 1
      },
      "input": {
        "type": "string"
      },
      "text": {
        "anyOf": [
          {
            "type": "string"
          },
          {
            "type": "null"
          }
        ]
      },
      "start": {
        "type": "integer"
      },
      "end": {
        "type": "integer"
      }
    },
    "required": [
      "ids",
      "input",
      "text",
      "start",
      "end"
    ],
    "additionalProperties": false
  },
  "minItems": 1
}

Supported by llama.cpp (--json-schema), vLLM, and outlines.

Training Details

Base model: unsloth/Qwen3.5-9B
Method: LoRA (r=16, alpha=16)
Target modules: gate_proj, down_proj, q_proj, o_proj, k_proj, up_proj, v_proj
Locale: zh-tw (Traditional Chinese)
Mode: Echo Mode
Max sequence length: 4096
Precision: bf16

Downloads last month: 44

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

6-bit

8-bit

16-bit

Model tree for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

unsloth/Qwen3.5-9B

Quantized

(19)

this model