asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1

GGUF quantizations of a fine-tuned model for translating Japanese ASMR transcriptions (ASR/Whisper output) into Traditional Chinese.

The model normalizes imperfect audio transcriptions, applies domain-specific glossaries, and translates character dialogue while retaining emotion and nuances.

Echo Mode

The model "echoes" the source Japanese text in an "input" field alongside the target translation. This anchors cross-attention, significantly reducing hallucinations and omitted segments.

Available Quantizations

Quantization Filename Size Description
q4_k_m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf 5.2 GB Good balance of quality and size
q6_k asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q6_k.gguf 6.9 GB Higher quality, moderate size
q8_0 asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q8_0.gguf 8.9 GB Near-lossless quality
bf16 asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-bf16.gguf 16.7 GB Full BF16, no quantization loss

Prompt Example

將以下日語ASMR逐字稿翻譯成繁體中文。

音軌:track01_示例音軌
場景說明:主角與青梅竹馬在校園下午的對話...

術語表(請嚴格使用zh欄位的譯名):
{
  "cvs": [],
  "characters": [],
  "terms": [{"ja": "放課後", "zh": "放學後"}]
}

翻譯前請靜默修正下列Whisper識別錯誤:
- 重複片語(連續3次以上且無變化):僅保留一次
- 錯字/同音異字:依上下文修正
- 字幕版權行(字幕:/翻訳:/QQ/LINE水印):text設為null
- 錯誤專有名詞:依術語表修正

翻譯規則:
- 呻吟與氣息聲(あ、ん、はあ)→ 自然對應(啊、嗯、哈、呼)
- 擬聲詞:日語形式翻譯(パンパン→啪啪);中文形式保留原樣
- 保留角色語氣與口吻
- text欄位只輸出譯文,不加注釋或括號說明
- input欄位為ids所對應的原始日文片段

輸入:逐字稿JSON陣列 — {"id": <n>, "text": "<日文>", "start": <ms>, "end": <ms>}

輸出:將連續構成同一句話的片段合併,JSON陣列格式:
{"ids": [<n>, ...], "input": "<合併後的原始日文,以空格連接>", "text": "<繁體中文>", "start": <最早ms>, "end": <最晚ms>}

字幕版權行:{"ids": [<n>], "input": "<原始日文>", "text": null, "start": <ms>, "end": <ms>}
每個輸入id必須恰好出現在一個輸出項中。
input欄位為ids所對應的原始日文片段以空格連接,text為其繁體中文翻譯。

逐字稿:
[
  {"id": 1, "text": "ねぇ、放課後、", "start": 3000, "end": 5000},
  {"id": 2, "text": "一緒に帰らない?", "start": 5000, "end": 7000}
]

Example Output:

[{"ids": [1, 2], "input": "ねぇ、放課後、 一緒に帰らない?", "text": "欸,放學後,要不要一起回去?", "start": 3000, "end": 7000}]

Usage

llama-server

llama-server -m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf -c 4096 --port 8080

llama-cli

llama-cli -m asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1-q4_k_m.gguf -p "<your prompt>" -n 2048

Structured Decoding (Recommended)

This model outputs JSON arrays. Using structured decoding (e.g. GBNF grammar or JSON schema constraints) avoids wasted computation on malformed output and guarantees valid JSON on every generation.

JSON Schema:

{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "ids": {
        "type": "array",
        "items": {
          "type": "integer"
        },
        "minItems": 1
      },
      "input": {
        "type": "string"
      },
      "text": {
        "anyOf": [
          {
            "type": "string"
          },
          {
            "type": "null"
          }
        ]
      },
      "start": {
        "type": "integer"
      },
      "end": {
        "type": "integer"
      }
    },
    "required": [
      "ids",
      "input",
      "text",
      "start",
      "end"
    ],
    "additionalProperties": false
  },
  "minItems": 1
}

Supported by llama.cpp (--json-schema), vLLM, and outlines.

Training Details

  • Base model: unsloth/Qwen3.5-9B
  • Method: LoRA (r=16, alpha=16)
  • Target modules: gate_proj, down_proj, q_proj, o_proj, k_proj, up_proj, v_proj
  • Locale: zh-tw (Traditional Chinese)
  • Mode: Echo Mode
  • Max sequence length: 4096
  • Precision: bf16
Downloads last month
44
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mmis1000/asmr-qwen3.5-9b-zh-tw-echo-gguf-v0.1

Finetuned
Qwen/Qwen3.5-9B
Quantized
(19)
this model