Instructions to use wnwu/Qwen3.5-9B-gelv-poet with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use wnwu/Qwen3.5-9B-gelv-poet with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="wnwu/Qwen3.5-9B-gelv-poet")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("wnwu/Qwen3.5-9B-gelv-poet")
model = AutoModelForMultimodalLM.from_pretrained("wnwu/Qwen3.5-9B-gelv-poet")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use wnwu/Qwen3.5-9B-gelv-poet with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="wnwu/Qwen3.5-9B-gelv-poet",
	filename="Qwen3.5-9B-gelv-poet.Q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use wnwu/Qwen3.5-9B-gelv-poet with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0
# Run inference directly in the terminal:
llama-cli -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0
# Run inference directly in the terminal:
llama-cli -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Use Docker

docker model run hf.co/wnwu/Qwen3.5-9B-gelv-poet:Q8_0

LM Studio
Jan

vLLM

How to use wnwu/Qwen3.5-9B-gelv-poet with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "wnwu/Qwen3.5-9B-gelv-poet"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wnwu/Qwen3.5-9B-gelv-poet",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/wnwu/Qwen3.5-9B-gelv-poet:Q8_0

SGLang

How to use wnwu/Qwen3.5-9B-gelv-poet with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "wnwu/Qwen3.5-9B-gelv-poet" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wnwu/Qwen3.5-9B-gelv-poet",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "wnwu/Qwen3.5-9B-gelv-poet" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wnwu/Qwen3.5-9B-gelv-poet",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use wnwu/Qwen3.5-9B-gelv-poet with Ollama:
```
ollama run hf.co/wnwu/Qwen3.5-9B-gelv-poet:Q8_0
```

Unsloth Studio

How to use wnwu/Qwen3.5-9B-gelv-poet with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for wnwu/Qwen3.5-9B-gelv-poet to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for wnwu/Qwen3.5-9B-gelv-poet to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for wnwu/Qwen3.5-9B-gelv-poet to start chatting

How to use wnwu/Qwen3.5-9B-gelv-poet with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "wnwu/Qwen3.5-9B-gelv-poet:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use wnwu/Qwen3.5-9B-gelv-poet with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use wnwu/Qwen3.5-9B-gelv-poet with Docker Model Runner:
```
docker model run hf.co/wnwu/Qwen3.5-9B-gelv-poet:Q8_0
```

Lemonade

How to use wnwu/Qwen3.5-9B-gelv-poet with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull wnwu/Qwen3.5-9B-gelv-poet:Q8_0

Run and chat with the model

lemonade run user.Qwen3.5-9B-gelv-poet-Q8_0

List all available models

lemonade list

Qwen3.5-9B 格律诗模型 (Gelv Poet)

基于 Qwen3.5-9B 微调的中国古典格律诗生成模型。严格遵循平仄格律规则，能够创作五言/七言绝句和律诗。

模型特点

严格遵循「二四六分明」的平仄规则
支持五言绝句、七言绝句、五言律诗、七言律诗
律诗颔联颈联对仗工整
偶数句押韵

训练细节

基座模型: Qwen/Qwen3.5-9B
方法: Unsloth + QLoRA (4-bit, LoRA r=64, alpha=64)
数据: 从chinese-poetry中筛选的24,525首严格合规格律诗，生成110,060条训练样本
硬件: RTX 4090 24GB
训练轮数: 3 epochs (10,320 steps)
最终 eval_loss: 0.094

评估结果（结构评分）

诗体	得分
五言绝句	0.90
七言绝句	1.00
五言律诗	0.97
七言律诗	1.00
整体	0.97

使用方法

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "wnwu/Qwen3.5-9B-gelv-poet"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True,
)
model.eval()

SYSTEM_PROMPT = (
    "你是一位精通中国古典诗词格律的诗人。你严格遵循平仄格律规则，"
    "擅长创作五言律诗、七言律诗、五言绝句、七言绝句等格律诗。"
    "你熟知「二四六分明」的平仄规则，懂得对仗、押韵的要求。"
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "请以「秋夜」为题，写一首严格符合格律的七言律诗。\n\n请直接输出诗句。"},
]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs, max_new_tokens=256, temperature=0.7,
        top_p=0.9, top_k=50, do_sample=True, repetition_penalty=1.15,
    )
new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

交互式推理

python inference.py --model wnwu/Qwen3.5-9B-gelv-poet

Ollama (GGUF)

GGUF Q8_0 量化版本可从本仓库的 gguf 分支下载。

生成示例

七言绝句「塞下曲」：

寒塞無因見落梅，胡人吹入內宮來。君恩如水東流去，應與春風豈復迴。

五言律诗「山居」：

欲出還中止，微陰却快晴。檻花栽盡活，籠鳥教初鳴。身寄江湖久，心知富貴輕。還家雖有日，遠宦尚餘生。

五言绝句「雪夜」：

酒力欺寒淺，心清睡較遲。梅花擎雪影，和月度疏籬。

Downloads last month: 11

Safetensors

Model size

10B params

Tensor type

BF16

F32

Model tree for wnwu/Qwen3.5-9B-gelv-poet

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(296)

this model

Finetunes

1 model