Instructions to use zgce/acsr-v2-yi34b-4bpw-hb6-exl2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zgce/acsr-v2-yi34b-4bpw-hb6-exl2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zgce/acsr-v2-yi34b-4bpw-hb6-exl2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("zgce/acsr-v2-yi34b-4bpw-hb6-exl2")
model = AutoModelForMultimodalLM.from_pretrained("zgce/acsr-v2-yi34b-4bpw-hb6-exl2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zgce/acsr-v2-yi34b-4bpw-hb6-exl2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zgce/acsr-v2-yi34b-4bpw-hb6-exl2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zgce/acsr-v2-yi34b-4bpw-hb6-exl2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zgce/acsr-v2-yi34b-4bpw-hb6-exl2

SGLang

How to use zgce/acsr-v2-yi34b-4bpw-hb6-exl2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zgce/acsr-v2-yi34b-4bpw-hb6-exl2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zgce/acsr-v2-yi34b-4bpw-hb6-exl2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zgce/acsr-v2-yi34b-4bpw-hb6-exl2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zgce/acsr-v2-yi34b-4bpw-hb6-exl2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zgce/acsr-v2-yi34b-4bpw-hb6-exl2 with Docker Model Runner:
```
docker model run hf.co/zgce/acsr-v2-yi34b-4bpw-hb6-exl2
```

个人感觉这是当前对中文角色扮演聊天支持最好的模型了

acsr-y34b-4bpw-hb6-exl2

base model: Yi-34B-Chat
LoRA: Yi-34b-alpaca-cot-lora 支持Alpaca格式对话，效果很差不推荐使用Alpaca指令格式
LoRA: Yi-34B-Spicyboros-3.1-LoRA 非官方对话数据集
LoRA: limarpv3-yi-llama-34b-lora 扮演类长回复
Instruction template: ChatML
Origin max token size: 4096
在使用text-generation-webui作后台，SillyTavern做前端时，webui上MaxToken调到8K时SillyTavern的MaxToken要调整到18K，WebUI才不会提前截断
上下文长度8K Alpha调到2.5左右时，聊天到6K 以上Token还是不可避免的出现重复的回复和回复质量下降
在SillyTavern上要得到Roleplay样式的长回复需做如下调整
- 使用ChatML作为指令格式
- 勾选 用换行符换行序列
- 最后输出序列调整为 ### Response (2 paragraphs, engaging, natural, authentic, descriptive, creative):
- 自定义停止符调整为 ["<|im_end|>用户", "\n### Input"]
- 系统提示调整为 Avoid repetition, don't loop. Develop the plot slowly, always stay in character. Describe all actions in full, elaborate, explicit, graphic, and vivid detail. Mention all relevant sensory perceptions.

description

This is test for exllamav2 model.
4bpw python convert.py -i acsr-v2-y34b -c exl2/0000.parquet -o acsr-v2-y34b-4bpw-hb6-exl2 -hb 6 -l 4096 -b 4.15
convert doc
calibration dataset: WikiText-2-v1
oobabooga/text-generation-webui must add --trust-remote-code into CMD_FLAGS.txt and use ExLlamav2 to load model

Downloads last month: 3