Instructions to use ronan7878/yiliao_qwen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ronan7878/yiliao_qwen with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-8B-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "ronan7878/yiliao_qwen")

Transformers

How to use ronan7878/yiliao_qwen with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ronan7878/yiliao_qwen")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ronan7878/yiliao_qwen", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ronan7878/yiliao_qwen with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ronan7878/yiliao_qwen"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ronan7878/yiliao_qwen",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ronan7878/yiliao_qwen

SGLang

How to use ronan7878/yiliao_qwen with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ronan7878/yiliao_qwen" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ronan7878/yiliao_qwen",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ronan7878/yiliao_qwen" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ronan7878/yiliao_qwen",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use ronan7878/yiliao_qwen with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ronan7878/yiliao_qwen to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ronan7878/yiliao_qwen to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ronan7878/yiliao_qwen to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="ronan7878/yiliao_qwen",
    max_seq_length=2048,
)

Docker Model Runner
How to use ronan7878/yiliao_qwen with Docker Model Runner:
```
docker model run hf.co/ronan7878/yiliao_qwen
```

Model Card for Model ID

This is a LoRA adapter for unsloth/Qwen3-8B-unsloth-bnb-4bit, fine-tuned for Chinese medical question-answering tasks. It's designed to provide helpful, detailed, and harmless responses to general medical inquiries based on the provided dataset.

Model Details

Model Description

This model is the result of Supervised Fine-Tuning (SFT) on the Qwen3-8B base model using the PEFT library with LoRA. The fine-tuning was performed using the Unsloth library to optimize for speed and memory efficiency. The model is specifically trained on the krisfu/delicate_medical_r1_data_chinese dataset, which contains a collection of Chinese medical questions and answers.

Developed by: Ronan
Model type: Qwen3 (Transformer-based Language Model)
Language(s) (NLP): Chinese (zh)
License: Apache-2.0 (Inherited from the base model, but you can choose another)
Finetuned from model: unsloth/Qwen3-8B-unsloth-bnb-4bit

Model Sources [optional]

Repository: [https://huggingface.co/ronan7878/yiliao_qwen/]
Base Model Repository: https://huggingface.co/unsloth/Qwen3-8B-unsloth-bnb-4bit

Uses

Direct Use

This model is intended for direct use as a question-answering assistant in the Chinese medical domain. It can be used in chatbots, as a research aid, or for generating preliminary medical information. This model is not a substitute for professional medical advice, diagnosis, or treatment.

Example prompt: "我最近总是感到疲劳，并且伴有头痛，可能是什么原因？"

Out-of-Scope Use

This model should not be used for:

Providing definitive medical diagnoses or treatment plans.
Emergency medical situations.
Generating content that could be harmful, unethical, or misleading.
Any use case where model error could lead to harm.

Bias, Risks, and Limitations

The model's knowledge is limited to the data it was trained on, which may not be comprehensive or up-to-date. It may generate plausible-sounding but incorrect medical information. The training data may contain inherent biases, which the model could reproduce or amplify. It is a language model and has no true understanding of medical concepts.

Recommendations

Always consult a qualified healthcare professional for any medical concerns. The model's outputs should be fact-checked and reviewed by a human expert before being used in any critical application.

How to Get Started with the Model

Use the code below to get started with the model using the Unsloth library.

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "ronan7878/yiliao_qwen",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# Perform inference
inputs = tokenizer(
[
    "我最近总是感到疲劳，并且伴有头痛，可能是什么原因？"
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))

Training Details

Training Data

The model was fine-tuned on the krisfu/delicate_medical_r1_data_chinese dataset. This dataset consists of Chinese medical question-and-answer pairs designed for fine-tuning language models.

Training Procedure

The model was trained using the Supervised Fine-Tuning (SFT) method with the TRL library. The Unsloth library was used to enable 4-bit quantization and LoRA for efficient training.

Training Hyperparameters

Training regime: bnb-4bit quantization with LoRA adapters
LoRA r: 16
LoRA alpha: 32
Optimizer: AdamW
Learning Rate: 2e-4

Framework versions

PEFT 0.17.1
Transformers 4.43.3
TRL 0.9.6
Unsloth 2025.9

Downloads last month: -

Model tree for ronan7878/yiliao_qwen

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Quantized

unsloth/Qwen3-8B-unsloth-bnb-4bit

Adapter

(73)

this model

ronan7878
/

yiliao_qwen