Instructions to use ronan7878/yiliao_qwen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ronan7878/yiliao_qwen with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-8B-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "ronan7878/yiliao_qwen") - Transformers
How to use ronan7878/yiliao_qwen with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ronan7878/yiliao_qwen") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ronan7878/yiliao_qwen", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ronan7878/yiliao_qwen with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ronan7878/yiliao_qwen" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ronan7878/yiliao_qwen", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ronan7878/yiliao_qwen
- SGLang
How to use ronan7878/yiliao_qwen with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ronan7878/yiliao_qwen" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ronan7878/yiliao_qwen", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ronan7878/yiliao_qwen" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ronan7878/yiliao_qwen", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use ronan7878/yiliao_qwen with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ronan7878/yiliao_qwen to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ronan7878/yiliao_qwen to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ronan7878/yiliao_qwen to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="ronan7878/yiliao_qwen", max_seq_length=2048, ) - Docker Model Runner
How to use ronan7878/yiliao_qwen with Docker Model Runner:
docker model run hf.co/ronan7878/yiliao_qwen
Model Card for Model ID
This is a LoRA adapter for unsloth/Qwen3-8B-unsloth-bnb-4bit, fine-tuned for Chinese medical question-answering tasks. It's designed to provide helpful, detailed, and harmless responses to general medical inquiries based on the provided dataset.
Model Details
Model Description
This model is the result of Supervised Fine-Tuning (SFT) on the Qwen3-8B base model using the PEFT library with LoRA. The fine-tuning was performed using the Unsloth library to optimize for speed and memory efficiency. The model is specifically trained on the krisfu/delicate_medical_r1_data_chinese dataset, which contains a collection of Chinese medical questions and answers.
- Developed by: Ronan
- Model type: Qwen3 (Transformer-based Language Model)
- Language(s) (NLP): Chinese (zh)
- License: Apache-2.0 (Inherited from the base model, but you can choose another)
- Finetuned from model:
unsloth/Qwen3-8B-unsloth-bnb-4bit
Model Sources [optional]
- Repository: [https://huggingface.co/ronan7878/yiliao_qwen/]
- Base Model Repository: https://huggingface.co/unsloth/Qwen3-8B-unsloth-bnb-4bit
Uses
Direct Use
This model is intended for direct use as a question-answering assistant in the Chinese medical domain. It can be used in chatbots, as a research aid, or for generating preliminary medical information. This model is not a substitute for professional medical advice, diagnosis, or treatment.
Example prompt: "我最近总是感到疲劳,并且伴有头痛,可能是什么原因?"
Out-of-Scope Use
This model should not be used for:
- Providing definitive medical diagnoses or treatment plans.
- Emergency medical situations.
- Generating content that could be harmful, unethical, or misleading.
- Any use case where model error could lead to harm.
Bias, Risks, and Limitations
The model's knowledge is limited to the data it was trained on, which may not be comprehensive or up-to-date. It may generate plausible-sounding but incorrect medical information. The training data may contain inherent biases, which the model could reproduce or amplify. It is a language model and has no true understanding of medical concepts.
Recommendations
Always consult a qualified healthcare professional for any medical concerns. The model's outputs should be fact-checked and reviewed by a human expert before being used in any critical application.
How to Get Started with the Model
Use the code below to get started with the model using the Unsloth library.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "ronan7878/yiliao_qwen",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
# Perform inference
inputs = tokenizer(
[
"我最近总是感到疲劳,并且伴有头痛,可能是什么原因?"
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))
Training Details
Training Data
The model was fine-tuned on the krisfu/delicate_medical_r1_data_chinese dataset. This dataset consists of Chinese medical question-and-answer pairs designed for fine-tuning language models.
Training Procedure
The model was trained using the Supervised Fine-Tuning (SFT) method with the TRL library. The Unsloth library was used to enable 4-bit quantization and LoRA for efficient training.
Training Hyperparameters
Training regime: bnb-4bit quantization with LoRA adapters
LoRA r: 16
LoRA alpha: 32
Optimizer: AdamW
Learning Rate: 2e-4
Framework versions
- PEFT 0.17.1
- Transformers 4.43.3
- TRL 0.9.6
- Unsloth 2025.9
- Downloads last month
- -