Instructions to use FlameF0X/Qwen2-0.2B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FlameF0X/Qwen2-0.2B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FlameF0X/Qwen2-0.2B-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("FlameF0X/Qwen2-0.2B-it")
model = AutoModelForCausalLM.from_pretrained("FlameF0X/Qwen2-0.2B-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use FlameF0X/Qwen2-0.2B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FlameF0X/Qwen2-0.2B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlameF0X/Qwen2-0.2B-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FlameF0X/Qwen2-0.2B-it

SGLang

How to use FlameF0X/Qwen2-0.2B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FlameF0X/Qwen2-0.2B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlameF0X/Qwen2-0.2B-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FlameF0X/Qwen2-0.2B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlameF0X/Qwen2-0.2B-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FlameF0X/Qwen2-0.2B-it with Docker Model Runner:
```
docker model run hf.co/FlameF0X/Qwen2-0.2B-it
```

Qwen2-0.2B-it / README.md

FlameF0X

Update README.md

ad52d8f verified 4 months ago

preview code

Raw

History Blame Contribute Delete

3.27 kB

	---
	library_name: transformers
	base_model:
	- FlameF0X/Qwen2-0.2B-pt
	license: apache-2.0
	datasets:
	- Salesforce/wikitext
	- roneneldan/TinyStories
	- FlameF0X/arXiv-AI-ML
	- Skylion007/openwebtext
	- flytech/python-codes-25k
	- bookcorpus/bookcorpus
	- HuggingFaceH4/ultrachat_200k
	- openai/gsm8k
	- microsoft/orca-math-word-problems-200k
	- laion/OIG
	- microsoft/wiki_qa
	metrics:
	- accuracy
	model-index:
	- name: FlameF0X/Qwen2-0.2B-it
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	id: openai/gsm8k
	name: GSM8K
	type: gsm8k
	config: main
	split: test
	metrics:
	- name: Accuracy
	type: accuracy
	value: 2.00
	verified: false
	source:
	name: Local Benchmark
	url: https://huggingface.co/FlameF0X/Qwen2-0.2B-it
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	id: TIGER-Lab/MMLU-Pro
	name: MMLU-Pro
	type: TIGER-Lab/MMLU-Pro
	config: default
	split: test
	metrics:
	- name: Accuracy
	type: accuracy
	value: 4.00
	verified: false
	source:
	name: Local Benchmark
	url: https://huggingface.co/FlameF0X/Qwen2-0.2B-it
	---

	## Evaluation Results

	\| Benchmark \| Score \|
	\|-----------\|-------\|
	\| GSM8K (test) \| 2.00% \|
	\| MMLU-Pro (test) \| 4.00% \|

	> Results obtained via local evaluation. Given the model size (0.2B parameters), low benchmark scores are expected.

	## Model Usage
	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_path = "FlameF0X/Qwen2-0.2B-it"

	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	torch_dtype="auto",
	device_map="auto",
	trust_remote_code=True
	)

	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "Explain how a transformer model works in one sentence."}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=128,
	do_sample=True,
	temperature=0.7
	)

	generated_ids = [
	output_ids[len(input_ids):]
	for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(f"--- Assistant Response ---\n{response}")
	```

	## Training Data

	This model was instruction-tuned on a mixture of:

	- `Salesforce/wikitext` — General text
	- `roneneldan/TinyStories` — Short story generation
	- `FlameF0X/arXiv-AI-ML` — AI/ML research papers
	- `Skylion007/openwebtext` — Web text
	- `flytech/python-codes-25k` — Python code
	- `bookcorpus/bookcorpus` — Books
	- `HuggingFaceH4/ultrachat_200k` — Instruction following
	- `openai/gsm8k` — Math reasoning
	- `microsoft/orca-math-word-problems-200k` — Math word problems
	- `laion/OIG` — Open instruction generalist
	- `microsoft/wiki_qa` — Question answering