open-r1/Mixture-of-Thoughts
Viewer • Updated • 699k • 5.22k • 312
How to use PhilSad/Lucie-7B-GRPO-Science-500 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="PhilSad/Lucie-7B-GRPO-Science-500")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("PhilSad/Lucie-7B-GRPO-Science-500")
model = AutoModelForCausalLM.from_pretrained("PhilSad/Lucie-7B-GRPO-Science-500")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use PhilSad/Lucie-7B-GRPO-Science-500 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PhilSad/Lucie-7B-GRPO-Science-500"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "PhilSad/Lucie-7B-GRPO-Science-500",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/PhilSad/Lucie-7B-GRPO-Science-500
How to use PhilSad/Lucie-7B-GRPO-Science-500 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "PhilSad/Lucie-7B-GRPO-Science-500" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "PhilSad/Lucie-7B-GRPO-Science-500",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "PhilSad/Lucie-7B-GRPO-Science-500" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "PhilSad/Lucie-7B-GRPO-Science-500",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use PhilSad/Lucie-7B-GRPO-Science-500 with Docker Model Runner:
docker model run hf.co/PhilSad/Lucie-7B-GRPO-Science-500
This is an experiment to add reasonning to Lucie-7B-Instruct with GRPO finetuning.
I used 500 exemples from open-r1/Mixture-of-Thoughts Science subset.
I used the same system prompt and same param on 100 test exemples from open-r1/Mixture-of-Thoughts Science subset. I used gemini-2.0-flash-lite to compare each model answer to the ground truth.
import transformers
messages = [
{'content': 'A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks and then provides the user with the answer. You begin you answer with the reasoning process and answer enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer>\\boxed{letter}</answer>. Your reasoning process should be detailed and should include all the steps you took to arrive at the answer. The answer should be based on the reasoning process and should be only the answer letter.',
'role': 'system'},
{'content': 'What happens to the equilibrium constant when the concentration of a reactant is increased in a reversible reaction?A: The equilibrium constant will fluctuate until a new equilibrium is reached.\nB: The equilibrium constant will increase.\nC: The equilibrium constant will decrease.\nD: The equilibrium constant will not change.',
'role': 'user'}
]
model_name = "PhilSad/Lucie-7B-GRPO-Science-500"
model = transformers.AutoModelForCausalLM.from_pretrained(
model_lora_path,
device_map="auto",
)
tokenizer = transformers.AutoTokenizer.from_pretrained("OpenLLM-France/Lucie-7B-Instruct-v1.1")
pipeline_base = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=512,
device_map="cuda",
temperature=0.1,
top_p=0.95,
top_k=50,
)
with torch.no_grad():
out = pipeline(exemple["prompt"])
print(out[0]["generated_text"][-1]["content"]
# > <think> When the concentration of a reactant is increased in a reversible reaction, the system will shift towards the products to re-establish equilibrium. This shift will cause the equilibrium constant to decrease, as the reaction will favor the formation of more products. </think><answer>\boxed{D}</answer>
docker model run hf.co/PhilSad/Lucie-7B-GRPO-Science-500