Instructions to use PhilSad/Lucie-7B-GRPO-Science-500 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PhilSad/Lucie-7B-GRPO-Science-500 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PhilSad/Lucie-7B-GRPO-Science-500")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PhilSad/Lucie-7B-GRPO-Science-500")
model = AutoModelForCausalLM.from_pretrained("PhilSad/Lucie-7B-GRPO-Science-500")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PhilSad/Lucie-7B-GRPO-Science-500 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PhilSad/Lucie-7B-GRPO-Science-500"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PhilSad/Lucie-7B-GRPO-Science-500",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PhilSad/Lucie-7B-GRPO-Science-500

SGLang

How to use PhilSad/Lucie-7B-GRPO-Science-500 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PhilSad/Lucie-7B-GRPO-Science-500" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PhilSad/Lucie-7B-GRPO-Science-500",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PhilSad/Lucie-7B-GRPO-Science-500" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PhilSad/Lucie-7B-GRPO-Science-500",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PhilSad/Lucie-7B-GRPO-Science-500 with Docker Model Runner:
```
docker model run hf.co/PhilSad/Lucie-7B-GRPO-Science-500
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

About

This is an experiment to add reasonning to Lucie-7B-Instruct with GRPO finetuning.

I used 500 exemples from open-r1/Mixture-of-Thoughts Science subset.

Evaluation procedure

I used the same system prompt and same param on 100 test exemples from open-r1/Mixture-of-Thoughts Science subset. I used gemini-2.0-flash-lite to compare each model answer to the ground truth.

Usage

import transformers

messages = [
  {'content': 'A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks and then provides the user with the answer. You begin you answer with the reasoning process and answer enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer>\\boxed{letter}</answer>. Your reasoning process should be detailed and should include all the steps you took to arrive at the answer. The answer should be based on the reasoning process and should be only the answer letter.',
   'role': 'system'},
  {'content': 'What happens to the equilibrium constant when the concentration of a reactant is increased in a reversible reaction?A: The equilibrium constant will fluctuate until a new equilibrium is reached.\nB: The equilibrium constant will increase.\nC: The equilibrium constant will decrease.\nD: The equilibrium constant will not change.',
   'role': 'user'}
]

model_name = "PhilSad/Lucie-7B-GRPO-Science-500"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_lora_path,
    device_map="auto",
)
tokenizer = transformers.AutoTokenizer.from_pretrained("OpenLLM-France/Lucie-7B-Instruct-v1.1")


pipeline_base = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    device_map="cuda",
    temperature=0.1,
    top_p=0.95,
    top_k=50,
)

with torch.no_grad():
    out = pipeline(exemple["prompt"])

print(out[0]["generated_text"][-1]["content"]

# > <think> When the concentration of a reactant is increased in a reversible reaction, the system will shift towards the products to re-establish equilibrium. This shift will cause the equilibrium constant to decrease, as the reaction will favor the formation of more products. </think><answer>\boxed{D}</answer>

Downloads last month: 2

Safetensors

Model size

7B params

Tensor type

F32

Model tree for PhilSad/Lucie-7B-GRPO-Science-500

Base model

OpenLLM-France/Lucie-7B

Finetuned

OpenLLM-France/Lucie-7B-Instruct-v1.1

Finetuned

(8)

this model

Quantizations

3 models

PhilSad
/

Lucie-7B-GRPO-Science-500

About

Evaluation procedure

Usage

Model tree for PhilSad/Lucie-7B-GRPO-Science-500

Dataset used to train PhilSad/Lucie-7B-GRPO-Science-500