nvidia/Nemotron-RL-math-OpenMathReasoning
Viewer • Updated • 113k • 529 • 17
How to use naazimsnh02/ernie-45-math-finetuned with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/ernie-45-math-finetuned to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naazimsnh02/ernie-45-math-finetuned to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for naazimsnh02/ernie-45-math-finetuned to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="naazimsnh02/ernie-45-math-finetuned",
max_seq_length=2048,
)This model is a fine-tuned version of unsloth/ERNIE-4.5-21B-A3B-PT on the nvidia/Nemotron-RL-math-OpenMathReasoning dataset.
This model specializes in solving complex mathematical problems including:
from unsloth import FastModel
# Load the fine-tuned model
model, tokenizer = FastModel.from_pretrained(
model_name="naazimsnh02/ernie-45-math-finetuned",
max_seq_length=2048,
load_in_4bit=True,
full_finetuning=False,
)
# Prepare for inference
FastModel.for_inference(model)
# Solve a math problem
messages = [{
"role": "user",
"content": "Solve the equation: 2x² + 5x - 3 = 0"
}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Input:
Solve the equation: x² + 5x + 6 = 0
Output:
To solve x² + 5x + 6 = 0, we can factor:
Find two numbers that multiply to 6 and add to 5:
2 and 3 work because 2 × 3 = 6 and 2 + 3 = 5
Factored form:
(x + 2)(x + 3) = 0
Setting each factor to zero:
x + 2 = 0 → x = -2
x + 3 = 0 → x = -3
Therefore: \boxed{x = -2, -3}
| Step | Training Loss | Validation Loss |
|---|---|---|
| 100 | 0.589 | 0.673 |
| 200 | 0.661 | 0.648 |
| 300 | 0.637 | 0.646 |
| 400 | 0.557 | 0.640 |
| 500 | 0.587 | 0.633 |
| 600 | 0.589 | 0.617 |
| 700 | 0.605 | 0.611 |
Training stopped at step 700 for optimal validation loss.
@misc{ernie45-math-2025,
title={ERNIE-4.5 Fine-tuned for Mathematical Reasoning},
author={naazimsnh02},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/naazimsnh02/ernie-45-math-finetuned}}
}
MIT License - See repository for details
Trained with ❤️ using Unsloth and Modal