ERNIE-4.5 Fine-tuned for Mathematical Reasoning

This model is a fine-tuned version of unsloth/ERNIE-4.5-21B-A3B-PT on the nvidia/Nemotron-RL-math-OpenMathReasoning dataset.

Model Description

This model specializes in solving complex mathematical problems including:

Algebra (equations, factoring, systems)
Calculus (derivatives, integrals)
Geometry and trigonometry
Word problems requiring multi-step reasoning
Competition-level mathematics

Training Details

Training Data

Dataset: nvidia/Nemotron-RL-math-OpenMathReasoning
Training Samples: 7,600
Evaluation Samples: 400
Format: Conversational (ERNIE-4.5 format)

Training Configuration

Base Model: unsloth/ERNIE-4.5-21B-A3B-PT (21B parameters)
Method: QLoRA (4-bit quantization + LoRA)
LoRA Rank: 16
LoRA Alpha: 16
Trainable Parameters: 355,090,432 (3.11% of total)

Hyperparameters

Batch Size: 4 (per device)
Gradient Accumulation: 2
Effective Batch Size: 8
Learning Rate: 0.0002
LR Scheduler: Cosine with warmup
Warmup Ratio: 0.05
Training Steps: 707 (stopped early for optimal performance)
Optimizer: AdamW 8-bit
Precision: BF16

Training Results

Final Training Loss: 0.6046
Final Validation Loss: 0.6115
Best Validation Loss: 0.6115
Loss Improvement: 9.2% (from 0.6732 to 0.6115)
Training Time: 4.64 hours
GPU: NVIDIA A100-SXM4-40GB
Peak Memory: 19.375 GB / 39.494 GB (49.058%)

Framework

Unsloth: 2x faster training, 70% less memory
Modal: Serverless GPU infrastructure (40GB A100)
Transformers: 4.56.2
TRL: 0.22.2

Usage

from unsloth import FastModel

# Load the fine-tuned model
model, tokenizer = FastModel.from_pretrained(
    model_name="naazimsnh02/ernie-45-math-finetuned",
    max_seq_length=2048,
    load_in_4bit=True,
    full_finetuning=False,
)

# Prepare for inference
FastModel.for_inference(model)

# Solve a math problem
messages = [{
    "role": "user",
    "content": "Solve the equation: 2x² + 5x - 3 = 0"
}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Example Output

Input:

Solve the equation: x² + 5x + 6 = 0

Output:

To solve x² + 5x + 6 = 0, we can factor:

Find two numbers that multiply to 6 and add to 5:
2 and 3 work because 2 × 3 = 6 and 2 + 3 = 5

Factored form:
(x + 2)(x + 3) = 0

Setting each factor to zero:
x + 2 = 0  →  x = -2
x + 3 = 0  →  x = -3

Therefore: \boxed{x = -2, -3}

Training Progress

Step	Training Loss	Validation Loss
100	0.589	0.673
200	0.661	0.648
300	0.637	0.646
400	0.557	0.640
500	0.587	0.633
600	0.589	0.617
700	0.605	0.611

Training stopped at step 700 for optimal validation loss.

Training Infrastructure

Platform: Modal (modal.com)
GPU: 40GB A100
Training Duration: ~4.6 hours
Checkpointing: Every 100 steps
Evaluation: Every 100 steps

Limitations

Optimized for mathematical reasoning; may not perform as well on other domains
Trained on English language problems only
Best results with problems similar to training data format
Requires GPU for inference (4-bit quantization)

Citation

@misc{ernie45-math-2025,
  title={ERNIE-4.5 Fine-tuned for Mathematical Reasoning},
  author={naazimsnh02},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/naazimsnh02/ernie-45-math-finetuned}}
}

Acknowledgments

ERNIE Team for the base model
Unsloth for optimization framework
NVIDIA for the Nemotron-RL dataset
Modal for GPU infrastructure
ERNIE AI Developer Challenge for the opportunity

License

MIT License - See repository for details

Trained with ❤️ using Unsloth and Modal

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naazimsnh02/ernie-45-math-finetuned

Base model

baidu/ERNIE-4.5-21B-A3B-PT

Finetuned

unsloth/ERNIE-4.5-21B-A3B-PT

Adapter

(2)

this model

Dataset used to train naazimsnh02/ernie-45-math-finetuned

Evaluation results

Final Training Loss on Nemotron-RL-math-OpenMathReasoning
self-reported

0.605
Final Validation Loss on Nemotron-RL-math-OpenMathReasoning
self-reported

0.611
Best Validation Loss on Nemotron-RL-math-OpenMathReasoning
self-reported

0.611