assn2-simpo-qwen2.5-1.5b-lora

SimPO LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct.

This repository contains a LoRA adapter trained for CAS4133 Assignment 2.

Base model

Qwen/Qwen2.5-1.5B-Instruct

Evaluation accuracy on 100 held-out GSM8K-style examples

Model Accuracy
Base 0.69
SFT 0.58
DPO 0.57
SimPO 0.58

Dataset

https://huggingface.co/datasets/Riacrdo/assn2-preference-dataset

Note

The full merged model was not uploaded due to network instability. This adapter can be loaded with the base model.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Riacrdo/assn2-simpo-qwen2.5-1.5b-lora

Adapter
(1032)
this model