eyobed7b
/

tenacious-bench-simpo-judge-v1

preference-optimization

Model card Files Files and versions

tenacious-bench-simpo-judge-v1

LoRA adapter for Qwen2.5-0.5B-Instruct trained as a B2B sales compliance judge via CPO (Contrastive Preference Optimization) on 137 Tenacious-Bench preference pairs.

Accuracy: 92.7% on held-out partition (vs 69.1% rule-only baseline)
Training: CPO, LoRA r=16, beta=2.0, 3 epochs on Colab T4
Dataset: eyobed7b/tenacious-bench
Author: Eyobed Feleke
License: CC-BY-4.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eyobed7b/tenacious-bench-simpo-judge-v1

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(623)

this model