--- library_name: peft base_model: unsloth/Qwen3-4B-unsloth-bnb-4bit tags: - simpo - alignment - text-generation-inference - sales-automation - judge --- # Qwen3 4B Tenacious Critic (SimPO) This is a LoRA-adapted 4-bit critic model developed as part of the **Week 11 Tenacious Sales Agent Evaluation Bench** (Act IV). It was trained using **SimPO** (Simple Preference Optimization) to evaluate and rank B2B sales outreach drafts against the Tenacious verification rubric. ## Intended Use This model is intended to be deployed as a **rejection-sampling layer** (a "Judge") in front of the Week 10 Conversion Engine composer. * **Input:** A drafted sales email and context. * **Output / Reward:** Instead of generating text, it provides a length-normalized token log-probability (SimPO reward) to rank multiple candidates. It penalizes tone-fails, hallucinated signals, and condescending gap-framing. ## Training Configuration * **Base Model:** `unsloth/Qwen3-4B-unsloth-bnb-4bit` * **Algorithm:** SimPO (pure preference, no NLL mixing) * **LoRA Rank:** 16 * **LoRA Alpha:** 32 * **Beta (Reward scale):** 2.0 * **Gamma (Margin):** 0.5 * **Precision:** fp16 + 4-bit QLoRA * **Infrastructure:** Google Colab T4 (16 GB VRAM) leveraging Unsloth ## Evaluation Metrics (Tenacious-Bench v0.1 Dev Partition) During ablation, this specific `gamma=0.5` checkpoint achieved the following zero-shot metrics on the held-out development partition: * **Preference Accuracy:** 1.0 (100%) * **Average Reward Gap:** 1.333 * **Judge-Evaluator Agreement:** 1.0 (100% agreement with the deterministic [scoring_evaluator.py](cci:7://file:///home/kg/Projects/10Academy/sales-agent-evaluation-bench/scoring_evaluator.py:0:0-0:0)) *Prior to training, the baseline `Qwen3-4B` model had a preference accuracy of merely 8.65% with a negative reward gap.* ## How to Load ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model_id = "unsloth/Qwen3-4B-unsloth-bnb-4bit" adapter_id = "kgutd/Qwen3-4B-Tenacious-Critic-SimPO" # Load the base model model = AutoModelForCausalLM.from_pretrained(base_model_id, load_in_4bit=True) tokenizer = AutoTokenizer.from_pretrained(base_model_id) # Apply the trained LoRA adapter model = PeftModel.from_pretrained(model, adapter_id)