daviddavidlu/DAPO-with-prompt-augmentation-step2820 Text Generation • 2B • Updated 12 days ago • 103 •
daviddavidlu/DAPO-with-prompt-augmentation-step2720 Text Generation • 2B • Updated 12 days ago • 95 •
daviddavidlu/DAPO-with-prompt-augmentation-step2480 Text Generation • 2B • Updated 12 days ago • 89 •
Pritish92/lavida-variant-D-seed0-oracleaug-alpha0p001 Reinforcement Learning • Updated 25 days ago • 21
Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2 Reinforcement Learning • Updated 25 days ago • 21
Pritish92/lavida-variant-B-seed0-selfdistill-alpha0p02 Reinforcement Learning • Updated 25 days ago • 22
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1100 Text Generation • 2B • Updated 12 days ago • 40
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1160 Text Generation • 2B • Updated 12 days ago • 41