ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4 Reinforcement Learning • 8B • Updated Mar 26, 2025 • 664 • 228
mradermacher/Reflector-Internalizing-Safety-Llama-3.1-8B-RL-GGUF Reinforcement Learning • 8B • Updated 22 days ago • 887 • 1
mradermacher/Reflector-Internalizing-Safety-Llama-3.1-8B-RL-i1-GGUF Reinforcement Learning • 8B • Updated 22 days ago • 2.43k • 1
Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.75-beta1.0-plain-pipeline Reinforcement Learning • 3B • Updated 4 days ago • 29 • 1
ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4 Reinforcement Learning • 15B • Updated Feb 13, 2025 • 1.49k • 838