Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
BounharAbdelaziz
's Collections
LLM Post-training
Frugal-AI
RLHF/RLVR
Moroccan Darija LLMs
Moroccan Darija Embeddings Models & Datasets
Moroccan Speech Models & Datasets
Moroccan Darija Datasets
Translation Models & Datasets
Arabic (MSA) Language Models & Datasets
Arabic (MSA) Summarization Models & Datasets
RLHF/RLVR
updated
Feb 11
Some RLHF/RLVR experiments using GRPO and DPO.
Upvote
-
BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K
Text Generation
•
3B
•
Updated
Jun 25, 2025
•
21
•
1
BounharAbdelaziz/Qwen2.5-0.5B-DPO-English-Orca
Text Generation
•
0.5B
•
Updated
Jun 25, 2025
•
4
BounharAbdelaziz/Qwen2.5-0.5B-DPO-French-Orca
Text Generation
•
0.5B
•
Updated
Jun 25, 2025
•
4
Upvote
-
Share collection
View history
Collection guide
Browse collections