RLHF/RLVR - a BounharAbdelaziz Collection

BounharAbdelaziz 's Collections

LLM Post-training

Moroccan Darija LLMs

Moroccan Darija Embeddings Models & Datasets

Moroccan Speech Models & Datasets

Moroccan Darija Datasets

Translation Models & Datasets

Arabic (MSA) Language Models & Datasets

Arabic (MSA) Summarization Models & Datasets

RLHF/RLVR

updated Feb 11

Some RLHF/RLVR experiments using GRPO and DPO.

BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K

Text Generation • 3B • Updated Jun 25, 2025 • 21 • 1
BounharAbdelaziz/Qwen2.5-0.5B-DPO-English-Orca

Text Generation • 0.5B • Updated Jun 25, 2025 • 4
BounharAbdelaziz/Qwen2.5-0.5B-DPO-French-Orca

Text Generation • 0.5B • Updated Jun 25, 2025 • 4