--- language: ar license: apache-2.0 base_model: Qwen/Qwen3-8B tags: - arabic - grpo - activation-steering - lora --- # AraSteer: Activation Steering + GRPO for Arabic Two LoRA adapters trained with Group Relative Policy Optimization (GRPO) on Qwen3-8B for Arabic language generation improvement. ## Adapters ### grpo_a/ - **Method**: Raw GRPO (200 steps, r=8, 21.8M params) - **Reward improvement**: +4.8% relative over 200 steps ### grpo_b/ - **Method**: CLAS-warm-started GRPO (200 steps, r=16, 43.6M params) - **CLAS config**: alpha=1.25, top-4 Arabic-specific layers {34, 33, 32, 0} - **Reward improvement**: +9.1% relative, +15.9% faster convergence vs GRPO-A at step 50 ## Usage ## Paper AraSteer: Bimodal Neuron Specialization and Activation Steering for Arabic in Multilingual LLMs