--- license: apache-2.0 base_model: Qwen/Qwen3-4B-Instruct-2507 tags: - medical - reinforcement-learning - qwen3 - healthcare --- # Qwen3-4B-MedMCQA-RL Qwen3-4B fine-tuned with RL on MedMCQA for medical multiple choice QA. LoRA weights properly merged. ## Model Details - **Base Model**: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) - **Training Method**: Reinforcement Learning (GRPO) with LoRA - **Framework**: [verifiers](https://github.com/willieneis/verifiers) + [prime-rl](https://github.com/PRIME-RL/PRIME-RL) ## Usage Please ask your administrator. ## License Apache 2.0