---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- medical
- reinforcement-learning
- qwen3
- healthcare
---

# Qwen3-4B-MedMCQA-RL

Qwen3-4B fine-tuned with RL on MedMCQA for medical multiple choice QA. LoRA weights properly merged.

## Model Details

- **Base Model**: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
- **Training Method**: Reinforcement Learning (GRPO) with LoRA
- **Framework**: [verifiers](https://github.com/willieneis/verifiers) + [prime-rl](https://github.com/PRIME-RL/PRIME-RL)

## Usage

Please ask your administrator.

## License

Apache 2.0