Qwen Arabic MCQ Generator (Instruction-less) 📝

An optimized, instruction-less causal language model fine-tuned for high-quality Multiple Choice Question (MCQ) generation from Arabic educational documents.

This model is based on Qwen/Qwen3-1.7B and has been aligned using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) via Unsloth.

Project Resources

GitHub Repository: https://github.com/Akil-x/Doc-to-MCQ-Generator

Model Details

Developed by: Akil Al-Sharafi, Jihad Nassari, Bashar Alwan, Adeeb Hassan, and Akram AL-subari
Base Model: Qwen/Qwen3-1.7B
Training Method: SFT (LoRA) + DPO (LoRA) via Unsloth
Inference Mode: Causal Auto-Completion (Instruction-less / Prompt-less)
Primary Domain: Educational Assessment / Automatic Exam Generation (Arabic)

Research Paper & Citation

This model is part of the research paper:
Bridging the Performance Gap in Arabic MCQ Generation: Instruction-less Fine-Tuning of Compact 1.7B Models via DPO and RLAIF to Match Giant 120B LLMs

Preprint / DOI: 10.5281/zenodo.20533068
Zenodo Record: https://zenodo.org/records/20533068

@misc{alsharafi2026bridging,
  author       = {Al-Sharafi, Akil and Nassari, Jihad&, Bashar Alwan, Adeeb Hassan, and Akram AL-subari},
  title        = {Bridging the Performance Gap in Arabic MCQ Generation: Instruction-less Fine-Tuning of Compact 1.7B Models via DPO and RLAIF to Match Giant 120B LLMs},
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.20533068},
  url          = {https://doi.org/10.5281/zenodo.20533068}
}

How It Works (Instruction-less Causal Auto-Completion)

Unlike general-purpose models that require natural language instructions (e.g. "أولد سؤالاً..."), this model acts as a pure causal text completer trained on custom XML tags. It expects the text context wrapped in <context>...</context> followed by the opening <question> tag. It will naturally autocomplete the MCQ sequence (Question, Answer, and Distractors) and stop generating at </distractors>.

Prompt Prefix Format

<context>
[Input Arabic text segment from document]
</context>
<question>

Usage Example (Python)

You can load and run this model locally using standard Hugging Face transformers:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "akilx/qwen-arabic-mcq"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_4bit=True  # Recommended for low VRAM
)

# Format the input prefix
context_text = "الذكاء الاصطناعي هو سلوك وخصائص معينة تتسم بها البرامج الحاسوبية تجعلها تحاكي القدرات الذهنية البشرية."
prompt = f"<context>\n{context_text}\n</context>\n<question>\n"

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

# Generate completion
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        use_cache=True,
        stop_strings=["</distractors>"],
        tokenizer=tokenizer,
        temperature=0.1
    )

# Decode output
decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
generated_mcq = decoded.split("</context>")[-1].strip()
print(generated_mcq)

Expected Generated Output

ما هو العلم الذي يعنى بجعل البرمجيات تحاكي القدرات الذهنية البشرية؟
</question>
<answer>
الذكاء الاصطناعي
</answer>
<distractors>
- البرمجة اللغوية العصبية
- هندسة البرمجيات
- تنقيب البيانات
</distractors>

Downloads last month: 45

Safetensors

Model size

2B params

Tensor type

F16

Model tree for akilx/qwen-arabic-mcq

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(812)

this model