How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Kandil7/Baligh-1.5B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Kandil7/Baligh-1.5B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Kandil7/Baligh-1.5B to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Kandil7/Baligh-1.5B",
    max_seq_length=2048,
)
Quick Links

๐ŸŒ™ Baligh-1.5B โ€” Arabic LLM Assistant

ุจู„ูŠุบ โ€” ู…ุณุงุนุฏ ุฐูƒุงุก ุงุตุทู†ุงุนูŠ ุนุฑุจูŠ

License Base Model Arabic HF


๐Ÿง  Model Summary

Baligh-1.5B is a compact Arabic language model fine-tuned for structured knowledge tasks, grounded question answering, and Arabic instruction following.
Built on Qwen2.5-1.5B-Instruct using QLoRA + Unsloth, trained on curated Arabic knowledge datasets covering classical and contemporary Islamic texts, with a focus on hallucination-resistant, citation-grounded responses.

This is v0 โ€” the initial public release. Further alignment iterations (v0.5 โ†’ v1) are in progress.


โœจ Key Features

  • ๐ŸŒ Arabic-first: optimized for Modern Standard Arabic (MSA) and Classical Arabic
  • ๐Ÿ“š Knowledge-grounded: trained on curated domain-specific corpora (Shamela4, Islamic QA)
  • ๐Ÿ›ก๏ธ Hallucination-resistant: architectural focus on grounded, citation-aware responses
  • โšก Compact & efficient: 1.5B parameters, runs on a single consumer GPU (T4 / 3090)
  • ๐Ÿ”ง RAG-ready: designed to integrate with Athar retrieval system and hybrid search pipelines

๐Ÿ—๏ธ Training Details

Parameter Value
Base Model Qwen2.5-1.5B-Instruct
Method QLoRA (4-bit quantization)
Framework Unsloth + TRL
LoRA Rank 16
LoRA Alpha 32
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Seq Length 2048
Batch Size 4 (grad accum = 4)
Learning Rate 2e-4
Epochs 3
Optimizer AdamW (8-bit)
Hardware Google Colab T4 (15GB VRAM)

๐Ÿ“ฆ Training Data

Trained on a curated mixture of Arabic knowledge datasets:

Dataset Type Source
Kandil7/Athar-Shamela4 Classical Arabic corpus Shamela Library (4,500+ downloads)
Kandil7/Athar-Datasets RAG QA pairs Athar project
Islamic QA Egyptian Arabic Instruction tuning Community curated
Arabic instruction mix General Arabic SFT Open-source Arabic datasets

๐Ÿš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Kandil7/Baligh-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "ุฃู†ุช ุจู„ูŠุบุŒ ู…ุณุงุนุฏ ุฐูƒุงุก ุงุตุทู†ุงุนูŠ ุนุฑุจูŠ ู…ุชุฎุตุต ููŠ ุงู„ู…ุนุฑูุฉ ุงู„ุฅุณู„ุงู…ูŠุฉ. ุฃุฌุจ ุจุฏู‚ุฉ ูˆุงุณุชู†ุฏ ุฅู„ู‰ ุงู„ู…ุตุงุฏุฑ."},
    {"role": "user", "content": "ู…ุง ู‡ูŠ ุฃุฑูƒุงู† ุงู„ุฅุณู„ุงู… ุงู„ุฎู…ุณุฉุŸ"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

๐Ÿ”— Integration with Athar RAG

Baligh is designed to work as the generation layer of the Athar RAG system:

# Athar + Baligh pipeline
from athar import HybridRetriever
from transformers import pipeline

# 1. Retrieve relevant passages
retriever = HybridRetriever(qdrant_url="...", collection="athar-shamela4")
passages = retriever.search(query="ุฃุฑูƒุงู† ุงู„ุฅุณู„ุงู…", top_k=5)

# 2. Build grounded prompt
context = "\n\n".join([p["text"] for p in passages])
prompt = f"""ุงุณุชู†ุงุฏุงู‹ ุฅู„ู‰ ุงู„ู…ุตุงุฏุฑ ุงู„ุชุงู„ูŠุฉ:
{context}

ุงู„ุณุคุงู„: ุฃุฑูƒุงู† ุงู„ุฅุณู„ุงู… ุงู„ุฎู…ุณุฉุŸ
ุงู„ุฌูˆุงุจ:"""

# 3. Generate grounded response with Baligh
pipe = pipeline("text-generation", model="Kandil7/Baligh-1.5B", device_map="auto")
response = pipe(prompt, max_new_tokens=300, temperature=0.3)

โš ๏ธ Limitations

  • v0 release: this is an early baseline model; quality will improve significantly in v0.5 and v1
  • Not recommended for fatwa issuance or binding religious rulings
  • Performance on dialectal Arabic (Egyptian, Gulf, etc.) is limited in this version
  • May hallucinate on rare or ambiguous topics โ€” always verify with primary sources
  • Best used in RAG pipelines with retrieval grounding for factual tasks

๐Ÿ—บ๏ธ Roadmap

Version Status Key Improvements
v0 โœ… Released Initial SFT baseline
v0.5 ๐Ÿ”„ In Progress Expanded dataset, better alignment
v0.9 ๐Ÿ“… Planned DPO/ORPO alignment, evaluation suite
v1 ๐Ÿ“… Planned Full release with benchmarks

๐Ÿ“Š Evaluation (v0 Baseline)

Full evaluation suite in progress. Results will be updated in v0.5.

Preliminary testing on internal Arabic QA benchmark:

  • Grounded answering (with RAG context): โœ… Good
  • Open-domain factual QA (without retrieval): โš ๏ธ Limited โ€” use with RAG
  • Arabic fluency: โœ… Good for MSA, limited dialect support

๐Ÿ”— Related Resources

Resource Link
Athar RAG System github.com/Kandil7
Athar-Shamela4 Dataset HuggingFace
Athar-Embeddings HuggingFace
Egyptian Mobile Action Model HuggingFace

๐Ÿ“œ Citation

If you use Baligh-1.5B in your research or applications, please cite:

@misc{kandil2025baligh,
  author    = {Mohamed Kandil},
  title     = {Baligh-1.5B: A Knowledge-Grounded Arabic LLM for Islamic Domain QA},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Kandil7/Baligh-1.5B}
}

๐Ÿ‘ค Author

Mohamed Kandil โ€” AI / NLP Engineer | Arabic LLMs, RAG, and Applied AI
๐Ÿ“ Kafr El-Sheikh, Egypt
๐Ÿ”— GitHub ยท HuggingFace ยท LinkedIn


Part of the Athar Islamic AI project โ€” building production-grade Arabic AI systems

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kandil7/Baligh-1.5B

Finetuned
(1608)
this model