Instructions to use Kandil7/Baligh-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kandil7/Baligh-1.5B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Kandil7/Baligh-1.5B")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Kandil7/Baligh-1.5B", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Kandil7/Baligh-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kandil7/Baligh-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kandil7/Baligh-1.5B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Kandil7/Baligh-1.5B

SGLang

How to use Kandil7/Baligh-1.5B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kandil7/Baligh-1.5B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kandil7/Baligh-1.5B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kandil7/Baligh-1.5B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kandil7/Baligh-1.5B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use Kandil7/Baligh-1.5B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Kandil7/Baligh-1.5B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Kandil7/Baligh-1.5B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Kandil7/Baligh-1.5B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Kandil7/Baligh-1.5B",
    max_seq_length=2048,
)

Docker Model Runner
How to use Kandil7/Baligh-1.5B with Docker Model Runner:
```
docker model run hf.co/Kandil7/Baligh-1.5B
```

🌙 Baligh-1.5B — Arabic LLM Assistant

بليغ — مساعد ذكاء اصطناعي عربي

🧠 Model Summary

Baligh-1.5B is a compact Arabic language model fine-tuned for structured knowledge tasks, grounded question answering, and Arabic instruction following.
Built on Qwen2.5-1.5B-Instruct using QLoRA + Unsloth, trained on curated Arabic knowledge datasets covering classical and contemporary Islamic texts, with a focus on hallucination-resistant, citation-grounded responses.

This is v0 — the initial public release. Further alignment iterations (v0.5 → v1) are in progress.

✨ Key Features

🌐 Arabic-first: optimized for Modern Standard Arabic (MSA) and Classical Arabic
📚 Knowledge-grounded: trained on curated domain-specific corpora (Shamela4, Islamic QA)
🛡️ Hallucination-resistant: architectural focus on grounded, citation-aware responses
⚡ Compact & efficient: 1.5B parameters, runs on a single consumer GPU (T4 / 3090)
🔧 RAG-ready: designed to integrate with Athar retrieval system and hybrid search pipelines

🏗️ Training Details

Parameter	Value
Base Model	Qwen2.5-1.5B-Instruct
Method	QLoRA (4-bit quantization)
Framework	Unsloth + TRL
LoRA Rank	16
LoRA Alpha	32
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Seq Length	2048
Batch Size	4 (grad accum = 4)
Learning Rate	2e-4
Epochs	3
Optimizer	AdamW (8-bit)
Hardware	Google Colab T4 (15GB VRAM)

📦 Training Data

Trained on a curated mixture of Arabic knowledge datasets:

Dataset	Type	Source
Kandil7/Athar-Shamela4	Classical Arabic corpus	Shamela Library (4,500+ downloads)
Kandil7/Athar-Datasets	RAG QA pairs	Athar project
Islamic QA Egyptian Arabic	Instruction tuning	Community curated
Arabic instruction mix	General Arabic SFT	Open-source Arabic datasets

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Kandil7/Baligh-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "أنت بليغ، مساعد ذكاء اصطناعي عربي متخصص في المعرفة الإسلامية. أجب بدقة واستند إلى المصادر."},
    {"role": "user", "content": "ما هي أركان الإسلام الخمسة؟"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

🔗 Integration with Athar RAG

Baligh is designed to work as the generation layer of the Athar RAG system:

# Athar + Baligh pipeline
from athar import HybridRetriever
from transformers import pipeline

# 1. Retrieve relevant passages
retriever = HybridRetriever(qdrant_url="...", collection="athar-shamela4")
passages = retriever.search(query="أركان الإسلام", top_k=5)

# 2. Build grounded prompt
context = "\n\n".join([p["text"] for p in passages])
prompt = f"""استناداً إلى المصادر التالية:
{context}

السؤال: أركان الإسلام الخمسة؟
الجواب:"""

# 3. Generate grounded response with Baligh
pipe = pipeline("text-generation", model="Kandil7/Baligh-1.5B", device_map="auto")
response = pipe(prompt, max_new_tokens=300, temperature=0.3)

⚠️ Limitations

v0 release: this is an early baseline model; quality will improve significantly in v0.5 and v1
Not recommended for fatwa issuance or binding religious rulings
Performance on dialectal Arabic (Egyptian, Gulf, etc.) is limited in this version
May hallucinate on rare or ambiguous topics — always verify with primary sources
Best used in RAG pipelines with retrieval grounding for factual tasks

🗺️ Roadmap

Version	Status	Key Improvements
v0	✅ Released	Initial SFT baseline
v0.5	🔄 In Progress	Expanded dataset, better alignment
v0.9	📅 Planned	DPO/ORPO alignment, evaluation suite
v1	📅 Planned	Full release with benchmarks

📊 Evaluation (v0 Baseline)

Full evaluation suite in progress. Results will be updated in v0.5.

Preliminary testing on internal Arabic QA benchmark:

Grounded answering (with RAG context): ✅ Good
Open-domain factual QA (without retrieval): ⚠️ Limited — use with RAG
Arabic fluency: ✅ Good for MSA, limited dialect support

🔗 Related Resources

Resource	Link
Athar RAG System	github.com/Kandil7
Athar-Shamela4 Dataset	HuggingFace
Athar-Embeddings	HuggingFace
Egyptian Mobile Action Model	HuggingFace

📜 Citation

If you use Baligh-1.5B in your research or applications, please cite:

@misc{kandil2025baligh,
  author    = {Mohamed Kandil},
  title     = {Baligh-1.5B: A Knowledge-Grounded Arabic LLM for Islamic Domain QA},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Kandil7/Baligh-1.5B}
}

👤 Author

Mohamed Kandil — AI / NLP Engineer | Arabic LLMs, RAG, and Applied AI
📍 Kafr El-Sheikh, Egypt
🔗 GitHub · HuggingFace · LinkedIn

Part of the Athar Islamic AI project — building production-grade Arabic AI systems

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Kandil7/Baligh-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1608)

this model