Instructions to use mufeedh28/dictalm2-israeli-law-pretrain-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mufeedh28/dictalm2-israeli-law-pretrain-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mufeedh28/dictalm2-israeli-law-pretrain-merged")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("mufeedh28/dictalm2-israeli-law-pretrain-merged") model = AutoModelForMultimodalLM.from_pretrained("mufeedh28/dictalm2-israeli-law-pretrain-merged") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mufeedh28/dictalm2-israeli-law-pretrain-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mufeedh28/dictalm2-israeli-law-pretrain-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mufeedh28/dictalm2-israeli-law-pretrain-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/mufeedh28/dictalm2-israeli-law-pretrain-merged
- SGLang
How to use mufeedh28/dictalm2-israeli-law-pretrain-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mufeedh28/dictalm2-israeli-law-pretrain-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mufeedh28/dictalm2-israeli-law-pretrain-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mufeedh28/dictalm2-israeli-law-pretrain-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mufeedh28/dictalm2-israeli-law-pretrain-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use mufeedh28/dictalm2-israeli-law-pretrain-merged with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mufeedh28/dictalm2-israeli-law-pretrain-merged to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mufeedh28/dictalm2-israeli-law-pretrain-merged to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mufeedh28/dictalm2-israeli-law-pretrain-merged to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="mufeedh28/dictalm2-israeli-law-pretrain-merged", max_seq_length=2048, ) - Docker Model Runner
How to use mufeedh28/dictalm2-israeli-law-pretrain-merged with Docker Model Runner:
docker model run hf.co/mufeedh28/dictalm2-israeli-law-pretrain-merged
DictaLM 2.0 - Israeli Law
A Hebrew legal language model fine-tuned on 140,000+ Israeli legal documents. Built for understanding, generating, and working with Israeli law, court rulings, and civil rights content.
Model Details
| Base Model | dicta-il/dictalm2.0 (Mistral-based, 7B params) |
| Training | Continued pretraining with QLoRA (4-bit) via Unsloth |
| Language | Hebrew |
| Domain | Israeli law, court rulings, legislation, civil rights |
| License | Apache 2.0 |
Training Data
The model was trained on ~140,000 Israeli legal documents from four sources:
| Source | Documents | Description |
|---|---|---|
| Israeli Courts (court.gov.il) | ~97,000 | Supreme Court and district court rulings |
| Kol-Zchut (kolzchut.org.il) | ~5,300 | Citizens' rights guides and legal explainers |
| Wikisource Laws | ~3,800 | Israeli legislation and basic laws |
| Total (after filtering) | ~106,000 |
Data Pipeline:
- Text cleaning and normalization (niqqud removal, whitespace, template stripping)
- PII scrubbing (Israeli ID numbers, phone numbers, emails, credit cards)
- Quality filtering (minimum length, Hebrew ratio, repetition, boilerplate)
- Near-deduplication via MinHash LSH (threshold 0.7)
- Source balancing: Kol-Zchut and Wikisource upsampled 5x to balance court dominance
Training Details
| Parameter | Value |
|---|---|
| GPU | NVIDIA A100-SXM4-40GB |
| Precision | BF16 + 4-bit QLoRA |
| LoRA rank | 64 |
| LoRA target modules | q, k, v, o, gate, up, down projections |
| Trainable parameters | 167M / 7.4B (2.26%) |
| Batch size | 16 (4 x 4 gradient accumulation) |
| Learning rate | 2e-4 (cosine schedule) |
| Epochs | 1 |
| Context length | 2,048 tokens |
| Packing | Enabled |
| Training time | ~7.75 hours |
| Framework | Unsloth + HuggingFace TRL |
Training Loss
| Step | Train Loss | Val Loss |
|---|---|---|
| 500 | 0.850 | 0.827 |
| 1000 | 0.781 | 0.816 |
| 2000 | 0.794 | 0.801 |
| 4000 | 0.697 | 0.782 |
| 6000 | 0.636 | 0.770 |
| 8000 | 0.564 | 0.769 |
| 8785 | 0.700 | 0.769 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("mufeedh28/dictalm2-israeli-law-merged")
tokenizer = AutoTokenizer.from_pretrained("mufeedh28/dictalm2-israeli-law-merged")
prompt = "ืืืช ืืืฉืคื ืืขืืืื ืคืกืง ืื"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, repetition_penalty=1.15)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With Unsloth (faster inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("mufeedh28/dictalm2-israeli-law-merged")
FastLanguageModel.for_inference(model)
inputs = tokenizer("ืืืง ืืกืื: ืืืื ืืืื ืืืืจืืชื ืงืืืข ืื", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Outputs
Prompt: ืืืง ืืกืื: ืืืื ืืืื ืืืืจืืชื ืงืืืข ืื
ืื ืืื ืืืื ืืืื ื ืขื ืืืื, ืืืคื ืืืืืื. ืืืืืช ืืืื ื ืขืฆืืืช ืืื ืืืง ืืืืืืช ืืืืื. ืื ืกืืืืช ืฉื ืืื ื ืขืฆืืืช ืืืชืจ ืืืื ืืชืงืืฃ ืืช ืื ืฉืืืืื ืขืืื ืื ืชืืงืฃ ืืืชื ืืื ืืืคืกืืง ืืช ืืชืงืืคื ืื ืืื ืืข ืืืชื. ืฉืืืืฉ ืืืื ืืืฆื ืฉื ืืื ื ืขืฆืืืช ืืืืื ืืืืื ืืกืืืจื ืืืจืืฉื ืื ืกืืืืช ืืืงืจื.
Prompt: ืขื ืคื ืืืง ืืฉืืืจืืช ืืืฉืืืื
ืฉืืจ-ืืื ืืื ืกืื ืืกืืื ืฉื ืขืจืืื ืฉืืืชืจ ืืืจืืฉ ืืฉืืืจ ืืืจื ืืืคืงืื ืืืื ืืืฉืืืจ. ืฉืืจ-ืืืื ืฆืจืื ืืขืืื ืืื ืืชื ืืื ืืืืื ืืื ืฉืืืื ืชืงืฃ: ืฉืืืจื ืืืืจื ืืืื ืจืฉืืืื ืื ืืืืืืื, ืขื ืฉืืจ ืืืื ืืืคืืข ืกืืื ืืกืคื, ืืฉืืจ ืืืื ืืจ ืคืจืขืื ืืืืคื ืืืืื ืขื ืืจืืฉื ืฉื ืืขื ืืืืจื.
Intended Use
- Legal text completion and generation
- Hebrew legal NLP research
- Legal document understanding and analysis
- Building legal search and retrieval systems
- Educational tools for Israeli law
Limitations
- This is a text completion model, not a chatbot. It continues text, not answers questions.
- May generate plausible-sounding but incorrect legal information. Do not use as legal advice.
- Trained primarily on court rulings โ may be less knowledgeable about specific regulatory domains.
- May occasionally reproduce patterns from training data.
- Hebrew-only. Performance on other languages will match the base DictaLM 2.0 model.
Citation
@misc{dictalm2-israeli-law,
title={DictaLM 2.0 - Israeli Law},
author={Mufeed Haj},
year={2026},
url={https://huggingface.co/mufeedh28/dictalm2-israeli-law-merged},
note={Fine-tuned from dicta-il/dictalm2.0 on Israeli legal corpus}
}
Acknowledgments
- Dicta - The Israel Center for Text Analysis for the base DictaLM 2.0 model
- Unsloth for efficient fine-tuning
- Israeli Courts, Kol-Zchut, and Hebrew Wikisource for the source data
- Downloads last month
- 2
Model tree for mufeedh28/dictalm2-israeli-law-pretrain-merged
Base model
dicta-il/dictalm2.0