How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "yam-peleg/Hebrew-Mixtral-8x22B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yam-peleg/Hebrew-Mixtral-8x22B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "yam-peleg/Hebrew-Mixtral-8x22B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yam-peleg/Hebrew-Mixtral-8x22B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Quick Links

Hebrew-Mixtral-8x22B

Hebrew-Mixtral-8x22B is an open-source Large Language Model (LLM) pretrained in hebrew and english pretrained with 141 billion parameters, based on Mixtral-8x22B from Mistral.

It is continuesly pretrained from Mixtral-8x22B on tokens in both English and Hebrew.

The resulting model is a powerful general-purpose language model suitable for a wide range of natural language processing tasks, with a focus on Hebrew language understanding and generation.

Usage

Below are some code snippets on how to get quickly started with running the model.

First make sure to pip install -U transformers, then copy the snippet from the section that is relevant for your usecase.

Running on CPU

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("yam-peleg/Hebrew-Mixtral-8x22B")
model = AutoModelForCausalLM.from_pretrained("yam-peleg/Hebrew-Mixtral-8x22B")

input_text = "ืฉืœื•ื! ืžื” ืฉืœื•ืžืš ื”ื™ื•ื?"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Running on GPU

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("yam-peleg/Hebrew-Mixtral-8x22B")
model = AutoModelForCausalLM.from_pretrained("yam-peleg/Hebrew-Mixtral-8x22B", device_map="auto")

input_text = "ืฉืœื•ื! ืžื” ืฉืœื•ืžืš ื”ื™ื•ื?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Running with 4-Bit precision

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

tokenizer = AutoTokenizer.from_pretrained("yam-peleg/Hebrew-Mixtral-8x22B")
model = AutoModelForCausalLM.from_pretrained("yam-peleg/Hebrew-Mixtral-8x22B", quantization_config = BitsAndBytesConfig(load_in_4bit=True))

input_text = "ืฉืœื•ื! ืžื” ืฉืœื•ืžืš ื”ื™ื•ื?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0])

Notice

Hebrew-Mixtral-8x22B is a pretrained base model and therefore does not have any moderation mechanisms.

Downloads last month
6
Safetensors
Model size
141B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yam-peleg/Hebrew-Mixtral-8x22B

Finetunes
1 model
Quantizations
1 model

Collection including yam-peleg/Hebrew-Mixtral-8x22B