Instructions to use zettafleet/z1-1b-hybrid-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zettafleet/z1-1b-hybrid-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zettafleet/z1-1b-hybrid-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("zettafleet/z1-1b-hybrid-instruct")
model = AutoModelForMultimodalLM.from_pretrained("zettafleet/z1-1b-hybrid-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zettafleet/z1-1b-hybrid-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zettafleet/z1-1b-hybrid-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zettafleet/z1-1b-hybrid-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zettafleet/z1-1b-hybrid-instruct

SGLang

How to use zettafleet/z1-1b-hybrid-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zettafleet/z1-1b-hybrid-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zettafleet/z1-1b-hybrid-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zettafleet/z1-1b-hybrid-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zettafleet/z1-1b-hybrid-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zettafleet/z1-1b-hybrid-instruct with Docker Model Runner:
```
docker model run hf.co/zettafleet/z1-1b-hybrid-instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Z1 1B Hybrid Instruct

We are excited to introduce the Z1 family of models! These models are based on the OLMo 2 1B architecture developed by Allen Institute for AI. Beginning with the pre-training checkpoint for OLMo 2 1B, we performed continued pre-training (i.e., midtraining) on Z1 1B Hybrid using the same dataset as OLMo 2 1B (dolmino-mix-1124).

What is unusual about the Z1 models is that the continued pre-training was performed via Zettafleet’s AI Training Platform on 8 NVIDIA GPUs in a fully decentralized way, without the use of high-bandwidth near-range communication links (i.e., NVLink) between the accelerators. See our blog post for further details.

The zettafleet/z1-1b-hybrid-instruct (i.e., this model) is an instruction-tuned version of zettafleet/z1-1b-hybrid, trained with the same post-training datasets as allenai/OLMo-2-0425-1B-Instruct. For more information about post-training, please see the OLMo 2 paper or Tülu 3 paper. The post-training pipeline (i.e, the training code) was reconstructed through instructions provided by engineers and researchers at Allen Institute for AI.

We release the following models as part of the Z1 family:

zettafleet/z1-1b-hybrid: A base model where continued pre-training was performed in a fully decentralized way on 8 NVIDIA H100 GPUs.
zettafleet/z1-1b-hybrid-rtx: A base model where continued pre-training was performed in a fully decentralized way on 8 NVIDIA RTX Pro 6000 GPUs.
zettafleet/z1-1b-hybrid-instruct: An instruction model tuned from z1-1b-hybrid, using a reconstructed post-training pipeline and datasets from OLMo 2 1B Instruct.

The Z1 family of models shares the same architecture:

Size	Layers	Hidden Size	Attention Heads	Context Length
z1-1b-hybrid*	16	2048	16	4096

Model description

Developed by: Zettafleet Ltd.
Contact: research@zettafleet.com.
Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
Language(s) (NLP): English.
License: The code and model are released under Zettafleet Open License, version 1.0 (ZOL-1.0-MIT).

Model Sources

Company page: https://www.zettafleet.com/
Repositories used:
- Post-training code: https://github.com/allenai/open-instruct
- Evaluation code: https://github.com/allenai/olmes
Demo: Zettafleet Launch Event

Using the Model

Loading with Hugging Face

To load the model with Hugging Face, use the following snippet:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("zettafleet/z1-1b-hybrid-instruct")

Chat Template

We have retained the OLMo 2 chat template which uses the following formatting:

<|user|>
How are you doing?
<|assistant|>
I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>

Data Processing

All datasets used for training were processed, tokenized and partitioned with the use of Zettafleet’s Data Platform.

Training Stages of Z1 models

The training stages we carried out are as follows:

Continued pre-training:
- Performed in a decentralized way via Zettafleet’s AI Training Platform.
- Trained on a mix of high-quality web data and academic/Q&A/instruction/mathematical content [dataset].
Post-training (Z1 Hybrid Instruct):
- Performed via Zettafleet’s AI training platform on a mix of data for conversational chatbots, preferences, instruction following and mathematics.
- Performed using a reconstructed version of the training pipeline of OLMo 2 1B Instruct, which consists of the following phases:
  - Supervised Fine-Tuning (SFT) [dataset].
  - Direct Preference Optimization (DPO) [dataset].
  - Reinforcement Learning with Verifiable Rewards (RLVR) [dataset 1] [dataset 2].

Performance

Our hybrid instruction model is competitive with other small models. We have reported results for OLMo 2 1B instruct from both the paper, and our own reproduction attempt, which performed post-training on the OLMo 2 1B base model using the same post-training pipeline and datasets as the paper.

Instruct Model	Average	DROP	GSM8K	IFEval	MATH	MMLU	PopQA
OLMo 2 1B (Paper)	41.1	34.6	68.3	70.1	20.7	40.0	12.9
OLMo 2 1B (Reproduction)	38.5	30.5	62.2	68.4	12.8	44.2	13.0
Z1 1B Hybrid	40.4	31.6	67.0	70.4	19.1	42.6	11.4
Qwen 2.5 1.5B	39.9	13.4	66.2	44.2	40.6	59.7	15.5
LLaMA 3.2 1B	35.6	32.2	45.4	54.0	21.6	46.7	13.8
Gemma 3 1B	34.9	25.1	35.0	60.6	40.3	38.9	9.6
SmolLM2 1.7B	33.1	30.9	45.3	51.6	20.3	34.3	16.4

Bias, Risks and Limitations

AI models can be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from Z1 or any LLM are often inaccurate, so facts should be verified.