Instructions to use zettafleet/z1-1b-hybrid-rtx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zettafleet/z1-1b-hybrid-rtx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="zettafleet/z1-1b-hybrid-rtx")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("zettafleet/z1-1b-hybrid-rtx") model = AutoModelForMultimodalLM.from_pretrained("zettafleet/z1-1b-hybrid-rtx") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use zettafleet/z1-1b-hybrid-rtx with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zettafleet/z1-1b-hybrid-rtx" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zettafleet/z1-1b-hybrid-rtx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/zettafleet/z1-1b-hybrid-rtx
- SGLang
How to use zettafleet/z1-1b-hybrid-rtx with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zettafleet/z1-1b-hybrid-rtx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zettafleet/z1-1b-hybrid-rtx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zettafleet/z1-1b-hybrid-rtx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zettafleet/z1-1b-hybrid-rtx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use zettafleet/z1-1b-hybrid-rtx with Docker Model Runner:
docker model run hf.co/zettafleet/z1-1b-hybrid-rtx
# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("zettafleet/z1-1b-hybrid-rtx")
model = AutoModelForMultimodalLM.from_pretrained("zettafleet/z1-1b-hybrid-rtx")Model Card for Z1 1B Hybrid RTX
We are excited to introduce the Z1 family of models! These models are based on the OLMo 2 1B architecture developed by Allen Institute for AI. Beginning with the pre-training checkpoint for OLMo-2 1B, we performed continued pre-training (i.e., midtraining) on Z1 1B Hybrid using the same dataset as OLMo 2 1B (dolmino-mix-1124).
What is unusual about the Z1 models is that the continued pre-training was performed via Zettafleet’s AI Training Platform on 8 NVIDIA GPUs in a fully decentralized way, without the use of high-bandwidth near-range communication links (i.e., NVLink) between the accelerators. See our blog post for further details.
We release the following models as part of the Z1 family:
- zettafleet/z1-1b-hybrid: A base model where continued pre-training was performed in a fully decentralized way on 8 NVIDIA H100 GPUs.
- zettafleet/z1-1b-hybrid-rtx: A base model where continued pre-training was performed in a fully decentralized way on 8 NVIDIA RTX Pro 6000 GPUs.
- zettafleet/z1-1b-hybrid-instruct: An instruction model tuned from
z1-1b-hybrid, using a reconstructed post-training pipeline and datasets from OLMo 2 1B Instruct.
The post-training pipeline was reconstructed through instructions provided by engineers and researchers at Allen Institute for AI.
The Z1 family of models shares the same architecture:
| Size | Layers | Hidden Size | Attention Heads | Context Length |
|---|---|---|---|---|
| z1-1b-hybrid* | 16 | 2048 | 16 | 4096 |
Using the Model
Z1 1B Hybrid is supported in transformers v4.48 or higher:
pip install transformers>=4.48
You can use Z1 1B Hybrid in your Python code as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
zettafleet = AutoModelForCausalLM.from_pretrained("zettafleet/z1-1b-hybrid")
tokenizer = AutoTokenizer.from_pretrained("zettafleet/z1-1b-hybrid")
message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = zettafleet.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
>> 'Language modeling is a key component of any text-based application, but its effectiveness...'
Model Description
- Developed by: Zettafleet Ltd.
- Contact: research@zettafleet.com.
- Model type: A transformer-style autoregressive language model.
- Language(s) (NLP): English.
- License: The code and model are released under Zettafleet Open License, version 1.0 (ZOL-1.0-MIT).
Evaluation
Below is an evaluation comparison of the original OLMo 2 1B and the two Z1 base models.
| Base Model | Avg | MMLU | ARC Challenge | HS | WG | NQ | DROP | AGI | GSM8K | MMLU Pro | TQA |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OLMo 2 1B | 42.8 | 44.7 | 50.4 | 68.3 | 65.8 | 19.1 | 35.3 | 34.5 | 37.7 | 16.0 | 56.2 |
| Z1 1B Hybrid | 44.6 | 46.2 | 52.9 | 68.7 | 65.4 | 20.8 | 36.5 | 37.2 | 47.6 | 16.3 | 54.4 |
| Z1 1B Hybrid RTX | 44.6 | 46.6 | 53.2 | 69.1 | 65.3 | 19.5 | 37.2 | 36.2 | 48.2 | 16.5 | 54.3 |
Model Details
Data Processing
All datasets used for training were processed, tokenized and partitioned with the use of Zettafleet’s Data Platform.
Training Stages of Z1 models
The training stages we carried out are as follows:
- Continued pre-training:
- Performed in a decentralized way via Zettafleet’s AI Training Platform.
- Trained on a mix of high-quality web data and academic/Q&A/instruction/mathematical content [dataset].
- Post-training (Z1 Hybrid Instruct):
- Performed via Zettafleet’s AI training platform on a mix of data for conversational chatbots, preferences, instruction following and mathematics.
- Performed using a reconstructed version of the training pipeline of OLMo 2 1B Instruct, which consists of the following phases:
- Supervised Fine-Tuning (SFT) [dataset].
- Direct Preference Optimization (DPO) [dataset].
- Reinforcement Learning with Verifiable Rewards (RLVR) [dataset 1] [dataset 2].
Bias, Risks and Limitations
AI models can be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from Z1 or any LLM are often inaccurate, so facts should be verified.
- Downloads last month
- 3
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="zettafleet/z1-1b-hybrid-rtx")