--- license: apache-2.0 language: - en tags: - from-scratch - sft - instruction-tuned - trc - tpu - maxtext - jax - grouped-query-attention - granite - gguf-compatible base_model: 0arch-io/kisoku-3b-base --- # Kisoku 3B SFT The instruction-tuned version of [Kisoku 3B Base](https://huggingface.co/0arch-io/kisoku-3b-base), fine-tuned using supervised fine-tuning (SFT) on Google Cloud TPUs with [MaxText](https://github.com/AI-Hypercomputer/maxtext). Trained **entirely from scratch** (pretraining + SFT) by a solo researcher, supported by [Google's TPU Research Cloud (TRC)](https://sites.research.google/trc/). ## Overview This model was SFT'd from the Kisoku 3B base checkpoint using a custom text-only chat template (`### User` / `### Assistant` format) designed to avoid out-of-vocabulary special token issues common with Llama-family tokenizers. The model uses **Granite architecture** (identical to Llama but with runtime logit scaling), enabling GGUF conversion and local deployment via llama.cpp. ## Architecture | Parameter | Value | |-----------|-------| | Architecture | GraniteForCausalLM | | Parameters | ~3B | | Layers | 28 | | Hidden size | 3072 | | FFN size | 8192 | | Attention heads | 24 | | KV heads | 6 (Grouped-Query Attention) | | Head dim | 128 | | Vocab size | 128,256 | | Context length | 4,096 | | Logit scaling | 55.43 (Granite-specific) | | Activation | SiLU | ## Training Details ### Pretraining (Base Model) | Detail | Value | |--------|-------| | Framework | MaxText (JAX) on TPU v4-32 | | Steps | 460,000 | | Data | DCLM-Baseline 1.0, FineWeb-Edu | ### SFT | Detail | Value | |--------|-------| | Framework | MaxText SFT on TPU | | Steps | ~2,499 | | Final loss | ~1.6 | | Chat template | Custom text-only (`### User` / `### Assistant`) | | Tokenizer | Custom (at `kisoku-sft-tokenizer/`) | ## Local Deployment (GGUF) A GGUF quantized version (Q8_0, 3.5GB) is available for local serving via llama.cpp: ```bash # Serve with llama-server llama-server -m kisoku-3b-sft-q8.gguf -c 4096 --port 8900 # Use with any OpenAI-compatible client curl http://localhost:8900/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "kisoku", "messages": [{"role": "user", "content": "Hello!"}]}' ``` **Note:** Due to Granite logit scaling (55.4x), use temperature ~0.01 for standard behavior, or use the included proxy script that auto-adjusts temperature and injects logit_bias for special tokens. ## Limitations - Undertrained base model (needs more pretraining tokens for competitive performance) - English-focused - No safety alignment (RLHF/DPO) applied - Granite logit scaling requires temperature adjustment at inference ## Acknowledgments Research supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/). ## License Apache 2.0