---
license: apache-2.0
language:
- en
tags:
- from-scratch
- sft
- instruction-tuned
- trc
- tpu
- maxtext
- jax
- grouped-query-attention
- granite
- gguf-compatible
base_model: 0arch-io/kisoku-3b-base
---

# Kisoku 3B SFT

The instruction-tuned version of [Kisoku 3B Base](https://huggingface.co/0arch-io/kisoku-3b-base), fine-tuned using supervised fine-tuning (SFT) on Google Cloud TPUs with [MaxText](https://github.com/AI-Hypercomputer/maxtext).

Trained **entirely from scratch** (pretraining + SFT) by a solo researcher, supported by [Google's TPU Research Cloud (TRC)](https://sites.research.google/trc/).

## Overview

This model was SFT'd from the Kisoku 3B base checkpoint using a custom text-only chat template (`### User` / `### Assistant` format) designed to avoid out-of-vocabulary special token issues common with Llama-family tokenizers.

The model uses **Granite architecture** (identical to Llama but with runtime logit scaling), enabling GGUF conversion and local deployment via llama.cpp.

## Architecture

| Parameter | Value |
|-----------|-------|
| Architecture | GraniteForCausalLM |
| Parameters | ~3B |
| Layers | 28 |
| Hidden size | 3072 |
| FFN size | 8192 |
| Attention heads | 24 |
| KV heads | 6 (Grouped-Query Attention) |
| Head dim | 128 |
| Vocab size | 128,256 |
| Context length | 4,096 |
| Logit scaling | 55.43 (Granite-specific) |
| Activation | SiLU |

## Training Details

### Pretraining (Base Model)
| Detail | Value |
|--------|-------|
| Framework | MaxText (JAX) on TPU v4-32 |
| Steps | 460,000 |
| Data | DCLM-Baseline 1.0, FineWeb-Edu |

### SFT
| Detail | Value |
|--------|-------|
| Framework | MaxText SFT on TPU |
| Steps | ~2,499 |
| Final loss | ~1.6 |
| Chat template | Custom text-only (`### User` / `### Assistant`) |
| Tokenizer | Custom (at `kisoku-sft-tokenizer/`) |

## Local Deployment (GGUF)

A GGUF quantized version (Q8_0, 3.5GB) is available for local serving via llama.cpp:

```bash
# Serve with llama-server
llama-server -m kisoku-3b-sft-q8.gguf -c 4096 --port 8900

# Use with any OpenAI-compatible client
curl http://localhost:8900/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "kisoku", "messages": [{"role": "user", "content": "Hello!"}]}'
```

**Note:** Due to Granite logit scaling (55.4x), use temperature ~0.01 for standard behavior, or use the included proxy script that auto-adjusts temperature and injects logit_bias for special tokens.

## Limitations

- Undertrained base model (needs more pretraining tokens for competitive performance)
- English-focused
- No safety alignment (RLHF/DPO) applied
- Granite logit scaling requires temperature adjustment at inference

## Acknowledgments

Research supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/).

## License

Apache 2.0