---
language:
- en
license: apache-2.0
tags:
- causal-lm
- pretraining
- small-language-model
- gqa
- swiglu
- rope
- multiple-choice
- text-ranking
- nlp-research
metrics:
- perplexity
- accuracy
pipeline_tag: text-generation
---

# SLM-10M

A 9.97M parameter causal language model trained from scratch, targeting the [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard) `<10M` tier.

## Intended Use

This is a **research model** optimised for NLU benchmarking tasks, not open-ended generation. It is best suited for:

| Task | Examples |
|------|---------|
| **Multiple-choice QA** | ARC, HellaSwag, PIQA, ArithMark — score each candidate and pick the highest |
| **Log-likelihood ranking** | Rank candidate continuations or document relevance by perplexity |
| **SLM research** | Ablations, architecture studies, efficiency benchmarks at the <10M scale |
| **Perplexity evaluation** | Measuring language model fit on held-out text corpora |

It is **not suited** for open-ended text generation, chat, or instruction following — at 10M parameters the vocabulary (8,192 tokens) and capacity are too limited for fluent free-form output.

## Model Details

| Property | Value |
|----------|-------|
| Parameters | 9,968,640 (~10M) |
| Architecture | Causal Transformer |
| Vocabulary | 8,192 tokens |
| Context length | 1,024 tokens |
| Training tokens | 25B |
| Precision | bfloat16 |

## Architecture

| Component | Config |
|-----------|--------|
| Hidden size | 256 |
| Layers | 12 |
| Q heads / KV heads | 8 / 2 (GQA) |
| Head dim | 32 |
| FFN intermediate | 640 |
| Positional encoding | RoPE (θ=100k) |
| Normalization | RMSNorm (fp32 upcast) |
| Activation | SwiGLU |
| Attention | GQA + QK-Norm |
| Weight tying | Embed ↔ LM head |

Design follows SotA SLM recipes (GPT-X2, Qwen3, Gemma2): QK-Norm prevents attention logit explosion, Z-loss stabilises early training (disabled after 31B tokens), scaled residual init keeps residual stream variance bounded.

## Training

**Data mix (25B tokens total):**

| Source | Weight |
|--------|--------|
| FineWeb-Edu | 55% |
| Cosmopedia-v2 | 25% |
| FineWeb-HQ | 10% |
| FineMath | 10% |

**Optimizer:** AdamW (fused) — lr=3e-3, min_lr=3e-4, β=(0.9, 0.95), wd=0.1, grad_clip=1.0

**LR schedule:** Warmup (1k steps) → stable → cosine decay tail (last 15% of steps)

**Batch:** 512K tokens/step (micro-batch 32 × grad_accum 16 × seq_len 1024)

**Hardware:** NVIDIA GB10, bfloat16, `torch.compile`

## Evaluation

Zero-shot evaluation on the [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard) benchmarks:

| Benchmark | Score |
|-----------|-------|
| HellaSwag (acc_norm) | 26.53% |
| ARC-Easy (acc_norm) | 30.47% |
| ARC-Challenge (acc_norm) | 25.00% |
| PIQA (acc_norm) | 50.92% |
| ArithMark-2.0 | 24.32% |
| **Avg** | **32.38%** |

Avg = (HellaSwag + (ARC-Easy + ARC-Challenge) / 2 + PIQA + ArithMark) / 4

Evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [ArithMark-2.0](https://huggingface.co/datasets/AxiomicLabs/ArithMark-2.0) custom benchmark script.

## Usage

This model is a **research artifact** for benchmarking, not a chat or generation model. At 10M parameters it excels at log-likelihood ranking tasks (multiple-choice benchmarks) rather than free-text generation.

### Scoring / ranking (recommended)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch.nn.functional as F

model = AutoModelForCausalLM.from_pretrained(
    "liodon-ai/slm-10m",
    trust_remote_code=True,
    dtype=torch.bfloat16,
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("liodon-ai/slm-10m", trust_remote_code=True)

def score(context, completion):
    full = tokenizer.encode(context + completion, return_tensors="pt").to("cuda")
    ctx_len = len(tokenizer.encode(context, add_special_tokens=False))
    with torch.no_grad():
        logits = model(full).logits[0]
    return -F.cross_entropy(logits[ctx_len - 1:-1], full[0, ctx_len:]).item()

context = "Which is an example of a renewable energy resource? Answer:"
choices = [" biomass", " coal", " gas", " oil"]
scores  = [score(context, c) for c in choices]
best    = choices[scores.index(max(scores))]
print(f"Best answer: {best.strip()}")
# → Best answer: biomass
```


## Citation

```bibtex
@software{liodonai2026slm10m,
  author = {{Liodon AI}},
  title = {SLM-10M},
  year = {2026},
  url = {https://huggingface.co/liodon-ai/slm-10m}
}
```

## License

Apache 2.0