--- language: - en license: apache-2.0 tags: - causal-lm - pretraining - small-language-model - gqa - swiglu - rope - multiple-choice - text-ranking - nlp-research metrics: - perplexity - accuracy pipeline_tag: text-generation --- # SLM-10M A 9.97M parameter causal language model trained from scratch, targeting the [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard) `<10M` tier. ## Intended Use This is a **research model** optimised for NLU benchmarking tasks, not open-ended generation. It is best suited for: | Task | Examples | |------|---------| | **Multiple-choice QA** | ARC, HellaSwag, PIQA, ArithMark — score each candidate and pick the highest | | **Log-likelihood ranking** | Rank candidate continuations or document relevance by perplexity | | **SLM research** | Ablations, architecture studies, efficiency benchmarks at the <10M scale | | **Perplexity evaluation** | Measuring language model fit on held-out text corpora | It is **not suited** for open-ended text generation, chat, or instruction following — at 10M parameters the vocabulary (8,192 tokens) and capacity are too limited for fluent free-form output. ## Model Details | Property | Value | |----------|-------| | Parameters | 9,968,640 (~10M) | | Architecture | Causal Transformer | | Vocabulary | 8,192 tokens | | Context length | 1,024 tokens | | Training tokens | 25B | | Precision | bfloat16 | ## Architecture | Component | Config | |-----------|--------| | Hidden size | 256 | | Layers | 12 | | Q heads / KV heads | 8 / 2 (GQA) | | Head dim | 32 | | FFN intermediate | 640 | | Positional encoding | RoPE (θ=100k) | | Normalization | RMSNorm (fp32 upcast) | | Activation | SwiGLU | | Attention | GQA + QK-Norm | | Weight tying | Embed ↔ LM head | Design follows SotA SLM recipes (GPT-X2, Qwen3, Gemma2): QK-Norm prevents attention logit explosion, Z-loss stabilises early training (disabled after 31B tokens), scaled residual init keeps residual stream variance bounded. ## Training **Data mix (25B tokens total):** | Source | Weight | |--------|--------| | FineWeb-Edu | 55% | | Cosmopedia-v2 | 25% | | FineWeb-HQ | 10% | | FineMath | 10% | **Optimizer:** AdamW (fused) — lr=3e-3, min_lr=3e-4, β=(0.9, 0.95), wd=0.1, grad_clip=1.0 **LR schedule:** Warmup (1k steps) → stable → cosine decay tail (last 15% of steps) **Batch:** 512K tokens/step (micro-batch 32 × grad_accum 16 × seq_len 1024) **Hardware:** NVIDIA GB10, bfloat16, `torch.compile` ## Evaluation Zero-shot evaluation on the [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard) benchmarks: | Benchmark | Score | |-----------|-------| | HellaSwag (acc_norm) | 26.53% | | ARC-Easy (acc_norm) | 30.47% | | ARC-Challenge (acc_norm) | 25.00% | | PIQA (acc_norm) | 50.92% | | ArithMark-2.0 | 24.32% | | **Avg** | **32.38%** | Avg = (HellaSwag + (ARC-Easy + ARC-Challenge) / 2 + PIQA + ArithMark) / 4 Evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [ArithMark-2.0](https://huggingface.co/datasets/AxiomicLabs/ArithMark-2.0) custom benchmark script. ## Usage This model is a **research artifact** for benchmarking, not a chat or generation model. At 10M parameters it excels at log-likelihood ranking tasks (multiple-choice benchmarks) rather than free-text generation. ### Scoring / ranking (recommended) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch import torch.nn.functional as F model = AutoModelForCausalLM.from_pretrained( "liodon-ai/slm-10m", trust_remote_code=True, dtype=torch.bfloat16, ).to("cuda") tokenizer = AutoTokenizer.from_pretrained("liodon-ai/slm-10m", trust_remote_code=True) def score(context, completion): full = tokenizer.encode(context + completion, return_tensors="pt").to("cuda") ctx_len = len(tokenizer.encode(context, add_special_tokens=False)) with torch.no_grad(): logits = model(full).logits[0] return -F.cross_entropy(logits[ctx_len - 1:-1], full[0, ctx_len:]).item() context = "Which is an example of a renewable energy resource? Answer:" choices = [" biomass", " coal", " gas", " oil"] scores = [score(context, c) for c in choices] best = choices[scores.index(max(scores))] print(f"Best answer: {best.strip()}") # → Best answer: biomass ``` ## Citation ```bibtex @software{liodonai2026slm10m, author = {{Liodon AI}}, title = {SLM-10M}, year = {2026}, url = {https://huggingface.co/liodon-ai/slm-10m} } ``` ## License Apache 2.0