MicroLM-Ettin-SWA-5M

Tiny random-init causal LM for fast training smoke tests and pretraining pipeline dry runs.

This model is intentionally untrained. It is meant to test code paths, dataset packing, loss plumbing, checkpointing, evaluation loops, and distributed training setup without downloading or training a large model.

Specs

  • Architecture: ModernBertDecoderForCausalLM
  • Parameters: ~5M
  • Layers: 6
  • Hidden size: 256
  • Attention heads: 4
  • MLP intermediate size: 384
  • Context length: 512
  • Attention: sliding-window attention in every layer
  • Tokenizer: 4096-token tokenizer reused from SupraLabs/Supra-Mini-v4-2M
  • Weights: random initialization

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "sileod/microlm-ettin-swa-5m"

tok = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
Downloads last month
69
Safetensors
Model size
5.52M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support