Instructions to use Lllllmmmmmm/conv-induction-babylm-strict-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Lllllmmmmmm/conv-induction-babylm-strict-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Lllllmmmmmm/conv-induction-babylm-strict-small", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Lllllmmmmmm/conv-induction-babylm-strict-small", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Lllllmmmmmm/conv-induction-babylm-strict-small with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Lllllmmmmmm/conv-induction-babylm-strict-small" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lllllmmmmmm/conv-induction-babylm-strict-small", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Lllllmmmmmm/conv-induction-babylm-strict-small
- SGLang
How to use Lllllmmmmmm/conv-induction-babylm-strict-small with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Lllllmmmmmm/conv-induction-babylm-strict-small" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lllllmmmmmm/conv-induction-babylm-strict-small", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Lllllmmmmmm/conv-induction-babylm-strict-small" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lllllmmmmmm/conv-induction-babylm-strict-small", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Lllllmmmmmm/conv-induction-babylm-strict-small with Docker Model Runner:
docker model run hf.co/Lllllmmmmmm/conv-induction-babylm-strict-small
Conv-Routed Induction LM
A small, attention-free, sub-quadratic language model built for the BabyLM 2026 Strict-Small track (a ~10M-word training budget). It is designed to test a specific hypothesis: that a transformer's self-attention can be replaced by a division of labour between two cheaper, complementary primitives β one for local word order, one for exact long-range recall β and still match a same-scale attention baseline on grammar (BLiMP) and perplexity.
β οΈ This card describes the architecture (which is stable). Exact hyperparameters, sizes, and headline metrics are still being iterated and live in the repo's
hyperparameters.json/ training logs for each revision rather than here.
Architecture
Each layer is three residual sub-blocks; none is redundant:
- Dynamic Conv β local, positional. A gated depthwise dilated convolution whose kernel weights are predicted per position from the token itself (content-adaptive local mixing, ~15-token reach). This is the "what just came before me" channel.
- Induction Mixer β global, content-based, exact. For each token it finds the last
M occurrences of the exact same token earlier in the sequence (a non-learned
O(T log T)index β sort/scatter, no attention matrix), softly ranks those occurrences by how well their surrounding context matches the present with a small multi-head score, and copies the raw representation of whatever token followed each one. In short: "what came after this token last time?" A learnable sink lets it abstain. Exactness and token identity are load-bearing β fuzzy/hashed matching destroys the effect. - SwiGLU FFN β per-token computation.
The design thesis: conv handles local order, induction handles long-range exact recall, the FFN computes β splitting the work that dense attention does into two parts with sharper inductive biases and no quadratic cost.
Why it is sub-quadratic
There is no T Γ T attention anywhere. The induction index is built with a sort and a
scatter (O(T log T)), and each token reads only a fixed number (M) of prior continuations.
Memory and compute scale near-linearly in sequence length.
Intended use & scope
Research artifact for data-efficient language modelling and architecture studies. It is a small model trained on a developmentally-motivated English corpus; it is not intended for production use, factual question answering, or deployment. Generations are short-range and reflect the small training budget.
How to load
The architecture is custom, so trust_remote_code=True is required (the modeling_induction.py
file ships with every revision):
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "<your-username>/conv-induction-babylm-strict-small"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
The model is causal (next-token; the index only ever references earlier positions) and is
padding-side agnostic β positions are derived from the attention mask and pad positions
are zeroed, so both left- and right-padded batches give identical results for the real
tokens. Learning-curve checkpoints are published on branches named chck_1M, chck_2M, β¦
Training data
BabyLM 2026 Strict-Small (~10M words of developmentally-plausible English), tokenised with a byte-level BPE vocabulary trained on the same corpus.
Limitations
- Small capacity and budget: limited world knowledge and short effective context.
- English, child-directed / developmental register; not representative of general web text.
- A research architecture under active iteration β treat any single revision's numbers as provisional.
License
MIT. Code: https://github.com/joshua-taylor/conv-induction-babylm
- Downloads last month
- 2,794