VUAF-1-Femto
⚠️ Experimental research checkpoint — under active development. This model is a small from-scratch architecture exploration. Outputs may be incoherent, biased, or wrong. Do not use it for anything serious. Treat every release as a snapshot of work in progress.
VUAF (Void Ultimatus Architecture Fusion) is a tiny hybrid language model combining four ideas in a single stack:
- RWKV-7 "Goose" — generalized delta rule with vector-valued data-dependent decay, for fast O(1) recurrent memory
- Mamba-2 SSD — scalar-times-identity state-space duality, for matmul-friendly long-range mixing
- Jamba-style sparse attention — exactly one attention layer per stack, used as "rare strategic attention" for global lookups
- Top-1 sparse Mixture-of-Experts — specialized capacity
- Ouro-style parameter-shared loop — the entire stack is reapplied R times with shared weights, giving latent reasoning depth without more parameters
- PonderNet halt head — the model can learn to exit the loop early on easy tokens
This is VUAF-1-Femto, the smallest member of the family. ~2.46 M parameters.
Previously released as
VUAF-Pico. The name was changed toVUAF-1-Femtoonce the next size up (VUAF-1-Pico, ~8.72 M) joined the family. Old checkpoints still load: the registry keepsVUAF-Picoas an alias ofVUAF-1-Femto.
Open weights, closed training. The trained weights are released here under Apache-2.0. The training code, dataset curation, and full reference implementation are not public. A small inference-only Python package (
vuaf-inference) is published on PyPI so anyone can run the weights locally.
Status
- ⚠️ Architecture: implemented and tested, but not extensively tuned
- ⚠️ Training: only small experimental runs so far on a multi-category mix. Do not expect coherent generations beyond toy prompts.
- ⚠️ Tokenizer: SentencePiece BPE, vocab 4096. Trained jointly with the data, so checkpoints are tied to a specific tokenizer.
- ⚠️ No safety filtering, no instruction tuning, no RLHF.
- ⚠️ Inference API on this page is disabled (
inference: false). Run locally viavuaf-inference.
Architecture summary
| Family | VUAF |
| Variant | 1-Femto |
| Parameters | ~2.46 M |
| Vocab | 4 096 (SentencePiece BPE) |
d_model |
128 |
| Layers | 6 |
| Loop steps (R) | 4 (parameter-shared) |
| Token mixers | RWKV-7 × 3, Mamba-2 × 2, Sparse Attn × 1 |
| Channel mixers | SwiGLU × 3, top-1 MoE (4 experts) × 3 |
| Halt head | PonderNet-style, per-token early exit |
Layer pattern (token mixer / channel mixer):
0 RWKV-7 SwiGLU
1 Mamba-2 MoE
2 RWKV-7 SwiGLU
3 Attn MoE ← single sparse attention layer
4 RWKV-7 SwiGLU
5 Mamba-2 MoE
The 6-block stack is reapplied R = 4 times with shared weights (the Ouro loop), trained with PonderNet's entropy-regularized objective so the model can learn how much computation each token deserves.
Special tokens
VUAF uses category and role tokens at training and inference time:
<bos> sequence start
<eos> sequence end
<pad>
<unk>
<|user|> chat role
<|assistant|>
<|system|>
<|turn|> boundary between conversation turns
<|doc|> document marker
<|meta|> metadata block open
<|/meta|> metadata block close
<|cat:NAME|> one token per training category
Inputs are conditioned with <bos><|cat:NAME|><|meta|>...<|/meta|>...<|eos|>.
Files in this repo
model.safetensors weights
config.json VUAFConfig
spm.model SentencePiece tokenizer
spm.vocab tokenizer vocabulary listing
categories.json ordered category registry
tokenizer.json tokenizer metadata + special token ids
metadata.json training-run metadata
README.md this file
Usage
This checkpoint cannot be loaded with transformers directly. Install
the small inference-only package and load with one call:
pip install vuaf-inference
huggingface-cli download Sqersters/vuaf-1-femto --local-dir ./vuaf-1-femto
from vuaf_inference import load_pretrained, GenConfig, build_chat_prompt, generate
model, tokenizer = load_pretrained("./vuaf-1-femto")
prompt = build_chat_prompt(
tokenizer=tokenizer,
category="stories", # any registered category
system=None,
user_message="Once upon a time",
)
for tok_id, loop_step in generate(
model, tokenizer, prompt, GenConfig(max_new_tokens=64)
):
print(tokenizer.decode([tok_id]), end="", flush=True)
You can also load directly from the Hub without downloading first:
model, tokenizer = load_pretrained("Sqersters/vuaf-1-femto")
VUAF is GPU-only: it refuses to run on CPU. Tested on AMD Radeon
RX 7600 XT with torch 2.9.1+rocm7.2.1. Should also work on CUDA.
Training data
VUAF-1-Femto checkpoints in this repo are trained on a multi-category mix:
stories/— TinyStorieswiki/— wikitext-103code/— CodeParrot (Python)chat/— OpenAssistant Guanaco subsetqa/— SQuAD reformatted as conversations
Exact subset sizes and step count are recorded in metadata.json for
each release.
Limitations
- Tiny. 2.46 M parameters is below the floor where general-purpose LMs become useful. Expect: short coherent fragments at best, factually wrong content often, no real reasoning.
- Experimental architecture. The hybrid + loop + halt combination is research-stage. Layer ratios, decay parameterizations, and the halt-distribution tuning have not been thoroughly ablated.
- No alignment. No RLHF, DPO, or safety post-training.
- Tokenizer locked to checkpoint. Mix-and-match tokenizers from different checkpoints will not work.
- Generations may be unsafe, biased, or factually incorrect. Do not use VUAF outputs for decisions, advice, or anything user-facing without manual review.
Intended use
- Architecture research and ablation
- Studying loop-based latent reasoning in tiny models
- Reproduction of RWKV-7 / Mamba-2 / Jamba / Ouro patterns at small scale on consumer GPUs
Not intended for production, deployment, or end-user applications.
License
The trained weights in this repository are released under
Apache-2.0. The vuaf-inference package on PyPI is also Apache-2.0.
The training code, data pipeline, and full reference implementation
are not covered by this license and are not publicly available.
Citation
If you build on VUAF, please reference the architectural prior work that VUAF combines:
- Peng et al., RWKV-7 "Goose" with Expressive Dynamic State Evolution (arXiv:2503.14456)
- Dao & Gu, Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (arXiv:2405.21060)
- Lieber et al., Jamba: A Hybrid Transformer-Mamba Language Model (arXiv:2403.19887)
- Zhu et al., Scaling Latent Reasoning via Looped Language Models (arXiv:2510.25741)
- Banino et al., PonderNet: Learning to Ponder (arXiv:2107.05407)
- Downloads last month
- 23