VUAF-1-Femto

⚠️ Experimental research checkpoint — under active development. This model is a small from-scratch architecture exploration. Outputs may be incoherent, biased, or wrong. Do not use it for anything serious. Treat every release as a snapshot of work in progress.

VUAF (Void Ultimatus Architecture Fusion) is a tiny hybrid language model combining four ideas in a single stack:

RWKV-7 "Goose" — generalized delta rule with vector-valued data-dependent decay, for fast O(1) recurrent memory
Mamba-2 SSD — scalar-times-identity state-space duality, for matmul-friendly long-range mixing
Jamba-style sparse attention — exactly one attention layer per stack, used as "rare strategic attention" for global lookups
Top-1 sparse Mixture-of-Experts — specialized capacity
Ouro-style parameter-shared loop — the entire stack is reapplied R times with shared weights, giving latent reasoning depth without more parameters
PonderNet halt head — the model can learn to exit the loop early on easy tokens

This is VUAF-1-Femto, the smallest member of the family. ~2.46 M parameters.

Previously released as VUAF-Pico. The name was changed to VUAF-1-Femto once the next size up (VUAF-1-Pico, ~8.72 M) joined the family. Old checkpoints still load: the registry keeps VUAF-Pico as an alias of VUAF-1-Femto.

Open weights, closed training. The trained weights are released here under Apache-2.0. The training code, dataset curation, and full reference implementation are not public. A small inference-only Python package (vuaf-inference) is published on PyPI so anyone can run the weights locally.

Status

⚠️ Architecture: implemented and tested, but not extensively tuned
⚠️ Training: only small experimental runs so far on a multi-category mix. Do not expect coherent generations beyond toy prompts.
⚠️ Tokenizer: SentencePiece BPE, vocab 4096. Trained jointly with the data, so checkpoints are tied to a specific tokenizer.
⚠️ No safety filtering, no instruction tuning, no RLHF.
⚠️ Inference API on this page is disabled (inference: false). Run locally via vuaf-inference.

Architecture summary


Family	VUAF
Variant	1-Femto
Parameters	~2.46 M
Vocab	4 096 (SentencePiece BPE)
`d_model`	128
Layers	6
Loop steps (R)	4 (parameter-shared)
Token mixers	RWKV-7 × 3, Mamba-2 × 2, Sparse Attn × 1
Channel mixers	SwiGLU × 3, top-1 MoE (4 experts) × 3
Halt head	PonderNet-style, per-token early exit

Layer pattern (token mixer / channel mixer):

0  RWKV-7    SwiGLU
1  Mamba-2   MoE
2  RWKV-7    SwiGLU
3  Attn      MoE        ← single sparse attention layer
4  RWKV-7    SwiGLU
5  Mamba-2   MoE

The 6-block stack is reapplied R = 4 times with shared weights (the Ouro loop), trained with PonderNet's entropy-regularized objective so the model can learn how much computation each token deserves.

Special tokens

VUAF uses category and role tokens at training and inference time:

<bos>           sequence start
<eos>           sequence end
<pad>
<unk>
<|user|>        chat role
<|assistant|>
<|system|>
<|turn|>        boundary between conversation turns
<|doc|>         document marker
<|meta|>        metadata block open
<|/meta|>       metadata block close
<|cat:NAME|>    one token per training category

Inputs are conditioned with <bos><|cat:NAME|><|meta|>...<|/meta|>...<|eos|>.

Files in this repo

model.safetensors   weights
config.json         VUAFConfig
spm.model           SentencePiece tokenizer
spm.vocab           tokenizer vocabulary listing
categories.json     ordered category registry
tokenizer.json      tokenizer metadata + special token ids
metadata.json       training-run metadata
README.md           this file

Usage

This checkpoint cannot be loaded with transformers directly. Install the small inference-only package and load with one call:

pip install vuaf-inference
huggingface-cli download Sqersters/vuaf-1-femto --local-dir ./vuaf-1-femto

from vuaf_inference import load_pretrained, GenConfig, build_chat_prompt, generate

model, tokenizer = load_pretrained("./vuaf-1-femto")

prompt = build_chat_prompt(
    tokenizer=tokenizer,
    category="stories",          # any registered category
    system=None,
    user_message="Once upon a time",
)
for tok_id, loop_step in generate(
    model, tokenizer, prompt, GenConfig(max_new_tokens=64)
):
    print(tokenizer.decode([tok_id]), end="", flush=True)

You can also load directly from the Hub without downloading first:

model, tokenizer = load_pretrained("Sqersters/vuaf-1-femto")

VUAF is GPU-only: it refuses to run on CPU. Tested on AMD Radeon RX 7600 XT with torch 2.9.1+rocm7.2.1. Should also work on CUDA.

Training data

VUAF-1-Femto checkpoints in this repo are trained on a multi-category mix:

stories/ — TinyStories
wiki/ — wikitext-103
code/ — CodeParrot (Python)
chat/ — OpenAssistant Guanaco subset
qa/ — SQuAD reformatted as conversations

Exact subset sizes and step count are recorded in metadata.json for each release.

Limitations

Tiny. 2.46 M parameters is below the floor where general-purpose LMs become useful. Expect: short coherent fragments at best, factually wrong content often, no real reasoning.
Experimental architecture. The hybrid + loop + halt combination is research-stage. Layer ratios, decay parameterizations, and the halt-distribution tuning have not been thoroughly ablated.
No alignment. No RLHF, DPO, or safety post-training.
Tokenizer locked to checkpoint. Mix-and-match tokenizers from different checkpoints will not work.
Generations may be unsafe, biased, or factually incorrect. Do not use VUAF outputs for decisions, advice, or anything user-facing without manual review.

Intended use

Architecture research and ablation
Studying loop-based latent reasoning in tiny models
Reproduction of RWKV-7 / Mamba-2 / Jamba / Ouro patterns at small scale on consumer GPUs

Not intended for production, deployment, or end-user applications.

License

The trained weights in this repository are released under Apache-2.0. The vuaf-inference package on PyPI is also Apache-2.0. The training code, data pipeline, and full reference implementation are not covered by this license and are not publicly available.

Citation

If you build on VUAF, please reference the architectural prior work that VUAF combines:

Peng et al., RWKV-7 "Goose" with Expressive Dynamic State Evolution (arXiv:2503.14456)
Dao & Gu, Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (arXiv:2405.21060)
Lieber et al., Jamba: A Hybrid Transformer-Mamba Language Model (arXiv:2403.19887)
Zhu et al., Scaling Latent Reasoning via Looped Language Models (arXiv:2510.25741)
Banino et al., PonderNet: Learning to Ponder (arXiv:2107.05407)