Laguna XS.2 Dense K=8 Reconstruction 2k

Research checkpoint from the Poolside Laguna XS.2 hackathon.

This model is a dense replacement experiment for poolside/Laguna-XS.2: each routed MoE MLP is replaced by a dense SwiGLU block with routed width 8 * expert_intermediate_size, while the shared expert path is kept. The checkpoint was initialized from cm2435-new/laguna-xs2-dense-k8-copied-shell and trained for 2,000 layer-wise reconstruction steps on nvidia/OpenCodeInstruct with sequence length 2048.

Important: this is not a usable coding model yet. It reloads, forwards, and generates, and reconstruction moved it from random-token output toward language-shaped output, but tiny coding smoke eval still fails functional tests. It is intended as a starting point for token-level SFT / KL distillation.

Sanity Metrics

Tiny 5-prompt Python smoke eval on checkpoint-final:

non-empty generations: 5/5
parseable generations: 1/5
tests passed: 0/5
approximate generation speed in the local smoke script: 37.9 tok/s

Training Summary

teacher: poolside/Laguna-XS.2
student init: cm2435-new/laguna-xs2-dense-k8-copied-shell
dataset: nvidia/OpenCodeInstruct
objective: teacher MLP output reconstruction across dense-replaced layers
max steps: 2,000
sequence length: 2048
batch size: 1
learning rate: 2e-4
cosine auxiliary weight: 0.05
final reconstruction loss: ~0.0266
best logged reconstruction loss: ~0.0189 at step 1800

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "cm2435-new/laguna-xs2-dense-k8-recon-2k",
    trust_remote_code=True,
    dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("poolside/Laguna-XS.2", trust_remote_code=True)

Downloads last month: 27

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cm2435-new/laguna-xs2-dense-k8-recon-2k

Base model

poolside/Laguna-XS.2

Finetuned

(23)

this model