Laguna XS.2 Dense K=8 Reconstruction 2k
Research checkpoint from the Poolside Laguna XS.2 hackathon.
This model is a dense replacement experiment for poolside/Laguna-XS.2: each routed MoE MLP is replaced by a dense SwiGLU block with routed width 8 * expert_intermediate_size, while the shared expert path is kept. The checkpoint was initialized from cm2435-new/laguna-xs2-dense-k8-copied-shell and trained for 2,000 layer-wise reconstruction steps on nvidia/OpenCodeInstruct with sequence length 2048.
Important: this is not a usable coding model yet. It reloads, forwards, and generates, and reconstruction moved it from random-token output toward language-shaped output, but tiny coding smoke eval still fails functional tests. It is intended as a starting point for token-level SFT / KL distillation.
Sanity Metrics
Tiny 5-prompt Python smoke eval on checkpoint-final:
- non-empty generations: 5/5
- parseable generations: 1/5
- tests passed: 0/5
- approximate generation speed in the local smoke script: 37.9 tok/s
Training Summary
- teacher:
poolside/Laguna-XS.2 - student init:
cm2435-new/laguna-xs2-dense-k8-copied-shell - dataset:
nvidia/OpenCodeInstruct - objective: teacher MLP output reconstruction across dense-replaced layers
- max steps: 2,000
- sequence length: 2048
- batch size: 1
- learning rate: 2e-4
- cosine auxiliary weight: 0.05
- final reconstruction loss: ~0.0266
- best logged reconstruction loss: ~0.0189 at step 1800
Loading
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"cm2435-new/laguna-xs2-dense-k8-recon-2k",
trust_remote_code=True,
dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("poolside/Laguna-XS.2", trust_remote_code=True)
- Downloads last month
- 27
Model tree for cm2435-new/laguna-xs2-dense-k8-recon-2k
Base model
poolside/Laguna-XS.2