Laguna XS.2 Dense K=8 Reconstruction 2k

Research checkpoint from the Poolside Laguna XS.2 hackathon.

This model is a dense replacement experiment for poolside/Laguna-XS.2: each routed MoE MLP is replaced by a dense SwiGLU block with routed width 8 * expert_intermediate_size, while the shared expert path is kept. The checkpoint was initialized from cm2435-new/laguna-xs2-dense-k8-copied-shell and trained for 2,000 layer-wise reconstruction steps on nvidia/OpenCodeInstruct with sequence length 2048.

Important: this is not a usable coding model yet. It reloads, forwards, and generates, and reconstruction moved it from random-token output toward language-shaped output, but tiny coding smoke eval still fails functional tests. It is intended as a starting point for token-level SFT / KL distillation.

Sanity Metrics

Tiny 5-prompt Python smoke eval on checkpoint-final:

  • non-empty generations: 5/5
  • parseable generations: 1/5
  • tests passed: 0/5
  • approximate generation speed in the local smoke script: 37.9 tok/s

Training Summary

  • teacher: poolside/Laguna-XS.2
  • student init: cm2435-new/laguna-xs2-dense-k8-copied-shell
  • dataset: nvidia/OpenCodeInstruct
  • objective: teacher MLP output reconstruction across dense-replaced layers
  • max steps: 2,000
  • sequence length: 2048
  • batch size: 1
  • learning rate: 2e-4
  • cosine auxiliary weight: 0.05
  • final reconstruction loss: ~0.0266
  • best logged reconstruction loss: ~0.0189 at step 1800

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "cm2435-new/laguna-xs2-dense-k8-recon-2k",
    trust_remote_code=True,
    dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("poolside/Laguna-XS.2", trust_remote_code=True)
Downloads last month
27
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cm2435-new/laguna-xs2-dense-k8-recon-2k

Finetuned
(23)
this model