Pythia-160M Pre-Pretraining: `control_nca` (seed 324)

Trained from scratch using the ppt pre-pretraining research framework.

Training Details

Parameter	Value
Base architecture	EleutherAI/pythia-160m (reinitialized)
Regimen	`control_nca`
Seed	324
Stage 1 dataset	Shuffled NCA tokens (unstructured control)
Stage 1 steps	5000
Stage 2 dataset	OpenWebText
Stage 2 steps	10000
Optimizer	AdamW (lr=1e-3, wd=0.0)
Effective batch size	64
Sequence length	2048

Control Design

Stage 1: NCA tokens with shuffled order. Same token distribution as nca but no sequential structure. This is the unstructured control.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")
tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")

Citation

If you use this model, please cite the original pre-pretraining papers:

Papadimitriou & Jurafsky (2020) — tilt-transfer
Hahn & Rofin (2024) — pre-pretraining with formal languages (michahu)
Lee et al. (2024) — NCA pre-pretraining (danihyunlee)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sashaboguraev/pythia-160m-ppt-control_nca-seed324

Base model

EleutherAI/pythia-160m

Finetuned

(332)