Pythia-160M Pre-Pretraining: control_nca (seed 324)

Trained from scratch using the ppt pre-pretraining research framework.

Training Details

Parameter Value
Base architecture EleutherAI/pythia-160m (reinitialized)
Regimen control_nca
Seed 324
Stage 1 dataset Shuffled NCA tokens (unstructured control)
Stage 1 steps 5000
Stage 2 dataset OpenWebText
Stage 2 steps 10000
Optimizer AdamW (lr=1e-3, wd=0.0)
Effective batch size 64
Sequence length 2048

Control Design

Stage 1: NCA tokens with shuffled order. Same token distribution as nca but no sequential structure. This is the unstructured control.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")
tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")

Citation

If you use this model, please cite the original pre-pretraining papers:

  • Papadimitriou & Jurafsky (2020) โ€” tilt-transfer
  • Hahn & Rofin (2024) โ€” pre-pretraining with formal languages (michahu)
  • Lee et al. (2024) โ€” NCA pre-pretraining (danihyunlee)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sashaboguraev/pythia-160m-ppt-control_nca-seed324

Finetuned
(332)
this model