--- language: en license: mit tags: - pre-pretraining - language-model - pythia-160m - control_nca - openwebtext base_model: EleutherAI/pythia-160m --- # Pythia-160M Pre-Pretraining: `control_nca` (seed 324) Trained from scratch using the [ppt](https://github.com/sashaboguraev/ppt) pre-pretraining research framework. ## Training Details | Parameter | Value | |-----------|-------| | Base architecture | EleutherAI/pythia-160m (reinitialized) | | Regimen | `control_nca` | | Seed | 324 | | Stage 1 dataset | Shuffled NCA tokens (unstructured control) | | Stage 1 steps | 5000 | | Stage 2 dataset | OpenWebText | | Stage 2 steps | 10000 | | Optimizer | AdamW (lr=1e-3, wd=0.0) | | Effective batch size | 64 | | Sequence length | 2048 | ## Control Design Stage 1: NCA tokens with *shuffled order*. Same token distribution as `nca` but no sequential structure. This is the unstructured control. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324") tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324") ``` ## Citation If you use this model, please cite the original pre-pretraining papers: - Papadimitriou & Jurafsky (2020) — tilt-transfer - Hahn & Rofin (2024) — pre-pretraining with formal languages (michahu) - Lee et al. (2024) — NCA pre-pretraining (danihyunlee)