Pythia-160M Pre-Pretraining: control_nca (seed 324)
Trained from scratch using the ppt pre-pretraining research framework.
Training Details
| Parameter | Value |
|---|---|
| Base architecture | EleutherAI/pythia-160m (reinitialized) |
| Regimen | control_nca |
| Seed | 324 |
| Stage 1 dataset | Shuffled NCA tokens (unstructured control) |
| Stage 1 steps | 5000 |
| Stage 2 dataset | OpenWebText |
| Stage 2 steps | 10000 |
| Optimizer | AdamW (lr=1e-3, wd=0.0) |
| Effective batch size | 64 |
| Sequence length | 2048 |
Control Design
Stage 1: NCA tokens with shuffled order. Same token distribution as nca but no sequential structure. This is the unstructured control.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")
tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")
Citation
If you use this model, please cite the original pre-pretraining papers:
- Papadimitriou & Jurafsky (2020) โ tilt-transfer
- Hahn & Rofin (2024) โ pre-pretraining with formal languages (michahu)
- Lee et al. (2024) โ NCA pre-pretraining (danihyunlee)
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for sashaboguraev/pythia-160m-ppt-control_nca-seed324
Base model
EleutherAI/pythia-160m