---
language: en
license: mit
tags:
  - pre-pretraining
  - language-model
  - pythia-160m
  - control_nca
  - openwebtext
base_model: EleutherAI/pythia-160m
---

# Pythia-160M Pre-Pretraining: `control_nca` (seed 324)

Trained from scratch using the [ppt](https://github.com/sashaboguraev/ppt)
pre-pretraining research framework.

## Training Details

| Parameter | Value |
|-----------|-------|
| Base architecture | EleutherAI/pythia-160m (reinitialized) |
| Regimen | `control_nca` |
| Seed | 324 |
| Stage 1 dataset | Shuffled NCA tokens (unstructured control) |
| Stage 1 steps | 5000 |
| Stage 2 dataset | OpenWebText |
| Stage 2 steps | 10000 |
| Optimizer | AdamW (lr=1e-3, wd=0.0) |
| Effective batch size | 64 |
| Sequence length | 2048 |

## Control Design

Stage 1: NCA tokens with *shuffled order*. Same token distribution as `nca` but no sequential structure. This is the unstructured control.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")
tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_nca-seed324")
```

## Citation

If you use this model, please cite the original pre-pretraining papers:

- Papadimitriou & Jurafsky (2020) — tilt-transfer
- Hahn & Rofin (2024) — pre-pretraining with formal languages (michahu)
- Lee et al. (2024) — NCA pre-pretraining (danihyunlee)