Pythia-160M Pre-Pretraining: `control_music_steps100` (seed 1024)

Trained from scratch using the ppt pre-pretraining research framework.

Training Details

Parameter	Value
Base architecture	EleutherAI/pythia-160m (reinitialized)
Regimen	`control_music_steps100`
Seed	1024
Stage 1 dataset	Shuffled MIDI tokens (unstructured control)
Stage 1 steps	100
Stage 2 dataset	OpenWebText
Stage 2 steps	10000
Optimizer	AdamW (lr=1e-3, wd=0.0)
Effective batch size	64
Sequence length	2048

Control Design

Stage 1: MIDI tokens with shuffled order. Same token distribution as music but no sequential structure. This is the unstructured control.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_music_steps100-seed1024")
tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_music_steps100-seed1024")

Citation

If you use this model, please cite the original pre-pretraining papers:

Papadimitriou & Jurafsky (2020) — tilt-transfer
Hahn & Rofin (2024) — pre-pretraining with formal languages (michahu)
Lee et al. (2024) — NCA pre-pretraining (danihyunlee)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sashaboguraev/pythia-160m-ppt-control_music_steps100-seed1024

Base model

EleutherAI/pythia-160m

Finetuned

(332)