Pythia-160M Pre-Pretraining: control_music_steps100 (seed 1024)

Trained from scratch using the ppt pre-pretraining research framework.

Training Details

Parameter Value
Base architecture EleutherAI/pythia-160m (reinitialized)
Regimen control_music_steps100
Seed 1024
Stage 1 dataset Shuffled MIDI tokens (unstructured control)
Stage 1 steps 100
Stage 2 dataset OpenWebText
Stage 2 steps 10000
Optimizer AdamW (lr=1e-3, wd=0.0)
Effective batch size 64
Sequence length 2048

Control Design

Stage 1: MIDI tokens with shuffled order. Same token distribution as music but no sequential structure. This is the unstructured control.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sashaboguraev/pythia-160m-ppt-control_music_steps100-seed1024")
tokenizer = AutoTokenizer.from_pretrained("sashaboguraev/pythia-160m-ppt-control_music_steps100-seed1024")

Citation

If you use this model, please cite the original pre-pretraining papers:

  • Papadimitriou & Jurafsky (2020) โ€” tilt-transfer
  • Hahn & Rofin (2024) โ€” pre-pretraining with formal languages (michahu)
  • Lee et al. (2024) โ€” NCA pre-pretraining (danihyunlee)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sashaboguraev/pythia-160m-ppt-control_music_steps100-seed1024

Finetuned
(332)
this model