nanoGPT โ TinyStories
A ~30M-parameter GPT trained from scratch on the TinyStories dataset.
- Tokenizer: GPT-2 BPE (
tiktoken) - Architecture: 6 layers, 6 heads, 384 embedding dim, context 256
- Best val loss: 1.7052
Load
import torch, json
import tiktoken
config = json.load(open('model_config.json'))
ckpt = torch.load('ckpt.pt', map_location='cpu')
# Re-instantiate GPT with the same config, then load state dict.
enc = tiktoken.get_encoding('gpt2')