nanoGPT โ€” TinyStories

A ~30M-parameter GPT trained from scratch on the TinyStories dataset.

  • Tokenizer: GPT-2 BPE (tiktoken)
  • Architecture: 6 layers, 6 heads, 384 embedding dim, context 256
  • Best val loss: 1.7052

Load

import torch, json
import tiktoken

config = json.load(open('model_config.json'))
ckpt   = torch.load('ckpt.pt', map_location='cpu')
# Re-instantiate GPT with the same config, then load state dict.
enc = tiktoken.get_encoding('gpt2')
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support