Arabic AI TinyStories GPT (12M parameters)

A small decoder-only GPT trained from scratch on TinyStories. Built line-by-line in PyTorch as part of the Build a GPT From Scratch on a 6 GB GPU YouTube playlist series โ€” no transformers model classes.

Model description

Property Value
Parameters ~12.4M
Layers 6
Heads 6
Embedding dim 384
Context length 256 tokens
Vocabulary 4,096 (custom BPE)
Training data TinyStories
Training steps ~25,000
Validation loss ~1.7 (varies by run)

The model writes short children's stories in English. It was trained with a custom tokenizer (tinystories_tokenizer.json) and a hand-written GPT class in pure PyTorch.

Files in this repository

File Description
model.pt Full training checkpoint (model, config, step, val_loss)
config.json Architecture hyperparameters
tinystories_tokenizer.json BPE tokenizer from Episode 2
model.py GPT and GPTConfig source (for loading locally)

Usage

Install dependencies

pip install torch tokenizers huggingface_hub

Download and generate

import json
import torch
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer

REPO_ID = "luayas1977/arabicai-tinystories-gpt"

# Download artifacts
ckpt_path = hf_hub_download(repo_id=REPO_ID, filename="model.pt")
tok_path = hf_hub_download(repo_id=REPO_ID, filename="tinystories_tokenizer.json")
model_py = hf_hub_download(repo_id=REPO_ID, filename="model.py")

# Load architecture (download model.py into cwd or add to path)
import importlib.util
spec = importlib.util.spec_from_file_location("gpt_model", model_py)
gpt_model = importlib.util.module_from_spec(spec)
spec.loader.exec_module(gpt_model)
GPT = gpt_model.GPT

device = "cuda" if torch.cuda.is_available() else "cpu"
checkpoint = torch.load(ckpt_path, map_location=device, weights_only=False)

model = GPT(checkpoint["config"]).to(device)
model.load_state_dict(checkpoint["model"])
model.eval()

tokenizer = Tokenizer.from_file(tok_path)
prompt = "Once upon a time"
prompt_ids = tokenizer.encode(prompt).ids
idx = torch.tensor([prompt_ids], dtype=torch.long, device=device)

eos_id = tokenizer.token_to_id("<|endoftext|>")
with torch.no_grad():
    out = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40, eos_token_id=eos_id)

text = tokenizer.decode(out[0].tolist())
print(text)

Try it in the browser

Arabic AI TinyStories GPT Playground โ€” interactive Gradio demo with temperature, top-k, and max-tokens controls.

Model repo: luayas1977/arabicai-tinystories-gpt

Training details

  • Optimizer: AdamW (lr 3e-4, weight decay 0.1)
  • Schedule: Linear warmup (500 steps) + cosine decay
  • Batch size: 32 sequences ร— 256 tokens
  • Hardware: Consumer GPU (~6 GB VRAM)
  • Runtime: ~2 hours for 25,000 steps

Limitations

  • Small model trained only on TinyStories โ€” not suitable for general knowledge, code, or adult topics
  • English only
  • Context limited to 256 tokens
  • May repeat story openers or run past a natural ending if EOS was not learned strongly (depends on data prep)

Citation / series

Part of Build a GPT From Scratch on a 6 GB GPU โ€” YouTube playlist.

Downloads last month
970
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train luayas1977/arabicai-tinystories-gpt

Space using luayas1977/arabicai-tinystories-gpt 1