--- license: mit language: - en tags: - gpt - transformer - pytorch - language-model - text-generation library_name: pytorch --- # Custom GPT Language Model A custom GPT-style autoregressive transformer language model implemented from scratch in PyTorch. This project contains: - custom multi-head self-attention - transformer blocks - causal masking - autoregressive text generation - mixed precision training - top-k / top-p sampling - safetensors model weights The model was trained on a subset of FineWeb-Edu using a GPT-2 tokenizer. --- # Architecture Model configuration: ```python { "vocab_size": 50257, "context_length": 256, "emb_dim": 768, "n_heads": 12, "n_layers": 12, "drop_rate": 0.1, "qkv_bias": False } ``` Approximate parameter count: - ~124M parameters Architecture components: - token embeddings - positional embeddings - masked multi-head self-attention - feed-forward MLP blocks - pre-layer normalization - residual connections - causal language modeling head --- # Training Training setup: - PyTorch - AdamW optimizer - Automatic Mixed Precision (AMP) - Gradient clipping - Top-k / Top-p text generation Hardware used: - RTX 3060 Ti 8GB Dataset: - FineWeb-Edu subset (10M tokens) Tokenizer: - GPT-2 tokenizer --- # Installation Install dependencies: ```bash pip install torch transformers safetensors ``` --- # Loading The Model ```python import json import torch from safetensors.torch import load_file from transformers import AutoTokenizer from model import GPTModel # load config with open("config.json") as f: cfg = json.load(f) # create model model = GPTModel(cfg) # load weights state_dict = load_file("model.safetensors") model.load_state_dict(state_dict) model.eval() # tokenizer tokenizer = AutoTokenizer.from_pretrained(".") ``` --- # Text Generation Example ```python from model import generate_and_print_sample print(generate_and_print_sample(model, tokenizer, "cuda", "The world is big")) ``` --- # Sample Generations Example generations from early-stage training: > "The world is big and is a whole for children. The best part of which has been made in the lives, and the state is an ideal man, but also the same one is in the world. “The only one has been created by people,” said the new study of the journal In the past, it is the best “s not people who have no longer to have not been seen in a few years.” “The only one who have one, the most famous in the country has no one at least three years. “If you’re very low, it is not a big or less than one’s risk.” The study is a study of people who have already been reported that the risk of people who are diagnosed with HIV-S" The model currently demonstrates: - syntactic coherence - topic persistence - autoregressive language modeling - early semantic structure --- # Files ```text model.py # GPT architecture model.safetensors # trained weights config.json # model configuration tokenizer files # GPT2 tokenizer assets README.md # project documentation ``` --- # Notes This is a custom PyTorch implementation and is not directly compatible with Hugging Face `AutoModelForCausalLM`. Users should load the model using the provided `model.py` architecture. --- # License MIT License.