Try the live demo Space: https://huggingface.co/spaces/Vedant3907/gpt2-irish-folk-tune-generator-space

Weekend project demo: an original dataset tune is played first, followed by a GPT-2 generated tune from similar ABC notation context.

GPT-2 Irish ABC Tune Generator

This model is a full fine-tune of openai-community/gpt2 on the sander-wood/irishman dataset, a collection of Irish folk tunes represented in ABC notation.

The model generates symbolic music text, not audio. Generated output can be pasted into an ABC player such as abc.rectanglered.com or ABCjs Editor to hear the tune.

Model Details

  • Base model: openai-community/gpt2
  • Training method: Full fine-tuning
  • Dataset: sander-wood/irishman
  • Task: Causal language modeling / ABC notation continuation
  • Max sequence length: 512 tokens
  • Training hardware: Google Colab T4 GPU
  • Training duration: Approximately one epoch
  • Validation loss: 0.9962592720985413

What The Model Learns

The training text was formatted as:

<control code>
<ABC notation>

So a prompt can start with a control code, and the model will continue by generating ABC notation.

Example prompt:

S:2 B:8 E:6 B:8

The output should look like ABC music notation with headers and note sequences.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Vedant3907/gpt2-irish-folk-tune-generator"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

def generate_tune(prompt="S:2 B:8 E:6 B:8\n", max_new_tokens=400):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.9,
            top_k=40,
            top_p=0.95,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    return tokenizer.decode(output[0], skip_special_tokens=True)

print(generate_tune())

Example Prompt

S:2 B:8 E:6 B:8

If the generated output does not include a complete ABC header, add one manually before playing it:

X:1
M:4/4
L:1/8
K:G

Then paste the full ABC text into:

Evaluation

The model was evaluated on the dataset validation split.

eval_loss: 0.9962592720985413

The final logged training loss was around 1.02, and the validation loss was close to that value, suggesting the model did not obviously overfit during this run.

Training Loss Table

Compact view of the training loss curve. The full logged loss table is available in training_loss_curve.csv.

Step Training Loss
50 3.143394
500 1.461160
1000 1.315162
1500 1.255494
2000 1.233580
2500 1.160884
3000 1.140212
3500 1.128867
4000 1.115015
4500 1.089873
5000 1.078019
5500 1.059447
6000 1.082058
6500 1.072979
7000 1.060261
7500 1.066326
8000 1.051891
8500 1.054574
9000 1.058629
9500 1.041122
10000 1.046246
10500 1.023033
11000 1.031550
11500 1.030448
12000 1.031115
12500 1.029919
12800 1.029599

Limitations

  • The model generates ABC notation, not direct audio.
  • Some generations may be syntactically invalid ABC.
  • Some outputs may need manual cleanup before playback.
  • The model may reproduce fragments or patterns from the training data.
  • Musical quality varies; sampling multiple outputs and selecting the best one is recommended.

Intended Use

This model is intended for experimentation with Irish folk tune generation, symbolic music modeling, and ABC notation text generation.

It is not intended for claims of originality or commercial music production without additional review for memorization and licensing concerns.

Training Summary

This was trained as a practical free-GPU fine-tuning experiment after the original Karpathy autoresearch training setup proved unsuitable for Google Colab's free T4 GPU. Instead of training from scratch, this model uses GPT-2 as a pretrained base and adapts it to Irish ABC notation.

Downloads last month
80
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vedant3907/gpt2-irish-folk-tune-generator

Finetuned
(2184)
this model
Quantizations
1 model

Dataset used to train Vedant3907/gpt2-irish-folk-tune-generator

Space using Vedant3907/gpt2-irish-folk-tune-generator 1