Instructions to use Romarchive/toyllama-50m-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Romarchive/toyllama-50m-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Romarchive/toyllama-50m-GGUF",
	filename="toyllama-50m BF16 (2026)[Romarchive].gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Romarchive/toyllama-50m-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Romarchive/toyllama-50m-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf Romarchive/toyllama-50m-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Romarchive/toyllama-50m-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf Romarchive/toyllama-50m-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Romarchive/toyllama-50m-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf Romarchive/toyllama-50m-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Romarchive/toyllama-50m-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Romarchive/toyllama-50m-GGUF:BF16

Use Docker

docker model run hf.co/Romarchive/toyllama-50m-GGUF:BF16

LM Studio
Jan

vLLM

How to use Romarchive/toyllama-50m-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Romarchive/toyllama-50m-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Romarchive/toyllama-50m-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Romarchive/toyllama-50m-GGUF:BF16

Ollama
How to use Romarchive/toyllama-50m-GGUF with Ollama:
```
ollama run hf.co/Romarchive/toyllama-50m-GGUF:BF16
```

Unsloth Studio new

How to use Romarchive/toyllama-50m-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Romarchive/toyllama-50m-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Romarchive/toyllama-50m-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Romarchive/toyllama-50m-GGUF to start chatting

Docker Model Runner
How to use Romarchive/toyllama-50m-GGUF with Docker Model Runner:
```
docker model run hf.co/Romarchive/toyllama-50m-GGUF:BF16
```

Lemonade

How to use Romarchive/toyllama-50m-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Romarchive/toyllama-50m-GGUF:BF16

Run and chat with the model

lemonade run user.toyllama-50m-GGUF-BF16

List all available models

lemonade list

Brought to you by Romarchive

ToyLlama 50M

Third version of ToyLlama model. See more ToyLlamas in my profile.

(P.S. this one was intended to be 30M v2, but due to increase of data, I choose to use pretrained openai-community/gpt2 tokenizer)

Example generations

All were generated with test.py interactive CLI. (Usage: python3 test.py)

>>> Enter prompt: Once upon a time, the cow walked to the park
------------------------------------------------------------
Once upon a time, the cow walked to the park in the west of the town of Pippa. It was the first of the town's most popular and popular sports. The town was renamed a "Woo" in January 2019.
The town's name was changed to "Doves of the Year" in May 2021.
References
External links
1952 births
Living people
American male football players
American football pitchers
21st-century American football players
21st-century American football players
People from Bambisha, New York
Canadian men's footballers
New York City footballers
Basketball players from New York (state)
Players of American football from New York (state)
People from Santa Cruz, New York (state)
Australian
------------------------------------------------------------
>>> Enter prompt: Abraham Lincoln
------------------------------------------------------------
Abraham Lincoln, the first of the first students of the National Academy of Sciences, the first of the first students of the National Academy of Sciences. 
The university was founded in 2005 by the National Academy of Sciences in 2006.
The first student school of the university was named for the first student school in the University of Chicago in 2006. The college is named after the college.
The college is located in the north-central corner of the campus.  The school is located in the south-west corner of the campus.
The school is located in the centre of the city, and is now part of the University of Minnesota.
See also
List of the School of Arts and Sciences
References
External links
 University of Wisconsin at University
------------------------------------------------------------

(as you see, it became slightly wikipedian)

Training information

Trained for 26 hours on one RX 6600 (8GB VRAM).

Parameter	Value
Loss	2.5885
Epoches	1
grad_norm	0.5695
Learning rate	5e-4
Batch size	8
Gradient accumulation steps	4
Training tokens	13763555964 (~13B)

Training data

Training data was HF datasets + multiple other sites (2262401 files total):

$ du -sh ./data/*
12K	./data/cows.info.gf
472K	./data/github.com
5,8M	./data/gutenberg.org
88K	./data/habr.com
9,0G	./data/huggingface.co
474M	./data/textfiles.com
12K	./data/www.reddit.com
$ du -sh ./data/
9,5G	./data/
$ du -sh ./data/huggingface.co/*/*
34M	./data/huggingface.co/cornell-movie-review-data/rotten_tomatoes
21M	./data/huggingface.co/EleutherAI/lambada_openai
5,8M	./data/huggingface.co/gaianet/bible_bot
92K	./data/huggingface.co/gaianet/paris
344K	./data/huggingface.co/gaianet/trumpVSharris
16K	./data/huggingface.co/krinal/fifa_2022
680M	./data/huggingface.co/prayslaks/wikimedia_wikipedia_100K
8,2G	./data/huggingface.co/roneneldan/TinyStories
68M	./data/huggingface.co/stas/openwebtext-10k

Advanced module information

Detailed information about EXACT model.

Parameter	Value
Hidden size	512
Intermediate size	1536
Hidden layers	8
Attention heads	8
Key value heads	4
Total parameters	50906112

Downloads last month: 397

GGUF

Model size

50.9M params

Architecture

llama

Hardware compatibility

1-bit

2-bit

8-bit

16-bit

32-bit

Model tree for Romarchive/toyllama-50m-GGUF

Base model

sapbot/toyllama-50m

Quantized

(1)

this model

Romarchive
/

toyllama-50m-GGUF