How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "doabell/olmostories-8m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "doabell/olmostories-8m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/doabell/olmostories-8m
Quick Links

olmostories-8m

TinyStories trained on OLMo 2 architecture.

Took around 2.5 hours on an A100 (80GB).

config = Olmo2Config(
    vocab_size=5000,
    hidden_size=288,
    intermediate_size=720,
    num_hidden_layers=6,
    num_attention_heads=6, 
    num_key_value_heads=6,
    max_position_embeddings=256,
    initializer_range=0.02,
    attention_dropout=0.1,
)
Downloads last month
6
Safetensors
Model size
8.61M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train doabell/olmostories-8m

Space using doabell/olmostories-8m 1