How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anakin87/LFM2-2.6B-ttt-rl-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anakin87/LFM2-2.6B-ttt-rl-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/anakin87/LFM2-2.6B-ttt-rl-merged
Quick Links

LFM2-2.6B-ttt-rl-merged

Merged standalone model after the first round of CISPO training for Tic Tac Toe.

This is the result of merging anakin87/LFM2-2.6B-ttt-rl (LoRA adapter) into anakin87/LFM2-2.6B-ttt-sft (SFT base). It serves as the base for the second round of RL training.

This is an intermediate checkpoint from 🎓 LLM RL Environments Lil Course, a hands-on course on building RL environments for Language Models, where models learn from rewards, not examples. It walks through the full process of turning a small open model into a specialist that outperforms a large proprietary one on a specific task (Tic Tac Toe). The final model is anakin87/LFM2-2.6B-mr-tictactoe.

🤗🕹️ Play against the final model

Evaluation

100 games per setting.

Model vs random opponent % Wins % Draws % Losses % Follows format % Games w invalid moves
LiquidAI/LFM2-2.6B 40 11 49 27.8 40
anakin87/LFM2-2.6B-ttt-sft 74 13 13 99.8 11
anakin87/LFM2-2.6B-ttt-rl-merged 86 12 2 100 1
Model vs optimal opponent % Wins % Draws % Losses % Follows format % Games w invalid moves
LiquidAI/LFM2-2.6B 0 11 89 24.7 43
anakin87/LFM2-2.6B-ttt-sft 0 52 48 99 14
anakin87/LFM2-2.6B-ttt-rl-merged 0 85 15 100 1

Competent player, but still falls into fork traps against the optimal opponent.

Downloads last month
2
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anakin87/LFM2-2.6B-ttt-rl-merged

Finetuned
(1)
this model
Adapters
1 model
Finetunes
1 model

Collection including anakin87/LFM2-2.6B-ttt-rl-merged