Instructions to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="seacorn/Mistral-Small-24B-Instruct-2501-Reasoner")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("seacorn/Mistral-Small-24B-Instruct-2501-Reasoner")
model = AutoModelForMultimodalLM.from_pretrained("seacorn/Mistral-Small-24B-Instruct-2501-Reasoner")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/seacorn/Mistral-Small-24B-Instruct-2501-Reasoner

SGLang

How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="seacorn/Mistral-Small-24B-Instruct-2501-Reasoner",
    max_seq_length=2048,
)

Docker Model Runner
How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with Docker Model Runner:
```
docker model run hf.co/seacorn/Mistral-Small-24B-Instruct-2501-Reasoner
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Mistral-Small-24B-Instruct-2501-Reasoner (Experimental)

This model is a finetuned version of unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit on the open-thoughts/OpenThoughts-114k dataset, giving the model reasoning capability.

Fine Tuning Details

The base model was finetuned on the open-thoughts/OpenThoughts-114k dataset for 1 epoch, using a single RTX 4090 for approximately 71 hours.

LoRA details:

LoRa Rank: 32
LoRa Alpha: 16 # I know, I forgot to change this number after I changed the rank and only realised it when I'm almost done with the finetune
Quantization: QLoRa
Optim: adamw_8bit
Learning rate: 2e-4
Weight Decay: 0.01
Learning rate scheduler type: linear
Gradient accumulation steps: 8
Per device train batch size: 2

Prompting Format

Recommended system prompt:

Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:

The model's response will be in the format of:

<|begin_of_thought|>
...
<|end_of_thought|>

<|begin_of_solution|>
...
<|end_of_solution|>

You may need to change the default prompt template to Llama 3, as highlighted in here if you're using program like LMStudio.

If you decided not to use the recommended system prompt, you may choose to prefix the model response with <|begin_of_thought|> to force the model into reasoning mode.

Appreciation

Thank you so much to the Unsloth team for their efforts in bringing finetuning to consumer level device. This finetune wouldn't be possible without their contribution.

Downloads last month: 2

Safetensors

Model size

24B params

Tensor type

BF16

Model tree for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner

Base model

mistralai/Mistral-Small-24B-Base-2501

Finetuned

mistralai/Mistral-Small-24B-Instruct-2501

Quantized

unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit

Finetuned

(12)

this model

Quantizations

2 models

seacorn
/

Mistral-Small-24B-Instruct-2501-Reasoner

Mistral-Small-24B-Instruct-2501-Reasoner (Experimental)

Fine Tuning Details

Prompting Format

Appreciation

Model tree for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner

Dataset used to train seacorn/Mistral-Small-24B-Instruct-2501-Reasoner