Instructions to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="seacorn/Mistral-Small-24B-Instruct-2501-Reasoner") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("seacorn/Mistral-Small-24B-Instruct-2501-Reasoner") model = AutoModelForMultimodalLM.from_pretrained("seacorn/Mistral-Small-24B-Instruct-2501-Reasoner") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/seacorn/Mistral-Small-24B-Instruct-2501-Reasoner
- SGLang
How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seacorn/Mistral-Small-24B-Instruct-2501-Reasoner", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="seacorn/Mistral-Small-24B-Instruct-2501-Reasoner", max_seq_length=2048, ) - Docker Model Runner
How to use seacorn/Mistral-Small-24B-Instruct-2501-Reasoner with Docker Model Runner:
docker model run hf.co/seacorn/Mistral-Small-24B-Instruct-2501-Reasoner
Mistral-Small-24B-Instruct-2501-Reasoner (Experimental)
This model is a finetuned version of unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit on the open-thoughts/OpenThoughts-114k dataset, giving the model reasoning capability.
Fine Tuning Details
The base model was finetuned on the open-thoughts/OpenThoughts-114k dataset for 1 epoch, using a single RTX 4090 for approximately 71 hours.
LoRA details:
- LoRa Rank: 32
- LoRa Alpha: 16 # I know, I forgot to change this number after I changed the rank and only realised it when I'm almost done with the finetune
- Quantization: QLoRa
- Optim: adamw_8bit
- Learning rate: 2e-4
- Weight Decay: 0.01
- Learning rate scheduler type: linear
- Gradient accumulation steps: 8
- Per device train batch size: 2
Prompting Format
Recommended system prompt:
Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:
The model's response will be in the format of:
<|begin_of_thought|>
...
<|end_of_thought|>
<|begin_of_solution|>
...
<|end_of_solution|>
You may need to change the default prompt template to Llama 3, as highlighted in here if you're using program like LMStudio.
If you decided not to use the recommended system prompt, you may choose to prefix the model response with <|begin_of_thought|> to force the model into reasoning mode.
Appreciation
Thank you so much to the Unsloth team for their efforts in bringing finetuning to consumer level device. This finetune wouldn't be possible without their contribution.
- Downloads last month
- 2
Model tree for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner
Base model
mistralai/Mistral-Small-24B-Base-2501