Instructions to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF
Run Hermes
hermes
- MLX LM
How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF", "messages": [ {"role": "user", "content": "Hello"} ] }'
model.safetensors.index.json?
Hi,
I'm new to running local models so I apologize if i'm wasting time, but isn't an index required to run this model? I tried both on Exo as well as Inferencer with distributed compute across two 512GB Mac Studios with no luck.
Which version of Inferencer did you try running it on?
This is just an observation comment, not technical:
I can't say for certain, but I think one thing that was getting in my way was the network settings of my Macs. I was not being as pragmatic as I could be when I was trying to hunt down what my problems were. So I changed my settings to be...-> "Using DHCP with Manual Address" instead of Automatic DHCP. I'm pretty sure this helped me both with Exo and Inferencer.
I was using two 512GB Mac Studios to distribute the models across. I used both the models from Inferencer (the has quantized) as well as other models from Hugging Face to do the distribution of the models successfully.
I was then able to go further and connect to the distributed models and do inferencing from a third M2 Mac Ultra.
While it's only a comment, at least you know someone's achieved what you're looking to do, which I think is helpful anyway. At least I hope so.
Which version of Inferencer did you try running it on?
latest version, this was just a few hours ago
This is just an observation comment, not technical:
I can't say for certain, but I think one thing that was getting in my way was the network settings of my Macs. I was not being as pragmatic as I could be when I was trying to hunt down what my problems were. So I changed my settings to be...-> "Using DHCP with Manual Address" instead of Automatic DHCP. I'm pretty sure this helped me both with Exo and Inferencer.
I was using two 512GB Mac Studios to distribute the models across. I used both the models from Inferencer (the has quantized) as well as other models from Hugging Face to do the distribution of the models successfully.
I was then able to go further and connect to the distributed models and do inferencing from a third M2 Mac Ultra.While it's only a comment, at least you know someone's achieved what you're looking to do, which I think is helpful anyway. At least I hope so.
thank you; i am going to keep slamming my head against the wall and (hopefully) it will work <3
Can you provide a screenshot of the error you're seeing?