Instructions to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF

Run Hermes

hermes

MLX LM

How to use inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "inferencerlabs/MiMo-V2.5-Pro-MLX-4.3bit-INF",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

model.safetensors.index.json?

by jeweled - opened 8 days ago

Discussion

jeweled

8 days ago

•

edited 8 days ago

Hi,

I'm new to running local models so I apologize if i'm wasting time, but isn't an index required to run this model? I tried both on Exo as well as Inferencer with distributed compute across two 512GB Mac Studios with no luck.

inferencerlabs

Owner 8 days ago

Which version of Inferencer did you try running it on?

phuket2

8 days ago

This is just an observation comment, not technical:
I can't say for certain, but I think one thing that was getting in my way was the network settings of my Macs. I was not being as pragmatic as I could be when I was trying to hunt down what my problems were. So I changed my settings to be...-> "Using DHCP with Manual Address" instead of Automatic DHCP. I'm pretty sure this helped me both with Exo and Inferencer.
I was using two 512GB Mac Studios to distribute the models across. I used both the models from Inferencer (the has quantized) as well as other models from Hugging Face to do the distribution of the models successfully.
I was then able to go further and connect to the distributed models and do inferencing from a third M2 Mac Ultra.

While it's only a comment, at least you know someone's achieved what you're looking to do, which I think is helpful anyway. At least I hope so.

jeweled

8 days ago

Which version of Inferencer did you try running it on?

latest version, this was just a few hours ago

This is just an observation comment, not technical:
I can't say for certain, but I think one thing that was getting in my way was the network settings of my Macs. I was not being as pragmatic as I could be when I was trying to hunt down what my problems were. So I changed my settings to be...-> "Using DHCP with Manual Address" instead of Automatic DHCP. I'm pretty sure this helped me both with Exo and Inferencer.
I was using two 512GB Mac Studios to distribute the models across. I used both the models from Inferencer (the has quantized) as well as other models from Hugging Face to do the distribution of the models successfully.
I was then able to go further and connect to the distributed models and do inferencing from a third M2 Mac Ultra.

While it's only a comment, at least you know someone's achieved what you're looking to do, which I think is helpful anyway. At least I hope so.

thank you; i am going to keep slamming my head against the wall and (hopefully) it will work <3

inferencerlabs

Owner 8 days ago

Can you provide a screenshot of the error you're seeing?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment