Instructions to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF
Run Hermes
hermes
- MLX LM
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF", "messages": [ {"role": "user", "content": "Hello"} ] }'
Commit History
Update chat_template.jinja 82db41e verified
Update chat_template.jinja 915592b verified
Upload model file 920b1c7 verified
Upload model file 8d89a47 verified
Upload model file 69c11c8 verified
Upload model file 4d0f1f8 verified
Upload model file 96c6b12 verified
Upload model file 5597f1d verified
Upload model file 674f420 verified
Upload model file 1619a96 verified
Upload model file fa3cca5 verified
Upload model file ea8fb3b verified
Upload model file 3db323e verified
Upload model file 69ff009 verified
Upload model file eb86fe4 verified
Squash history to free storage 7296cd4
History Squash commited on