Instructions to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF
Run Hermes
hermes
- MLX LM
How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF", "messages": [ {"role": "user", "content": "Hello"} ] }'
Update chat_template.jinja
Browse files- chat_template.jinja +58 -8
chat_template.jinja
CHANGED
|
@@ -4,16 +4,41 @@
|
|
| 4 |
{{- "Reasoning Effort: Absolute maximum with no shortcuts permitted.\nYou MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.\nExplicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked.\n\n" -}}
|
| 5 |
{%- endif -%}
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
{%- for message in messages -%}
|
| 8 |
-
{%- if message['role'] == '
|
| 9 |
-
{{-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
{%- endif -%}
|
| 14 |
|
| 15 |
-
{
|
| 16 |
-
|
|
|
|
| 17 |
|
| 18 |
{%- elif message['role'] == 'assistant' -%}
|
| 19 |
{{- "<|Assistant|>" -}}
|
|
@@ -22,7 +47,32 @@
|
|
| 22 |
{%- else -%}
|
| 23 |
{{- "</think>" -}}
|
| 24 |
{%- endif -%}
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
{%- if not loop.last -%}
|
| 27 |
{{- eos_token -}}
|
| 28 |
{%- endif -%}
|
|
|
|
| 4 |
{{- "Reasoning Effort: Absolute maximum with no shortcuts permitted.\nYou MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.\nExplicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked.\n\n" -}}
|
| 5 |
{%- endif -%}
|
| 6 |
|
| 7 |
+
{#- System Prompt -#}
|
| 8 |
+
{%- if messages|length > 0 and messages[0]['role'] == 'system' -%}
|
| 9 |
+
{{- messages[0]['content'] -}}
|
| 10 |
+
{%- endif -%}
|
| 11 |
+
|
| 12 |
+
{#- Tools Template -#}
|
| 13 |
+
{%- if tools -%}
|
| 14 |
+
{{- "\n\n## Tools\n\nYou have access to a set of tools to help answer the user's question. You can invoke tools by writing a \"<|DSML|tool_calls>\" block like the following:\n\n<|DSML|tool_calls>\n<|DSML|invoke name=\"$TOOL_NAME\">\n<|DSML|parameter name=\"$PARAMETER_NAME\" string=\"true|false\">$PARAMETER_VALUE</|DSML|parameter>\n...\n</|DSML|invoke>\n<|DSML|invoke name=\"$TOOL_NAME2\">\n...\n</|DSML|invoke>\n</|DSML|tool_calls>\n\nString parameters should be specified as is and set `string=\"true\"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string=\"false\"`.\n\nIf thinking_mode is enabled (triggered by <think>), you MUST output your complete reasoning inside <think>...</think> BEFORE any tool calls or final response.\n\nOtherwise, output directly after </think> with tool calls or final response.\n\n### Available Tool Schemas\n\n" -}}
|
| 15 |
+
{%- for tool in tools -%}
|
| 16 |
+
{{- tool | tojson -}}{{- "\n" -}}
|
| 17 |
+
{%- endfor -%}
|
| 18 |
+
{{- "\nYou MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls." -}}
|
| 19 |
+
{%- endif -%}
|
| 20 |
+
|
| 21 |
{%- for message in messages -%}
|
| 22 |
+
{%- if message['role'] == 'user' or message['role'] == 'developer' -%}
|
| 23 |
+
{{- "<|User|>" -}}
|
| 24 |
+
|
| 25 |
+
{#- 2a. Handle content blocks if preprocessed (for tool results merged into user) -#}
|
| 26 |
+
{%- if message['content_blocks'] -%}
|
| 27 |
+
{%- for block in message['content_blocks'] -%}
|
| 28 |
+
{%- if block['type'] == 'text' -%}
|
| 29 |
+
{{- block['text'] -}}
|
| 30 |
+
{%- elif block['type'] == 'tool_result' -%}
|
| 31 |
+
{{- "<tool_result>" ~ block['content'] ~ "</tool_result>" -}}
|
| 32 |
+
{%- endif -%}
|
| 33 |
+
{%- if not loop.last -%}{{- "\n\n" -}}{%- endif -%}
|
| 34 |
+
{%- endfor -%}
|
| 35 |
+
{%- else -%}
|
| 36 |
+
{{- message['content'] -}}
|
| 37 |
{%- endif -%}
|
| 38 |
|
| 39 |
+
{#- Fallback for raw OpenAI-style tool messages not pre-processed into user blocks -#}
|
| 40 |
+
{%- elif message['role'] == 'tool' -%}
|
| 41 |
+
{{- "<|User|><tool_result>" ~ message['content'] ~ "</tool_result>" -}}
|
| 42 |
|
| 43 |
{%- elif message['role'] == 'assistant' -%}
|
| 44 |
{{- "<|Assistant|>" -}}
|
|
|
|
| 47 |
{%- else -%}
|
| 48 |
{{- "</think>" -}}
|
| 49 |
{%- endif -%}
|
| 50 |
+
|
| 51 |
+
{%- if message['content'] -%}
|
| 52 |
+
{{- message['content'] -}}
|
| 53 |
+
{%- endif -%}
|
| 54 |
+
|
| 55 |
+
{#- 3. Handle Assistant Tool Calls using DSML tags -#}
|
| 56 |
+
{%- if message['tool_calls'] -%}
|
| 57 |
+
{{- "\n\n<|DSML|tool_calls>\n" -}}
|
| 58 |
+
{%- for tool_call in message['tool_calls'] -%}
|
| 59 |
+
{{- "<|DSML|invoke name=\"" ~ tool_call['function']['name'] ~ "\">\n" -}}
|
| 60 |
+
|
| 61 |
+
{#- Hugging Face automatically parses JSON string arguments into dictionaries. -#}
|
| 62 |
+
{%- set args = tool_call['function']['arguments'] -%}
|
| 63 |
+
{%- for key, value in args.items() -%}
|
| 64 |
+
{%- if value is string -%}
|
| 65 |
+
{{- "<|DSML|parameter name=\"" ~ key ~ "\" string=\"true\">" ~ value ~ "</|DSML|parameter>\n" -}}
|
| 66 |
+
{%- else -%}
|
| 67 |
+
{{- "<|DSML|parameter name=\"" ~ key ~ "\" string=\"false\">" ~ (value | tojson) ~ "</|DSML|parameter>\n" -}}
|
| 68 |
+
{%- endif -%}
|
| 69 |
+
{%- endfor -%}
|
| 70 |
+
|
| 71 |
+
{{- "</|DSML|invoke>\n" -}}
|
| 72 |
+
{%- endfor -%}
|
| 73 |
+
{{- "</|DSML|tool_calls>" -}}
|
| 74 |
+
{%- endif -%}
|
| 75 |
+
|
| 76 |
{%- if not loop.last -%}
|
| 77 |
{{- eos_token -}}
|
| 78 |
{%- endif -%}
|