Instructions to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF

Run Hermes

hermes

MLX LM

How to use inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "inferencerlabs/DeepSeek-V4-Flash-MLX-2.8bit-INF",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

inferencerlabs commited on 4 days ago

Commit

82db41e

verified ·

1 Parent(s): 915592b

Update chat_template.jinja

Browse files

Files changed (1) hide show

chat_template.jinja +58 -8

chat_template.jinja CHANGED Viewed

@@ -4,16 +4,41 @@
     {{- "Reasoning Effort: Absolute maximum with no shortcuts permitted.\nYou MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.\nExplicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked.\n\n" -}}
 {%- endif -%}
 {%- for message in messages -%}
-    {%- if message['role'] == 'system' -%}
-        {{- message['content'] -}}
-        {%- if tools -%}
-            {{- "\n\n## Tools\n\n(Standard Tool Header logic here...)" -}}
-            {%- for tool in tools -%}{{- tool | tojson -}}{{- "\n" -}}{%- endfor -%}
         {%- endif -%}
-    {%- elif message['role'] == 'user' or message['role'] == 'developer' -%}
-        {{- "<｜User｜>" -}}{{- message['content'] -}}
     {%- elif message['role'] == 'assistant' -%}
         {{- "<｜Assistant｜>" -}}
@@ -22,7 +47,32 @@
         {%- else -%}
             {{- "</think>" -}}
         {%- endif -%}
-        {{- message['content'] -}}
         {%- if not loop.last -%}
             {{- eos_token -}}
         {%- endif -%}

     {{- "Reasoning Effort: Absolute maximum with no shortcuts permitted.\nYou MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.\nExplicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked.\n\n" -}}
 {%- endif -%}
+{#- System Prompt -#}
+{%- if messages|length > 0 and messages[0]['role'] == 'system' -%}
+    {{- messages[0]['content'] -}}
+{%- endif -%}
+{#- Tools Template -#}
+{%- if tools -%}
+    {{- "\n\n## Tools\n\nYou have access to a set of tools to help answer the user's question. You can invoke tools by writing a \"<｜DSML｜tool_calls>\" block like the following:\n\n<｜DSML｜tool_calls>\n<｜DSML｜invoke name=\"$TOOL_NAME\">\n<｜DSML｜parameter name=\"$PARAMETER_NAME\" string=\"true|false\">$PARAMETER_VALUE</｜DSML｜parameter>\n...\n</｜DSML｜invoke>\n<｜DSML｜invoke name=\"$TOOL_NAME2\">\n...\n</｜DSML｜invoke>\n</｜DSML｜tool_calls>\n\nString parameters should be specified as is and set `string=\"true\"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string=\"false\"`.\n\nIf thinking_mode is enabled (triggered by <think>), you MUST output your complete reasoning inside <think>...</think> BEFORE any tool calls or final response.\n\nOtherwise, output directly after </think> with tool calls or final response.\n\n### Available Tool Schemas\n\n" -}}
+    {%- for tool in tools -%}
+        {{- tool | tojson -}}{{- "\n" -}}
+    {%- endfor -%}
+    {{- "\nYou MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls." -}}
+{%- endif -%}
 {%- for message in messages -%}
+    {%- if message['role'] == 'user' or message['role'] == 'developer' -%}
+        {{- "<｜User｜>" -}}
+        {#- 2a. Handle content blocks if preprocessed (for tool results merged into user) -#}
+        {%- if message['content_blocks'] -%}
+            {%- for block in message['content_blocks'] -%}
+                {%- if block['type'] == 'text' -%}
+                    {{- block['text'] -}}
+                {%- elif block['type'] == 'tool_result' -%}
+                    {{- "<tool_result>" ~ block['content'] ~ "</tool_result>" -}}
+                {%- endif -%}
+                {%- if not loop.last -%}{{- "\n\n" -}}{%- endif -%}
+            {%- endfor -%}
+        {%- else -%}
+            {{- message['content'] -}}
         {%- endif -%}
+    {#- Fallback for raw OpenAI-style tool messages not pre-processed into user blocks -#}
+    {%- elif message['role'] == 'tool' -%}
+        {{- "<｜User｜><tool_result>" ~ message['content'] ~ "</tool_result>" -}}
     {%- elif message['role'] == 'assistant' -%}
         {{- "<｜Assistant｜>" -}}
         {%- else -%}
             {{- "</think>" -}}
         {%- endif -%}
+        {%- if message['content'] -%}
+            {{- message['content'] -}}
+        {%- endif -%}
+        {#- 3. Handle Assistant Tool Calls using DSML tags -#}
+        {%- if message['tool_calls'] -%}
+            {{- "\n\n<｜DSML｜tool_calls>\n" -}}
+            {%- for tool_call in message['tool_calls'] -%}
+                {{- "<｜DSML｜invoke name=\"" ~ tool_call['function']['name'] ~ "\">\n" -}}
+                {#- Hugging Face automatically parses JSON string arguments into dictionaries. -#}
+                {%- set args = tool_call['function']['arguments'] -%}
+                {%- for key, value in args.items() -%}
+                    {%- if value is string -%}
+                        {{- "<｜DSML｜parameter name=\"" ~ key ~ "\" string=\"true\">" ~ value ~ "</｜DSML｜parameter>\n" -}}
+                    {%- else -%}
+                        {{- "<｜DSML｜parameter name=\"" ~ key ~ "\" string=\"false\">" ~ (value | tojson) ~ "</｜DSML｜parameter>\n" -}}
+                    {%- endif -%}
+                {%- endfor -%}
+                {{- "</｜DSML｜invoke>\n" -}}
+            {%- endfor -%}
+            {{- "</｜DSML｜tool_calls>" -}}
+        {%- endif -%}
         {%- if not loop.last -%}
             {{- eos_token -}}
         {%- endif -%}