---
library_name: mlx-vlm
base_model: Qwen/Qwen3.5-35B-A3B
tags:
  - mlx
  - mlx-vlm
  - vision-language-model
  - qwen
  - fine-tuned
license: other
pipeline_tag: image-text-to-text
---

# FineVine Qwen3.5-35B-A3B Vision MLX V18 (6bit)

FineVine Qwen3.5-35B-A3B V18 is a fine-tuned version of `Qwen/Qwen3.5-35B-A3B` optimized for efficient local inference by reducing thinking-token usage, especially for local use on Apple silicon laptops. Significant effort was also made to reduce looping behavior.

This repository contains the **6-bit** MLX variant of FineVine Qwen3.5-35B-A3B Vision V18.

## Other Quantizations

- [4-bit](https://huggingface.co/pea-tree/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-4bit)
- [6-bit](https://huggingface.co/pea-tree/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-6bit)
- [8-bit](https://huggingface.co/pea-tree/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-8bit)

This release focuses on:

- lower looping behavior
- significant reduction in looping during function calls and reasoning
- significant reduction in the tokens needed to reach an answer
- better prompt adherence on structured tasks
- better handling of contradictions, underdetermined prompts, and no-solution cases

This repo is organized as a multi-quant MLX release so you can choose the quantization that fits your hardware. It is ideal for Macs with less than 50 GB of memory.

## Base Model

This model is based on:
- `Qwen/Qwen3.5-35B-A3B`

## What Improved

Compared with the base model, this release was tuned to improve several behavior problems at once:

- Reduced overthinking on easy and medium prompts
- Much less looping and repeated branch restarts on quantized versions
- Better prompt adherence on structured coding and reasoning tasks
- Better balance between concise answers and still giving enough detail when helpful
- Better formatting for practical answers, including selective use of lists, tables, and diagrams
- Better handling of valid logic, contradictions, and insufficient-information cases
- Better behavior on harder coding prompts with more branch coverage before writing code
- Strong tool-calling behavior on structured tasks and tool-chains
- More stable local assistant behavior overall; if you need a local model that answers quickly, this fine-tune is a strong fit.

In practice, the model tends to answer more directly, with less wasted reasoning, while still remaining capable on harder structured tasks.

## Training Data

Training data sources included:

- custom hand-written and hand-edited datasets (60%)
- manually reviewed synthetic traces (20%)
- high-quality examples and reasoning traces derived from models including Claude Opus and GLM-5 (20%)

## Quantizations

This release includes three MLX-VLM quantization options:

- `mlx-4bit`
- `mlx-6bit`
- `mlx-8bit`

## Repository Layout

```text
FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18/
  README.md
  mlx-4bit/
  mlx-6bit/
  mlx-8bit/
```

## Intended Use

This model is meant for:

- local laptop use
- direct-answer assistant tasks
- strong tool-calling and structured tool-use workflows for tools like Osaurus and OpenCode
- coding help

## Important Note

This model is **not ideal as a general chatbot or roleplaying model**.

If you want a warmer, more open-ended, highly conversational chat style, or strong roleplay behavior, this release is probably not the best fit.

Suggested model parameters:

- `temperature = 0.6`
- `top_p = 0.95`
- `top_k = 20`
- `min_p = 0.0`
- `presence_penalty = 0.0`
- `repetition_penalty = 1.0`


## LM Studio Note


If you want to turn off visible thinking in LM Studio, use:

```jinja
{%- set enable_thinking = false %}
```