--- library_name: mlx-vlm base_model: Qwen/Qwen3.5-35B-A3B tags: - mlx - mlx-vlm - vision-language-model - qwen - fine-tuned license: other pipeline_tag: image-text-to-text --- # FineVine Qwen3.5-35B-A3B Vision MLX V18 (6bit) FineVine Qwen3.5-35B-A3B V18 is a fine-tuned version of `Qwen/Qwen3.5-35B-A3B` optimized for efficient local inference by reducing thinking-token usage, especially for local use on Apple silicon laptops. Significant effort was also made to reduce looping behavior. This repository contains the **6-bit** MLX variant of FineVine Qwen3.5-35B-A3B Vision V18. ## Other Quantizations - [4-bit](https://huggingface.co/pea-tree/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-4bit) - [6-bit](https://huggingface.co/pea-tree/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-6bit) - [8-bit](https://huggingface.co/pea-tree/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-8bit) This release focuses on: - lower looping behavior - significant reduction in looping during function calls and reasoning - significant reduction in the tokens needed to reach an answer - better prompt adherence on structured tasks - better handling of contradictions, underdetermined prompts, and no-solution cases This repo is organized as a multi-quant MLX release so you can choose the quantization that fits your hardware. It is ideal for Macs with less than 50 GB of memory. ## Base Model This model is based on: - `Qwen/Qwen3.5-35B-A3B` ## What Improved Compared with the base model, this release was tuned to improve several behavior problems at once: - Reduced overthinking on easy and medium prompts - Much less looping and repeated branch restarts on quantized versions - Better prompt adherence on structured coding and reasoning tasks - Better balance between concise answers and still giving enough detail when helpful - Better formatting for practical answers, including selective use of lists, tables, and diagrams - Better handling of valid logic, contradictions, and insufficient-information cases - Better behavior on harder coding prompts with more branch coverage before writing code - Strong tool-calling behavior on structured tasks and tool-chains - More stable local assistant behavior overall; if you need a local model that answers quickly, this fine-tune is a strong fit. In practice, the model tends to answer more directly, with less wasted reasoning, while still remaining capable on harder structured tasks. ## Training Data Training data sources included: - custom hand-written and hand-edited datasets (60%) - manually reviewed synthetic traces (20%) - high-quality examples and reasoning traces derived from models including Claude Opus and GLM-5 (20%) ## Quantizations This release includes three MLX-VLM quantization options: - `mlx-4bit` - `mlx-6bit` - `mlx-8bit` ## Repository Layout ```text FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18/ README.md mlx-4bit/ mlx-6bit/ mlx-8bit/ ``` ## Intended Use This model is meant for: - local laptop use - direct-answer assistant tasks - strong tool-calling and structured tool-use workflows for tools like Osaurus and OpenCode - coding help ## Important Note This model is **not ideal as a general chatbot or roleplaying model**. If you want a warmer, more open-ended, highly conversational chat style, or strong roleplay behavior, this release is probably not the best fit. Suggested model parameters: - `temperature = 0.6` - `top_p = 0.95` - `top_k = 20` - `min_p = 0.0` - `presence_penalty = 0.0` - `repetition_penalty = 1.0` ## LM Studio Note If you want to turn off visible thinking in LM Studio, use: ```jinja {%- set enable_thinking = false %} ```