Instructions to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF",
	filename="Qwen3.5-27B.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Ollama
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Ollama:
```
ollama run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
```

Unsloth Studio new

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF to start chatting

Pi new

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
```

Lemonade

How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF-Q4_K_M

List all available models

lemonade list

🌟 Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill

💡 Model Introduction

Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill is a reasoning model fine-tuned on top of Qwen3.5-27B.
The model is primarily optimized through high-density reasoning distillation sourced from Gemini 3.1, while also incorporating additional reasoning traces distilled from Qwen3.5-27B and a broader Gemini 3.0 Pro reasoning corpus.

Through Supervised Fine-Tuning focused on structured analytical behavior, this model aims to reshape the base model’s reasoning style into a more coherent, better-organized, and higher-density Chain-of-Thought (CoT) pattern.
It is especially designed to improve decomposition, planning, abstraction, and response cleanliness on complex multi-step tasks.

🧠 Example of Learned Reasoning Scaffold

This model inherits a more structured reasoning style influenced by Gemini 3.1-style analytical planning.
Compared with more loosely exploratory reasoning patterns, this model tends to organize the problem before answering:

My Thought Process / My Analysis of the problem:

1. Restate the task and identify the true objective.
2. Abstract the problem into a higher-level reasoning frame.
3. Identify the key mechanism, failure mode, or constraint.
4. Separate likely misconceptions from the actual core issue.
5. Plan the structure of the final response.
6. Deliver a cleaner, more direct, and higher-density answer.
.
.
.

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-27B)
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA + Reasoning Distillation
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
 │
 ▼
Final Model Text Only (Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill)

📋 Stage Details

🔹 Supervised Fine-Tuning (SFT)

Objective: Objective: To inject reasoning behavior into Qwen3.5-27B and strengthen its performance on complex analytical tasks requiring decomposition and multi-step inference.
Method: The model is trained on distilled reasoning traces collected from stronger teacher-style reasoning sources, with the goal of transferring cleaner analytical structure, stronger planning habits, and more stable task-solving behavior.
Target Behavior: Compared with a standard instruct model, the tuned model is expected to respond with more deliberate reasoning organization, reduced shallow guessing, and stronger cross-domain analytical consistency.

📚 All Datasets Used

The dataset consists of multiple reasoning distillation sources:

Dataset Name	Description / Purpose
Roman1111111/gemini-3.1-pro-hard-high-reasoning	Primary high-quality reasoning source used to shape structured analytical style, planning behavior, and dense CoT patterns.
Jackrong/Qwen3.5-reasoning-700x	Provides additional Qwen-family reasoning trajectories distilled from Qwen3.5-27B, improving style stability and complementary reasoning diversity. 🪐 (Only a small portion of this dataset was used to help avoid excessive degradation of generalization ability and mitigate catastrophic forgetting.)
Roman1111111/gemini-3-pro-10000x-hard-high-reasoning	A broader multi-domain reasoning corpus used to enhance coverage across mathematics, systems, science, law, medicine, finance, and adversarial reasoning tasks.

📊 Approximate Domain Composition (Approx|Samples|Share)

Domain	Samples	Share
Mathematics / Logic	3947	28.5%
Computer Science / Programming / Systems	3019	21.8%
Security / Adversarial Reasoning	1551	11.2%
Physics / Astronomy / Engineering	1482	10.7%
Law / Philosophy / Humanities	1191	8.6%
Biology / Medicine	817	5.9%
Finance / Economics	679	4.9%
Chemistry / Materials	540	3.9%
Applied / Social Systems (Urban Planning, Traffic, Supply Chain, etc.)	360	2.6%
Other	264	1.9%

⚠️ Distillation & Task-Specific Fine-Tuning Effects: This model has been distilled and further fine-tuned on top of the base model for reasoning-oriented tasks. These techniques may improve performance on certain specialized tasks, but they can also influence the model’s generalization ability in broader scenarios and may lead to partial forgetting of some pretraining knowledge. The extent of these effects depends on factors such as the quality, scale, and distribution of the datasets used during distillation and fine-tuning. As a result, the model’s behavior may differ from the base model across different tasks or application contexts. Users are encouraged to evaluate the model according to their specific requirements before deployment. Thank you for your understanding～

🌟 Core Skills & Capabilities

Structured Analytical Reasoning: The model is optimized to first identify the real task structure before generating an answer, rather than relying on shallow immediate completion.
Improved Multi-Step Planning: It performs more reliably on tasks requiring decomposition, constraint tracking, sequential planning, and trade-off analysis.
Cross-Domain Reasoning Strength: The training corpus provides broad reasoning coverage across math, programming, systems, physics, law, medicine, finance, chemistry, and applied domains.
Security & Adversarial Awareness: A dedicated portion of the distilled data includes adversarial, attack-defense, and failure-mode reasoning tasks, improving robustness in difficult prompts.
Compact but Strong Footprint: Built on a 27B base, the model aims to deliver significantly denser reasoning behavior and cleaner analytical output than a generic base instruct model of similar size.

⚠️ Limitations & Intended Use

Hallucination Risk: Although reasoning behavior is improved, the model remains an autoregressive LLM and may still hallucinate niche facts, citations, or unverifiable real-world details.
Reasoning Style Bias: Because the model is tuned for analytical depth, it may sometimes produce longer or more structured answers than necessary for very simple prompts.
Teacher-Style Distillation Bias: Some response behaviors reflect the reasoning style of the teacher traces used during distillation, rather than purely native behavior emerging from the base model itself.
Preview Version Notice: As a relatively specialized distilled reasoning model, surrounding inference templates, prompt formatting strategies, and ecosystem integrations may still require tuning. Users may encounter occasional compatibility differences depending on runtime or deployment stack.

🙏 Acknowledgements

Special thanks to the Qwen team for the strong base architecture, and to the broader open-source ecosystem for enabling efficient reasoning distillation workflows. We also acknowledge the value of the distilled reasoning corpora derived from Gemini 3.1 Pro, Qwen3.5, and Gemini 3 Pro, which made this model possible.

Downloads last month: 1,540

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Model tree for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF

Base model

Qwen/Qwen3.5-27B

Quantized

(195)

this model

Datasets used to train Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF

Collection including Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF

Qwen3.5-Gemini-3.1-Pro-Reasoning-Distilled

Collection

4 items • Updated about 23 hours ago • 10