Instructions to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF", filename="Qwen3.5-27B.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
- Ollama
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Ollama:
ollama run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
- Unsloth Studio new
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF to start chatting
- Pi new
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Docker Model Runner:
docker model run hf.co/Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
- Lemonade
How to use Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF-Q4_K_M
List all available models
lemonade list
🌟 Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill
💡 Model Introduction
Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill is a reasoning model fine-tuned on top of Qwen3.5-27B.
The model is primarily optimized through high-density reasoning distillation sourced from Gemini 3.1, while also incorporating additional reasoning traces distilled from Qwen3.5-27B and a broader Gemini 3.0 Pro reasoning corpus.
Through Supervised Fine-Tuning focused on structured analytical behavior, this model aims to reshape the base model’s reasoning style into a more coherent, better-organized, and higher-density Chain-of-Thought (CoT) pattern.
It is especially designed to improve decomposition, planning, abstraction, and response cleanliness on complex multi-step tasks.
🧠 Example of Learned Reasoning Scaffold
This model inherits a more structured reasoning style influenced by Gemini 3.1-style analytical planning.
Compared with more loosely exploratory reasoning patterns, this model tends to organize the problem before answering:
My Thought Process / My Analysis of the problem:
1. Restate the task and identify the true objective.
2. Abstract the problem into a higher-level reasoning frame.
3. Identify the key mechanism, failure mode, or constraint.
4. Separate likely misconceptions from the actual core issue.
5. Plan the structure of the final response.
6. Deliver a cleaner, more direct, and higher-density answer.
.
.
.
🗺️ Training Pipeline Overview
Base Model (Qwen3.5-27B)
│
▼
Supervised Fine-Tuning (SFT) + LoRA + Reasoning Distillation
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
│
▼
Final Model Text Only (Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill)
📋 Stage Details
🔹 Supervised Fine-Tuning (SFT)
- Objective: Objective: To inject reasoning behavior into Qwen3.5-27B and strengthen its performance on complex analytical tasks requiring decomposition and multi-step inference.
- Method: The model is trained on distilled reasoning traces collected from stronger teacher-style reasoning sources, with the goal of transferring cleaner analytical structure, stronger planning habits, and more stable task-solving behavior.
- Target Behavior: Compared with a standard instruct model, the tuned model is expected to respond with more deliberate reasoning organization, reduced shallow guessing, and stronger cross-domain analytical consistency.
📚 All Datasets Used
The dataset consists of multiple reasoning distillation sources:
| Dataset Name | Description / Purpose |
|---|---|
| Roman1111111/gemini-3.1-pro-hard-high-reasoning | Primary high-quality reasoning source used to shape structured analytical style, planning behavior, and dense CoT patterns. |
| Jackrong/Qwen3.5-reasoning-700x | Provides additional Qwen-family reasoning trajectories distilled from Qwen3.5-27B, improving style stability and complementary reasoning diversity. 🪐 (Only a small portion of this dataset was used to help avoid excessive degradation of generalization ability and mitigate catastrophic forgetting.) |
| Roman1111111/gemini-3-pro-10000x-hard-high-reasoning | A broader multi-domain reasoning corpus used to enhance coverage across mathematics, systems, science, law, medicine, finance, and adversarial reasoning tasks. |
📊 Approximate Domain Composition (Approx|Samples|Share)
| Domain | Samples | Share |
|---|---|---|
| Mathematics / Logic | 3947 | 28.5% |
| Computer Science / Programming / Systems | 3019 | 21.8% |
| Security / Adversarial Reasoning | 1551 | 11.2% |
| Physics / Astronomy / Engineering | 1482 | 10.7% |
| Law / Philosophy / Humanities | 1191 | 8.6% |
| Biology / Medicine | 817 | 5.9% |
| Finance / Economics | 679 | 4.9% |
| Chemistry / Materials | 540 | 3.9% |
| Applied / Social Systems (Urban Planning, Traffic, Supply Chain, etc.) | 360 | 2.6% |
| Other | 264 | 1.9% |
⚠️ Distillation & Task-Specific Fine-Tuning Effects: This model has been distilled and further fine-tuned on top of the base model for reasoning-oriented tasks. These techniques may improve performance on certain specialized tasks, but they can also influence the model’s generalization ability in broader scenarios and may lead to partial forgetting of some pretraining knowledge. The extent of these effects depends on factors such as the quality, scale, and distribution of the datasets used during distillation and fine-tuning. As a result, the model’s behavior may differ from the base model across different tasks or application contexts. Users are encouraged to evaluate the model according to their specific requirements before deployment. Thank you for your understanding~
🌟 Core Skills & Capabilities
- Structured Analytical Reasoning: The model is optimized to first identify the real task structure before generating an answer, rather than relying on shallow immediate completion.
- Improved Multi-Step Planning: It performs more reliably on tasks requiring decomposition, constraint tracking, sequential planning, and trade-off analysis.
- Cross-Domain Reasoning Strength: The training corpus provides broad reasoning coverage across math, programming, systems, physics, law, medicine, finance, chemistry, and applied domains.
- Security & Adversarial Awareness: A dedicated portion of the distilled data includes adversarial, attack-defense, and failure-mode reasoning tasks, improving robustness in difficult prompts.
- Compact but Strong Footprint: Built on a 27B base, the model aims to deliver significantly denser reasoning behavior and cleaner analytical output than a generic base instruct model of similar size.
⚠️ Limitations & Intended Use
- Hallucination Risk: Although reasoning behavior is improved, the model remains an autoregressive LLM and may still hallucinate niche facts, citations, or unverifiable real-world details.
- Reasoning Style Bias: Because the model is tuned for analytical depth, it may sometimes produce longer or more structured answers than necessary for very simple prompts.
- Teacher-Style Distillation Bias: Some response behaviors reflect the reasoning style of the teacher traces used during distillation, rather than purely native behavior emerging from the base model itself.
- Preview Version Notice: As a relatively specialized distilled reasoning model, surrounding inference templates, prompt formatting strategies, and ecosystem integrations may still require tuning. Users may encounter occasional compatibility differences depending on runtime or deployment stack.
🙏 Acknowledgements
Special thanks to the Qwen team for the strong base architecture, and to the broader open-source ecosystem for enabling efficient reasoning distillation workflows. We also acknowledge the value of the distilled reasoning corpora derived from Gemini 3.1 Pro, Qwen3.5, and Gemini 3 Pro, which made this model possible.
- Downloads last month
- 1,540
4-bit
8-bit
Model tree for Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill-GGUF
Base model
Qwen/Qwen3.5-27B