Instructions to use ubergarm/Qwen3.5-27B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ubergarm/Qwen3.5-27B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ubergarm/Qwen3.5-27B-GGUF", filename="Qwen3.5-27B-IQ5_KS.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use ubergarm/Qwen3.5-27B-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL # Run inference directly in the terminal: llama cli -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL # Run inference directly in the terminal: llama cli -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL # Run inference directly in the terminal: ./llama-cli -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL # Run inference directly in the terminal: ./build/bin/llama-cli -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Use Docker
docker model run hf.co/ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
- LM Studio
- Jan
- vLLM
How to use ubergarm/Qwen3.5-27B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ubergarm/Qwen3.5-27B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ubergarm/Qwen3.5-27B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
- Ollama
How to use ubergarm/Qwen3.5-27B-GGUF with Ollama:
ollama run hf.co/ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
- Unsloth Studio
How to use ubergarm/Qwen3.5-27B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ubergarm/Qwen3.5-27B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ubergarm/Qwen3.5-27B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ubergarm/Qwen3.5-27B-GGUF to start chatting
- Pi
How to use ubergarm/Qwen3.5-27B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ubergarm/Qwen3.5-27B-GGUF:IQ4_NL" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ubergarm/Qwen3.5-27B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use ubergarm/Qwen3.5-27B-GGUF with Docker Model Runner:
docker model run hf.co/ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
- Lemonade
How to use ubergarm/Qwen3.5-27B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ubergarm/Qwen3.5-27B-GGUF:IQ4_NL
Run and chat with the model
lemonade run user.Qwen3.5-27B-GGUF-IQ4_NL
List all available models
lemonade list
Unofficial models
Hello, I want to ask you if you can make IQ4_KSS quantization for unofficial AI models. As I see you are the only one who are making IQK quantization on Hugging Face. I want someone to make IQ4_KSS (or IQ4_KT, but as I see you doing only IQK quants) for https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2
If you have 100GB of free disk space, you can quantize it yourself pretty quickly to any recipe you like including copy pasting my "secret recipe" for the 27B here and adjusting to IQ4_KSS or IQ4_KT etc. Both are very nice for dense models full GPU offload. I do like KT "trellis" quants for low BPW but find IQ4_KSS is more generally applicable for CPU inference and is same 4.0 BPW with similar PPL/KLD stats.
Basically:
- download full bf16 safetensors from original repo
- use mainline llama.cpp
convert_hf_to_gguf.py - You can use the imatrix from here: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF/resolve/main/mmproj-Qwen3.5-122B-A10B-BF16.gguf
- Just convert the
ggufimatrix file to.datformat using mainline'sllama-imatrixto convert it so that ik_llama.cpp can use it. - run
llama-quantizeusing the .dat imatrix file and my 'secret recipe' adjusted to your liking
It doesn't take any VRAM to do this and even on a modest gaming rig won't take too long and probably less than 32GB RAM total.
Let me know if you need any help!
If you have 100GB of free disk space, you can quantize it yourself pretty quickly to any recipe you like including copy pasting my "secret recipe" for the 27B here and adjusting to IQ4_KSS or IQ4_KT etc. Both are very nice for dense models full GPU offload. I do like KT "trellis" quants for low BPW but find IQ4_KSS is more generally applicable for CPU inference and is same 4.0 BPW with similar PPL/KLD stats.
Basically:
- download full bf16 safetensors from original repo
- use mainline llama.cpp
convert_hf_to_gguf.py- You can use the imatrix from here: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF/resolve/main/mmproj-Qwen3.5-122B-A10B-BF16.gguf
- Just convert the
ggufimatrix file to.datformat using mainline'sllama-imatrixto convert it so that ik_llama.cpp can use it.- run
llama-quantizeusing the .dat imatrix file and my 'secret recipe' adjusted to your likingIt doesn't take any VRAM to do this and even on a modest gaming rig won't take too long and probably less than 32GB RAM total.
Let me know if you need any help!
Thank you! Do you think 32GB RAM is enough? I got only Intel Core Ultra 7 265K and 32GB 6800 MHz RAM, no dGPU. Do you think can I try to make quants for this model?
Anyways, thank you for your help, I think I will try to do it anyways.
I make all my quants with exactly 0 vram haha...
Yes, the most demanding (in terms of hardware and RAM) is generating the imatrix from the full size bf16 as you must be able to inference with that. But if someone else has made the imatrix for you, then the llama-quantize itself takes very little resources by comparison. A fast nvme drive is nice for the disk i/o, but if you're patient it should be fine.
If you want a very high level overview of the process you can check my recent talk: https://blog.aifoundry.org/p/adventures-in-model-quantization and i have a very old quant cookers guide (out of date) here: https://github.com/ikawrakow/ik_llama.cpp/discussions/434
I have a few commands and such too in my recent logs/ folders for more updated commands.
Let me know if you get stuck on any part, good luck!