Instructions to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF", filename="Qwen3.5-27B-heretic-v2-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Use Docker
docker model run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
- Ollama
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Ollama:
ollama run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
- Unsloth Studio
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF to start chatting
- Pi
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Docker Model Runner:
docker model run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
- Lemonade
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF-Q4_K_M
List all available models
lemonade list
Imatrix GGUF
Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt
No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision).
No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision)
Okay, thanks.
No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision)
Okay, thanks.
Yeah sorry, I have a full plate already as they say, I can't add more workload on top of everything.
Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt
Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.
Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt
Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.
Sorry for answering so late. I'm interested of course, but I actually got an idea to use IQK quantizations, these are special quants for ik_llama.cpp (llama.cpp fork). There is only 1 popular IQK creator - ubergaem, if you would do it as well, I would appreciate it a lot.
I would actually do my own imatrix/IQK quants if I had a GPU, but renting GPU in my country costs 4x than in country like USA, because of different salaries.
If you want to know what model I want you to quantize - your Qwen3.5-27B-Heretic-v2, or if you want, different Qwen3.5-27B-Heretic model.
You can search on the internet or ask any AI model how to make IQK quantization, because its a bit long process. Anyways, if you don't want to dig into it, I wouldn't be against basic imatrix quantization, its good as well. If you not doing IQK quants, then I would prefer IQ4_XS quantization.
Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt
Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.
Sorry for answering so late. I'm interested of course, but I actually got an idea to use IQK quantizations, these are special quants for ik_llama.cpp (llama.cpp fork). There is only 1 popular IQK creator - ubergaem, if you would do it as well, I would appreciate it a lot.
I would actually do my own imatrix/IQK quants if I had a GPU, but renting GPU in my country costs 4x than in country like USA, because of different salaries.
If you want to know what model I want you to quantize - your Qwen3.5-27B-Heretic-v2, or if you want, different Qwen3.5-27B-Heretic model.
You can search on the internet or ask any AI model how to make IQK quantization, because its a bit long process. Anyways, if you don't want to dig into it, I wouldn't be against basic imatrix quantization, its good as well. If you not doing IQK quants, then I would prefer IQ4_XS quantization.
Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF
Still working on IQK (requires building ik_llama.cpp).
If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46
Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF
Still working on IQK (requires building ik_llama.cpp).
If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46
Thank you for making imatrix version! I think I will download it soon.
By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.
Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF
Still working on IQK (requires building ik_llama.cpp).
If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46
Thank you for making imatrix version! I think I will download it soon.
By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.
Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).
Anyways, thank you once again for making IQK quantization!
Anyways, thank you once again for making IQK quantization!
No problem, but keep in mind I am still building the tools needed for this and I have to fulfill other people's requests, so if you can wait a few days you will have IQK, in the meantime I hop you can make use of the imatrix Quantizations that I just posted.
Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF
Still working on IQK (requires building ik_llama.cpp).
If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46
Thank you for making imatrix version! I think I will download it soon.
By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.
Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).
Anyways, thank you once again for making IQK quantization!
What IQK quant are you looking for? Just wondering because you did not mention it before.
Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF
Still working on IQK (requires building ik_llama.cpp).
If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46
Thank you for making imatrix version! I think I will download it soon.
By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.
Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).
Anyways, thank you once again for making IQK quantization!
Since you didn't specify which quant type, I went ahead and made IQ4_K and IQ4_KSS. Hopefully one of these is what you were looking for, if not, let me know and I can make the one you need: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-IQK-GGUF
If you like my work and find the releases helpful, consider subscribing to my Patreon (https://patreon.com/LLMfan46) or sending me a tip on Ko-Fi (https://ko-fi.com/llmfan46), your support helps cover compute costs and motivates more releases!