Instructions to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF",
	filename="Qwen3.5-27B-heretic-v2-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Ollama
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Ollama:
```
ollama run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
```

Unsloth Studio

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF to start chatting

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Docker Model Runner:
```
docker model run hf.co/llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M
```

Lemonade

How to use llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull llmfan46/Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.5-27B-ultra-uncensored-heretic-v1-GGUF-Q4_K_M

List all available models

lemonade list

Imatrix GGUF

by Austriani - opened Mar 6

Discussion

Austriani

Mar 6

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

llmfan46

Owner Mar 6

No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision).

Austriani

Mar 6

•

edited Mar 6

No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision)

Okay, thanks.

Austriani changed discussion status to closed Mar 6

llmfan46

Owner Mar 6

No, what I can do is Safetensors, GGUF, GPTQ (4-bit precision and 8-precision) and maybe AWQ (4-bit precision)

Okay, thanks.

Yeah sorry, I have a full plate already as they say, I can't add more workload on top of everything.

llmfan46

Owner Mar 8

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.

Austriani

Mar 8

•

edited Mar 8

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.

Sorry for answering so late. I'm interested of course, but I actually got an idea to use IQK quantizations, these are special quants for ik_llama.cpp (llama.cpp fork). There is only 1 popular IQK creator - ubergaem, if you would do it as well, I would appreciate it a lot.

I would actually do my own imatrix/IQK quants if I had a GPU, but renting GPU in my country costs 4x than in country like USA, because of different salaries.

If you want to know what model I want you to quantize - your Qwen3.5-27B-Heretic-v2, or if you want, different Qwen3.5-27B-Heretic model.

You can search on the internet or ask any AI model how to make IQK quantization, because its a bit long process. Anyways, if you don't want to dig into it, I wouldn't be against basic imatrix quantization, its good as well. If you not doing IQK quants, then I would prefer IQ4_XS quantization.

llmfan46

Owner Mar 9

•

edited Mar 9

Hello, I was wondering if you are thinking about adding imatrix quantizations. I would like if you do make imatrix quantizations for <10B AI models (of course, only if your GPU allows you doing it). There is some public imatrix dataset if you need an example https://github.com/ggerganov/llama.cpp/files/15440637/groups_merged-enhancedV3.txt

Okay, I have a bit of free time, I am working on it, are you still interested? If so, let me know which models you want imatrix GGUF of.

Sorry for answering so late. I'm interested of course, but I actually got an idea to use IQK quantizations, these are special quants for ik_llama.cpp (llama.cpp fork). There is only 1 popular IQK creator - ubergaem, if you would do it as well, I would appreciate it a lot.

I would actually do my own imatrix/IQK quants if I had a GPU, but renting GPU in my country costs 4x than in country like USA, because of different salaries.

If you want to know what model I want you to quantize - your Qwen3.5-27B-Heretic-v2, or if you want, different Qwen3.5-27B-Heretic model.

You can search on the internet or ask any AI model how to make IQK quantization, because its a bit long process. Anyways, if you don't want to dig into it, I wouldn't be against basic imatrix quantization, its good as well. If you not doing IQK quants, then I would prefer IQ4_XS quantization.

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

llmfan46 changed discussion status to open Mar 9

Austriani

Mar 9

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Austriani

Mar 9

•

edited Mar 9

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).

Anyways, thank you once again for making IQK quantization!

llmfan46

Owner Mar 9

•

edited Mar 9

Anyways, thank you once again for making IQK quantization!

No problem, but keep in mind I am still building the tools needed for this and I have to fulfill other people's requests, so if you can wait a few days you will have IQK, in the meantime I hop you can make use of the imatrix Quantizations that I just posted.

llmfan46

Owner Mar 9

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).

Anyways, thank you once again for making IQK quantization!

What IQK quant are you looking for? Just wondering because you did not mention it before.

llmfan46

Owner Mar 10

•

edited Mar 10

Here's the imatrix version that you asked: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-i1-GGUF

Still working on IQK (requires building ik_llama.cpp).

If you find my work useful, consider supporting me on Patreon: https://patreon.com/LLMfan46

Thank you for making imatrix version! I think I will download it soon.

By the way, I think i'm going to try to make Thireus quantization for this AI model if my system allows me.

Update: I didn't read that you are making IQK. I won't even try it (both because you are making it and that my resources probably unsuffiecent).

Anyways, thank you once again for making IQK quantization!

Since you didn't specify which quant type, I went ahead and made IQ4_K and IQ4_KSS. Hopefully one of these is what you were looking for, if not, let me know and I can make the one you need: https://huggingface.co/llmfan46/Qwen3.5-27B-heretic-v2-IQK-GGUF

If you like my work and find the releases helpful, consider subscribing to my Patreon (https://patreon.com/LLMfan46) or sending me a tip on Ko-Fi (https://ko-fi.com/llmfan46), your support helps cover compute costs and motivates more releases!

llmfan46 changed discussion status to closed Mar 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment