Instructions to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved")
model = AutoModelForMultimodalLM.from_pretrained("llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved

SGLang

How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with Docker Model Runner:
```
docker model run hf.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved
```

Qwopus

by TotallyNotAusCon - opened May 25

Discussion

TotallyNotAusCon

May 25

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

llmfan46

Owner May 25

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

TotallyNotAusCon

May 25

•

edited May 25

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

llmfan46

Owner May 25

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yes I can, won't do it right away because I am releasing other models and formats, will do it when I finished doing all that.

Austriani

May 25

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yes I can, won't do it right away because I am releasing other models and formats, will do it when I finished doing all that.

I don't know if you do agree, but I also would like to see IQ4_KS again like you did to Qwen3.6-35B model, if its possible. Anyways, thank you for your work and efforts.

llmfan46

Owner May 25

•

edited May 25

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yes I can, won't do it right away because I am releasing other models and formats, will do it when I finished doing all that.

I don't know if you do agree, but I also would like to see IQ4_KS again like you did to Qwen3.6-35B model, if its possible. Anyways, thank you for your work and efforts.

IQK is a very niche format, 90% of people download standard K GGUFs which is compatible with a vast array of inference engines and backends, this is extra time, extra storage room taken and extra compute for a format that can really only be used with ik_llama.cpp and Oobabooga Text Generation Web UI (with the ik_llama.cpp fork, so basically still ik_llama.cpp just with a different UI) and that is only downloaded by a handful of people, the other issue is that I don't have access to good dataset and unlike traditional/standard "K" GGUFs, IQK is a format where dataset is definitely recommended, especially for anything below IQ6_K and seeing that Qwen3.5-9B-DeepSeek-V4-Flash is a 9B model, in this case a dataset might even be recommended for IQ6_K too.

llmfan46

Owner May 26

•

edited May 26

Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.

Just for info, this model has 99/100 refusals, so it's very censored!

Austriani

May 26

•

edited May 26

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.

Just for info, this model has 99/100 refusals, so it's very censored!

Got it! Thanks for notifying me about it. Also, I want to ask you what time do you spend to finish 1 cycle in heretic program? I just tried to do my own heretic version of Qwen3.5-0.8B a long time ago, it took me at least 4 hours, I have Intel Core Ultra 7 265K and decent RAM (no dGPU). I would like to know if its okay time or Qwen3.5 models just take much longer time per cycle.

llmfan46

Owner May 26

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.

Just for info, this model has 99/100 refusals, so it's very censored!

Got it! Thanks for notifying me about it. Also, I want to ask you what time do you spend to finish 1 cycle in heretic program? I just tried to do my own heretic version of Qwen3.5-0.8B a long time ago, it took me at least 4 hours, I have Intel Core Ultra 7 265K and decent RAM (no dGPU). I would like to know if its okay time or Qwen3.5 models just take much longer time per cycle.

I dunno, I was never able to do any models that wasn't entirely on VRAM, I tried to do bigger models like 70B, 72B or 122B who spill over on RAM and then I get a message about something about "meta device" and then the process just hangs there and seemingly makes no progress, this is why almost all of my models are 26B-35B, there is a big gap with the current generation where models are either 26B-35B and then it suddenly jumps to 119B-122B (70B and 72B models are old generation, like Qwen2.5-VL-72B-Instruct and Llama-3.3-70B-Instruct).

Austriani

May 27

Hey TotallyNotAusCon,

It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:

https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash

But not the GGUFs.

Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.

P.S.: If you can't make MTP, just make it without it.

Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.

Just for info, this model has 99/100 refusals, so it's very censored!

Got it! Thanks for notifying me about it. Also, I want to ask you what time do you spend to finish 1 cycle in heretic program? I just tried to do my own heretic version of Qwen3.5-0.8B a long time ago, it took me at least 4 hours, I have Intel Core Ultra 7 265K and decent RAM (no dGPU). I would like to know if its okay time or Qwen3.5 models just take much longer time per cycle.

I dunno, I was never able to do any models that wasn't entirely on VRAM, I tried to do bigger models like 70B, 72B or 122B who spill over on RAM and then I get a message about something about "meta device" and then the process just hangs there and seemingly makes no progress, this is why almost all of my models are 26B-35B, there is a big gap with the current generation where models are either 26B-35B and then it suddenly jumps to 119B-122B (70B and 72B models are old generation, like Qwen2.5-VL-72B-Instruct and Llama-3.3-70B-Instruct).

Sorry for not responding for so long.

I would want to know how much time does it take for you to do heretic version entirely on VRAM, like small models and large models. I just want to compare your GPU to my CPU speed in doing heretic versions. By the way, do you find Qwen models taking more time than any other to do heretic version of? Just asking to know if its Qwen or my CPU power problems.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment