Instructions to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved") model = AutoModelForMultimodalLM.from_pretrained("llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved
- SGLang
How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved with Docker Model Runner:
docker model run hf.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved
Qwopus
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yes I can, won't do it right away because I am releasing other models and formats, will do it when I finished doing all that.
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yes I can, won't do it right away because I am releasing other models and formats, will do it when I finished doing all that.
I don't know if you do agree, but I also would like to see IQ4_KS again like you did to Qwen3.6-35B model, if its possible. Anyways, thank you for your work and efforts.
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yes I can, won't do it right away because I am releasing other models and formats, will do it when I finished doing all that.
I don't know if you do agree, but I also would like to see IQ4_KS again like you did to Qwen3.6-35B model, if its possible. Anyways, thank you for your work and efforts.
IQK is a very niche format, 90% of people download standard K GGUFs which is compatible with a vast array of inference engines and backends, this is extra time, extra storage room taken and extra compute for a format that can really only be used with ik_llama.cpp and Oobabooga Text Generation Web UI (with the ik_llama.cpp fork, so basically still ik_llama.cpp just with a different UI) and that is only downloaded by a handful of people, the other issue is that I don't have access to good dataset and unlike traditional/standard "K" GGUFs, IQK is a format where dataset is definitely recommended, especially for anything below IQ6_K and seeing that Qwen3.5-9B-DeepSeek-V4-Flash is a 9B model, in this case a dataset might even be recommended for IQ6_K too.
Hey, llmfan46! Can you do heretic version of https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF? I'd like you to preserve MTP, thanks!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.
Just for info, this model has 99/100 refusals, so it's very censored!
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.
Just for info, this model has 99/100 refusals, so it's very censored!
Got it! Thanks for notifying me about it. Also, I want to ask you what time do you spend to finish 1 cycle in heretic program? I just tried to do my own heretic version of Qwen3.5-0.8B a long time ago, it took me at least 4 hours, I have Intel Core Ultra 7 265K and decent RAM (no dGPU). I would like to know if its okay time or Qwen3.5 models just take much longer time per cycle.
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.
Just for info, this model has 99/100 refusals, so it's very censored!
Got it! Thanks for notifying me about it. Also, I want to ask you what time do you spend to finish 1 cycle in heretic program? I just tried to do my own heretic version of Qwen3.5-0.8B a long time ago, it took me at least 4 hours, I have Intel Core Ultra 7 265K and decent RAM (no dGPU). I would like to know if its okay time or Qwen3.5 models just take much longer time per cycle.
I dunno, I was never able to do any models that wasn't entirely on VRAM, I tried to do bigger models like 70B, 72B or 122B who spill over on RAM and then I get a message about something about "meta device" and then the process just hangs there and seemingly makes no progress, this is why almost all of my models are 26B-35B, there is a big gap with the current generation where models are either 26B-35B and then it suddenly jumps to 119B-122B (70B and 72B models are old generation, like Qwen2.5-VL-72B-Instruct and Llama-3.3-70B-Instruct).
Hey TotallyNotAusCon,
It's in NVFP4 format and multiple GGUFs, I can only work with the original BF16 Safetensors, like this one:
https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
But not the GGUFs.
Yeah, I attached the wrong link, sorry. Use original safetensors, tell me if you can preserve MTP, please.
P.S.: If you can't make MTP, just make it without it.
Yeah sorry, I tried with MPOA and it doesn't seem like it's gonna be possible, getting terrible results like 98/100, 99/100, the best result so far is 92/100, the other issue is that like with all Jackrong's models that I tested, the uncensoring process is insanely slow, this 9B model has an ETA of 39 hours! So even if I could maybe get better results with ARA, I don't feel like wasting 39 hours of compute on this one 9B model.
Just for info, this model has 99/100 refusals, so it's very censored!
Got it! Thanks for notifying me about it. Also, I want to ask you what time do you spend to finish 1 cycle in heretic program? I just tried to do my own heretic version of Qwen3.5-0.8B a long time ago, it took me at least 4 hours, I have Intel Core Ultra 7 265K and decent RAM (no dGPU). I would like to know if its okay time or Qwen3.5 models just take much longer time per cycle.
I dunno, I was never able to do any models that wasn't entirely on VRAM, I tried to do bigger models like 70B, 72B or 122B who spill over on RAM and then I get a message about something about "meta device" and then the process just hangs there and seemingly makes no progress, this is why almost all of my models are 26B-35B, there is a big gap with the current generation where models are either 26B-35B and then it suddenly jumps to 119B-122B (70B and 72B models are old generation, like Qwen2.5-VL-72B-Instruct and Llama-3.3-70B-Instruct).
Sorry for not responding for so long.
I would want to know how much time does it take for you to do heretic version entirely on VRAM, like small models and large models. I just want to compare your GPU to my CPU speed in doing heretic versions. By the way, do you find Qwen models taking more time than any other to do heretic version of? Just asking to know if its Qwen or my CPU power problems.