Instructions to use humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom")
model = AutoModelForImageTextToText.from_pretrained("humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom

SGLang

How to use humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom with Docker Model Runner:
```
docker model run hf.co/humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom
```

humbleakh commited on Jun 8, 2025

Commit

7d33284

verified ·

1 Parent(s): a0a9a96

Upload 4-bit quantized Qwen2.5-VL-3B for Chain-of-Zoom

Browse files

Files changed (6) hide show

README.md +4 -4
config.json +1 -0
model.safetensors +1 -1
preprocessor_config.json +1 -1
tokenizer_config.json +1 -1
video_preprocessor_config.json +1 -1

README.md CHANGED Viewed

@@ -11,14 +11,14 @@ base_model: Qwen/Qwen2.5-VL-3B-Instruct
 license: apache-2.0
 language:
 - en
-pipeline_tag: vision-language-understanding
 ---
 # Qwen2.5-VL-3B 4-bit Quantized for Chain-of-Zoom
 ## 📋 Model Description
-4-bit quantized Vision-Language Model optimized for super-resolution prompt generation
 This model is part of the **Chain-of-Zoom 4-bit Quantized Pipeline** - a memory-optimized version of the original Chain-of-Zoom super-resolution framework.
@@ -68,11 +68,11 @@ bnb_config = BitsAndBytesConfig(
 ## 🔧 Technical Specifications
-- **Created**: 2025-06-08 16:28:34
 - **Quantization Library**: BitsAndBytes
 - **Framework**: PyTorch + Transformers
 - **Precision**: 4-bit NF4
-- **Model Size**: 2899.8801851272583 MB
 ## 📝 Citation

 license: apache-2.0
 language:
 - en
+pipeline_tag: image-text-to-text
 ---
 # Qwen2.5-VL-3B 4-bit Quantized for Chain-of-Zoom
 ## 📋 Model Description
+4-bit quantized Vision-Language Model optimized for Chain-of-Zoom super-resolution
 This model is part of the **Chain-of-Zoom 4-bit Quantized Pipeline** - a memory-optimized version of the original Chain-of-Zoom super-resolution framework.
 ## 🔧 Technical Specifications
+- **Created**: 2025-06-08 17:10:40
 - **Quantization Library**: BitsAndBytes
 - **Framework**: PyTorch + Transformers
 - **Precision**: 4-bit NF4
+- **Model Size**: 2899.8802061080933 MB
 ## 📝 Citation

config.json CHANGED Viewed

@@ -112,6 +112,7 @@
     "spatial_patch_size": 14,
     "temporal_patch_size": 2,
     "tokens_per_second": 2,
     "window_size": 112
   },
   "vision_end_token_id": 151653,

     "spatial_patch_size": 14,
     "temporal_patch_size": 2,
     "tokens_per_second": 2,
+    "torch_dtype": "bfloat16",
     "window_size": 112
   },
   "vision_end_token_id": 151653,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5e79c18e6f20fd0e15d9e522b68ab5c9357933809a1e97df2abd0082171f0afe
 size 3024861693

 version https://git-lfs.github.com/spec/v1
+oid sha256:de593e892f76e8b97f2344896ad5a1b8db248be9b92cdfde7bcd2d231dcee6a6
 size 3024861693

preprocessor_config.json CHANGED Viewed

@@ -18,7 +18,7 @@
   "merge_size": 2,
   "min_pixels": 3136,
   "patch_size": 14,
-  "processor_class": "Qwen2_5_VLProcessor",
   "resample": 3,
   "rescale_factor": 0.00392156862745098,
   "size": {

   "merge_size": 2,
   "min_pixels": 3136,
   "patch_size": 14,
+  "processor_class": "Qwen2VLProcessor",
   "resample": 3,
   "rescale_factor": 0.00392156862745098,
   "size": {

tokenizer_config.json CHANGED Viewed

@@ -201,7 +201,7 @@
   "extra_special_tokens": {},
   "model_max_length": 131072,
   "pad_token": "<|endoftext|>",
-  "processor_class": "Qwen2_5_VLProcessor",
   "split_special_tokens": false,
   "tokenizer_class": "Qwen2Tokenizer",
   "unk_token": null

   "extra_special_tokens": {},
   "model_max_length": 131072,
   "pad_token": "<|endoftext|>",
+  "processor_class": "Qwen2VLProcessor",
   "split_special_tokens": false,
   "tokenizer_class": "Qwen2Tokenizer",
   "unk_token": null

video_preprocessor_config.json CHANGED Viewed

@@ -73,7 +73,7 @@
     "merge_size"
   ],
   "patch_size": 14,
-  "processor_class": "Qwen2_5_VLProcessor",
   "resample": 3,
   "rescale_factor": 0.00392156862745098,
   "size": {

     "merge_size"
   ],
   "patch_size": 14,
+  "processor_class": "Qwen2VLProcessor",
   "resample": 3,
   "rescale_factor": 0.00392156862745098,
   "size": {