Instructions to use unsloth/Llama-3.2-11B-Vision-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/Llama-3.2-11B-Vision-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="unsloth/Llama-3.2-11B-Vision-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("unsloth/Llama-3.2-11B-Vision-Instruct")
model = AutoModelForImageTextToText.from_pretrained("unsloth/Llama-3.2-11B-Vision-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use unsloth/Llama-3.2-11B-Vision-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/Llama-3.2-11B-Vision-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Llama-3.2-11B-Vision-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/Llama-3.2-11B-Vision-Instruct

SGLang

How to use unsloth/Llama-3.2-11B-Vision-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/Llama-3.2-11B-Vision-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Llama-3.2-11B-Vision-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/Llama-3.2-11B-Vision-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Llama-3.2-11B-Vision-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio new

How to use unsloth/Llama-3.2-11B-Vision-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Llama-3.2-11B-Vision-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Llama-3.2-11B-Vision-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/Llama-3.2-11B-Vision-Instruct to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/Llama-3.2-11B-Vision-Instruct",
    max_seq_length=2048,
)

Docker Model Runner
How to use unsloth/Llama-3.2-11B-Vision-Instruct with Docker Model Runner:
```
docker model run hf.co/unsloth/Llama-3.2-11B-Vision-Instruct
```

Issue Loading Llama-3.2 (11B) Vision Instruct Model in Colab

by Sampas - opened Oct 3, 2024

Discussion

Sampas

Oct 3, 2024

Hello Unslooth Team,

I am attempting to load the Llama-3.2 (11B)-Vision-Instruct model in Colab, but I am encountering the following error:

RuntimeError: The checkpoint you are trying to load has model type `mllama` but Transformers does not recognize this model type.

Could you please confirm if this model is supported on Colab, or if there are specific configurations or dependencies I might be missing? I have updated the necessary libraries, but the issue persists.

I would appreciate your assistance in resolving this.

Best regards,
[Enes Tura]

shimmyshimmer

Unsloth AI org Oct 4, 2024

•

edited Nov 21, 2024

Hey guys apologies for the delays. Vision models are now supported! :)

Read our blogpost: https://unsloth.ai/blog/vision
Tweet: https://x.com/UnslothAI/status/1859667930075758793
GitHub post: https://github.com/unslothai/unsloth/releases/tag/November-2024

Babu

Oct 9, 2024

is there any update on this?

shimmyshimmer

Unsloth AI org Oct 9, 2024

is there any update on this?

Still in the works - apologies.

eturanes

Oct 12, 2024

hello , is there any update on this model ? or can you please inform me whenever the update model comes...

shimmyshimmer

Unsloth AI org Oct 12, 2024

hello , is there any update on this model ? or can you please inform me whenever the update model comes...

Yes we will let you guys know once it's supported so we can update all the model cards many thanks!

eturanes

Oct 14, 2024

thank you :)

eturanes

Oct 20, 2024

Hello, is there any progress?

eturanes

Nov 14, 2024

Hello, is there any progress ?

shimmyshimmer

Unsloth AI org Nov 18, 2024

•

edited Nov 18, 2024

Yes. It's done and ready to go but we need to announce it so this week for sure! :) I will notify you all

shimmyshimmer

Unsloth AI org Nov 21, 2024

•

edited Nov 21, 2024

Hello, is there any progress ?

Hello, is there any progress?

Hello Unslooth Team,

I am attempting to load the Llama-3.2 (11B)-Vision-Instruct model in Colab, but I am encountering the following error:
RuntimeError: The checkpoint you are trying to load has model type `mllama` but Transformers does not recognize this model type.
Could you please confirm if this model is supported on Colab, or if there are specific configurations or dependencies I might be missing? I have updated the necessary libraries, but the issue persists.

I would appreciate your assistance in resolving this.

Best regards,
[Enes Tura]

is there any update on this?

Hey guys apologies for the delays. Vision models are now supported! :)

Read our blogpost: https://unsloth.ai/blog/vision
Tweet: https://x.com/UnslothAI/status/1859667930075758793
GitHub post: https://github.com/unslothai/unsloth/releases/tag/November-2024

eturanes

Nov 23, 2024

Thank you for the update, do I have a chance to fine-tune the 11 billion instruction version of the llama3.2 model? I don't want to do image processing. I just want to fine-tune it for the Turkish language using unsloth.

shimmyshimmer

Unsloth AI org Nov 23, 2024

Thank you for the update, do I have a chance to fine-tune the 11 billion instruction version of the llama3.2 model? I don't want to do image processing. I just want to fine-tune it for the Turkish language using unsloth.

Yes of course we allow you to do that. You can fine-tune the image and language part separately.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment