Instructions to use google/gemma-3-12b-it-qat-q4_0-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3-12b-it-qat-q4_0-unquantized with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-3-12b-it-qat-q4_0-unquantized")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it-qat-q4_0-unquantized")
model = AutoModelForMultimodalLM.from_pretrained("google/gemma-3-12b-it-qat-q4_0-unquantized")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use google/gemma-3-12b-it-qat-q4_0-unquantized with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3-12b-it-qat-q4_0-unquantized"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-12b-it-qat-q4_0-unquantized",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3-12b-it-qat-q4_0-unquantized

SGLang

How to use google/gemma-3-12b-it-qat-q4_0-unquantized with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3-12b-it-qat-q4_0-unquantized" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-12b-it-qat-q4_0-unquantized",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3-12b-it-qat-q4_0-unquantized" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-12b-it-qat-q4_0-unquantized",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3-12b-it-qat-q4_0-unquantized with Docker Model Runner:
```
docker model run hf.co/google/gemma-3-12b-it-qat-q4_0-unquantized
```

Persistent 401 GatedRepoError for Gemma-3 on Space despite accepted license

by Gigantos89 - opened May 12

Discussion

Gigantos89

May 12

Dear Hugging Face Support Team / Google Model Developers,

I am experiencing a persistent authentication issue while trying to load a gated model within my Hugging Face Space.

Technical Details:

Space ID: Gigantos89/LTX-2-3-First-Last-Frame

Target Model: google/gemma-3-12b-it-qat-q4_0-unquantized

Error: huggingface_hub.errors.GatedRepoError: 401 Client Error

Current Status & Troubleshooting Performed:

License Acceptance: I have manually visited the model card and clicked "Acknowledge license." My account status for this model is officially "ACCEPTED".

Authentication in Code: My app.py uses snapshot_download(repo_id=GEMMA_REPO, token=HF_TOKEN), where HF_TOKEN is a valid secret provided in the Space settings.

Successful Downloads: Other models in the same session (e.g., Lightricks/LTX-2.3-fp8) download without issues using the same token, proving the token is functional.

Specific Error Point: The logs show the failure occurs specifically when the huggingface_hub tries to fetch the .gitattributes file for the Gemma-3 model.

Could you please verify if there is a permission sync lag between my user account and the Space’s runtime environment for this specific Google repository?

Thank you for your assistance in resolving this manufacturing bottleneck.

Best regards,
Gigantos89

thnamratha

Google org 25 days ago

Hi @Gigantos89 ,

Apologies for the late reply.
Thanks for addressing the issue!

Could you please confirm if you have checked your Hugging Face token type and permissions?
If you are using Hugging Face's newer Fine-Grained tokens, a token scoped strictly to "Read public repositories" will fail on gated repos, even if your underlying account has already accepted the license agreement.

To verify and fix this:

Go to your Hugging Face Account Settings ----> Access Tokens.
Check the configuration of the token you are using for HF_TOKEN.
Ensure it is either a Classic "Read" token or a Fine-Grained token that explicitly includes permissions for "Read access to contents of all public/gated repos you have access to".

Please let us know if upgrading the token permissions resolves the 401 GatedRepoError for you!

armour27

9 days ago

Looks like downloads of this model have gone to zero because of the 401. I could be wrong, but I couldn't defeat it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment