Instructions to use google/gemma-3-12b-it-qat-q4_0-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-12b-it-qat-q4_0-unquantized with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-3-12b-it-qat-q4_0-unquantized") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it-qat-q4_0-unquantized") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-3-12b-it-qat-q4_0-unquantized") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use google/gemma-3-12b-it-qat-q4_0-unquantized with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-12b-it-qat-q4_0-unquantized" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-12b-it-qat-q4_0-unquantized", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/gemma-3-12b-it-qat-q4_0-unquantized
- SGLang
How to use google/gemma-3-12b-it-qat-q4_0-unquantized with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-12b-it-qat-q4_0-unquantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-12b-it-qat-q4_0-unquantized", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-12b-it-qat-q4_0-unquantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-12b-it-qat-q4_0-unquantized", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/gemma-3-12b-it-qat-q4_0-unquantized with Docker Model Runner:
docker model run hf.co/google/gemma-3-12b-it-qat-q4_0-unquantized
Persistent 401 GatedRepoError for Gemma-3 on Space despite accepted license
Dear Hugging Face Support Team / Google Model Developers,
I am experiencing a persistent authentication issue while trying to load a gated model within my Hugging Face Space.
Technical Details:
Space ID: Gigantos89/LTX-2-3-First-Last-Frame
Target Model: google/gemma-3-12b-it-qat-q4_0-unquantized
Error: huggingface_hub.errors.GatedRepoError: 401 Client Error
Current Status & Troubleshooting Performed:
License Acceptance: I have manually visited the model card and clicked "Acknowledge license." My account status for this model is officially "ACCEPTED".
Authentication in Code: My app.py uses snapshot_download(repo_id=GEMMA_REPO, token=HF_TOKEN), where HF_TOKEN is a valid secret provided in the Space settings.
Successful Downloads: Other models in the same session (e.g., Lightricks/LTX-2.3-fp8) download without issues using the same token, proving the token is functional.
Specific Error Point: The logs show the failure occurs specifically when the huggingface_hub tries to fetch the .gitattributes file for the Gemma-3 model.
Could you please verify if there is a permission sync lag between my user account and the Space’s runtime environment for this specific Google repository?
Thank you for your assistance in resolving this manufacturing bottleneck.
Best regards,
Gigantos89
Hi @Gigantos89 ,
Apologies for the late reply.
Thanks for addressing the issue!
Could you please confirm if you have checked your Hugging Face token type and permissions?
If you are using Hugging Face's newer Fine-Grained tokens, a token scoped strictly to "Read public repositories" will fail on gated repos, even if your underlying account has already accepted the license agreement.
To verify and fix this:
- Go to your Hugging Face Account Settings ----> Access Tokens.
- Check the configuration of the token you are using for HF_TOKEN.
- Ensure it is either a Classic "Read" token or a Fine-Grained token that explicitly includes permissions for "Read access to contents of all public/gated repos you have access to".
Please let us know if upgrading the token permissions resolves the 401 GatedRepoError for you!
Looks like downloads of this model have gone to zero because of the 401. I could be wrong, but I couldn't defeat it.