How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Xingyu-Zheng/gemma-4-31B-it-int8-foem")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Xingyu-Zheng/gemma-4-31B-it-int8-foem")
model = AutoModelForMultimodalLM.from_pretrained("Xingyu-Zheng/gemma-4-31B-it-int8-foem")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

⚠️ Warning

The 31B version may encounter an error at language_model.model.layers.5.self_attn.qkv_proj when deployed with vLLM.

This issue appears to originate from GPTQModel, as it does not occur in the E2B version. We are currently investigating and working on a fix.

This is an unofficial quantized version of google/gemma-4-31B-it.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Downloads last month
481
Safetensors
Model size
31B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xingyu-Zheng/gemma-4-31B-it-int8-foem

Quantized
(243)
this model

Dataset used to train Xingyu-Zheng/gemma-4-31B-it-int8-foem

Collection including Xingyu-Zheng/gemma-4-31B-it-int8-foem