How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ig1/medgemma-27b-text-it-FP8-Dynamic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ig1/medgemma-27b-text-it-FP8-Dynamic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/ig1/medgemma-27b-text-it-FP8-Dynamic
Quick Links

MedGemma-27B-Text-IT-FP8-Dynamic

Overview

MedGemma-27B-Text-IT-FP8-Dynamic is an FP8 Dynamic–quantized derivative of Google’s MedGemma-27B-Text-IT model, optimized for high-throughput inference while preserving strong performance on medical and biomedical instruction-tuned text-only tasks.

This version is intended for vLLM deployment on modern NVIDIA GPUs and follows a conservative FP8 Dynamic quantization strategy designed for maximum stability.


Base Model

  • Base model: google/medgemma-27b-text-it
  • Architecture: Decoder-only Transformer (instruction-tuned)
  • Domain: Medical / Biomedical NLP
  • Modality: Text-only

Quantization Details

  • Method: FP8 Dynamic
  • Tooling: llmcompressor
  • Quantized layers: Linear layers
  • Excluded components:
    • lm_head

Rationale

  • FP8 Dynamic reduces VRAM usage and improves inference throughput.
  • Excluding lm_head preserves output stability.
  • The resulting model is fully compatible with vLLM.

Weights are already quantized — do not apply runtime quantization.


Intended Use

  • Medical and biomedical instruction-following
  • Clinical text summarization
  • Medical RAG pipelines
  • Decision-support and research assistance

Deployment (vLLM)

Recommended

vllm serve ig1/medgemma-27b-text-it-FP8-Dynamic \
  --served-model-name medgemma-27b-text-it-fp8 \
  --dtype auto
Downloads last month
705
Safetensors
Model size
28B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ig1/medgemma-27b-text-it-FP8-Dynamic

Quantized
(26)
this model