YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
BioGPT INT8 Quantized for Medical Feature Extraction
This is an INT8 quantized version of Microsoft's BioGPT for CPU inference.
Quick Start
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load base model and apply quantization
tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16)
model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
model.eval()
# Use for inference
prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model Details
- Base: microsoft/biogpt
- Quantization: INT8 dynamic
- Size: ~85MB (vs 1.56GB original)
- Optimized for: CPU inference
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support