YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

BioGPT INT8 Quantized for Medical Feature Extraction

This is an INT8 quantized version of Microsoft's BioGPT for CPU inference.

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model and apply quantization
tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16)
model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
model.eval()

# Use for inference
prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

  • Base: microsoft/biogpt
  • Quantization: INT8 dynamic
  • Size: ~85MB (vs 1.56GB original)
  • Optimized for: CPU inference
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support