Text Generation
PEFT
Safetensors
Transformers
lora
ner
information-extraction
medgemma
conversational
Instructions to use Pritish92/ner-medgemma15-4b-it-lora-b0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pritish92/ner-medgemma15-4b-it-lora-b0 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/medgemma-1.5-4b-it") model = PeftModel.from_pretrained(base_model, "Pritish92/ner-medgemma15-4b-it-lora-b0") - Transformers
How to use Pritish92/ner-medgemma15-4b-it-lora-b0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Pritish92/ner-medgemma15-4b-it-lora-b0") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Pritish92/ner-medgemma15-4b-it-lora-b0", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Pritish92/ner-medgemma15-4b-it-lora-b0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Pritish92/ner-medgemma15-4b-it-lora-b0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pritish92/ner-medgemma15-4b-it-lora-b0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Pritish92/ner-medgemma15-4b-it-lora-b0
- SGLang
How to use Pritish92/ner-medgemma15-4b-it-lora-b0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Pritish92/ner-medgemma15-4b-it-lora-b0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pritish92/ner-medgemma15-4b-it-lora-b0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Pritish92/ner-medgemma15-4b-it-lora-b0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pritish92/ner-medgemma15-4b-it-lora-b0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Pritish92/ner-medgemma15-4b-it-lora-b0 with Docker Model Runner:
docker model run hf.co/Pritish92/ner-medgemma15-4b-it-lora-b0
Pritish92/ner-medgemma15-4b-it-lora-b0
This is a LoRA adapter fine-tuned from google/medgemma-1.5-4b-it for instruction-following NER extraction into a strict JSON list format:
[{"label":"...","text":"..."}]
This repository contains adapter weights only (not full base model weights). You must have access to google/medgemma-1.5-4b-it to run it.
Prompt format (exact)
### Instruction:
{instruction}
Maintain the JSON key order exactly as shown.
Output format: [{"label":"...","text":"..."}]
### Input:
{input_chunk}
### Response:
How to load
import torch
from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText
adapter_id = "Pritish92/ner-medgemma15-4b-it-lora-b0"
base_id = "google/medgemma-1.5-4b-it"
processor = AutoProcessor.from_pretrained(adapter_id, use_fast=False)
base_model = AutoModelForImageTextToText.from_pretrained(
base_id,
dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
Training details
- Date: 2026-02-26
- Sequence length cap (
max_length): 8192 - Chunking strategy: entity_aware
- prompt overhead tokens reserved: 256
- output overhead tokens reserved: 1024
- max input chunk tokens: 2048
- overlap chunk tokens: 256
- min chunk tokens: 256
- Batch size: 1
- Gradient accumulation: 8 (effective batch: 8)
- Learning rate: 2e-05
- Planned epochs: 3.0
- Loss masking: response-only (prompt + input chunk tokens masked with -100)
- W&B run name: N/A
- W&B run id: 2pjkzb4r
LoRA / PEFT
- LoRA rank (r): 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- Target modules: language_model.layers.20.self_attn.q_proj, language_model.layers.25.self_attn.k_proj, language_model.layers.18.self_attn.q_proj, language_model.layers.18.self_attn.k_proj, language_model.layers.23.self_attn.v_proj, language_model.layers.19.self_attn.k_proj, language_model.layers.7.self_attn.v_proj, language_model.layers.1.self_attn.q_proj, language_model.layers.10.self_attn.q_proj, language_model.layers.16.self_attn.k_proj, language_model.layers.16.self_attn.v_proj, o_proj, language_model.layers.24.self_attn.v_proj, language_model.layers.7.self_attn.k_proj, language_model.layers.13.self_attn.v_proj, language_model.layers.10.self_attn.k_proj, 28.self_attn.k_proj, language_model.layers.0.self_attn.v_proj, language_model.layers.8.self_attn.k_proj, language_model.layers.21.self_attn.q_proj, language_model.layers.23.self_attn.k_proj, language_model.layers.9.self_attn.q_proj, language_model.layers.25.self_attn.v_proj, language_model.layers.15.self_attn.k_proj, language_model.layers.2.self_attn.k_proj, 31.self_attn.k_proj, language_model.layers.9.self_attn.v_proj, language_model.layers.26.self_attn.k_proj, language_model.layers.11.self_attn.k_proj, language_model.layers.3.self_attn.q_proj, language_model.layers.11.self_attn.q_proj, language_model.layers.4.self_attn.q_proj, language_model.layers.17.self_attn.q_proj, language_model.layers.24.self_attn.k_proj, 30.self_attn.v_proj, language_model.layers.14.self_attn.v_proj, language_model.layers.5.self_attn.q_proj, 27.self_attn.k_proj, 27.self_attn.v_proj, language_model.layers.3.self_attn.v_proj, language_model.layers.17.self_attn.k_proj, language_model.layers.3.self_attn.k_proj, up_proj, 29.self_attn.q_proj, language_model.layers.4.self_attn.k_proj, language_model.layers.14.self_attn.k_proj, 31.self_attn.v_proj, language_model.layers.19.self_attn.q_proj, 32.self_attn.k_proj, 29.self_attn.v_proj, 31.self_attn.q_proj, 30.self_attn.k_proj, 27.self_attn.q_proj, language_model.layers.22.self_attn.v_proj, language_model.layers.23.self_attn.q_proj, language_model.layers.5.self_attn.v_proj, language_model.layers.17.self_attn.v_proj, language_model.layers.24.self_attn.q_proj, 28.self_attn.q_proj, language_model.layers.16.self_attn.q_proj, language_model.layers.13.self_attn.k_proj, language_model.layers.26.self_attn.q_proj, language_model.layers.26.self_attn.v_proj, language_model.layers.5.self_attn.k_proj, 29.self_attn.k_proj, 32.self_attn.q_proj, language_model.layers.21.self_attn.v_proj, language_model.layers.19.self_attn.v_proj, language_model.layers.11.self_attn.v_proj, language_model.layers.21.self_attn.k_proj, language_model.layers.9.self_attn.k_proj, language_model.layers.14.self_attn.q_proj, language_model.layers.1.self_attn.k_proj, language_model.layers.8.self_attn.q_proj, 33.self_attn.k_proj, language_model.layers.12.self_attn.q_proj, language_model.layers.2.self_attn.q_proj, language_model.layers.25.self_attn.q_proj, language_model.layers.15.self_attn.v_proj, 28.self_attn.v_proj, language_model.layers.4.self_attn.v_proj, language_model.layers.0.self_attn.k_proj, 30.self_attn.q_proj, language_model.layers.22.self_attn.k_proj, language_model.layers.12.self_attn.v_proj, language_model.layers.2.self_attn.v_proj, language_model.layers.18.self_attn.v_proj, language_model.layers.6.self_attn.k_proj, language_model.layers.6.self_attn.v_proj, 33.self_attn.q_proj, language_model.layers.22.self_attn.q_proj, language_model.layers.20.self_attn.v_proj, down_proj, language_model.layers.12.self_attn.k_proj, language_model.layers.1.self_attn.v_proj, language_model.layers.10.self_attn.v_proj, language_model.layers.0.self_attn.q_proj, language_model.layers.8.self_attn.v_proj, 33.self_attn.v_proj, language_model.layers.20.self_attn.k_proj, language_model.layers.7.self_attn.q_proj, language_model.layers.13.self_attn.q_proj, language_model.layers.6.self_attn.q_proj, gate_proj, language_model.layers.15.self_attn.q_proj, 32.self_attn.v_proj
Training data
Local CSVs:
NER/NER-Data/ner_train_dataset.csvNER/NER-Data/ner_dev_dataset.csvNER/NER-Data/ner_test_dataset.csv
Example counts: raw train=18,115, raw val=2,010; after chunking train examples=25,570
Evaluation
- Best checkpoint metric: eval_entity_f1_balanced=0.339016 (best checkpoint: step 9591)
- Train runtime: 58611.6s (16h 16m 51s)
- eval_entity_f1: 0.367399
- eval_entity_f1_balanced: 0.339016
- eval_entity_f1_clinical_status: 0.180791
- eval_entity_f1_diagnosis: 0.372832
- eval_entity_f1_symptom: 0.378277
- eval_entity_label_macro_f1: 0.310633
- eval_entity_micro_f1: 0.344190
- eval_entity_parse_fail_rate: 0.906250
- eval_entity_precision: 0.654518
- eval_entity_precision_clinical_status: 0.405063
- eval_entity_precision_diagnosis: 0.708791
- eval_entity_precision_symptom: 0.721429
- eval_entity_recall: 0.278551
- eval_entity_recall_clinical_status: 0.116364
- eval_entity_recall_diagnosis: 0.252941
- eval_entity_recall_symptom: 0.256345
- eval_runtime: 104.118900
- eval_samples_per_second: 0.615000
- eval_steps_per_second: 0.077000
Notes
- MedGemma can be prompt-sensitive; keep inference prompt formatting aligned with training.
- Validate JSON output before downstream use.
- If
google/medgemma-1.5-4b-itis gated, authenticate first.
References
- MedGemma model card: https://huggingface.co/google/medgemma-1.5-4b-it
- MedGemma notebooks: https://github.com/google-health/medgemma/tree/main/notebooks
- Downloads last month
- 1
Model tree for Pritish92/ner-medgemma15-4b-it-lora-b0
Base model
google/medgemma-1.5-4b-it
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/medgemma-1.5-4b-it") model = PeftModel.from_pretrained(base_model, "Pritish92/ner-medgemma15-4b-it-lora-b0")