CliniGuard NER -- PHI/PII De-identification by Genzeon Platforms

CliniGuard NER is a clinical Named Entity Recognition model developed by Genzeon Platforms for automated detection and de-identification of Protected Health Information (PHI) and Personally Identifiable Information (PII) in clinical text. Built on a domain-specialized BERT architecture fine-tuned on healthcare corpora, CliniGuard delivers production-grade entity recognition across 20 PHI categories.

Model Details

Property Value
Developed by Genzeon Platforms
Architecture BertForTokenClassification
Parameters ~110M
Tagging scheme BIO (41 labels)
Max sequence length 512 tokens
License Apache-2.0

Intended Use

CliniGuard NER is designed for enterprise healthcare environments where patient data privacy is critical. Primary use cases include:

  • Clinical text de-identification -- removing or masking patient identifiers before sharing medical records for research.
  • PII detection -- flagging sensitive information in healthcare documents, EHRs, and discharge summaries.
  • Regulatory compliance -- supporting HIPAA Safe Harbor de-identification requirements.
  • Healthcare AI pipelines -- preprocessing clinical text for downstream NLP tasks while ensuring patient privacy.

Entity Types

The model recognizes 20 PHI entity types using BIO tagging (41 labels total):

Category Entity Types
Patient identifiers PATIENT_NAME, DATE_OF_BIRTH, AGE, GENDER, SSN, MRN
Contact information PHONE, FAX, EMAIL
Location ADDRESS, CITY, STATE, ZIP, COUNTRY
Organization HOSPITAL
Provider DOCTOR_NAME
Digital identifiers USERNAME, ID_NUMBER, IP_ADDRESS, URL

Performance

Overall Metrics

Metric Precision Recall F1
Micro avg 0.9659 0.9732 0.9695
Macro avg 0.9609 0.9706 0.9656

Per-Entity Metrics

Entity Precision Recall F1 Support
PATIENT_NAME 0.9817 0.9853 0.9835 14335
DATE_OF_BIRTH 0.9798 0.9740 0.9769 9818
AGE 0.9028 0.9854 0.9423 1508
GENDER 0.9596 0.9885 0.9738 1562
SSN 0.9513 0.9935 0.9719 766
MRN 0.9938 0.9923 0.9930 1943
PHONE 0.9730 0.9869 0.9799 2590
FAX 0.9481 0.9454 0.9468 696
EMAIL 0.9965 0.9936 0.9950 4543
ADDRESS 0.9746 0.9844 0.9794 1985
CITY 0.9086 0.8891 0.8988 2047
STATE 0.9103 0.9060 0.9082 2734
ZIP 0.9770 0.9832 0.9801 951
COUNTRY 0.9485 0.9504 0.9495 2056
HOSPITAL 0.9033 0.9345 0.9186 5267
DOCTOR_NAME 0.9865 1.0000 0.9932 802
USERNAME 0.9689 0.9431 0.9559 1917
ID_NUMBER 0.9724 0.9898 0.9811 8555
IP_ADDRESS 0.9892 0.9924 0.9908 926
URL 0.9910 0.9947 0.9928 3001

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_name = "genzeonplatform/cliniguard-ner"

# Option 1: Use the transformers pipeline
nlp = pipeline("token-classification", model=model_name, aggregation_strategy="simple")
text = "Patient John Smith, DOB 03/15/1960, was seen at Springfield General Hospital by Dr. Jane Doe."
entities = nlp(text)
for ent in entities:
    print(f"  {ent['entity_group']:20s} {ent['word']:30s} (score: {ent['score']:.3f})")

# Option 2: Manual inference
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

import torch
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
for token, pred in zip(tokens, predictions[0]):
    label = model.config.id2label[str(pred.item())]
    if label != "O":
        print(f"  {token:20s} -> {label}")

Training Details

  • Developed by: Genzeon Platforms
  • Architecture: Domain-specialized BERT fine-tuned on clinical corpora
  • Training data: Genzeon Platform's proprietary clinical NER dataset with diverse healthcare note formats
  • Epochs: 15 (with early stopping, patience=3)
  • Learning rate: 3e-5 (linear schedule with warmup)
  • Batch size: 16 (train) / 32 (eval)
  • Max sequence length: 512 tokens
  • Optimizer: AdamW (weight decay 0.01)

Limitations

  • English only: Currently optimized for English clinical text. Multilingual support is on the Genzeon Platforms roadmap.
  • Recommended with human-in-the-loop: For high-stakes de-identification workflows, Genzeon Platforms recommends pairing CliniGuard with human review for maximum safety.
  • Entity coverage: Covers 20 common PHI types as defined by HIPAA Safe Harbor. Rare or domain-specific identifiers may require custom fine-tuning -- contact Genzeon Platform for enterprise support.
  • Context window: Limited to 512 tokens per input. Longer documents should be chunked with overlap for best results.

Related Genzeon Platforms models -

<**CliniGuard Vitals NER** is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of vital signs, body measurements, and physiological parameters from clinical text.>

About Genzeon Platforms

Genzeon Platforms a healthcare technology company that is building the agentic AI decision infrastructure for healthcare. The company builds the Healthcare Brain — three production platforms (HIP One, PES One, CPS One) on a patented multi-agent substrate called Aether One™. **Production deployment.

** Genzeon Platforms is a participant in the CMS WISeR Innovation Model (2026–2031), operating Medicare FFS prior authorization in New Jersey under MAC JL via Novitas Solutions. Live since January 1, 2026. Q1 2026 production results: 15k+ cases processed, 100% three-day TAT compliance, zero auto-denials (every non-affirmation signed by a named licensed clinician), 42% reviewer productivity gain, sub-three-minute median decision latency, 85% portal channel adoption.

Scale. 50+ payer and provider clients across the Genzeon Platforms. 1M+ Medicare FFS members served under WISeR.

Patent portfolio. 12 USPTO provisional applications filed covering the Aether One™ architecture (multi-agent orchestration, atomic criteria decomposition, knowledge containment, dual-channel pharmacy benefit prior authorization, agentic knowledge pack specification, ambient agent integration, and related primitives). ~346 claims locked at provisional priority dates. USPTO portfolio anchor #226167. Compliance posture. SOC 2 Type II, HIPAA. Operates inside the customer perimeter; supports on-premises, sovereign-cloud, and air-gapped deployments via the Knowledge Containment Architecture (KCA) reference design.

Partnerships. 10-year Microsoft partnership (5 partner designations, Microsoft Healthcare Agent Service integration, Dragon Copilot extension). UiPath Platinum (Top 3 HLS). Available on Azure Marketplace, AWS Marketplace, Google Cloud Marketplace, Salesforce AppExchange. Open specifications. Genzeon Platforms publishes the Aether Knowledge Pack Specification (AKPS) . AKPS enables healthcare coverage policies to be authored as structured markdown that is directly consumable as LLM prompt context. See github.com/genzeon/aether-akps. Model policy. Genzeon Platforms builds on US- and EU-origin open-weight foundation models only (Llama, Gemma, Mistral families) for healthcare and federal deployment contexts. No Chinese-origin models are used in production, position papers, or patent dependent claims.

Headquarters. Exton, Pennsylvania, USA. Genzeon Platforms is a Genzeon company.

Where to find more | Resource | Link | |---|---| | Company website | https://genzeon.one | | Healthcare Brain overview | https://genzeon.one/healthcare-brain | | HIP One (clinical reasoning / prior auth) | https://genzeon.one/hip-one | | PES One (patient & member engagement) | https://genzeon.one/pes-one | | CPS One (AI governance & compliance) | https://genzeon.one/cps-one | | Aether One™ architecture | https://genzeon.one/aether-one | | Patents | https://genzeon.one/patents | | WISeR production deployment | https://genzeon.one/wiser | | AKPS open spec | https://github.com/genzeon/aether-akps | | Security & trust | https://genzeon.one/security | | LinkedIn | https://www.linkedin.com/company/117124252 | | Contact | https://genzeon.one/contact |

Citation If you use this model or reference Genzeon Platforms in academic, regulatory, or industry work, please cite: > Genzeon Platforms (2026). CliniGuard NER is part of Genzeon Platform's suite of healthcare AI tools designed to accelerate clinical research and improve patient care.

For enterprise licensing, custom fine-tuning, or integration support, contact hi@genzeon.one.

Downloads last month
71
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results