episodic-ingestion-compiler ModernBERT-large multi-head (2000 steps)

Fine-tuned from answerdotai/ModernBERT-large for the ingestion-model-spec-v1 task in the episodic-rs project: given a blurb of text, emit a list of structured claim candidates conforming to the spec-v1 wire format (closed 21-predicate allowlist, 13-subject-type allowlist, predicate-specific object_value, character offset source_spans, calibrated confidence, first-class abstention).

This checkpoint is the production-selected backbone of the bakeoff conducted 2026-05-09. See the project's docs/bakeoff-decision-2026-05-09.md for full rationale.

Architecture

The encoder pools [CLS] into four heads:

Head Type Output dim Purpose
claim_present Linear โ†’ BCE 1 P(blurb has โ‰ฅ1 claim); drives abstention
predicate Linear โ†’ CE 21 Argmax over spec-v1 predicate allowlist
subject_type Linear โ†’ CE 13 Argmax over spec-v1 subject_type allowlist
confidence Linear โ†’ sigmoid โ†’ MSE 1 Calibrated probability

Not directly loadable via AutoModel.from_pretrained. Use the IngestionEncoder wrapper in scripts/train_ingestion_encoder.py of the source repo.

Training

Training data 94,894 labeled blurbs from 6,099 deduplicated agent sessions
Splits session-stratified 70/10/10/10 train/cal/val/test
Steps 2000 @ batch 16 ร— lr 3e-5 (linear schedule, 80 step warmup implicit)
Precision bf16 autocast
Elapsed 832s on RTX 5090 (24 GiB)
Peak VRAM 16.61 GiB

Eval metrics (val split, 13,179 blurbs)

Metric Value
claim_present F1 0.930
claim_present precision 0.955
claim_present recall 0.906
predicate accuracy 0.954
subject_type accuracy 0.994
false_emission_rate 0.012
confidence MAE 0.028

Loading

The checkpoint is a torch.save of {state_dict, backbone, args, step, eval}. Load via:

import torch
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
import torch.nn as nn

class IngestionEncoder(nn.Module):
    def __init__(self, backbone_name: str):
        super().__init__()
        self.backbone = AutoModel.from_pretrained(backbone_name)
        h = self.backbone.config.hidden_size
        self.dropout = nn.Dropout(0.1)
        self.head_claim_present = nn.Linear(h, 1)
        self.head_predicate = nn.Linear(h, 21)
        self.head_subject_type = nn.Linear(h, 13)
        self.head_confidence = nn.Linear(h, 1)

    def forward(self, input_ids, attention_mask):
        out = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        pooled = self.dropout(out.last_hidden_state[:, 0])
        return {
            "claim_logit": self.head_claim_present(pooled).squeeze(-1),
            "predicate_logits": self.head_predicate(pooled),
            "subject_type_logits": self.head_subject_type(pooled),
            "confidence_pred": torch.sigmoid(self.head_confidence(pooled).squeeze(-1)),
        }

ckpt = torch.load(
    hf_hub_download("Avifenesh/episodic-ingestion-compiler-modernbert-large-mh-2000", "best.pt"),
    map_location="cpu", weights_only=False,
)
model = IngestionEncoder(ckpt["backbone"])
model.load_state_dict(ckpt["state_dict"])
model.eval()
tokenizer = AutoTokenizer.from_pretrained(ckpt["backbone"])

Allowlists (closed vocabularies enforced by the spec)

Predicates (21): validated_by, had_outcome, failed_because, worked_because, decided, blocked_by, has_next_action, has_status, has_goal, touched_file, ran_command, logged_event, has_constraint, has_open_question, has_quality_finding, reverted_file, deleted_file, created_file, committed, deployed, incident_observed.

Subject types (13): objective, command, file, pr, incident, policy, person, repo, team, service, document, ticket, thread.

Wrapper responsibility (not done by this model)

This model emits the information-bearing subset. A deterministic wrapper fills the literal fields:

  • status = "candidate" (always)
  • source_authority = "model_draft" (always)
  • extraction_method = "ingestion_model_v1" (always)
  • source_memory_ids from the caller's scope
  • class_hint derived from predicate
  • qualifiers = {} default
  • source_spans TBD โ€” current encoder emits whole-blurb as the single span; a span-boundary head is planned

Bakeoff context

Competed against: ModernBERT-base-MH, DeBERTa-v3-large-MH, NuExtract-2.0-2B (QLoRA SFT), NuExtract-2.0-8B (QLoRA SFT), Qwen3.5-4B (QLoRA SFT). ModernBERT-large-MH won on every gate.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Avifenesh/episodic-ingestion-compiler-modernbert-large-mh-2000

Finetuned
(318)
this model