episodic-ingestion-compiler ModernBERT-large multi-head (2000 steps)

Fine-tuned from answerdotai/ModernBERT-large for the ingestion-model-spec-v1 task in the episodic-rs project: given a blurb of text, emit a list of structured claim candidates conforming to the spec-v1 wire format (closed 21-predicate allowlist, 13-subject-type allowlist, predicate-specific object_value, character offset source_spans, calibrated confidence, first-class abstention).

This checkpoint is the production-selected backbone of the bakeoff conducted 2026-05-09. See the project's docs/bakeoff-decision-2026-05-09.md for full rationale.

Architecture

The encoder pools [CLS] into four heads:

Head	Type	Output dim	Purpose
`claim_present`	Linear → BCE	1	P(blurb has ≥1 claim); drives abstention
`predicate`	Linear → CE	21	Argmax over spec-v1 predicate allowlist
`subject_type`	Linear → CE	13	Argmax over spec-v1 subject_type allowlist
`confidence`	Linear → sigmoid → MSE	1	Calibrated probability

Not directly loadable via AutoModel.from_pretrained. Use the IngestionEncoder wrapper in scripts/train_ingestion_encoder.py of the source repo.

Training


Training data	94,894 labeled blurbs from 6,099 deduplicated agent sessions
Splits	session-stratified 70/10/10/10 train/cal/val/test
Steps	2000 @ batch 16 × lr 3e-5 (linear schedule, 80 step warmup implicit)
Precision	bf16 autocast
Elapsed	832s on RTX 5090 (24 GiB)
Peak VRAM	16.61 GiB

Eval metrics (val split, 13,179 blurbs)

Metric	Value
claim_present F1	0.930
claim_present precision	0.955
claim_present recall	0.906
predicate accuracy	0.954
subject_type accuracy	0.994
false_emission_rate	0.012
confidence MAE	0.028

Loading

The checkpoint is a torch.save of {state_dict, backbone, args, step, eval}. Load via:

import torch
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
import torch.nn as nn

class IngestionEncoder(nn.Module):
    def __init__(self, backbone_name: str):
        super().__init__()
        self.backbone = AutoModel.from_pretrained(backbone_name)
        h = self.backbone.config.hidden_size
        self.dropout = nn.Dropout(0.1)
        self.head_claim_present = nn.Linear(h, 1)
        self.head_predicate = nn.Linear(h, 21)
        self.head_subject_type = nn.Linear(h, 13)
        self.head_confidence = nn.Linear(h, 1)

    def forward(self, input_ids, attention_mask):
        out = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        pooled = self.dropout(out.last_hidden_state[:, 0])
        return {
            "claim_logit": self.head_claim_present(pooled).squeeze(-1),
            "predicate_logits": self.head_predicate(pooled),
            "subject_type_logits": self.head_subject_type(pooled),
            "confidence_pred": torch.sigmoid(self.head_confidence(pooled).squeeze(-1)),
        }

ckpt = torch.load(
    hf_hub_download("Avifenesh/episodic-ingestion-compiler-modernbert-large-mh-2000", "best.pt"),
    map_location="cpu", weights_only=False,
)
model = IngestionEncoder(ckpt["backbone"])
model.load_state_dict(ckpt["state_dict"])
model.eval()
tokenizer = AutoTokenizer.from_pretrained(ckpt["backbone"])

Allowlists (closed vocabularies enforced by the spec)

Predicates (21): validated_by, had_outcome, failed_because, worked_because, decided, blocked_by, has_next_action, has_status, has_goal, touched_file, ran_command, logged_event, has_constraint, has_open_question, has_quality_finding, reverted_file, deleted_file, created_file, committed, deployed, incident_observed.

Subject types (13): objective, command, file, pr, incident, policy, person, repo, team, service, document, ticket, thread.

Wrapper responsibility (not done by this model)

This model emits the information-bearing subset. A deterministic wrapper fills the literal fields:

status = "candidate" (always)
source_authority = "model_draft" (always)
extraction_method = "ingestion_model_v1" (always)
source_memory_ids from the caller's scope
class_hint derived from predicate
qualifiers = {} default
source_spans TBD — current encoder emits whole-blurb as the single span; a span-boundary head is planned

Bakeoff context

Competed against: ModernBERT-base-MH, DeBERTa-v3-large-MH, NuExtract-2.0-2B (QLoRA SFT), NuExtract-2.0-8B (QLoRA SFT), Qwen3.5-4B (QLoRA SFT). ModernBERT-large-MH won on every gate.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Avifenesh/episodic-ingestion-compiler-modernbert-large-mh-2000

Base model

answerdotai/ModernBERT-large

Finetuned

(318)

this model