---
license: apache-2.0
library_name: transformers
base_model: answerdotai/ModernBERT-large
tags:
  - modernbert
  - episodic-ingestion-compiler
  - structured-extraction
  - ingestion-model-spec-v1
  - multi-head-classifier
language:
  - en
---

# episodic-ingestion-compiler ModernBERT-large multi-head (2000 steps)

Fine-tuned from `answerdotai/ModernBERT-large` for the
**ingestion-model-spec-v1** task in the `episodic-rs` project:
given a blurb of text, emit a list of structured claim candidates
conforming to the spec-v1 wire format (closed 21-predicate allowlist,
13-subject-type allowlist, predicate-specific object_value, character
offset source_spans, calibrated confidence, first-class abstention).

This checkpoint is the **production-selected backbone** of the
bakeoff conducted 2026-05-09. See the project's
`docs/bakeoff-decision-2026-05-09.md` for full rationale.

## Architecture

The encoder pools [CLS] into four heads:

| Head | Type | Output dim | Purpose |
|---|---|---:|---|
| `claim_present` | Linear → BCE | 1 | P(blurb has ≥1 claim); drives abstention |
| `predicate` | Linear → CE | 21 | Argmax over spec-v1 predicate allowlist |
| `subject_type` | Linear → CE | 13 | Argmax over spec-v1 subject_type allowlist |
| `confidence` | Linear → sigmoid → MSE | 1 | Calibrated probability |

Not directly loadable via `AutoModel.from_pretrained`. Use the
`IngestionEncoder` wrapper in
`scripts/train_ingestion_encoder.py` of the source repo.

## Training

| | |
|---|---|
| Training data | 94,894 labeled blurbs from 6,099 deduplicated agent sessions |
| Splits | session-stratified 70/10/10/10 train/cal/val/test |
| Steps | 2000 @ batch 16 × lr 3e-5 (linear schedule, 80 step warmup implicit) |
| Precision | bf16 autocast |
| Elapsed | 832s on RTX 5090 (24 GiB) |
| Peak VRAM | 16.61 GiB |

## Eval metrics (val split, 13,179 blurbs)

| Metric | Value |
|---|---:|
| claim_present F1 | **0.930** |
| claim_present precision | 0.955 |
| claim_present recall | 0.906 |
| predicate accuracy | 0.954 |
| subject_type accuracy | 0.994 |
| false_emission_rate | 0.012 |
| confidence MAE | 0.028 |

## Loading

The checkpoint is a `torch.save` of `{state_dict, backbone, args, step, eval}`.
Load via:

```python
import torch
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
import torch.nn as nn

class IngestionEncoder(nn.Module):
    def __init__(self, backbone_name: str):
        super().__init__()
        self.backbone = AutoModel.from_pretrained(backbone_name)
        h = self.backbone.config.hidden_size
        self.dropout = nn.Dropout(0.1)
        self.head_claim_present = nn.Linear(h, 1)
        self.head_predicate = nn.Linear(h, 21)
        self.head_subject_type = nn.Linear(h, 13)
        self.head_confidence = nn.Linear(h, 1)

    def forward(self, input_ids, attention_mask):
        out = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        pooled = self.dropout(out.last_hidden_state[:, 0])
        return {
            "claim_logit": self.head_claim_present(pooled).squeeze(-1),
            "predicate_logits": self.head_predicate(pooled),
            "subject_type_logits": self.head_subject_type(pooled),
            "confidence_pred": torch.sigmoid(self.head_confidence(pooled).squeeze(-1)),
        }

ckpt = torch.load(
    hf_hub_download("Avifenesh/episodic-ingestion-compiler-modernbert-large-mh-2000", "best.pt"),
    map_location="cpu", weights_only=False,
)
model = IngestionEncoder(ckpt["backbone"])
model.load_state_dict(ckpt["state_dict"])
model.eval()
tokenizer = AutoTokenizer.from_pretrained(ckpt["backbone"])
```

## Allowlists (closed vocabularies enforced by the spec)

**Predicates (21)**: `validated_by`, `had_outcome`, `failed_because`,
`worked_because`, `decided`, `blocked_by`, `has_next_action`,
`has_status`, `has_goal`, `touched_file`, `ran_command`, `logged_event`,
`has_constraint`, `has_open_question`, `has_quality_finding`,
`reverted_file`, `deleted_file`, `created_file`, `committed`,
`deployed`, `incident_observed`.

**Subject types (13)**: `objective`, `command`, `file`, `pr`,
`incident`, `policy`, `person`, `repo`, `team`, `service`, `document`,
`ticket`, `thread`.

## Wrapper responsibility (not done by this model)

This model emits the information-bearing subset. A deterministic
wrapper fills the literal fields:
- `status = "candidate"` (always)
- `source_authority = "model_draft"` (always)
- `extraction_method = "ingestion_model_v1"` (always)
- `source_memory_ids` from the caller's scope
- `class_hint` derived from predicate
- `qualifiers = {}` default
- `source_spans` TBD — current encoder emits whole-blurb as the single
  span; a span-boundary head is planned

## Bakeoff context

Competed against: ModernBERT-base-MH, DeBERTa-v3-large-MH,
NuExtract-2.0-2B (QLoRA SFT), NuExtract-2.0-8B (QLoRA SFT),
Qwen3.5-4B (QLoRA SFT). ModernBERT-large-MH won on every gate.