---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-large
pipeline_tag: text-classification
language:
  - en
datasets:
  - dipta007/decomposeRL-tiny-judge
tags:
  - fact-verification
  - claim-verification
  - reward-model
  - llm-as-a-judge
  - distillation
  - modernbert
  - text-classification
  - decomposition
  - atomicity
---

# DecomposeRL Tiny-Judge: Atomicity (verifiable) Judge

<p align="center">
  <a href="https://arxiv.org/abs/2605.27858v1">
    <img src="https://img.shields.io/badge/%F0%9F%93%84_Paper-arXiv-b12a00?style=for-the-badge&labelColor=ffb300" alt="Paper">
  </a>
</p>

[![Paper](https://img.shields.io/badge/arXiv-2605.27858-red)](https://arxiv.org/abs/2605.27858v1)
[![Project Page](https://img.shields.io/badge/Project-Page-green)](https://dipta007.github.io/DecomposeRL/)
[![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/dipta007/decomposeRL-tiny-judge)
[![Collection](https://img.shields.io/badge/HuggingFace-Collection-blueviolet)](https://huggingface.co/collections/dipta007/decomposerl)
[![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/dipta007/DecomposeRL)

A ModernBERT-large classifier that scores whether a generated sub-question is **verifiable** — one of the five binary checks that make up the **atomicity** sub-signal of DecomposeRL's joint multiplicative quality reward.

It is part of the **DecomposeRL tiny-judge stack** — eight task-specific LoRA classifier heads on a shared `ModernBERT-large` backbone that *distill* a `Qwen3-32B` LLM judge into small, fast reward models. Swapping the 32B judge for this ~400M-parameter stack cuts GRPO judge compute by ~80% (240 → 48 GPU-hours) while retaining ~99% of in-domain accuracy.

## Model Overview

| Property | Value |
|----------|-------|
| **Model Type** | `ModernBertForSequenceClassification` (sequence classification) |
| **Base Model** | `answerdotai/ModernBERT-large` (~400M params) |
| **Training** | LoRA (r=64, α=128), merged into the base before release |
| **Labels** | 2-way: `no` / `yes` |
| **Distilled from** | `Qwen/Qwen3-32B` judge labels |
| **Dataset / config** | [`dipta007/decomposeRL-tiny-judge`](https://huggingface.co/datasets/dipta007/decomposeRL-tiny-judge) · `atomicity_verifiable` |
| **Train split** | `train_balanced` (class-balanced); selected on macro-F1 |
| **Language** | English |

## What it judges

This head is one of **five binary atomicity checks** (`is_question`, `single_focus`, `no_conjunctions`, `verifiable`, `grounded`). At reward time the five yes/no predictions are averaged into the per-question **atomicity** score `R_atom`, which is then multiplied with the answerability (`R_ans`) and answer-correctness (`R_corr`) sub-signals to form the joint multiplicative quality reward (Eq. 7 in the paper).

### Input format

Claim + candidate sub-question:

```
Claim: {claim}
Question: {question}
```

### Label space

| Label | Name | Meaning |
|------:|------|---------|
| `0` | `no`  | the question is open-ended, vague, or has no checkable answer |
| `1` | `yes` | the question has a concrete verifiable answer (yes/no or a specific fact) |

## Quickstart

```python
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo = "dipta007/atomicity-verifiable-judge-balanced"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = (
    'Claim: The cloth then undergoes dyeing, even in cases where the yarn was dyed before weaving.\\n'
    'Question: Does the evidence show that even after the wear-dyed fabric, dyeing is necessary during the finishing processes?'
)

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
with torch.no_grad():
    logits = model(**inputs).logits
pred = int(logits.argmax(-1))
print(pred, model.config.id2label[pred])
# expected: 1 -> yes
```

## Training Data

Trained on the `atomicity_verifiable` config of [`dipta007/decomposeRL-tiny-judge`](https://huggingface.co/datasets/dipta007/decomposeRL-tiny-judge), whose labels are distilled from `Qwen3-32B` judge calls made during DecomposeRL reward computation. The model is fine-tuned with LoRA on the class-balanced `train_balanced` split, validated on the natural `validation` split, and the best checkpoint is chosen by macro-F1. LoRA adapters are merged into the backbone before release, so the model loads with a plain `from_pretrained` (no PEFT required).

## Role in DecomposeRL

DecomposeRL trains a claim-verification policy with GRPO over a seven-reward ensemble. Five of those rewards are scored by an LLM judge, which dominates training-time GPU cost. The tiny-judge stack replaces that 32B judge with eight small distilled heads so reward scoring runs on the same single GPU as training. See the [paper](https://arxiv.org/abs/2605.27858v1) (tiny-judge ablation) and the [DecomposeRL-7B model](https://huggingface.co/dipta007/decomposeRL-7b) for the full reward design.

## Intended Use

- **In-scope**: serving as a fast reward / scoring model inside the DecomposeRL training loop, or as a standalone classifier for the specific judgment above on claim-decomposition traces.
- **Out-of-scope**: general-purpose fact-checking, use on inputs that do not follow the input format above, or as a standalone end-to-end claim verifier (use [DecomposeRL-7B](https://huggingface.co/dipta007/decomposeRL-7b) for that).

## Citation

```bibtex
@article{dipta2025decomposerl,
  title={DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification},
  author={Shubhashis Roy Dipta and Ankur Padia and Francis Ferraro},
  year={2025},
  eprint={2605.27858},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.27858v1},
}
```

## License

Released under the Apache 2.0 License.