--- library_name: transformers license: apache-2.0 base_model: answerdotai/ModernBERT-large pipeline_tag: text-classification language: - en datasets: - dipta007/decomposeRL-tiny-judge tags: - fact-verification - claim-verification - reward-model - llm-as-a-judge - distillation - modernbert - text-classification - decomposition - atomicity --- # DecomposeRL Tiny-Judge: Atomicity (verifiable) Judge

Paper

[![Paper](https://img.shields.io/badge/arXiv-2605.27858-red)](https://arxiv.org/abs/2605.27858v1) [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://dipta007.github.io/DecomposeRL/) [![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/dipta007/decomposeRL-tiny-judge) [![Collection](https://img.shields.io/badge/HuggingFace-Collection-blueviolet)](https://huggingface.co/collections/dipta007/decomposerl) [![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/dipta007/DecomposeRL) A ModernBERT-large classifier that scores whether a generated sub-question is **verifiable** — one of the five binary checks that make up the **atomicity** sub-signal of DecomposeRL's joint multiplicative quality reward. It is part of the **DecomposeRL tiny-judge stack** — eight task-specific LoRA classifier heads on a shared `ModernBERT-large` backbone that *distill* a `Qwen3-32B` LLM judge into small, fast reward models. Swapping the 32B judge for this ~400M-parameter stack cuts GRPO judge compute by ~80% (240 → 48 GPU-hours) while retaining ~99% of in-domain accuracy. ## Model Overview | Property | Value | |----------|-------| | **Model Type** | `ModernBertForSequenceClassification` (sequence classification) | | **Base Model** | `answerdotai/ModernBERT-large` (~400M params) | | **Training** | LoRA (r=64, α=128), merged into the base before release | | **Labels** | 2-way: `no` / `yes` | | **Distilled from** | `Qwen/Qwen3-32B` judge labels | | **Dataset / config** | [`dipta007/decomposeRL-tiny-judge`](https://huggingface.co/datasets/dipta007/decomposeRL-tiny-judge) · `atomicity_verifiable` | | **Train split** | `train_balanced` (class-balanced); selected on macro-F1 | | **Language** | English | ## What it judges This head is one of **five binary atomicity checks** (`is_question`, `single_focus`, `no_conjunctions`, `verifiable`, `grounded`). At reward time the five yes/no predictions are averaged into the per-question **atomicity** score `R_atom`, which is then multiplied with the answerability (`R_ans`) and answer-correctness (`R_corr`) sub-signals to form the joint multiplicative quality reward (Eq. 7 in the paper). ### Input format Claim + candidate sub-question: ``` Claim: {claim} Question: {question} ``` ### Label space | Label | Name | Meaning | |------:|------|---------| | `0` | `no` | the question is open-ended, vague, or has no checkable answer | | `1` | `yes` | the question has a concrete verifiable answer (yes/no or a specific fact) | ## Quickstart ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer repo = "dipta007/atomicity-verifiable-judge-balanced" tokenizer = AutoTokenizer.from_pretrained(repo) model = AutoModelForSequenceClassification.from_pretrained(repo).eval() text = ( 'Claim: The cloth then undergoes dyeing, even in cases where the yarn was dyed before weaving.\\n' 'Question: Does the evidence show that even after the wear-dyed fabric, dyeing is necessary during the finishing processes?' ) inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192) with torch.no_grad(): logits = model(**inputs).logits pred = int(logits.argmax(-1)) print(pred, model.config.id2label[pred]) # expected: 1 -> yes ``` ## Training Data Trained on the `atomicity_verifiable` config of [`dipta007/decomposeRL-tiny-judge`](https://huggingface.co/datasets/dipta007/decomposeRL-tiny-judge), whose labels are distilled from `Qwen3-32B` judge calls made during DecomposeRL reward computation. The model is fine-tuned with LoRA on the class-balanced `train_balanced` split, validated on the natural `validation` split, and the best checkpoint is chosen by macro-F1. LoRA adapters are merged into the backbone before release, so the model loads with a plain `from_pretrained` (no PEFT required). ## Role in DecomposeRL DecomposeRL trains a claim-verification policy with GRPO over a seven-reward ensemble. Five of those rewards are scored by an LLM judge, which dominates training-time GPU cost. The tiny-judge stack replaces that 32B judge with eight small distilled heads so reward scoring runs on the same single GPU as training. See the [paper](https://arxiv.org/abs/2605.27858v1) (tiny-judge ablation) and the [DecomposeRL-7B model](https://huggingface.co/dipta007/decomposeRL-7b) for the full reward design. ## Intended Use - **In-scope**: serving as a fast reward / scoring model inside the DecomposeRL training loop, or as a standalone classifier for the specific judgment above on claim-decomposition traces. - **Out-of-scope**: general-purpose fact-checking, use on inputs that do not follow the input format above, or as a standalone end-to-end claim verifier (use [DecomposeRL-7B](https://huggingface.co/dipta007/decomposeRL-7b) for that). ## Citation ```bibtex @article{dipta2025decomposerl, title={DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification}, author={Shubhashis Roy Dipta and Ankur Padia and Francis Ferraro}, year={2025}, eprint={2605.27858}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2605.27858v1}, } ``` ## License Released under the Apache 2.0 License.