---
license: gemma
language:
  - en
library_name: transformers
base_model: google/gemma-4-E4B-it
base_model_relation: finetune
pipeline_tag: text-generation
tags:
  - judge
  - llm-as-judge
  - evaluation
  - social-bias
  - bbq
  - gemma
  - lora
  - sft
  - fine-tuned
datasets:
  - krishnakartik/gemma4-social-bias-judge-pairs
model-index:
  - name: gemma4-social-bias-judge-sft
    results:
      - task:
          type: text-classification
          name: Social-bias judge (A / B / TIE verdict)
        dataset:
          type: krishnakartik/gemma4-social-bias-judge-pairs
          name: Gemma 4 Social Bias Judge Pairs (eval holdout)
          config: eval_holdout
          split: train
        metrics:
          - type: cohen_kappa
            name: "Cohen's κ (in-distribution, 240 pairs)"
            value: 0.647
          - type: cohen_kappa
            name: "Cohen's κ (OOD religion, 60 pairs)"
            value: 0.695
          - type: cohen_kappa
            name: "Cohen's κ (tracked-vs-alternate)"
            value: 0.197
          - type: cohen_kappa
            name: "Cohen's κ (subtle-bias bucket)"
            value: 0.743
          - type: position_bias_rate
            name: "Position-bias rate (in-distribution; lower is better)"
            value: 0.084
          - type: self_consistency
            name: "Self-consistency rate (T=0.3)"
            value: 0.832
---

# Gemma 4 E4B — Social-Bias Judge (SFT only)

This is the **SFT-only checkpoint** from the [judge-from-scratch
project](https://github.com/krishnakartik1/judge-from-scratch). It is the
intermediate artifact before the DPO refinement pass that produced
[`krishnakartik/gemma4-social-bias-judge`](https://huggingface.co/krishnakartik/gemma4-social-bias-judge)
(the primary release).

**Use this checkpoint instead of the DPO version if your bias
categories are out-of-distribution relative to BBQ's training set.**
The DPO refinement narrows generalization by overfitting to the 10
in-distribution bias categories' specific patterns — fine when your
inputs match the training distribution, harmful when they don't.

For the full project narrative, eval methodology, training pipeline,
and limitations, **read the [primary model
card](https://huggingface.co/krishnakartik/gemma4-social-bias-judge)**.
This card focuses on what differs between the SFT-only and DPO
checkpoints.

---

## ⚠️ Important: Thinking Mode

This model was fine-tuned with **Gemma 4's native thinking mode
DISABLED**. Do **NOT** include `<|think|>` in the system prompt at
inference time — the model never saw that token during training and
will generate degraded, unparseable output. See the [primary model
card's thinking-mode
section](https://huggingface.co/krishnakartik/gemma4-social-bias-judge#%E2%9A%A0%EF%B8%8F-important-thinking-mode)
for the full explanation.

---

## Quick start

### Ollama

```bash
# IMPORTANT: thinking mode is disabled — do NOT add <|think|> to /system.
ollama run hf.co/krishnakartik/gemma4-social-bias-judge-gguf:Q8_0-sft
```

### Python (transformers)

```python
# Identical usage to the DPO checkpoint — only the model_id changes.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "krishnakartik/gemma4-social-bias-judge-sft"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="cuda"
)
# ... see primary model card for the full inference snippet.
```

---

## When to choose this over the DPO checkpoint

| Use case | Recommended |
|---|---|
| Bias categories in BBQ's 10 trained set (age, disability, gender identity, nationality, physical appearance, race/ethnicity inc. intersectional, religion, sexual orientation, SES) | DPO (primary) |
| Bias categories outside the trained set (politics, ideology, novel demographic axes, intersectional categories not in training) | **This checkpoint (SFT)** |
| Tie-case detection (both responses clean) is critical | DPO — tie-κ jumps from −0.06 (SFT) to 0.36 (DPO) |
| Subtle bias discrimination on in-dist data | DPO — subtle-κ jumps from 0.74 (SFT) to 0.89 (DPO) |
| Tracked-vs-alternate (which specific stereotype is invoked) | This checkpoint (SFT-κ 0.20 vs DPO-κ 0.12) |
| Position-bias robustness on OOD | This checkpoint (SFT 11.7% vs DPO 16.7%) |

---

## Eval results (selected)

Same 300-pair holdout, same vLLM/bf16 backend as the [primary model
card's eval
table](https://huggingface.co/krishnakartik/gemma4-social-bias-judge#eval-results).

| Metric | Base | **SFT (this)** | DPO |
|---|---|---|---|
| Overall κ (in-dist) | 0.481 | 0.647 | 0.682 |
| **Overall κ (OOD religion)** | 0.542 | **0.695** | 0.643 |
| Tracked-vs-alternate κ | 0.145 | **0.197** | 0.119 |
| Subtle cases κ | 0.632 | 0.743 | 0.890 |
| Tie cases κ | 0.202 | −0.056 | 0.359 |
| Position-bias rate (OOD) | 21.7% | **11.7%** | 16.7% |
| Self-consistency (T=0.3) | 73.7% | 83.2% | 82.7% |

This checkpoint **wins on OOD κ, tracked-vs-alternate κ, and
OOD position-bias**. The DPO checkpoint wins on in-dist κ, subtle
cases, and tie cases — the metrics where the synth-hard-negatives
training shape was specifically designed to help.

The OOD-κ delta (+0.052 in this checkpoint's favor) is the load-bearing
reason this artifact exists. See the [primary model card's
OOD-regression
discussion](https://huggingface.co/krishnakartik/gemma4-social-bias-judge#%EF%B8%8F-the-ood-regression---read-this-before-deploying)
for the full analysis.

---

## Training summary

QLoRA SFT: 3,844 rows (1,938 base pairs × position-swap doubling), 3
epochs, 720 optimizer steps, r=16, α=32, dropout=0, all-linear LoRA
targets, lr=2e-4 cosine, peak VRAM 23.4 GB on A100-40GB. Final
`train_loss` 0.889, `mean_token_accuracy` 86.1%. Total Stage 6
spend: ~$4. Adapter merged to bf16 for Stage 8 eval and this
release.

The DPO step was applied to a copy of this checkpoint (not gated by
this checkpoint's existence), so the SFT artifact is the same one
that fed into DPO — it's a checkpoint snapshot of the pipeline,
unmodified.

---

## License & citation

Same as the [primary model
card](https://huggingface.co/krishnakartik/gemma4-social-bias-judge#citation).