Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop

Model Details

  • Model Type: Image Classification
  • Model Stats:
    • Params: 37.3M
    • FLOPs / MACs: 44.3G / 22.1G
    • Image size: train = 384 x 384, test = 384 x 384
  • Dataset: deepghs-cv/bangumibase-face-quality-cls
    • Classes: poor, ok, good, excellent

Results

Robustness Analysis

This is the recommended model. Trained with head-aware random crop augmentation so the classifier does not exploit head-to-frame ratio as a shortcut for the face-quality label.

Why we measure this. Class labels in bangumibase-face-quality-cls are derived from face_geo_mean = sqrt(face_w · face_h) in the original frame's pixel coordinates. A baseline classifier can reach high accuracy by reading how much of the frame the head occupies instead of the face's intrinsic drawing quality. We trained this model with random crops that always fully contain the head, so every label has been seen at many different head-area fractions during training.

Test protocol. Same as the baseline counterpart: each of 8,800 test images is independently random-cropped to fully contain the head bbox; per-side margin uniform in [0, max_margin × head_dim]. Same seed across all 7 settings, paired with the baseline run.

head_crop crop robustness

crop setting margin × head macro_f1 accuracy Δ macro_f1 vs full
full — 0.9204 0.9205 —
loose_4x 4.0 0.9280 0.9282 +0.008
loose_2x 2.0 0.9296 0.9298 +0.009
loose_1x 1.0 0.9356 0.9358 +0.015
loose_0.5x 0.5 0.9350 0.9353 +0.015
loose_0.25x 0.25 0.9348 0.9351 +0.014
head_only 0.0 0.9328 0.9332 +0.012

macro_f1 is flat across all 7 settings and even slightly higher under tight crops — head-area fraction is no longer a signal the model relies on. Per-class F1 on head_only stays above 0.89 for all four classes (poor 0.947, ok 0.899, good 0.920, excellent 0.966) — the model identifies face quality from face content, not from framing.

Comparison with the framing-only baseline. Paired comparison against cls-bangumibase-face-quality.caformer_s36 on the same test rows and same crops:

setting baseline macro_f1 this model Δ
full (in-distribution) 0.9197 0.9204 +0.001
head_only (framing removed) 0.2492 0.9328 +0.684

This model does not pay an in-distribution accuracy cost (Δ ≈ 0 on full) while gaining +68 pp macro_f1 robustness on head-only crops. Recommended for any downstream task where input framing may differ from the BangumiBase distribution.

Metrics

# Acc / Top-2 Macro (F1/P/R/AUC) Micro (F1/P/R/AUC)
Validation 92.12% / 99.81% 0.921 / 0.921 / 0.921 / 0.990 0.921 / 0.921 / 0.921 / 0.991
Test 92.05% / 99.93% 0.920 / 0.921 / 0.920 / 0.990 0.920 / 0.920 / 0.920 / 0.992

Plots

# Confusion P/R F1
Validation
Test

How to Use

We provided a sample image for our code samples, you can find it here.

Use Transformers And Torch

Install dghs-imgutils, timm and other necessary requirements with the following command

pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow 'transformers>=4.57.6'

After that you can load this model with timm library, and use it for train, validation and test, with the following code

import torch
from imgutils.data import load_image
from transformers import AutoImageProcessor, AutoModel

processor = AutoImageProcessor.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop', trust_remote_code=True)
model = AutoModel.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop', trust_remote_code=True, use_infer_head=True)
model.eval()

image = load_image('https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop/resolve/main/sample.webp', mode='RGB', force_background='white')
input_ = processor(image)['pixel_values']
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
classes = model.config.classes
# ['poor', 'ok', 'good', 'excellent']

with torch.no_grad():
    output = model(input_)
# output, shape: torch.Size([1, 4]), dtype: torch.float32

print(dict(zip(classes, output[0].tolist())))
# {'poor': 1.061463763107895e-06,
#  'ok': 0.0002213950065197423,
#  'good': 0.9341738224029541,
#  'excellent': 0.06560368835926056}

Citation

@misc{cls_bangumibase_face_quality_caformer_s36_head_crop,
  title        = {Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop},
  author       = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop}},
  note         = {A anime-style image classification model for classification task with 4 classes (poor, ok, good, excellent), trained on anime dataset bangumibase-face-quality (\url{https://huggingface.co/datasets/deepghs-cv/bangumibase-face-quality-cls}). Model parameters: 37.3M, FLOPs: 44.3G, input resolution: 384×384.},
  license      = {mit}
}
Downloads last month
85
Safetensors
Model size
37.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop

Quantized
(3)
this model

Dataset used to train deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop

Space using deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop 1