Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop

Model Details

Model Type: Image Classification
Model Stats:
- Params: 37.3M
- FLOPs / MACs: 44.3G / 22.1G
- Image size: train = 384 x 384, test = 384 x 384
Dataset: deepghs-cv/bangumibase-face-quality-cls
- Classes: poor, ok, good, excellent

Results

Robustness Analysis

This is the recommended model. Trained with head-aware random crop augmentation so the classifier does not exploit head-to-frame ratio as a shortcut for the face-quality label.

Why we measure this. Class labels in bangumibase-face-quality-cls are derived from face_geo_mean = sqrt(face_w · face_h) in the original frame's pixel coordinates. A baseline classifier can reach high accuracy by reading how much of the frame the head occupies instead of the face's intrinsic drawing quality. We trained this model with random crops that always fully contain the head, so every label has been seen at many different head-area fractions during training.

Test protocol. Same as the baseline counterpart: each of 8,800 test images is independently random-cropped to fully contain the head bbox; per-side margin uniform in [0, max_margin × head_dim]. Same seed across all 7 settings, paired with the baseline run.

crop setting	margin × head	macro_f1	accuracy	Δ macro_f1 vs `full`
`full`	—	0.9204	0.9205	—
`loose_4x`	4.0	0.9280	0.9282	+0.008
`loose_2x`	2.0	0.9296	0.9298	+0.009
`loose_1x`	1.0	0.9356	0.9358	+0.015
`loose_0.5x`	0.5	0.9350	0.9353	+0.015
`loose_0.25x`	0.25	0.9348	0.9351	+0.014
`head_only`	0.0	0.9328	0.9332	+0.012

macro_f1 is flat across all 7 settings and even slightly higher under tight crops — head-area fraction is no longer a signal the model relies on. Per-class F1 on head_only stays above 0.89 for all four classes (poor 0.947, ok 0.899, good 0.920, excellent 0.966) — the model identifies face quality from face content, not from framing.

Comparison with the framing-only baseline. Paired comparison against cls-bangumibase-face-quality.caformer_s36 on the same test rows and same crops:

setting	baseline macro_f1	this model	Δ
`full` (in-distribution)	0.9197	0.9204	+0.001
`head_only` (framing removed)	0.2492	0.9328	+0.684

This model does not pay an in-distribution accuracy cost (Δ ≈ 0 on full) while gaining +68 pp macro_f1 robustness on head-only crops. Recommended for any downstream task where input framing may differ from the BangumiBase distribution.

Metrics

#	Acc / Top-2	Macro (F1/P/R/AUC)	Micro (F1/P/R/AUC)
Validation	92.12% / 99.81%	0.921 / 0.921 / 0.921 / 0.990	0.921 / 0.921 / 0.921 / 0.991
Test	92.05% / 99.93%	0.920 / 0.921 / 0.920 / 0.990	0.920 / 0.920 / 0.920 / 0.992

Plots

#	Confusion	P/R	F1
Validation
Test

How to Use

We provided a sample image for our code samples, you can find it here.

Use Transformers And Torch

Install dghs-imgutils, timm and other necessary requirements with the following command

pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow 'transformers>=4.57.6'

After that you can load this model with timm library, and use it for train, validation and test, with the following code

import torch
from imgutils.data import load_image
from transformers import AutoImageProcessor, AutoModel

processor = AutoImageProcessor.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop', trust_remote_code=True)
model = AutoModel.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop', trust_remote_code=True, use_infer_head=True)
model.eval()

image = load_image('https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop/resolve/main/sample.webp', mode='RGB', force_background='white')
input_ = processor(image)['pixel_values']
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
classes = model.config.classes
# ['poor', 'ok', 'good', 'excellent']

with torch.no_grad():
    output = model(input_)
# output, shape: torch.Size([1, 4]), dtype: torch.float32

print(dict(zip(classes, output[0].tolist())))
# {'poor': 1.061463763107895e-06,
#  'ok': 0.0002213950065197423,
#  'good': 0.9341738224029541,
#  'excellent': 0.06560368835926056}

Citation

@misc{cls_bangumibase_face_quality_caformer_s36_head_crop,
  title        = {Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop},
  author       = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop}},
  note         = {A anime-style image classification model for classification task with 4 classes (poor, ok, good, excellent), trained on anime dataset bangumibase-face-quality (\url{https://huggingface.co/datasets/deepghs-cv/bangumibase-face-quality-cls}). Model parameters: 37.3M, FLOPs: 44.3G, input resolution: 384×384.},
  license      = {mit}
}

Downloads last month: 85

Safetensors

Model size

37.3M params

Tensor type

F32

Model tree for deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop

Base model

timm/caformer_s36.sail_in22k_ft_in1k_384

Quantized

(3)

this model

deepghs-cv
/

cls-bangumibase-face-quality.caformer_s36.head_crop