Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36

Model Details

Model Type: Image Classification
Model Stats:
- Params: 37.3M
- FLOPs / MACs: 44.3G / 22.1G
- Image size: train = 384 x 384, test = 384 x 384
Dataset: deepghs-cv/bangumibase-face-quality-cls
- Classes: poor, ok, good, excellent

Results

Robustness Analysis

This model is published mainly as a reference baseline for the framing-shortcut analysis below. For production use, see the .head_crop variant which corrects the issue documented here.

Why we measure this. Class labels in bangumibase-face-quality-cls are derived from face_geo_mean = sqrt(face_w · face_h) in the original frame's pixel coordinates. By construction, poor rows tend to be wide shots and excellent rows tend to be close-ups. A classifier can therefore reach high accuracy by reading how much of the frame the head occupies rather than the face's intrinsic drawing quality.

Test protocol. Each of the 8,800 test images is independently random-cropped so that the head bbox is fully contained, but per-side margin is uniform in [0, max_margin × head_dim]. full keeps the original frame; head_only tightens the crop down to exactly the head bbox. Same seed across all 7 settings, paired with the .head_crop variant for an apples-to-apples comparison.

crop setting	margin × head	macro_f1	accuracy	Δ macro_f1 vs `full`
`full`	—	0.9197	0.9199	—
`loose_4x`	4.0	0.6258	0.6514	−0.294
`loose_2x`	2.0	0.5328	0.5881	−0.387
`loose_1x`	1.0	0.4403	0.5216	−0.479
`loose_0.5x`	0.5	0.3706	0.4713	−0.549
`loose_0.25x`	0.25	0.3200	0.4305	−0.600
`head_only`	0.0	0.2492	0.3585	−0.671

4-class chance line is 0.25; this model collapses to chance when the frame context is removed. Per-class F1 on head_only: poor 0.052, ok 0.013, good 0.298, excellent 0.634 — poor and ok are identified almost entirely from head-area fraction, not from face content.

What this means. This model is a strong in-distribution classifier (test macro_f1 = 0.920) but a fragile one: any preprocessing that changes the head-to-frame ratio (tighter framing, cropping, padding) will degrade predictions sharply. Recommended alternative: cls-bangumibase-face-quality.caformer_s36.head_crop, trained with head-aware random crops, holds macro_f1 ≈ 0.93 across all 7 crop settings with no loss of in-distribution accuracy.

Metrics

#	Acc / Top-2	Macro (F1/P/R/AUC)	Micro (F1/P/R/AUC)
Validation	93.07% / 99.81%	0.931 / 0.931 / 0.931 / 0.992	0.931 / 0.931 / 0.931 / 0.993
Test	91.99% / 99.92%	0.920 / 0.920 / 0.920 / 0.991	0.920 / 0.920 / 0.920 / 0.993

Plots

#	Confusion	P/R	F1
Validation
Test

How to Use

We provided a sample image for our code samples, you can find it here.

Use Transformers And Torch

Install dghs-imgutils, timm and other necessary requirements with the following command

pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow 'transformers>=4.57.6'

After that you can load this model with timm library, and use it for train, validation and test, with the following code

import torch
from imgutils.data import load_image
from transformers import AutoImageProcessor, AutoModel

processor = AutoImageProcessor.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36', trust_remote_code=True)
model = AutoModel.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36', trust_remote_code=True, use_infer_head=True)
model.eval()

image = load_image('https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36/resolve/main/sample.webp', mode='RGB', force_background='white')
input_ = processor(image)['pixel_values']
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
classes = model.config.classes
# ['poor', 'ok', 'good', 'excellent']

with torch.no_grad():
    output = model(input_)
# output, shape: torch.Size([1, 4]), dtype: torch.float32

print(dict(zip(classes, output[0].tolist())))
# {'poor': 3.6020756510879437e-07,
#  'ok': 4.579680899041705e-05,
#  'good': 0.9659607410430908,
#  'excellent': 0.03399312123656273}

Citation

@misc{cls_bangumibase_face_quality_caformer_s36,
  title        = {Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36},
  author       = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36}},
  note         = {A anime-style image classification model for classification task with 4 classes (poor, ok, good, excellent), trained on anime dataset bangumibase-face-quality (\url{https://huggingface.co/datasets/deepghs-cv/bangumibase-face-quality-cls}). Model parameters: 37.3M, FLOPs: 44.3G, input resolution: 384×384.},
  license      = {mit}
}

Downloads last month: 101

Safetensors

Model size

37.3M params

Tensor type

F32

Model tree for deepghs-cv/cls-bangumibase-face-quality.caformer_s36

Base model

timm/caformer_s36.sail_in22k_ft_in1k_384

Quantized

(3)

this model

deepghs-cv
/

cls-bangumibase-face-quality.caformer_s36