Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36

Model Details

  • Model Type: Image Classification
  • Model Stats:
    • Params: 37.3M
    • FLOPs / MACs: 44.3G / 22.1G
    • Image size: train = 384 x 384, test = 384 x 384
  • Dataset: deepghs-cv/bangumibase-face-quality-cls
    • Classes: poor, ok, good, excellent

Results

Robustness Analysis

This model is published mainly as a reference baseline for the framing-shortcut analysis below. For production use, see the .head_crop variant which corrects the issue documented here.

Why we measure this. Class labels in bangumibase-face-quality-cls are derived from face_geo_mean = sqrt(face_w · face_h) in the original frame's pixel coordinates. By construction, poor rows tend to be wide shots and excellent rows tend to be close-ups. A classifier can therefore reach high accuracy by reading how much of the frame the head occupies rather than the face's intrinsic drawing quality.

Test protocol. Each of the 8,800 test images is independently random-cropped so that the head bbox is fully contained, but per-side margin is uniform in [0, max_margin × head_dim]. full keeps the original frame; head_only tightens the crop down to exactly the head bbox. Same seed across all 7 settings, paired with the .head_crop variant for an apples-to-apples comparison.

baseline crop robustness

crop setting margin × head macro_f1 accuracy Δ macro_f1 vs full
full — 0.9197 0.9199 —
loose_4x 4.0 0.6258 0.6514 −0.294
loose_2x 2.0 0.5328 0.5881 −0.387
loose_1x 1.0 0.4403 0.5216 −0.479
loose_0.5x 0.5 0.3706 0.4713 −0.549
loose_0.25x 0.25 0.3200 0.4305 −0.600
head_only 0.0 0.2492 0.3585 −0.671

4-class chance line is 0.25; this model collapses to chance when the frame context is removed. Per-class F1 on head_only: poor 0.052, ok 0.013, good 0.298, excellent 0.634 — poor and ok are identified almost entirely from head-area fraction, not from face content.

What this means. This model is a strong in-distribution classifier (test macro_f1 = 0.920) but a fragile one: any preprocessing that changes the head-to-frame ratio (tighter framing, cropping, padding) will degrade predictions sharply. Recommended alternative: cls-bangumibase-face-quality.caformer_s36.head_crop, trained with head-aware random crops, holds macro_f1 ≈ 0.93 across all 7 crop settings with no loss of in-distribution accuracy.

Metrics

# Acc / Top-2 Macro (F1/P/R/AUC) Micro (F1/P/R/AUC)
Validation 93.07% / 99.81% 0.931 / 0.931 / 0.931 / 0.992 0.931 / 0.931 / 0.931 / 0.993
Test 91.99% / 99.92% 0.920 / 0.920 / 0.920 / 0.991 0.920 / 0.920 / 0.920 / 0.993

Plots

# Confusion P/R F1
Validation
Test

How to Use

We provided a sample image for our code samples, you can find it here.

Use Transformers And Torch

Install dghs-imgutils, timm and other necessary requirements with the following command

pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow 'transformers>=4.57.6'

After that you can load this model with timm library, and use it for train, validation and test, with the following code

import torch
from imgutils.data import load_image
from transformers import AutoImageProcessor, AutoModel

processor = AutoImageProcessor.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36', trust_remote_code=True)
model = AutoModel.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36', trust_remote_code=True, use_infer_head=True)
model.eval()

image = load_image('https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36/resolve/main/sample.webp', mode='RGB', force_background='white')
input_ = processor(image)['pixel_values']
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
classes = model.config.classes
# ['poor', 'ok', 'good', 'excellent']

with torch.no_grad():
    output = model(input_)
# output, shape: torch.Size([1, 4]), dtype: torch.float32

print(dict(zip(classes, output[0].tolist())))
# {'poor': 3.6020756510879437e-07,
#  'ok': 4.579680899041705e-05,
#  'good': 0.9659607410430908,
#  'excellent': 0.03399312123656273}

Citation

@misc{cls_bangumibase_face_quality_caformer_s36,
  title        = {Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36},
  author       = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36}},
  note         = {A anime-style image classification model for classification task with 4 classes (poor, ok, good, excellent), trained on anime dataset bangumibase-face-quality (\url{https://huggingface.co/datasets/deepghs-cv/bangumibase-face-quality-cls}). Model parameters: 37.3M, FLOPs: 44.3G, input resolution: 384×384.},
  license      = {mit}
}
Downloads last month
101
Safetensors
Model size
37.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepghs-cv/cls-bangumibase-face-quality.caformer_s36

Quantized
(3)
this model

Dataset used to train deepghs-cv/cls-bangumibase-face-quality.caformer_s36

Space using deepghs-cv/cls-bangumibase-face-quality.caformer_s36 1