Instructions to use deepghs-cv/cls-bangumibase-face-quality.caformer_s36 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepghs-cv/cls-bangumibase-face-quality.caformer_s36 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="deepghs-cv/cls-bangumibase-face-quality.caformer_s36", trust_remote_code=True) pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("deepghs-cv/cls-bangumibase-face-quality.caformer_s36", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36
Model Details
- Model Type: Image Classification
- Model Stats:
- Params: 37.3M
- FLOPs / MACs: 44.3G / 22.1G
- Image size: train = 384 x 384, test = 384 x 384
- Dataset: deepghs-cv/bangumibase-face-quality-cls
- Classes:
poor,ok,good,excellent
- Classes:
Results
Robustness Analysis
This model is published mainly as a reference baseline for the framing-shortcut analysis below. For production use, see the
.head_cropvariant which corrects the issue documented here.
Why we measure this. Class labels in bangumibase-face-quality-cls are derived from face_geo_mean = sqrt(face_w · face_h) in the original frame's pixel coordinates. By construction, poor rows tend to be wide shots and excellent rows tend to be close-ups. A classifier can therefore reach high accuracy by reading how much of the frame the head occupies rather than the face's intrinsic drawing quality.
Test protocol. Each of the 8,800 test images is independently random-cropped so that the head bbox is fully contained, but per-side margin is uniform in [0, max_margin × head_dim]. full keeps the original frame; head_only tightens the crop down to exactly the head bbox. Same seed across all 7 settings, paired with the .head_crop variant for an apples-to-apples comparison.
| crop setting | margin × head | macro_f1 | accuracy | Δ macro_f1 vs full |
|---|---|---|---|---|
full |
— | 0.9197 | 0.9199 | — |
loose_4x |
4.0 | 0.6258 | 0.6514 | −0.294 |
loose_2x |
2.0 | 0.5328 | 0.5881 | −0.387 |
loose_1x |
1.0 | 0.4403 | 0.5216 | −0.479 |
loose_0.5x |
0.5 | 0.3706 | 0.4713 | −0.549 |
loose_0.25x |
0.25 | 0.3200 | 0.4305 | −0.600 |
head_only |
0.0 | 0.2492 | 0.3585 | −0.671 |
4-class chance line is 0.25; this model collapses to chance when the frame context is removed. Per-class F1 on head_only: poor 0.052, ok 0.013, good 0.298, excellent 0.634 — poor and ok are identified almost entirely from head-area fraction, not from face content.
What this means. This model is a strong in-distribution classifier (test macro_f1 = 0.920) but a fragile one: any preprocessing that changes the head-to-frame ratio (tighter framing, cropping, padding) will degrade predictions sharply. Recommended alternative: cls-bangumibase-face-quality.caformer_s36.head_crop, trained with head-aware random crops, holds macro_f1 ≈ 0.93 across all 7 crop settings with no loss of in-distribution accuracy.
Metrics
| # | Acc / Top-2 | Macro (F1/P/R/AUC) | Micro (F1/P/R/AUC) |
|---|---|---|---|
| Validation | 93.07% / 99.81% | 0.931 / 0.931 / 0.931 / 0.992 | 0.931 / 0.931 / 0.931 / 0.993 |
| Test | 91.99% / 99.92% | 0.920 / 0.920 / 0.920 / 0.991 | 0.920 / 0.920 / 0.920 / 0.993 |
Plots
How to Use
We provided a sample image for our code samples, you can find it here.
Use Transformers And Torch
Install dghs-imgutils, timm and other necessary requirements with the following command
pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow 'transformers>=4.57.6'
After that you can load this model with timm library, and use it for train, validation and test, with the following code
import torch
from imgutils.data import load_image
from transformers import AutoImageProcessor, AutoModel
processor = AutoImageProcessor.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36', trust_remote_code=True)
model = AutoModel.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36', trust_remote_code=True, use_infer_head=True)
model.eval()
image = load_image('https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36/resolve/main/sample.webp', mode='RGB', force_background='white')
input_ = processor(image)['pixel_values']
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
classes = model.config.classes
# ['poor', 'ok', 'good', 'excellent']
with torch.no_grad():
output = model(input_)
# output, shape: torch.Size([1, 4]), dtype: torch.float32
print(dict(zip(classes, output[0].tolist())))
# {'poor': 3.6020756510879437e-07,
# 'ok': 4.579680899041705e-05,
# 'good': 0.9659607410430908,
# 'excellent': 0.03399312123656273}
Citation
@misc{cls_bangumibase_face_quality_caformer_s36,
title = {Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36},
author = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
year = {2026},
howpublished = {\url{https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36}},
note = {A anime-style image classification model for classification task with 4 classes (poor, ok, good, excellent), trained on anime dataset bangumibase-face-quality (\url{https://huggingface.co/datasets/deepghs-cv/bangumibase-face-quality-cls}). Model parameters: 37.3M, FLOPs: 44.3G, input resolution: 384×384.},
license = {mit}
}
- Downloads last month
- 101
Model tree for deepghs-cv/cls-bangumibase-face-quality.caformer_s36
Base model
timm/caformer_s36.sail_in22k_ft_in1k_384





