Instructions to use deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop", trust_remote_code=True) pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop
Model Details
- Model Type: Image Classification
- Model Stats:
- Params: 37.3M
- FLOPs / MACs: 44.3G / 22.1G
- Image size: train = 384 x 384, test = 384 x 384
- Dataset: deepghs-cv/bangumibase-face-quality-cls
- Classes:
poor,ok,good,excellent
- Classes:
Results
Robustness Analysis
This is the recommended model. Trained with head-aware random crop augmentation so the classifier does not exploit head-to-frame ratio as a shortcut for the face-quality label.
Why we measure this. Class labels in bangumibase-face-quality-cls are derived from face_geo_mean = sqrt(face_w · face_h) in the original frame's pixel coordinates. A baseline classifier can reach high accuracy by reading how much of the frame the head occupies instead of the face's intrinsic drawing quality. We trained this model with random crops that always fully contain the head, so every label has been seen at many different head-area fractions during training.
Test protocol. Same as the baseline counterpart: each of 8,800 test images is independently random-cropped to fully contain the head bbox; per-side margin uniform in [0, max_margin × head_dim]. Same seed across all 7 settings, paired with the baseline run.
| crop setting | margin × head | macro_f1 | accuracy | Δ macro_f1 vs full |
|---|---|---|---|---|
full |
— | 0.9204 | 0.9205 | — |
loose_4x |
4.0 | 0.9280 | 0.9282 | +0.008 |
loose_2x |
2.0 | 0.9296 | 0.9298 | +0.009 |
loose_1x |
1.0 | 0.9356 | 0.9358 | +0.015 |
loose_0.5x |
0.5 | 0.9350 | 0.9353 | +0.015 |
loose_0.25x |
0.25 | 0.9348 | 0.9351 | +0.014 |
head_only |
0.0 | 0.9328 | 0.9332 | +0.012 |
macro_f1 is flat across all 7 settings and even slightly higher under tight crops — head-area fraction is no longer a signal the model relies on. Per-class F1 on head_only stays above 0.89 for all four classes (poor 0.947, ok 0.899, good 0.920, excellent 0.966) — the model identifies face quality from face content, not from framing.
Comparison with the framing-only baseline. Paired comparison against cls-bangumibase-face-quality.caformer_s36 on the same test rows and same crops:
| setting | baseline macro_f1 | this model | Δ |
|---|---|---|---|
full (in-distribution) |
0.9197 | 0.9204 | +0.001 |
head_only (framing removed) |
0.2492 | 0.9328 | +0.684 |
This model does not pay an in-distribution accuracy cost (Δ ≈ 0 on full) while gaining +68 pp macro_f1 robustness on head-only crops. Recommended for any downstream task where input framing may differ from the BangumiBase distribution.
Metrics
| # | Acc / Top-2 | Macro (F1/P/R/AUC) | Micro (F1/P/R/AUC) |
|---|---|---|---|
| Validation | 92.12% / 99.81% | 0.921 / 0.921 / 0.921 / 0.990 | 0.921 / 0.921 / 0.921 / 0.991 |
| Test | 92.05% / 99.93% | 0.920 / 0.921 / 0.920 / 0.990 | 0.920 / 0.920 / 0.920 / 0.992 |
Plots
How to Use
We provided a sample image for our code samples, you can find it here.
Use Transformers And Torch
Install dghs-imgutils, timm and other necessary requirements with the following command
pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow 'transformers>=4.57.6'
After that you can load this model with timm library, and use it for train, validation and test, with the following code
import torch
from imgutils.data import load_image
from transformers import AutoImageProcessor, AutoModel
processor = AutoImageProcessor.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop', trust_remote_code=True)
model = AutoModel.from_pretrained('deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop', trust_remote_code=True, use_infer_head=True)
model.eval()
image = load_image('https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop/resolve/main/sample.webp', mode='RGB', force_background='white')
input_ = processor(image)['pixel_values']
# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
classes = model.config.classes
# ['poor', 'ok', 'good', 'excellent']
with torch.no_grad():
output = model(input_)
# output, shape: torch.Size([1, 4]), dtype: torch.float32
print(dict(zip(classes, output[0].tolist())))
# {'poor': 1.061463763107895e-06,
# 'ok': 0.0002213950065197423,
# 'good': 0.9341738224029541,
# 'excellent': 0.06560368835926056}
Citation
@misc{cls_bangumibase_face_quality_caformer_s36_head_crop,
title = {Anime Classifier deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop},
author = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
year = {2026},
howpublished = {\url{https://huggingface.co/deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop}},
note = {A anime-style image classification model for classification task with 4 classes (poor, ok, good, excellent), trained on anime dataset bangumibase-face-quality (\url{https://huggingface.co/datasets/deepghs-cv/bangumibase-face-quality-cls}). Model parameters: 37.3M, FLOPs: 44.3G, input resolution: 384×384.},
license = {mit}
}
- Downloads last month
- 85
Model tree for deepghs-cv/cls-bangumibase-face-quality.caformer_s36.head_crop
Base model
timm/caformer_s36.sail_in22k_ft_in1k_384





