hal-utokyo/Manga109-s
Updated • 146 • 32
A lightweight YOLO26-nano model fine-tuned on Manga109-s for detecting panels and text bubbles in manga pages. Designed for on-device Android inference via TFLite/LiteRT.
| Metric | All | Panel | Text |
|---|---|---|---|
| mAP50 | 0.956 | 0.985 | 0.928 |
| mAP50-95 | 0.846 | 0.953 | 0.740 |
| Precision | 0.954 | 0.966 | 0.935 |
| Recall | 0.912 | 0.956 | 0.877 |
| Metric | FP32 | INT8 | Delta |
|---|---|---|---|
| mAP50 | 0.9569 | 0.9561 | -0.0008 |
| mAP50-95 | 0.8464 | 0.8458 | -0.0006 |
| Precision | 0.9507 | 0.9535 | +0.0028 |
| Recall | 0.9167 | 0.9124 | -0.0043 |
0: panel, 1: text| File | Format | Size | Use case |
|---|---|---|---|
manga_panel_detector_fp32.pt |
PyTorch FP32 | ~15 MB | Fine-tuning, FP16 export, further training |
manga_panel_detector_int8.tflite |
TFLite INT8 | 2.71 MB | Android/mobile deployment |
from ultralytics import YOLO
model = YOLO("manga_panel_detector_fp32.pt")
results = model.predict("manga_page.jpg", conf=0.25)
for box in results[0].boxes:
cls = int(box.cls) # 0=panel, 1=text
conf = float(box.conf)
x1, y1, x2, y2 = box.xyxy[0].tolist()
label = "panel" if cls == 0 else "text"
print(f"{label} ({conf:.2f}): [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")
Use manga_panel_detector_int8.tflite with the TensorFlow Lite interpreter or Google's LiteRT runtime.
| Parameter | Value |
|---|---|
| Base model | YOLO26n pretrained on COCO |
| Dataset | Manga109-s (87 manga titles, ~18k pages, ~32k annotations) |
| Classes | panel (frame), text (speech/dialog) |
| Epochs | 66 (early stopping, patience=20) |
| Best epoch | 48 (mAP50 = 0.957) |
| Image size | 640x640 |
| Batch size | 16 |
| GPU | NVIDIA T4 (Google Colab) |
| Training time | ~4 hours |
| Augmentation | No hue/saturation shift (grayscale manga), no horizontal flip (RTL reading order), reduced mosaic (0.5), no rotation (axis-aligned panels) |
If you use this model, please cite:
@misc{leoxs22_manga_panel_detector_2026,
author={Leandro Narosky},
title={{Manga Panel and Text Detector (YOLO26-nano)}},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/leoxs22/manga-panel-detector-yolo26n}
}
This model was trained on Manga109-s. Please cite:
@article{multimedia_aizawa_2020,
author={Kiyoharu Aizawa and Azuma Fujimoto and Atsushi Otsubo and Toru Ogawa and Yusuke Matsui and Koki Tsubota and Hikaru Ikuta},
title={Building a Manga Dataset ``Manga109'' with Annotations for Multimedia Applications},
journal={IEEE MultiMedia},
volume={27},
number={2},
pages={8--18},
doi={10.1109/mmul.2020.2987895},
year={2020}
}
@article{mtap_matsui_2017,
author={Yusuke Matsui and Kota Ito and Yuji Aramaki and Azuma Fujimoto and Toru Ogawa and Toshihiko Yamasaki and Kiyoharu Aizawa},
title={Sketch-based Manga Retrieval using Manga109 Dataset},
journal={Multimedia Tools and Applications},
volume={76},
number={20},
pages={21811--21838},
doi={10.1007/s11042-016-4020-z},
year={2017}
}
This model is released under the Apache 2.0 License.
The training data (Manga109-s) has its own license terms. Per condition 5 of the Manga109-s license, results obtained from machine learning experiments (including pre-trained models) may be used for commercial purposes, provided that the use of the dataset is clearly indicated.