Manga panel & text detector (YOLO26-nano)

A lightweight YOLO26-nano model fine-tuned on Manga109-s for detecting panels and text bubbles in manga pages. Designed for on-device Android inference via TFLite/LiteRT.

Performance

INT8 TFLite (2.71 MB)

Metric	All	Panel	Text
mAP50	0.956	0.985	0.928
mAP50-95	0.846	0.953	0.740
Precision	0.954	0.966	0.935
Recall	0.912	0.956	0.877

Quantization impact (FP32 vs INT8)

Metric	FP32	INT8	Delta
mAP50	0.9569	0.9561	-0.0008
mAP50-95	0.8464	0.8458	-0.0006
Precision	0.9507	0.9535	+0.0028
Recall	0.9167	0.9124	-0.0043

Training curves

Model details

Architecture: YOLO26-nano (2.57M parameters)
Input size: 640x640
Classes: 0: panel, 1: text
INT8 TFLite size: 2.71 MB
Inference speed: ~100-180ms CPU

Files

File	Format	Size	Use case
`manga_panel_detector_fp32.pt`	PyTorch FP32	~15 MB	Fine-tuning, FP16 export, further training
`manga_panel_detector_int8.tflite`	TFLite INT8	2.71 MB	Android/mobile deployment

Usage

Python (ultralytics)

from ultralytics import YOLO

model = YOLO("manga_panel_detector_fp32.pt")
results = model.predict("manga_page.jpg", conf=0.25)

for box in results[0].boxes:
    cls = int(box.cls)  # 0=panel, 1=text
    conf = float(box.conf)
    x1, y1, x2, y2 = box.xyxy[0].tolist()
    label = "panel" if cls == 0 else "text"
    print(f"{label} ({conf:.2f}): [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")

Android (TFLite / LiteRT)

Use manga_panel_detector_int8.tflite with the TensorFlow Lite interpreter or Google's LiteRT runtime.

Input: 640x640 RGB image normalized to [0, 1]
Output: Up to 300 detections, each with [x1, y1, x2, y2, confidence, class_id]
Recommended confidence threshold: 0.25

Training

Parameter	Value
Base model	YOLO26n pretrained on COCO
Dataset	Manga109-s (87 manga titles, ~18k pages, ~32k annotations)
Classes	panel (frame), text (speech/dialog)
Epochs	66 (early stopping, patience=20)
Best epoch	48 (mAP50 = 0.957)
Image size	640x640
Batch size	16
GPU	NVIDIA T4 (Google Colab)
Training time	~4 hours
Augmentation	No hue/saturation shift (grayscale manga), no horizontal flip (RTL reading order), reduced mosaic (0.5), no rotation (axis-aligned panels)

Model citation

If you use this model, please cite:

@misc{leoxs22_manga_panel_detector_2026,
    author={Leandro Narosky},
    title={{Manga Panel and Text Detector (YOLO26-nano)}},
    year={2026},
    publisher={Hugging Face},
    url={https://huggingface.co/leoxs22/manga-panel-detector-yolo26n}
}

Dataset citation

This model was trained on Manga109-s. Please cite:

@article{multimedia_aizawa_2020,
    author={Kiyoharu Aizawa and Azuma Fujimoto and Atsushi Otsubo and Toru Ogawa and Yusuke Matsui and Koki Tsubota and Hikaru Ikuta},
    title={Building a Manga Dataset ``Manga109'' with Annotations for Multimedia Applications},
    journal={IEEE MultiMedia},
    volume={27},
    number={2},
    pages={8--18},
    doi={10.1109/mmul.2020.2987895},
    year={2020}
}

@article{mtap_matsui_2017,
    author={Yusuke Matsui and Kota Ito and Yuji Aramaki and Azuma Fujimoto and Toru Ogawa and Toshihiko Yamasaki and Kiyoharu Aizawa},
    title={Sketch-based Manga Retrieval using Manga109 Dataset},
    journal={Multimedia Tools and Applications},
    volume={76},
    number={20},
    pages={21811--21838},
    doi={10.1007/s11042-016-4020-z},
    year={2017}
}

License

This model is released under the Apache 2.0 License.

The training data (Manga109-s) has its own license terms. Per condition 5 of the Manga109-s license, results obtained from machine learning experiments (including pre-trained models) may be used for commercial purposes, provided that the use of the dataset is clearly indicated.

Downloads last month: 67

leoxs22
/

manga-panel-detector-yolo26n