Manga panel & text detector (YOLO26-nano)

A lightweight YOLO26-nano model fine-tuned on Manga109-s for detecting panels and text bubbles in manga pages. Designed for on-device Android inference via TFLite/LiteRT.

Performance

INT8 TFLite (2.71 MB)

Metric All Panel Text
mAP50 0.956 0.985 0.928
mAP50-95 0.846 0.953 0.740
Precision 0.954 0.966 0.935
Recall 0.912 0.956 0.877

Quantization impact (FP32 vs INT8)

Metric FP32 INT8 Delta
mAP50 0.9569 0.9561 -0.0008
mAP50-95 0.8464 0.8458 -0.0006
Precision 0.9507 0.9535 +0.0028
Recall 0.9167 0.9124 -0.0043

Training curves

Validation Metrics

Training Loss

Model details

  • Architecture: YOLO26-nano (2.57M parameters)
  • Input size: 640x640
  • Classes: 0: panel, 1: text
  • INT8 TFLite size: 2.71 MB
  • Inference speed: ~100-180ms CPU

Files

File Format Size Use case
manga_panel_detector_fp32.pt PyTorch FP32 ~15 MB Fine-tuning, FP16 export, further training
manga_panel_detector_int8.tflite TFLite INT8 2.71 MB Android/mobile deployment

Usage

Python (ultralytics)

from ultralytics import YOLO

model = YOLO("manga_panel_detector_fp32.pt")
results = model.predict("manga_page.jpg", conf=0.25)

for box in results[0].boxes:
    cls = int(box.cls)  # 0=panel, 1=text
    conf = float(box.conf)
    x1, y1, x2, y2 = box.xyxy[0].tolist()
    label = "panel" if cls == 0 else "text"
    print(f"{label} ({conf:.2f}): [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")

Android (TFLite / LiteRT)

Use manga_panel_detector_int8.tflite with the TensorFlow Lite interpreter or Google's LiteRT runtime.

  • Input: 640x640 RGB image normalized to [0, 1]
  • Output: Up to 300 detections, each with [x1, y1, x2, y2, confidence, class_id]
  • Recommended confidence threshold: 0.25

Training

Parameter Value
Base model YOLO26n pretrained on COCO
Dataset Manga109-s (87 manga titles, ~18k pages, ~32k annotations)
Classes panel (frame), text (speech/dialog)
Epochs 66 (early stopping, patience=20)
Best epoch 48 (mAP50 = 0.957)
Image size 640x640
Batch size 16
GPU NVIDIA T4 (Google Colab)
Training time ~4 hours
Augmentation No hue/saturation shift (grayscale manga), no horizontal flip (RTL reading order), reduced mosaic (0.5), no rotation (axis-aligned panels)

Model citation

If you use this model, please cite:

@misc{leoxs22_manga_panel_detector_2026,
    author={Leandro Narosky},
    title={{Manga Panel and Text Detector (YOLO26-nano)}},
    year={2026},
    publisher={Hugging Face},
    url={https://huggingface.co/leoxs22/manga-panel-detector-yolo26n}
}

Dataset citation

This model was trained on Manga109-s. Please cite:

@article{multimedia_aizawa_2020,
    author={Kiyoharu Aizawa and Azuma Fujimoto and Atsushi Otsubo and Toru Ogawa and Yusuke Matsui and Koki Tsubota and Hikaru Ikuta},
    title={Building a Manga Dataset ``Manga109'' with Annotations for Multimedia Applications},
    journal={IEEE MultiMedia},
    volume={27},
    number={2},
    pages={8--18},
    doi={10.1109/mmul.2020.2987895},
    year={2020}
}

@article{mtap_matsui_2017,
    author={Yusuke Matsui and Kota Ito and Yuji Aramaki and Azuma Fujimoto and Toru Ogawa and Toshihiko Yamasaki and Kiyoharu Aizawa},
    title={Sketch-based Manga Retrieval using Manga109 Dataset},
    journal={Multimedia Tools and Applications},
    volume={76},
    number={20},
    pages={21811--21838},
    doi={10.1007/s11042-016-4020-z},
    year={2017}
}

License

This model is released under the Apache 2.0 License.

The training data (Manga109-s) has its own license terms. Per condition 5 of the Manga109-s license, results obtained from machine learning experiments (including pre-trained models) may be used for commercial purposes, provided that the use of the dataset is clearly indicated.

Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train leoxs22/manga-panel-detector-yolo26n