YOLOv4-Leaky-416 INT8 (ONNX, MIT)

Post-training INT8 quantization of YOLOv4-Leaky-416 (Bochkovskiy et al., 2020), exported to ONNX QOperator format. Calibrated on 1,000 COCO val2017 images.

Files

File	Size	SHA-256
`yolov4-leaky-416_float.onnx`	257,388,314 B	`d7277fc1c6522cb063999d2d72058fb15de6f15900c66d0093d535df0bcf200f`
`yolov4-leaky-416_int8_qop.onnx`	64,655,943 B	`ca31b2c53227518f1e29cb50e59294e758b69de26f33e374f1e65c922d338da4`

Architecture


Layers	110 Conv2D, 23 Shortcut, multiple Route, 3 YOLO heads
Backbone	CSPDarknet53 with Leaky ReLU (α = 0.1)
Activation	LeakyReLU on 107/110 convs; remaining 3 are linear (pre-head)
Input	1×3×416×416, RGB, [0, 1], NCHW, letterbox-padded with 114
Output	3 raw conv tensors at strides 8, 16, 32 (decoder external)
Anchors	(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326)
Quantization	Per-tensor INT8 (W symmetric, A asymmetric); bias INT32

Performance

Metric	FP32	INT8	Reference (AlexeyAB)
AP @ IoU=0.5:0.95	0.4428	0.3449	0.407
AP @ IoU=0.5	0.6863	0.6662	0.627
AP_small	0.234	0.183	—
AP_medium	0.500	0.386	—
AP_large	0.620	0.492	—
Size	245.46 MiB	61.66 MiB	—

The INT8 model preserves AP@0.5 well (-2.0 mAP) while showing a larger drop at the stricter AP@0.5:0.95 metric (-9.8 mAP). This is consistent with the deliberate use of per-tensor symmetric weights / asymmetric activations and the QOperator format (no QDQ wrap), which is the hardware-friendly choice targeting an INT8 FPGA DPU. Per-channel quantization or QDQ format would typically recover 2-4 AP points at the cost of more complex datapath.

Evaluation protocol


Dataset	MS COCO val2017 (5,000 images, 36,781 annotated objects, 80 classes)
Annotations	`instances_val2017.json` from `annotations_trainval2017.zip` (CC BY 4.0)
Tool	`pycocotools.cocoeval.COCOeval` (bbox IoU type)
Score threshold	0.001 (low to populate the PR curve correctly)
NMS	greedy, per-class, IoU threshold 0.45
Detections per image	top-100 (matches `params.maxDets[2]`)
Image preprocessing	letterbox to 416×416, padding value 114, RGB, [0, 1], NCHW

The +3.6 AP delta vs the AlexeyAB darknet reference is the well-known gap between darknet's internal mAP routine (more conservative) and pycocotools with proper letterbox preservation. Tianxiaomo/pytorch-YOLOv4 reports 0.471 on the same weights using a similar PyTorch+pycocotools setup.

Calibration protocol (for the INT8 model)


Dataset	MS COCO val2017 (1,000 images sampled)
Sampling	uniform random with `random.Random(42).sample(...)` (deterministic)
Preprocessing	identical to evaluation (letterbox 416, padding 114, RGB, /255, NCHW)
Quantizer	`onnxruntime.quantization.quantize_static` (MIT)

Visual comparison (FP32 vs INT8)

Side-by-side detections on COCO val2017 / classic darknet test images. Left: FP32 ONNX. Right: INT8 ONNX (same input, same Python decoder).

Reproducibility

python quantize_float_to_int8.py
python inference.py --onnx yolov4-leaky-416_int8_qop.onnx

The quantization script produces a bit-similar INT8 model from yolov4-leaky-416_float.onnx. Differences in calibration sampling order may shift activation scales by a few LSBs.

Provenance

AlexeyAB/darknet  yolov4-leaky-416.weights      public domain (YOLO License v2)
        │
        │  parse_config + load_weights from gwinndr/YOLOv4-Pytorch (MIT, used as tool)
        │  + DarknetRaw wrapper to capture pre-YoloLayer outputs
        ▼
yolov4-leaky-416_float.onnx                     MIT (this repository)
        │
        │  onnxruntime.quantize_static (MIT, used as tool)
        │  + COCO val2017 calibration (CC BY 4.0, 1,000 images)
        ▼
yolov4-leaky-416_int8_qop.onnx                  MIT (this repository)

No Vitis-AI nor Apache-2.0 components are bundled. Tools (PyTorch, ONNX Runtime, gwinndr) are used to produce the artifacts but not redistributed. See NOTICE.md for full attribution.

Citation

@article{bochkovskiy2020yolov4,
  author  = {Bochkovskiy, Alexey and Wang, Chien-Yao and Liao, Hong-Yuan Mark},
  title   = {YOLOv4: Optimal Speed and Accuracy of Object Detection},
  journal = {arXiv:2004.10934},
  year    = {2020}
}

Author of the INT8 derivative: Pablo Mendoza (@thefalley), 2026.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for Thefalley/yolov4-leaky-416-int8-qop

YOLOv4: Optimal Speed and Accuracy of Object Detection

Paper • 2004.10934 • Published Apr 23, 2020