Shashank022002's picture
Update README with verified evaluation metrics and full DeepStream deployment guide
8e73d37 verified
|
Raw
History Blame Contribute Delete
12.8 kB
metadata
tags:
  - ultralytics
  - yolov8
  - object-detection
  - person-detection
  - beach-surveillance
  - aerial-view
  - deepstream
  - edge-deployment
  - nvidia
  - onnx
  - visdrone
library_name: ultralytics
license: agpl-3.0
datasets:
  - banu4prasad/VisDrone-Dataset
pipeline_tag: object-detection
base_model:
  - mshamrai/yolov8m-visdrone
  - mshamrai/yolov8n-visdrone
model-index:
  - name: YOLOv8m-VisDrone (person classes)
    results:
      - task:
          type: object-detection
        dataset:
          name: VisDrone2019-DET val
          type: banu4prasad/VisDrone-Dataset
        metrics:
          - type: recall
            value: 0.456
            name: Recall (pedestrian, conf=0.15)
          - type: precision
            value: 0.586
            name: Precision (pedestrian, conf=0.15)
          - type: map50
            value: 0.386
            name: mAP@50 (all 10 classes, conf=0.10)

πŸ–οΈ Beach Person Detector (YOLOv8)

Single-class person detection models for beach surveillance from a 30ft elevated camera, ready for NVIDIA DeepStream edge deployment.

🎯 Design Priorities

  1. HIGH RECALL β€” miss no persons. False detections are acceptable, missed detections are NOT.
  2. Aerial/elevated view β€” trained on VisDrone drone footage matching the 30ft camera perspective.
  3. Edge-ready β€” ONNX exports optimized for TensorRT/DeepStream on Jetson and dGPU.

βœ… Verified Evaluation Results

All numbers measured on VisDrone2019-DET val set (531 images, 13,969 person boxes).

YOLOv8n-person (single-class, fine-tuned)

conf Precision Recall mAP@50 mAP@50-95
0.05 0.535 0.378 0.355 0.111
0.10 0.535 0.378 0.349 0.110
0.15 0.535 0.378 0.341 0.108
0.25 0.535 0.378 0.320 0.103
0.50 0.727 0.255 0.215 0.075

YOLOv8m-VisDrone (10-class, person classes extracted)

Evaluated on full VisDrone val (548 images, 38,759 boxes across 10 classes).

conf P (pedestrian) R (pedestrian) P (people) R (people) mAP@50 (all)
0.10 0.603 0.447 0.621 0.331 0.386
0.15 0.586 0.456 0.610 0.339 0.368
0.25 0.680 0.408 0.693 0.290 0.337
0.50 0.883 0.278 0.849 0.162 0.263

Recommendation: Use the YOLOv8m-VisDrone model at conf=0.10–0.15 for maximum recall on person classes. The larger model gives +7 percentage points higher recall on pedestrians compared to YOLOv8n.

ONNX Runtime Verification

ONNX File Size Input Output onnx.checker Opset
yolov8n_person_640.onnx 12.2 MB 1Γ—3Γ—640Γ—640 1Γ—5Γ—8400 βœ… 12
yolov8m_visdrone_640.onnx 103.6 MB 1Γ—3Γ—640Γ—640 1Γ—14Γ—8400 βœ… 12
yolov8m_visdrone_1280.onnx 104.1 MB 1Γ—3Γ—1280Γ—1280 1Γ—14Γ—33600 βœ… 12

Model Variants

Model Params ONNX File Classes Best For ~FPS (Jetson Orin)
YOLOv8n-person 3.0M yolov8n_person_640.onnx 1 (person) Real-time edge ~200 FPS
YOLOv8m-VisDrone ⭐ 25.9M yolov8m_visdrone_640.onnx 10 (filter to person) Best accuracy ~60 FPS
YOLOv8m-VisDrone-1280 25.9M yolov8m_visdrone_1280.onnx 10 (filter to person) Maximum recall ~25 FPS

Quick Start

Python (Ultralytics)

from ultralytics import YOLO

# === Option A: Single-class person model (simpler) ===
model = YOLO("weights/yolov8n_person.pt")
results = model.predict("beach.jpg", conf=0.15, imgsz=640, max_det=300)

# === Option B: Full VisDrone model (higher recall β€” recommended) ===
model = YOLO("weights/yolov8m_visdrone.pt")
results = model.predict(
    "beach.jpg",
    conf=0.10,          # Low threshold for max recall
    imgsz=1280,         # Higher resolution for small persons
    max_det=300,
    classes=[0, 1],     # 0=pedestrian, 1=people
)

for r in results:
    print(f"Detected {len(r.boxes)} persons")
    r.save("result.jpg")

SAHI Inference (Maximum Recall for Large/Wide Images)

For wide-angle cameras at 30ft, SAHI sliced inference adds +12–14% mAP for small person detection:

# pip install sahi ultralytics
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

model = AutoDetectionModel.from_pretrained(
    model_type="ultralytics",
    model_path="weights/yolov8m_visdrone.pt",
    confidence_threshold=0.10,
    device="cuda:0",
)

result = get_sliced_prediction(
    "beach_wide_angle.jpg",
    model,
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.25,
    overlap_width_ratio=0.25,
    perform_standard_pred=True,
    postprocess_match_threshold=0.5,
)
# Filter to person classes only:
person_preds = [p for p in result.object_prediction_list if p.category.id in [0, 1]]
print(f"Detected {len(person_preds)} persons via SAHI")

πŸš€ Complete DeepStream Deployment Guide

Prerequisites

Step 1: Download Model Files

# Install huggingface CLI
pip install huggingface_hub

# Download the model you want
huggingface-cli download Shashank022002/beach-person-detector-yolov8m \
    onnx/yolov8m_visdrone_640.onnx \
    config/nvinfer_config.txt \
    config/labels.txt \
    --local-dir ./beach-person-detector

Step 2: Build TensorRT Engine

cd beach-person-detector

# For the recommended YOLOv8m model (best accuracy):
trtexec --onnx=onnx/yolov8m_visdrone_640.onnx \
        --saveEngine=yolov8m_visdrone_640.engine \
        --fp16 \
        --workspace=4096

# ⚠️ Use FP16, NOT INT8 β€” INT8 degrades recall on small objects

# For the lightweight YOLOv8n model (fastest):
trtexec --onnx=onnx/yolov8n_person_640.onnx \
        --saveEngine=yolov8n_person_640.engine \
        --fp16 \
        --workspace=4096

Step 3: Install DeepStream-Yolo Plugin

git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo

# Build the custom parser library
CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo
# Adjust CUDA_VER to match your system (check with nvcc --version)

Step 4: Create DeepStream Config Files

config_infer_primary.txt β€” nvinfer config for the person detector:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0

# ─── Choose ONE model ───
# Option A: Single-class person model (simplest)
#onnx-file=yolov8n_person_640.onnx
#model-engine-file=yolov8n_person_640.engine
#num-detected-classes=1
#labelfile-path=labels_person.txt

# Option B: Full VisDrone model (higher accuracy) ← RECOMMENDED
onnx-file=yolov8m_visdrone_640.onnx
model-engine-file=yolov8m_visdrone_640.engine
num-detected-classes=10
labelfile-path=labels_visdrone.txt

batch-size=1
network-mode=2                # FP16
interval=0                    # Process EVERY frame (critical for surveillance)
gie-unique-id=1
process-mode=1                # Primary detector
network-type=0                # Detector
cluster-mode=2                # NMS
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

# ─── HIGH RECALL SETTINGS ───
pre-cluster-threshold=0.10    # Very low β†’ catch all persons
post-cluster-threshold=0.10

[class-attrs-all]
pre-cluster-threshold=0.10
topk=300
nms-iou-threshold=0.5

# For Option B: suppress non-person classes
# Set threshold=1.0 for classes 2-9 to filter them out
[class-attrs-0]
pre-cluster-threshold=0.10

[class-attrs-1]
pre-cluster-threshold=0.10

[class-attrs-2]
pre-cluster-threshold=1.0

[class-attrs-3]
pre-cluster-threshold=1.0

[class-attrs-4]
pre-cluster-threshold=1.0

[class-attrs-5]
pre-cluster-threshold=1.0

[class-attrs-6]
pre-cluster-threshold=1.0

[class-attrs-7]
pre-cluster-threshold=1.0

[class-attrs-8]
pre-cluster-threshold=1.0

[class-attrs-9]
pre-cluster-threshold=1.0

labels_person.txt (for YOLOv8n single-class):

person

labels_visdrone.txt (for YOLOv8m 10-class):

pedestrian
people
bicycle
car
van
truck
tricycle
awning-tricycle
bus
motor

deepstream_app_config.txt β€” main DeepStream app config:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720

[source0]
enable=1
# For RTSP camera:
type=4
uri=rtsp://your_camera_ip:554/stream
# For USB camera:
#type=1
#camera-v4l2-dev-node=0
# For file:
#type=3
#uri=file:///path/to/beach_video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2           # EGL window
sync=0           # Async for max throughput
gpu-id=0

[osd]
enable=1
text-size=15
border-width=2
border-color=0;1;0;1

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
width=1280
height=720
enable-padding=0
live-source=1    # Set to 1 for RTSP/USB, 0 for file

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

Step 5: Run DeepStream

# Run the pipeline
deepstream-app -c deepstream_app_config.txt

# Or use Python bindings for custom logic:
python3 deepstream_python_app.py

Step 6: Optional β€” Add Tracker for Person Counting

# Add to deepstream_app_config.txt for tracking across frames:
[tracker]
enable=1
tracker-width=640
tracker-height=480
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml

βš™οΈ Confidence Threshold Guide

Scenario conf imgsz Model Notes
Max Recall (recommended) 0.10 1280 yolov8m Beach surveillance β€” miss nothing
Balanced 0.20 640 yolov8m Good recall + precision
Edge Real-time 0.15 640 yolov8n ~200 FPS on Jetson Orin
SAHI Wide FOV 0.10 640 slices yolov8m Wide-angle cameras

πŸ“ Production Tips

Camera & Environment

  • 30ft (9m) elevation is lower than typical VisDrone drone altitudes (60–130m) β†’ persons appear larger β†’ good generalization
  • Beach lighting varies dramatically (sunrise/sunset/glare) β€” the model handles it via HSV augmentation
  • For best results: collect 200+ labeled frames from your actual camera and fine-tune further (see below)

Key DeepStream Settings

  1. FP16 only β€” INT8 quantization degrades recall on small objects
  2. interval=0 β€” process every frame to avoid missing fast-moving persons
  3. pre-cluster-threshold=0.10 β€” low threshold maximizes person detection
  4. Use NvDCF tracker β€” smooths detections across frames, improves effective recall

Further Fine-tuning on Your Beach Data

from ultralytics import YOLO

model = YOLO("weights/yolov8m_visdrone.pt")

model.train(
    data="your_beach_data.yaml",  # YOLO format: images/ + labels/ with 'person' class
    epochs=25,
    imgsz=1280,
    batch=4,
    lr0=0.001,       # Low LR for fine-tuning
    freeze=10,       # Freeze backbone, train neck+head only
    conf=0.001,
    patience=10,
    device=0,
)

Training Details

YOLOv8n-person (fine-tuned, 3.0M params)

  • Base: mshamrai/yolov8n-visdrone (YOLOv8n trained on VisDrone 10-class)
  • Fine-tuned: Head-only (backbone frozen) on person-only VisDrone subset (5,684 images, 106K boxes)
  • Resolution: 640Γ—640, Optimizer: Adam (lr=0.001)
  • Augmentation: Mosaic, rotation Β±10Β°, HSV jitter, vertical flip

YOLOv8m-VisDrone (25.9M params)

  • Source: mshamrai/yolov8m-visdrone
  • Trained: On full VisDrone2019-DET (6,471 images, 10 classes, 343K boxes)
  • Usage: Filter classes 0 (pedestrian) + 1 (people) via classes=[0,1] or DeepStream threshold filtering

Literature & References