Shashank022002

Update README with verified evaluation metrics and full DeepStream deployment guide

8e73d37 verified about 2 months ago

12.8 kB

tags:
  - ultralytics
  - yolov8
  - object-detection
  - person-detection
  - beach-surveillance
  - aerial-view
  - deepstream
  - edge-deployment
  - nvidia
  - onnx
  - visdrone
library_name: ultralytics
license: agpl-3.0
datasets:
  - banu4prasad/VisDrone-Dataset
pipeline_tag: object-detection
base_model:
  - mshamrai/yolov8m-visdrone
  - mshamrai/yolov8n-visdrone
model-index:
  - name: YOLOv8m-VisDrone (person classes)
    results:
      - task:
          type: object-detection
        dataset:
          name: VisDrone2019-DET val
          type: banu4prasad/VisDrone-Dataset
        metrics:
          - type: recall
            value: 0.456
            name: Recall (pedestrian, conf=0.15)
          - type: precision
            value: 0.586
            name: Precision (pedestrian, conf=0.15)
          - type: map50
            value: 0.386
            name: mAP@50 (all 10 classes, conf=0.10)

🏖️ Beach Person Detector (YOLOv8)

Single-class person detection models for beach surveillance from a 30ft elevated camera, ready for NVIDIA DeepStream edge deployment.

🎯 Design Priorities

HIGH RECALL — miss no persons. False detections are acceptable, missed detections are NOT.
Aerial/elevated view — trained on VisDrone drone footage matching the 30ft camera perspective.
Edge-ready — ONNX exports optimized for TensorRT/DeepStream on Jetson and dGPU.

✅ Verified Evaluation Results

All numbers measured on VisDrone2019-DET val set (531 images, 13,969 person boxes).

YOLOv8n-person (single-class, fine-tuned)

conf	Precision	Recall	mAP@50	mAP@50-95
0.05	0.535	0.378	0.355	0.111
0.10	0.535	0.378	0.349	0.110
0.15	0.535	0.378	0.341	0.108
0.25	0.535	0.378	0.320	0.103
0.50	0.727	0.255	0.215	0.075

YOLOv8m-VisDrone (10-class, person classes extracted)

Evaluated on full VisDrone val (548 images, 38,759 boxes across 10 classes).

conf	P (pedestrian)	R (pedestrian)	P (people)	R (people)	mAP@50 (all)
0.10	0.603	0.447	0.621	0.331	0.386
0.15	0.586	0.456	0.610	0.339	0.368
0.25	0.680	0.408	0.693	0.290	0.337
0.50	0.883	0.278	0.849	0.162	0.263

Recommendation: Use the YOLOv8m-VisDrone model at conf=0.10–0.15 for maximum recall on person classes. The larger model gives +7 percentage points higher recall on pedestrians compared to YOLOv8n.

ONNX Runtime Verification

ONNX File	Size	Input	Output	onnx.checker	Opset
`yolov8n_person_640.onnx`	12.2 MB	1×3×640×640	1×5×8400	✅	12
`yolov8m_visdrone_640.onnx`	103.6 MB	1×3×640×640	1×14×8400	✅	12
`yolov8m_visdrone_1280.onnx`	104.1 MB	1×3×1280×1280	1×14×33600	✅	12

Model Variants

Model	Params	ONNX File	Classes	Best For	~FPS (Jetson Orin)
YOLOv8n-person	3.0M	`yolov8n_person_640.onnx`	1 (person)	Real-time edge	~200 FPS
YOLOv8m-VisDrone ⭐	25.9M	`yolov8m_visdrone_640.onnx`	10 (filter to person)	Best accuracy	~60 FPS
YOLOv8m-VisDrone-1280	25.9M	`yolov8m_visdrone_1280.onnx`	10 (filter to person)	Maximum recall	~25 FPS

Quick Start

Python (Ultralytics)

from ultralytics import YOLO

# === Option A: Single-class person model (simpler) ===
model = YOLO("weights/yolov8n_person.pt")
results = model.predict("beach.jpg", conf=0.15, imgsz=640, max_det=300)

# === Option B: Full VisDrone model (higher recall — recommended) ===
model = YOLO("weights/yolov8m_visdrone.pt")
results = model.predict(
    "beach.jpg",
    conf=0.10,          # Low threshold for max recall
    imgsz=1280,         # Higher resolution for small persons
    max_det=300,
    classes=[0, 1],     # 0=pedestrian, 1=people
)

for r in results:
    print(f"Detected {len(r.boxes)} persons")
    r.save("result.jpg")

SAHI Inference (Maximum Recall for Large/Wide Images)

For wide-angle cameras at 30ft, SAHI sliced inference adds +12–14% mAP for small person detection:

# pip install sahi ultralytics
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

model = AutoDetectionModel.from_pretrained(
    model_type="ultralytics",
    model_path="weights/yolov8m_visdrone.pt",
    confidence_threshold=0.10,
    device="cuda:0",
)

result = get_sliced_prediction(
    "beach_wide_angle.jpg",
    model,
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.25,
    overlap_width_ratio=0.25,
    perform_standard_pred=True,
    postprocess_match_threshold=0.5,
)
# Filter to person classes only:
person_preds = [p for p in result.object_prediction_list if p.category.id in [0, 1]]
print(f"Detected {len(person_preds)} persons via SAHI")

🚀 Complete DeepStream Deployment Guide

Prerequisites

NVIDIA Jetson (Orin / Xavier / Nano) or dGPU with DeepStream 6.2+
TensorRT 8.x+
DeepStream-Yolo plugin

Step 1: Download Model Files

# Install huggingface CLI
pip install huggingface_hub

# Download the model you want
huggingface-cli download Shashank022002/beach-person-detector-yolov8m \
    onnx/yolov8m_visdrone_640.onnx \
    config/nvinfer_config.txt \
    config/labels.txt \
    --local-dir ./beach-person-detector

Step 2: Build TensorRT Engine

cd beach-person-detector

# For the recommended YOLOv8m model (best accuracy):
trtexec --onnx=onnx/yolov8m_visdrone_640.onnx \
        --saveEngine=yolov8m_visdrone_640.engine \
        --fp16 \
        --workspace=4096

# ⚠️ Use FP16, NOT INT8 — INT8 degrades recall on small objects

# For the lightweight YOLOv8n model (fastest):
trtexec --onnx=onnx/yolov8n_person_640.onnx \
        --saveEngine=yolov8n_person_640.engine \
        --fp16 \
        --workspace=4096

Step 3: Install DeepStream-Yolo Plugin

git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo

# Build the custom parser library
CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo
# Adjust CUDA_VER to match your system (check with nvcc --version)

Step 4: Create DeepStream Config Files

config_infer_primary.txt — nvinfer config for the person detector:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0

# ─── Choose ONE model ───
# Option A: Single-class person model (simplest)
#onnx-file=yolov8n_person_640.onnx
#model-engine-file=yolov8n_person_640.engine
#num-detected-classes=1
#labelfile-path=labels_person.txt

# Option B: Full VisDrone model (higher accuracy) ← RECOMMENDED
onnx-file=yolov8m_visdrone_640.onnx
model-engine-file=yolov8m_visdrone_640.engine
num-detected-classes=10
labelfile-path=labels_visdrone.txt

batch-size=1
network-mode=2                # FP16
interval=0                    # Process EVERY frame (critical for surveillance)
gie-unique-id=1
process-mode=1                # Primary detector
network-type=0                # Detector
cluster-mode=2                # NMS
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

# ─── HIGH RECALL SETTINGS ───
pre-cluster-threshold=0.10    # Very low → catch all persons
post-cluster-threshold=0.10

[class-attrs-all]
pre-cluster-threshold=0.10
topk=300
nms-iou-threshold=0.5

# For Option B: suppress non-person classes
# Set threshold=1.0 for classes 2-9 to filter them out
[class-attrs-0]
pre-cluster-threshold=0.10

[class-attrs-1]
pre-cluster-threshold=0.10

[class-attrs-2]
pre-cluster-threshold=1.0

[class-attrs-3]
pre-cluster-threshold=1.0

[class-attrs-4]
pre-cluster-threshold=1.0

[class-attrs-5]
pre-cluster-threshold=1.0

[class-attrs-6]
pre-cluster-threshold=1.0

[class-attrs-7]
pre-cluster-threshold=1.0

[class-attrs-8]
pre-cluster-threshold=1.0

[class-attrs-9]
pre-cluster-threshold=1.0

labels_person.txt (for YOLOv8n single-class):

person

labels_visdrone.txt (for YOLOv8m 10-class):

pedestrian
people
bicycle
car
van
truck
tricycle
awning-tricycle
bus
motor

deepstream_app_config.txt — main DeepStream app config:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720

[source0]
enable=1
# For RTSP camera:
type=4
uri=rtsp://your_camera_ip:554/stream
# For USB camera:
#type=1
#camera-v4l2-dev-node=0
# For file:
#type=3
#uri=file:///path/to/beach_video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2           # EGL window
sync=0           # Async for max throughput
gpu-id=0

[osd]
enable=1
text-size=15
border-width=2
border-color=0;1;0;1

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
width=1280
height=720
enable-padding=0
live-source=1    # Set to 1 for RTSP/USB, 0 for file

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

Step 5: Run DeepStream

# Run the pipeline
deepstream-app -c deepstream_app_config.txt

# Or use Python bindings for custom logic:
python3 deepstream_python_app.py

Step 6: Optional — Add Tracker for Person Counting

# Add to deepstream_app_config.txt for tracking across frames:
[tracker]
enable=1
tracker-width=640
tracker-height=480
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml

⚙️ Confidence Threshold Guide

Scenario	conf	imgsz	Model	Notes
Max Recall (recommended)	0.10	1280	yolov8m	Beach surveillance — miss nothing
Balanced	0.20	640	yolov8m	Good recall + precision
Edge Real-time	0.15	640	yolov8n	~200 FPS on Jetson Orin
SAHI Wide FOV	0.10	640 slices	yolov8m	Wide-angle cameras

📝 Production Tips

Camera & Environment

30ft (9m) elevation is lower than typical VisDrone drone altitudes (60–130m) → persons appear larger → good generalization
Beach lighting varies dramatically (sunrise/sunset/glare) — the model handles it via HSV augmentation
For best results: collect 200+ labeled frames from your actual camera and fine-tune further (see below)

Key DeepStream Settings

FP16 only — INT8 quantization degrades recall on small objects
interval=0 — process every frame to avoid missing fast-moving persons
pre-cluster-threshold=0.10 — low threshold maximizes person detection
Use NvDCF tracker — smooths detections across frames, improves effective recall

Further Fine-tuning on Your Beach Data

from ultralytics import YOLO

model = YOLO("weights/yolov8m_visdrone.pt")

model.train(
    data="your_beach_data.yaml",  # YOLO format: images/ + labels/ with 'person' class
    epochs=25,
    imgsz=1280,
    batch=4,
    lr0=0.001,       # Low LR for fine-tuning
    freeze=10,       # Freeze backbone, train neck+head only
    conf=0.001,
    patience=10,
    device=0,
)

Training Details

YOLOv8n-person (fine-tuned, 3.0M params)

Base: mshamrai/yolov8n-visdrone (YOLOv8n trained on VisDrone 10-class)
Fine-tuned: Head-only (backbone frozen) on person-only VisDrone subset (5,684 images, 106K boxes)
Resolution: 640×640, Optimizer: Adam (lr=0.001)
Augmentation: Mosaic, rotation ±10°, HSV jitter, vertical flip

YOLOv8m-VisDrone (25.9M params)

Source: mshamrai/yolov8m-visdrone
Trained: On full VisDrone2019-DET (6,471 images, 10 classes, 343K boxes)
Usage: Filter classes 0 (pedestrian) + 1 (people) via classes=[0,1] or DeepStream threshold filtering

Literature & References

VisDrone2019-DET — Aerial drone detection benchmark
SAHI — Slicing Aided Hyper Inference (+12–14% AP for small objects)
DeepStream-Yolo — NVIDIA DeepStream YOLOv8 integration
Ultralytics YOLOv8 — Model documentation