---
tags:
  - ultralytics
  - yolov8
  - object-detection
  - person-detection
  - beach-surveillance
  - aerial-view
  - deepstream
  - edge-deployment
  - nvidia
  - onnx
  - visdrone
library_name: ultralytics
license: agpl-3.0
datasets:
  - banu4prasad/VisDrone-Dataset
pipeline_tag: object-detection
base_model:
  - mshamrai/yolov8m-visdrone
  - mshamrai/yolov8n-visdrone
model-index:
  - name: YOLOv8m-VisDrone (person classes)
    results:
      - task:
          type: object-detection
        dataset:
          name: VisDrone2019-DET val
          type: banu4prasad/VisDrone-Dataset
        metrics:
          - type: recall
            value: 0.456
            name: Recall (pedestrian, conf=0.15)
          - type: precision
            value: 0.586
            name: Precision (pedestrian, conf=0.15)
          - type: map50
            value: 0.386
            name: mAP@50 (all 10 classes, conf=0.10)
---

# 🏖️ Beach Person Detector (YOLOv8)

Single-class **person detection** models for **beach surveillance** from a **30ft elevated camera**, ready for **NVIDIA DeepStream** edge deployment.

## 🎯 Design Priorities
1. **HIGH RECALL** — miss no persons. False detections are acceptable, missed detections are NOT.
2. **Aerial/elevated view** — trained on VisDrone drone footage matching the 30ft camera perspective.
3. **Edge-ready** — ONNX exports optimized for TensorRT/DeepStream on Jetson and dGPU.

---

## ✅ Verified Evaluation Results

All numbers measured on **VisDrone2019-DET val set** (531 images, 13,969 person boxes).

### YOLOv8n-person (single-class, fine-tuned)

| conf | Precision | Recall | mAP@50 | mAP@50-95 |
|------|-----------|--------|--------|-----------|
| 0.05 | 0.535 | **0.378** | 0.355 | 0.111 |
| 0.10 | 0.535 | **0.378** | 0.349 | 0.110 |
| 0.15 | 0.535 | **0.378** | 0.341 | 0.108 |
| 0.25 | 0.535 | **0.378** | 0.320 | 0.103 |
| 0.50 | 0.727 | 0.255 | 0.215 | 0.075 |

### YOLOv8m-VisDrone (10-class, person classes extracted)

Evaluated on full VisDrone val (548 images, 38,759 boxes across 10 classes).

| conf | P (pedestrian) | R (pedestrian) | P (people) | R (people) | mAP@50 (all) |
|------|----------------|----------------|------------|------------|--------------|
| 0.10 | 0.603 | **0.447** | 0.621 | **0.331** | 0.386 |
| 0.15 | 0.586 | **0.456** | 0.610 | **0.339** | 0.368 |
| 0.25 | 0.680 | 0.408 | 0.693 | 0.290 | 0.337 |
| 0.50 | 0.883 | 0.278 | 0.849 | 0.162 | 0.263 |

> **Recommendation:** Use the **YOLOv8m-VisDrone** model at **conf=0.10–0.15** for maximum recall on person classes. The larger model gives **+7 percentage points higher recall** on pedestrians compared to YOLOv8n.

### ONNX Runtime Verification

| ONNX File | Size | Input | Output | onnx.checker | Opset |
|-----------|------|-------|--------|-------------|-------|
| `yolov8n_person_640.onnx` | 12.2 MB | 1×3×640×640 | 1×5×8400 | ✅ | 12 |
| `yolov8m_visdrone_640.onnx` | 103.6 MB | 1×3×640×640 | 1×14×8400 | ✅ | 12 |
| `yolov8m_visdrone_1280.onnx` | 104.1 MB | 1×3×1280×1280 | 1×14×33600 | ✅ | 12 |

---

## Model Variants

| Model | Params | ONNX File | Classes | Best For | ~FPS (Jetson Orin) |
|-------|--------|-----------|---------|----------|--------------------|
| **YOLOv8n-person** | 3.0M | `yolov8n_person_640.onnx` | 1 (person) | Real-time edge | ~200 FPS |
| **YOLOv8m-VisDrone** ⭐ | 25.9M | `yolov8m_visdrone_640.onnx` | 10 (filter to person) | **Best accuracy** | ~60 FPS |
| **YOLOv8m-VisDrone-1280** | 25.9M | `yolov8m_visdrone_1280.onnx` | 10 (filter to person) | Maximum recall | ~25 FPS |

---

## Quick Start

### Python (Ultralytics)
```python
from ultralytics import YOLO

# === Option A: Single-class person model (simpler) ===
model = YOLO("weights/yolov8n_person.pt")
results = model.predict("beach.jpg", conf=0.15, imgsz=640, max_det=300)

# === Option B: Full VisDrone model (higher recall — recommended) ===
model = YOLO("weights/yolov8m_visdrone.pt")
results = model.predict(
    "beach.jpg",
    conf=0.10,          # Low threshold for max recall
    imgsz=1280,         # Higher resolution for small persons
    max_det=300,
    classes=[0, 1],     # 0=pedestrian, 1=people
)

for r in results:
    print(f"Detected {len(r.boxes)} persons")
    r.save("result.jpg")
```

### SAHI Inference (Maximum Recall for Large/Wide Images)
For wide-angle cameras at 30ft, SAHI sliced inference adds **+12–14% mAP** for small person detection:

```python
# pip install sahi ultralytics
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

model = AutoDetectionModel.from_pretrained(
    model_type="ultralytics",
    model_path="weights/yolov8m_visdrone.pt",
    confidence_threshold=0.10,
    device="cuda:0",
)

result = get_sliced_prediction(
    "beach_wide_angle.jpg",
    model,
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.25,
    overlap_width_ratio=0.25,
    perform_standard_pred=True,
    postprocess_match_threshold=0.5,
)
# Filter to person classes only:
person_preds = [p for p in result.object_prediction_list if p.category.id in [0, 1]]
print(f"Detected {len(person_preds)} persons via SAHI")
```

---

## 🚀 Complete DeepStream Deployment Guide

### Prerequisites
- NVIDIA Jetson (Orin / Xavier / Nano) or dGPU with DeepStream 6.2+
- TensorRT 8.x+
- [DeepStream-Yolo plugin](https://github.com/marcoslucianops/DeepStream-Yolo)

### Step 1: Download Model Files
```bash
# Install huggingface CLI
pip install huggingface_hub

# Download the model you want
huggingface-cli download Shashank022002/beach-person-detector-yolov8m \
    onnx/yolov8m_visdrone_640.onnx \
    config/nvinfer_config.txt \
    config/labels.txt \
    --local-dir ./beach-person-detector
```

### Step 2: Build TensorRT Engine
```bash
cd beach-person-detector

# For the recommended YOLOv8m model (best accuracy):
trtexec --onnx=onnx/yolov8m_visdrone_640.onnx \
        --saveEngine=yolov8m_visdrone_640.engine \
        --fp16 \
        --workspace=4096

# ⚠️ Use FP16, NOT INT8 — INT8 degrades recall on small objects

# For the lightweight YOLOv8n model (fastest):
trtexec --onnx=onnx/yolov8n_person_640.onnx \
        --saveEngine=yolov8n_person_640.engine \
        --fp16 \
        --workspace=4096
```

### Step 3: Install DeepStream-Yolo Plugin
```bash
git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo

# Build the custom parser library
CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo
# Adjust CUDA_VER to match your system (check with nvcc --version)
```

### Step 4: Create DeepStream Config Files

**`config_infer_primary.txt`** — nvinfer config for the person detector:
```ini
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0

# ─── Choose ONE model ───
# Option A: Single-class person model (simplest)
#onnx-file=yolov8n_person_640.onnx
#model-engine-file=yolov8n_person_640.engine
#num-detected-classes=1
#labelfile-path=labels_person.txt

# Option B: Full VisDrone model (higher accuracy) ← RECOMMENDED
onnx-file=yolov8m_visdrone_640.onnx
model-engine-file=yolov8m_visdrone_640.engine
num-detected-classes=10
labelfile-path=labels_visdrone.txt

batch-size=1
network-mode=2                # FP16
interval=0                    # Process EVERY frame (critical for surveillance)
gie-unique-id=1
process-mode=1                # Primary detector
network-type=0                # Detector
cluster-mode=2                # NMS
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

# ─── HIGH RECALL SETTINGS ───
pre-cluster-threshold=0.10    # Very low → catch all persons
post-cluster-threshold=0.10

[class-attrs-all]
pre-cluster-threshold=0.10
topk=300
nms-iou-threshold=0.5

# For Option B: suppress non-person classes
# Set threshold=1.0 for classes 2-9 to filter them out
[class-attrs-0]
pre-cluster-threshold=0.10

[class-attrs-1]
pre-cluster-threshold=0.10

[class-attrs-2]
pre-cluster-threshold=1.0

[class-attrs-3]
pre-cluster-threshold=1.0

[class-attrs-4]
pre-cluster-threshold=1.0

[class-attrs-5]
pre-cluster-threshold=1.0

[class-attrs-6]
pre-cluster-threshold=1.0

[class-attrs-7]
pre-cluster-threshold=1.0

[class-attrs-8]
pre-cluster-threshold=1.0

[class-attrs-9]
pre-cluster-threshold=1.0
```

**`labels_person.txt`** (for YOLOv8n single-class):
```
person
```

**`labels_visdrone.txt`** (for YOLOv8m 10-class):
```
pedestrian
people
bicycle
car
van
truck
tricycle
awning-tricycle
bus
motor
```

**`deepstream_app_config.txt`** — main DeepStream app config:
```ini
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720

[source0]
enable=1
# For RTSP camera:
type=4
uri=rtsp://your_camera_ip:554/stream
# For USB camera:
#type=1
#camera-v4l2-dev-node=0
# For file:
#type=3
#uri=file:///path/to/beach_video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2           # EGL window
sync=0           # Async for max throughput
gpu-id=0

[osd]
enable=1
text-size=15
border-width=2
border-color=0;1;0;1

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
width=1280
height=720
enable-padding=0
live-source=1    # Set to 1 for RTSP/USB, 0 for file

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt
```

### Step 5: Run DeepStream
```bash
# Run the pipeline
deepstream-app -c deepstream_app_config.txt

# Or use Python bindings for custom logic:
python3 deepstream_python_app.py
```

### Step 6: Optional — Add Tracker for Person Counting
```ini
# Add to deepstream_app_config.txt for tracking across frames:
[tracker]
enable=1
tracker-width=640
tracker-height=480
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
```

---

## ⚙️ Confidence Threshold Guide

| Scenario | conf | imgsz | Model | Notes |
|----------|------|-------|-------|-------|
| **Max Recall** (recommended) | 0.10 | 1280 | yolov8m | Beach surveillance — miss nothing |
| **Balanced** | 0.20 | 640 | yolov8m | Good recall + precision |
| **Edge Real-time** | 0.15 | 640 | yolov8n | ~200 FPS on Jetson Orin |
| **SAHI Wide FOV** | 0.10 | 640 slices | yolov8m | Wide-angle cameras |

---

## 📝 Production Tips

### Camera & Environment
- **30ft (9m) elevation** is *lower* than typical VisDrone drone altitudes (60–130m) → persons appear *larger* → good generalization
- Beach lighting varies dramatically (sunrise/sunset/glare) — the model handles it via HSV augmentation
- For **best results**: collect 200+ labeled frames from your actual camera and fine-tune further (see below)

### Key DeepStream Settings
1. **FP16 only** — INT8 quantization degrades recall on small objects
2. **`interval=0`** — process every frame to avoid missing fast-moving persons
3. **`pre-cluster-threshold=0.10`** — low threshold maximizes person detection
4. **Use NvDCF tracker** — smooths detections across frames, improves effective recall

### Further Fine-tuning on Your Beach Data
```python
from ultralytics import YOLO

model = YOLO("weights/yolov8m_visdrone.pt")

model.train(
    data="your_beach_data.yaml",  # YOLO format: images/ + labels/ with 'person' class
    epochs=25,
    imgsz=1280,
    batch=4,
    lr0=0.001,       # Low LR for fine-tuning
    freeze=10,       # Freeze backbone, train neck+head only
    conf=0.001,
    patience=10,
    device=0,
)
```

---

## Training Details

### YOLOv8n-person (fine-tuned, 3.0M params)
- **Base**: [mshamrai/yolov8n-visdrone](https://huggingface.co/mshamrai/yolov8n-visdrone) (YOLOv8n trained on VisDrone 10-class)
- **Fine-tuned**: Head-only (backbone frozen) on person-only VisDrone subset (5,684 images, 106K boxes)
- **Resolution**: 640×640, Optimizer: Adam (lr=0.001)
- **Augmentation**: Mosaic, rotation ±10°, HSV jitter, vertical flip

### YOLOv8m-VisDrone (25.9M params)
- **Source**: [mshamrai/yolov8m-visdrone](https://huggingface.co/mshamrai/yolov8m-visdrone)
- **Trained**: On full VisDrone2019-DET (6,471 images, 10 classes, 343K boxes)
- **Usage**: Filter classes 0 (pedestrian) + 1 (people) via `classes=[0,1]` or DeepStream threshold filtering

---

## Literature & References
- [VisDrone2019-DET](https://arxiv.org/abs/2001.06303) — Aerial drone detection benchmark
- [SAHI](https://arxiv.org/abs/2202.06934) — Slicing Aided Hyper Inference (+12–14% AP for small objects)
- [DeepStream-Yolo](https://github.com/marcoslucianops/DeepStream-Yolo) — NVIDIA DeepStream YOLOv8 integration
- [Ultralytics YOLOv8](https://docs.ultralytics.com/) — Model documentation