File size: 12,823 Bytes
c287933
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c287933
 
 
 
 
 
 
 
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c287933
 
 
8e73d37
 
 
 
c287933
 
8e73d37
c287933
 
 
 
 
 
 
8e73d37
c287933
 
 
8e73d37
c287933
8e73d37
 
 
 
 
 
 
 
 
 
 
c287933
 
 
8e73d37
c287933
 
8e73d37
c287933
 
 
 
 
 
8e73d37
c287933
 
 
 
 
 
 
 
 
 
 
 
 
8e73d37
 
 
c287933
 
8e73d37
c287933
8e73d37
 
 
 
 
 
 
 
c287933
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c287933
 
 
8e73d37
 
 
 
 
c287933
 
 
 
8e73d37
c287933
8e73d37
c287933
8e73d37
 
 
 
c287933
 
8e73d37
c287933
8e73d37
c287933
 
 
 
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c287933
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
c287933
 
8e73d37
c287933
 
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c287933
 
8e73d37
c287933
 
 
8e73d37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c287933
 
 
 
 
8e73d37
 
 
 
 
 
c287933
 
 
 
8e73d37
 
 
c287933
8e73d37
c287933
 
8e73d37
 
c287933
8e73d37
c287933
 
 
 
 
 
8e73d37
c287933
 
 
 
8e73d37
c287933
 
8e73d37
c287933
 
 
8e73d37
 
c287933
 
8e73d37
 
 
 
 
c287933
8e73d37
 
 
 
 
 
c287933
8e73d37
c287933
8e73d37
c287933
8e73d37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
---
tags:
  - ultralytics
  - yolov8
  - object-detection
  - person-detection
  - beach-surveillance
  - aerial-view
  - deepstream
  - edge-deployment
  - nvidia
  - onnx
  - visdrone
library_name: ultralytics
license: agpl-3.0
datasets:
  - banu4prasad/VisDrone-Dataset
pipeline_tag: object-detection
base_model:
  - mshamrai/yolov8m-visdrone
  - mshamrai/yolov8n-visdrone
model-index:
  - name: YOLOv8m-VisDrone (person classes)
    results:
      - task:
          type: object-detection
        dataset:
          name: VisDrone2019-DET val
          type: banu4prasad/VisDrone-Dataset
        metrics:
          - type: recall
            value: 0.456
            name: Recall (pedestrian, conf=0.15)
          - type: precision
            value: 0.586
            name: Precision (pedestrian, conf=0.15)
          - type: map50
            value: 0.386
            name: mAP@50 (all 10 classes, conf=0.10)
---

# πŸ–οΈ Beach Person Detector (YOLOv8)

Single-class **person detection** models for **beach surveillance** from a **30ft elevated camera**, ready for **NVIDIA DeepStream** edge deployment.

## 🎯 Design Priorities
1. **HIGH RECALL** β€” miss no persons. False detections are acceptable, missed detections are NOT.
2. **Aerial/elevated view** β€” trained on VisDrone drone footage matching the 30ft camera perspective.
3. **Edge-ready** β€” ONNX exports optimized for TensorRT/DeepStream on Jetson and dGPU.

---

## βœ… Verified Evaluation Results

All numbers measured on **VisDrone2019-DET val set** (531 images, 13,969 person boxes).

### YOLOv8n-person (single-class, fine-tuned)

| conf | Precision | Recall | mAP@50 | mAP@50-95 |
|------|-----------|--------|--------|-----------|
| 0.05 | 0.535 | **0.378** | 0.355 | 0.111 |
| 0.10 | 0.535 | **0.378** | 0.349 | 0.110 |
| 0.15 | 0.535 | **0.378** | 0.341 | 0.108 |
| 0.25 | 0.535 | **0.378** | 0.320 | 0.103 |
| 0.50 | 0.727 | 0.255 | 0.215 | 0.075 |

### YOLOv8m-VisDrone (10-class, person classes extracted)

Evaluated on full VisDrone val (548 images, 38,759 boxes across 10 classes).

| conf | P (pedestrian) | R (pedestrian) | P (people) | R (people) | mAP@50 (all) |
|------|----------------|----------------|------------|------------|--------------|
| 0.10 | 0.603 | **0.447** | 0.621 | **0.331** | 0.386 |
| 0.15 | 0.586 | **0.456** | 0.610 | **0.339** | 0.368 |
| 0.25 | 0.680 | 0.408 | 0.693 | 0.290 | 0.337 |
| 0.50 | 0.883 | 0.278 | 0.849 | 0.162 | 0.263 |

> **Recommendation:** Use the **YOLOv8m-VisDrone** model at **conf=0.10–0.15** for maximum recall on person classes. The larger model gives **+7 percentage points higher recall** on pedestrians compared to YOLOv8n.

### ONNX Runtime Verification

| ONNX File | Size | Input | Output | onnx.checker | Opset |
|-----------|------|-------|--------|-------------|-------|
| `yolov8n_person_640.onnx` | 12.2 MB | 1Γ—3Γ—640Γ—640 | 1Γ—5Γ—8400 | βœ… | 12 |
| `yolov8m_visdrone_640.onnx` | 103.6 MB | 1Γ—3Γ—640Γ—640 | 1Γ—14Γ—8400 | βœ… | 12 |
| `yolov8m_visdrone_1280.onnx` | 104.1 MB | 1Γ—3Γ—1280Γ—1280 | 1Γ—14Γ—33600 | βœ… | 12 |

---

## Model Variants

| Model | Params | ONNX File | Classes | Best For | ~FPS (Jetson Orin) |
|-------|--------|-----------|---------|----------|--------------------|
| **YOLOv8n-person** | 3.0M | `yolov8n_person_640.onnx` | 1 (person) | Real-time edge | ~200 FPS |
| **YOLOv8m-VisDrone** ⭐ | 25.9M | `yolov8m_visdrone_640.onnx` | 10 (filter to person) | **Best accuracy** | ~60 FPS |
| **YOLOv8m-VisDrone-1280** | 25.9M | `yolov8m_visdrone_1280.onnx` | 10 (filter to person) | Maximum recall | ~25 FPS |

---

## Quick Start

### Python (Ultralytics)
```python
from ultralytics import YOLO

# === Option A: Single-class person model (simpler) ===
model = YOLO("weights/yolov8n_person.pt")
results = model.predict("beach.jpg", conf=0.15, imgsz=640, max_det=300)

# === Option B: Full VisDrone model (higher recall β€” recommended) ===
model = YOLO("weights/yolov8m_visdrone.pt")
results = model.predict(
    "beach.jpg",
    conf=0.10,          # Low threshold for max recall
    imgsz=1280,         # Higher resolution for small persons
    max_det=300,
    classes=[0, 1],     # 0=pedestrian, 1=people
)

for r in results:
    print(f"Detected {len(r.boxes)} persons")
    r.save("result.jpg")
```

### SAHI Inference (Maximum Recall for Large/Wide Images)
For wide-angle cameras at 30ft, SAHI sliced inference adds **+12–14% mAP** for small person detection:

```python
# pip install sahi ultralytics
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

model = AutoDetectionModel.from_pretrained(
    model_type="ultralytics",
    model_path="weights/yolov8m_visdrone.pt",
    confidence_threshold=0.10,
    device="cuda:0",
)

result = get_sliced_prediction(
    "beach_wide_angle.jpg",
    model,
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.25,
    overlap_width_ratio=0.25,
    perform_standard_pred=True,
    postprocess_match_threshold=0.5,
)
# Filter to person classes only:
person_preds = [p for p in result.object_prediction_list if p.category.id in [0, 1]]
print(f"Detected {len(person_preds)} persons via SAHI")
```

---

## πŸš€ Complete DeepStream Deployment Guide

### Prerequisites
- NVIDIA Jetson (Orin / Xavier / Nano) or dGPU with DeepStream 6.2+
- TensorRT 8.x+
- [DeepStream-Yolo plugin](https://github.com/marcoslucianops/DeepStream-Yolo)

### Step 1: Download Model Files
```bash
# Install huggingface CLI
pip install huggingface_hub

# Download the model you want
huggingface-cli download Shashank022002/beach-person-detector-yolov8m \
    onnx/yolov8m_visdrone_640.onnx \
    config/nvinfer_config.txt \
    config/labels.txt \
    --local-dir ./beach-person-detector
```

### Step 2: Build TensorRT Engine
```bash
cd beach-person-detector

# For the recommended YOLOv8m model (best accuracy):
trtexec --onnx=onnx/yolov8m_visdrone_640.onnx \
        --saveEngine=yolov8m_visdrone_640.engine \
        --fp16 \
        --workspace=4096

# ⚠️ Use FP16, NOT INT8 β€” INT8 degrades recall on small objects

# For the lightweight YOLOv8n model (fastest):
trtexec --onnx=onnx/yolov8n_person_640.onnx \
        --saveEngine=yolov8n_person_640.engine \
        --fp16 \
        --workspace=4096
```

### Step 3: Install DeepStream-Yolo Plugin
```bash
git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo

# Build the custom parser library
CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo
# Adjust CUDA_VER to match your system (check with nvcc --version)
```

### Step 4: Create DeepStream Config Files

**`config_infer_primary.txt`** β€” nvinfer config for the person detector:
```ini
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0

# ─── Choose ONE model ───
# Option A: Single-class person model (simplest)
#onnx-file=yolov8n_person_640.onnx
#model-engine-file=yolov8n_person_640.engine
#num-detected-classes=1
#labelfile-path=labels_person.txt

# Option B: Full VisDrone model (higher accuracy) ← RECOMMENDED
onnx-file=yolov8m_visdrone_640.onnx
model-engine-file=yolov8m_visdrone_640.engine
num-detected-classes=10
labelfile-path=labels_visdrone.txt

batch-size=1
network-mode=2                # FP16
interval=0                    # Process EVERY frame (critical for surveillance)
gie-unique-id=1
process-mode=1                # Primary detector
network-type=0                # Detector
cluster-mode=2                # NMS
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

# ─── HIGH RECALL SETTINGS ───
pre-cluster-threshold=0.10    # Very low β†’ catch all persons
post-cluster-threshold=0.10

[class-attrs-all]
pre-cluster-threshold=0.10
topk=300
nms-iou-threshold=0.5

# For Option B: suppress non-person classes
# Set threshold=1.0 for classes 2-9 to filter them out
[class-attrs-0]
pre-cluster-threshold=0.10

[class-attrs-1]
pre-cluster-threshold=0.10

[class-attrs-2]
pre-cluster-threshold=1.0

[class-attrs-3]
pre-cluster-threshold=1.0

[class-attrs-4]
pre-cluster-threshold=1.0

[class-attrs-5]
pre-cluster-threshold=1.0

[class-attrs-6]
pre-cluster-threshold=1.0

[class-attrs-7]
pre-cluster-threshold=1.0

[class-attrs-8]
pre-cluster-threshold=1.0

[class-attrs-9]
pre-cluster-threshold=1.0
```

**`labels_person.txt`** (for YOLOv8n single-class):
```
person
```

**`labels_visdrone.txt`** (for YOLOv8m 10-class):
```
pedestrian
people
bicycle
car
van
truck
tricycle
awning-tricycle
bus
motor
```

**`deepstream_app_config.txt`** β€” main DeepStream app config:
```ini
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720

[source0]
enable=1
# For RTSP camera:
type=4
uri=rtsp://your_camera_ip:554/stream
# For USB camera:
#type=1
#camera-v4l2-dev-node=0
# For file:
#type=3
#uri=file:///path/to/beach_video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2           # EGL window
sync=0           # Async for max throughput
gpu-id=0

[osd]
enable=1
text-size=15
border-width=2
border-color=0;1;0;1

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
width=1280
height=720
enable-padding=0
live-source=1    # Set to 1 for RTSP/USB, 0 for file

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt
```

### Step 5: Run DeepStream
```bash
# Run the pipeline
deepstream-app -c deepstream_app_config.txt

# Or use Python bindings for custom logic:
python3 deepstream_python_app.py
```

### Step 6: Optional β€” Add Tracker for Person Counting
```ini
# Add to deepstream_app_config.txt for tracking across frames:
[tracker]
enable=1
tracker-width=640
tracker-height=480
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
```

---

## βš™οΈ Confidence Threshold Guide

| Scenario | conf | imgsz | Model | Notes |
|----------|------|-------|-------|-------|
| **Max Recall** (recommended) | 0.10 | 1280 | yolov8m | Beach surveillance β€” miss nothing |
| **Balanced** | 0.20 | 640 | yolov8m | Good recall + precision |
| **Edge Real-time** | 0.15 | 640 | yolov8n | ~200 FPS on Jetson Orin |
| **SAHI Wide FOV** | 0.10 | 640 slices | yolov8m | Wide-angle cameras |

---

## πŸ“ Production Tips

### Camera & Environment
- **30ft (9m) elevation** is *lower* than typical VisDrone drone altitudes (60–130m) β†’ persons appear *larger* β†’ good generalization
- Beach lighting varies dramatically (sunrise/sunset/glare) β€” the model handles it via HSV augmentation
- For **best results**: collect 200+ labeled frames from your actual camera and fine-tune further (see below)

### Key DeepStream Settings
1. **FP16 only** β€” INT8 quantization degrades recall on small objects
2. **`interval=0`** β€” process every frame to avoid missing fast-moving persons
3. **`pre-cluster-threshold=0.10`** β€” low threshold maximizes person detection
4. **Use NvDCF tracker** β€” smooths detections across frames, improves effective recall

### Further Fine-tuning on Your Beach Data
```python
from ultralytics import YOLO

model = YOLO("weights/yolov8m_visdrone.pt")

model.train(
    data="your_beach_data.yaml",  # YOLO format: images/ + labels/ with 'person' class
    epochs=25,
    imgsz=1280,
    batch=4,
    lr0=0.001,       # Low LR for fine-tuning
    freeze=10,       # Freeze backbone, train neck+head only
    conf=0.001,
    patience=10,
    device=0,
)
```

---

## Training Details

### YOLOv8n-person (fine-tuned, 3.0M params)
- **Base**: [mshamrai/yolov8n-visdrone](https://huggingface.co/mshamrai/yolov8n-visdrone) (YOLOv8n trained on VisDrone 10-class)
- **Fine-tuned**: Head-only (backbone frozen) on person-only VisDrone subset (5,684 images, 106K boxes)
- **Resolution**: 640Γ—640, Optimizer: Adam (lr=0.001)
- **Augmentation**: Mosaic, rotation Β±10Β°, HSV jitter, vertical flip

### YOLOv8m-VisDrone (25.9M params)
- **Source**: [mshamrai/yolov8m-visdrone](https://huggingface.co/mshamrai/yolov8m-visdrone)
- **Trained**: On full VisDrone2019-DET (6,471 images, 10 classes, 343K boxes)
- **Usage**: Filter classes 0 (pedestrian) + 1 (people) via `classes=[0,1]` or DeepStream threshold filtering

---

## Literature & References
- [VisDrone2019-DET](https://arxiv.org/abs/2001.06303) β€” Aerial drone detection benchmark
- [SAHI](https://arxiv.org/abs/2202.06934) β€” Slicing Aided Hyper Inference (+12–14% AP for small objects)
- [DeepStream-Yolo](https://github.com/marcoslucianops/DeepStream-Yolo) β€” NVIDIA DeepStream YOLOv8 integration
- [Ultralytics YOLOv8](https://docs.ultralytics.com/) β€” Model documentation