Instructions to use Shashank022002/beach-person-detector-yolov8m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use Shashank022002/beach-person-detector-yolov8m with ultralytics:
from ultralytics import YOLOvv8 model = YOLOvv8.from_pretrained("Shashank022002/beach-person-detector-yolov8m") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
tags:
- ultralytics
- yolov8
- object-detection
- person-detection
- beach-surveillance
- aerial-view
- deepstream
- edge-deployment
- nvidia
- onnx
- visdrone
library_name: ultralytics
license: agpl-3.0
datasets:
- banu4prasad/VisDrone-Dataset
pipeline_tag: object-detection
base_model:
- mshamrai/yolov8m-visdrone
- mshamrai/yolov8n-visdrone
model-index:
- name: YOLOv8m-VisDrone (person classes)
results:
- task:
type: object-detection
dataset:
name: VisDrone2019-DET val
type: banu4prasad/VisDrone-Dataset
metrics:
- type: recall
value: 0.456
name: Recall (pedestrian, conf=0.15)
- type: precision
value: 0.586
name: Precision (pedestrian, conf=0.15)
- type: map50
value: 0.386
name: mAP@50 (all 10 classes, conf=0.10)
ποΈ Beach Person Detector (YOLOv8)
Single-class person detection models for beach surveillance from a 30ft elevated camera, ready for NVIDIA DeepStream edge deployment.
π― Design Priorities
- HIGH RECALL β miss no persons. False detections are acceptable, missed detections are NOT.
- Aerial/elevated view β trained on VisDrone drone footage matching the 30ft camera perspective.
- Edge-ready β ONNX exports optimized for TensorRT/DeepStream on Jetson and dGPU.
β Verified Evaluation Results
All numbers measured on VisDrone2019-DET val set (531 images, 13,969 person boxes).
YOLOv8n-person (single-class, fine-tuned)
| conf | Precision | Recall | mAP@50 | mAP@50-95 |
|---|---|---|---|---|
| 0.05 | 0.535 | 0.378 | 0.355 | 0.111 |
| 0.10 | 0.535 | 0.378 | 0.349 | 0.110 |
| 0.15 | 0.535 | 0.378 | 0.341 | 0.108 |
| 0.25 | 0.535 | 0.378 | 0.320 | 0.103 |
| 0.50 | 0.727 | 0.255 | 0.215 | 0.075 |
YOLOv8m-VisDrone (10-class, person classes extracted)
Evaluated on full VisDrone val (548 images, 38,759 boxes across 10 classes).
| conf | P (pedestrian) | R (pedestrian) | P (people) | R (people) | mAP@50 (all) |
|---|---|---|---|---|---|
| 0.10 | 0.603 | 0.447 | 0.621 | 0.331 | 0.386 |
| 0.15 | 0.586 | 0.456 | 0.610 | 0.339 | 0.368 |
| 0.25 | 0.680 | 0.408 | 0.693 | 0.290 | 0.337 |
| 0.50 | 0.883 | 0.278 | 0.849 | 0.162 | 0.263 |
Recommendation: Use the YOLOv8m-VisDrone model at conf=0.10β0.15 for maximum recall on person classes. The larger model gives +7 percentage points higher recall on pedestrians compared to YOLOv8n.
ONNX Runtime Verification
| ONNX File | Size | Input | Output | onnx.checker | Opset |
|---|---|---|---|---|---|
yolov8n_person_640.onnx |
12.2 MB | 1Γ3Γ640Γ640 | 1Γ5Γ8400 | β | 12 |
yolov8m_visdrone_640.onnx |
103.6 MB | 1Γ3Γ640Γ640 | 1Γ14Γ8400 | β | 12 |
yolov8m_visdrone_1280.onnx |
104.1 MB | 1Γ3Γ1280Γ1280 | 1Γ14Γ33600 | β | 12 |
Model Variants
| Model | Params | ONNX File | Classes | Best For | ~FPS (Jetson Orin) |
|---|---|---|---|---|---|
| YOLOv8n-person | 3.0M | yolov8n_person_640.onnx |
1 (person) | Real-time edge | ~200 FPS |
| YOLOv8m-VisDrone β | 25.9M | yolov8m_visdrone_640.onnx |
10 (filter to person) | Best accuracy | ~60 FPS |
| YOLOv8m-VisDrone-1280 | 25.9M | yolov8m_visdrone_1280.onnx |
10 (filter to person) | Maximum recall | ~25 FPS |
Quick Start
Python (Ultralytics)
from ultralytics import YOLO
# === Option A: Single-class person model (simpler) ===
model = YOLO("weights/yolov8n_person.pt")
results = model.predict("beach.jpg", conf=0.15, imgsz=640, max_det=300)
# === Option B: Full VisDrone model (higher recall β recommended) ===
model = YOLO("weights/yolov8m_visdrone.pt")
results = model.predict(
"beach.jpg",
conf=0.10, # Low threshold for max recall
imgsz=1280, # Higher resolution for small persons
max_det=300,
classes=[0, 1], # 0=pedestrian, 1=people
)
for r in results:
print(f"Detected {len(r.boxes)} persons")
r.save("result.jpg")
SAHI Inference (Maximum Recall for Large/Wide Images)
For wide-angle cameras at 30ft, SAHI sliced inference adds +12β14% mAP for small person detection:
# pip install sahi ultralytics
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction
model = AutoDetectionModel.from_pretrained(
model_type="ultralytics",
model_path="weights/yolov8m_visdrone.pt",
confidence_threshold=0.10,
device="cuda:0",
)
result = get_sliced_prediction(
"beach_wide_angle.jpg",
model,
slice_height=640,
slice_width=640,
overlap_height_ratio=0.25,
overlap_width_ratio=0.25,
perform_standard_pred=True,
postprocess_match_threshold=0.5,
)
# Filter to person classes only:
person_preds = [p for p in result.object_prediction_list if p.category.id in [0, 1]]
print(f"Detected {len(person_preds)} persons via SAHI")
π Complete DeepStream Deployment Guide
Prerequisites
- NVIDIA Jetson (Orin / Xavier / Nano) or dGPU with DeepStream 6.2+
- TensorRT 8.x+
- DeepStream-Yolo plugin
Step 1: Download Model Files
# Install huggingface CLI
pip install huggingface_hub
# Download the model you want
huggingface-cli download Shashank022002/beach-person-detector-yolov8m \
onnx/yolov8m_visdrone_640.onnx \
config/nvinfer_config.txt \
config/labels.txt \
--local-dir ./beach-person-detector
Step 2: Build TensorRT Engine
cd beach-person-detector
# For the recommended YOLOv8m model (best accuracy):
trtexec --onnx=onnx/yolov8m_visdrone_640.onnx \
--saveEngine=yolov8m_visdrone_640.engine \
--fp16 \
--workspace=4096
# β οΈ Use FP16, NOT INT8 β INT8 degrades recall on small objects
# For the lightweight YOLOv8n model (fastest):
trtexec --onnx=onnx/yolov8n_person_640.onnx \
--saveEngine=yolov8n_person_640.engine \
--fp16 \
--workspace=4096
Step 3: Install DeepStream-Yolo Plugin
git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo
# Build the custom parser library
CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo
# Adjust CUDA_VER to match your system (check with nvcc --version)
Step 4: Create DeepStream Config Files
config_infer_primary.txt β nvinfer config for the person detector:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
# βββ Choose ONE model βββ
# Option A: Single-class person model (simplest)
#onnx-file=yolov8n_person_640.onnx
#model-engine-file=yolov8n_person_640.engine
#num-detected-classes=1
#labelfile-path=labels_person.txt
# Option B: Full VisDrone model (higher accuracy) β RECOMMENDED
onnx-file=yolov8m_visdrone_640.onnx
model-engine-file=yolov8m_visdrone_640.engine
num-detected-classes=10
labelfile-path=labels_visdrone.txt
batch-size=1
network-mode=2 # FP16
interval=0 # Process EVERY frame (critical for surveillance)
gie-unique-id=1
process-mode=1 # Primary detector
network-type=0 # Detector
cluster-mode=2 # NMS
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
# βββ HIGH RECALL SETTINGS βββ
pre-cluster-threshold=0.10 # Very low β catch all persons
post-cluster-threshold=0.10
[class-attrs-all]
pre-cluster-threshold=0.10
topk=300
nms-iou-threshold=0.5
# For Option B: suppress non-person classes
# Set threshold=1.0 for classes 2-9 to filter them out
[class-attrs-0]
pre-cluster-threshold=0.10
[class-attrs-1]
pre-cluster-threshold=0.10
[class-attrs-2]
pre-cluster-threshold=1.0
[class-attrs-3]
pre-cluster-threshold=1.0
[class-attrs-4]
pre-cluster-threshold=1.0
[class-attrs-5]
pre-cluster-threshold=1.0
[class-attrs-6]
pre-cluster-threshold=1.0
[class-attrs-7]
pre-cluster-threshold=1.0
[class-attrs-8]
pre-cluster-threshold=1.0
[class-attrs-9]
pre-cluster-threshold=1.0
labels_person.txt (for YOLOv8n single-class):
person
labels_visdrone.txt (for YOLOv8m 10-class):
pedestrian
people
bicycle
car
van
truck
tricycle
awning-tricycle
bus
motor
deepstream_app_config.txt β main DeepStream app config:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
[source0]
enable=1
# For RTSP camera:
type=4
uri=rtsp://your_camera_ip:554/stream
# For USB camera:
#type=1
#camera-v4l2-dev-node=0
# For file:
#type=3
#uri=file:///path/to/beach_video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
[sink0]
enable=1
type=2 # EGL window
sync=0 # Async for max throughput
gpu-id=0
[osd]
enable=1
text-size=15
border-width=2
border-color=0;1;0;1
[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
width=1280
height=720
enable-padding=0
live-source=1 # Set to 1 for RTSP/USB, 0 for file
[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt
Step 5: Run DeepStream
# Run the pipeline
deepstream-app -c deepstream_app_config.txt
# Or use Python bindings for custom logic:
python3 deepstream_python_app.py
Step 6: Optional β Add Tracker for Person Counting
# Add to deepstream_app_config.txt for tracking across frames:
[tracker]
enable=1
tracker-width=640
tracker-height=480
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
βοΈ Confidence Threshold Guide
| Scenario | conf | imgsz | Model | Notes |
|---|---|---|---|---|
| Max Recall (recommended) | 0.10 | 1280 | yolov8m | Beach surveillance β miss nothing |
| Balanced | 0.20 | 640 | yolov8m | Good recall + precision |
| Edge Real-time | 0.15 | 640 | yolov8n | ~200 FPS on Jetson Orin |
| SAHI Wide FOV | 0.10 | 640 slices | yolov8m | Wide-angle cameras |
π Production Tips
Camera & Environment
- 30ft (9m) elevation is lower than typical VisDrone drone altitudes (60β130m) β persons appear larger β good generalization
- Beach lighting varies dramatically (sunrise/sunset/glare) β the model handles it via HSV augmentation
- For best results: collect 200+ labeled frames from your actual camera and fine-tune further (see below)
Key DeepStream Settings
- FP16 only β INT8 quantization degrades recall on small objects
interval=0β process every frame to avoid missing fast-moving personspre-cluster-threshold=0.10β low threshold maximizes person detection- Use NvDCF tracker β smooths detections across frames, improves effective recall
Further Fine-tuning on Your Beach Data
from ultralytics import YOLO
model = YOLO("weights/yolov8m_visdrone.pt")
model.train(
data="your_beach_data.yaml", # YOLO format: images/ + labels/ with 'person' class
epochs=25,
imgsz=1280,
batch=4,
lr0=0.001, # Low LR for fine-tuning
freeze=10, # Freeze backbone, train neck+head only
conf=0.001,
patience=10,
device=0,
)
Training Details
YOLOv8n-person (fine-tuned, 3.0M params)
- Base: mshamrai/yolov8n-visdrone (YOLOv8n trained on VisDrone 10-class)
- Fine-tuned: Head-only (backbone frozen) on person-only VisDrone subset (5,684 images, 106K boxes)
- Resolution: 640Γ640, Optimizer: Adam (lr=0.001)
- Augmentation: Mosaic, rotation Β±10Β°, HSV jitter, vertical flip
YOLOv8m-VisDrone (25.9M params)
- Source: mshamrai/yolov8m-visdrone
- Trained: On full VisDrone2019-DET (6,471 images, 10 classes, 343K boxes)
- Usage: Filter classes 0 (pedestrian) + 1 (people) via
classes=[0,1]or DeepStream threshold filtering
Literature & References
- VisDrone2019-DET β Aerial drone detection benchmark
- SAHI β Slicing Aided Hyper Inference (+12β14% AP for small objects)
- DeepStream-Yolo β NVIDIA DeepStream YOLOv8 integration
- Ultralytics YOLOv8 β Model documentation