--- tags: - ultralytics - yolov8 - object-detection - person-detection - beach-surveillance - aerial-view - deepstream - edge-deployment - nvidia - onnx - visdrone library_name: ultralytics license: agpl-3.0 datasets: - banu4prasad/VisDrone-Dataset pipeline_tag: object-detection base_model: - mshamrai/yolov8m-visdrone - mshamrai/yolov8n-visdrone model-index: - name: YOLOv8m-VisDrone (person classes) results: - task: type: object-detection dataset: name: VisDrone2019-DET val type: banu4prasad/VisDrone-Dataset metrics: - type: recall value: 0.456 name: Recall (pedestrian, conf=0.15) - type: precision value: 0.586 name: Precision (pedestrian, conf=0.15) - type: map50 value: 0.386 name: mAP@50 (all 10 classes, conf=0.10) --- # 🏖️ Beach Person Detector (YOLOv8) Single-class **person detection** models for **beach surveillance** from a **30ft elevated camera**, ready for **NVIDIA DeepStream** edge deployment. ## 🎯 Design Priorities 1. **HIGH RECALL** — miss no persons. False detections are acceptable, missed detections are NOT. 2. **Aerial/elevated view** — trained on VisDrone drone footage matching the 30ft camera perspective. 3. **Edge-ready** — ONNX exports optimized for TensorRT/DeepStream on Jetson and dGPU. --- ## ✅ Verified Evaluation Results All numbers measured on **VisDrone2019-DET val set** (531 images, 13,969 person boxes). ### YOLOv8n-person (single-class, fine-tuned) | conf | Precision | Recall | mAP@50 | mAP@50-95 | |------|-----------|--------|--------|-----------| | 0.05 | 0.535 | **0.378** | 0.355 | 0.111 | | 0.10 | 0.535 | **0.378** | 0.349 | 0.110 | | 0.15 | 0.535 | **0.378** | 0.341 | 0.108 | | 0.25 | 0.535 | **0.378** | 0.320 | 0.103 | | 0.50 | 0.727 | 0.255 | 0.215 | 0.075 | ### YOLOv8m-VisDrone (10-class, person classes extracted) Evaluated on full VisDrone val (548 images, 38,759 boxes across 10 classes). | conf | P (pedestrian) | R (pedestrian) | P (people) | R (people) | mAP@50 (all) | |------|----------------|----------------|------------|------------|--------------| | 0.10 | 0.603 | **0.447** | 0.621 | **0.331** | 0.386 | | 0.15 | 0.586 | **0.456** | 0.610 | **0.339** | 0.368 | | 0.25 | 0.680 | 0.408 | 0.693 | 0.290 | 0.337 | | 0.50 | 0.883 | 0.278 | 0.849 | 0.162 | 0.263 | > **Recommendation:** Use the **YOLOv8m-VisDrone** model at **conf=0.10–0.15** for maximum recall on person classes. The larger model gives **+7 percentage points higher recall** on pedestrians compared to YOLOv8n. ### ONNX Runtime Verification | ONNX File | Size | Input | Output | onnx.checker | Opset | |-----------|------|-------|--------|-------------|-------| | `yolov8n_person_640.onnx` | 12.2 MB | 1×3×640×640 | 1×5×8400 | ✅ | 12 | | `yolov8m_visdrone_640.onnx` | 103.6 MB | 1×3×640×640 | 1×14×8400 | ✅ | 12 | | `yolov8m_visdrone_1280.onnx` | 104.1 MB | 1×3×1280×1280 | 1×14×33600 | ✅ | 12 | --- ## Model Variants | Model | Params | ONNX File | Classes | Best For | ~FPS (Jetson Orin) | |-------|--------|-----------|---------|----------|--------------------| | **YOLOv8n-person** | 3.0M | `yolov8n_person_640.onnx` | 1 (person) | Real-time edge | ~200 FPS | | **YOLOv8m-VisDrone** ⭐ | 25.9M | `yolov8m_visdrone_640.onnx` | 10 (filter to person) | **Best accuracy** | ~60 FPS | | **YOLOv8m-VisDrone-1280** | 25.9M | `yolov8m_visdrone_1280.onnx` | 10 (filter to person) | Maximum recall | ~25 FPS | --- ## Quick Start ### Python (Ultralytics) ```python from ultralytics import YOLO # === Option A: Single-class person model (simpler) === model = YOLO("weights/yolov8n_person.pt") results = model.predict("beach.jpg", conf=0.15, imgsz=640, max_det=300) # === Option B: Full VisDrone model (higher recall — recommended) === model = YOLO("weights/yolov8m_visdrone.pt") results = model.predict( "beach.jpg", conf=0.10, # Low threshold for max recall imgsz=1280, # Higher resolution for small persons max_det=300, classes=[0, 1], # 0=pedestrian, 1=people ) for r in results: print(f"Detected {len(r.boxes)} persons") r.save("result.jpg") ``` ### SAHI Inference (Maximum Recall for Large/Wide Images) For wide-angle cameras at 30ft, SAHI sliced inference adds **+12–14% mAP** for small person detection: ```python # pip install sahi ultralytics from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction model = AutoDetectionModel.from_pretrained( model_type="ultralytics", model_path="weights/yolov8m_visdrone.pt", confidence_threshold=0.10, device="cuda:0", ) result = get_sliced_prediction( "beach_wide_angle.jpg", model, slice_height=640, slice_width=640, overlap_height_ratio=0.25, overlap_width_ratio=0.25, perform_standard_pred=True, postprocess_match_threshold=0.5, ) # Filter to person classes only: person_preds = [p for p in result.object_prediction_list if p.category.id in [0, 1]] print(f"Detected {len(person_preds)} persons via SAHI") ``` --- ## 🚀 Complete DeepStream Deployment Guide ### Prerequisites - NVIDIA Jetson (Orin / Xavier / Nano) or dGPU with DeepStream 6.2+ - TensorRT 8.x+ - [DeepStream-Yolo plugin](https://github.com/marcoslucianops/DeepStream-Yolo) ### Step 1: Download Model Files ```bash # Install huggingface CLI pip install huggingface_hub # Download the model you want huggingface-cli download Shashank022002/beach-person-detector-yolov8m \ onnx/yolov8m_visdrone_640.onnx \ config/nvinfer_config.txt \ config/labels.txt \ --local-dir ./beach-person-detector ``` ### Step 2: Build TensorRT Engine ```bash cd beach-person-detector # For the recommended YOLOv8m model (best accuracy): trtexec --onnx=onnx/yolov8m_visdrone_640.onnx \ --saveEngine=yolov8m_visdrone_640.engine \ --fp16 \ --workspace=4096 # ⚠️ Use FP16, NOT INT8 — INT8 degrades recall on small objects # For the lightweight YOLOv8n model (fastest): trtexec --onnx=onnx/yolov8n_person_640.onnx \ --saveEngine=yolov8n_person_640.engine \ --fp16 \ --workspace=4096 ``` ### Step 3: Install DeepStream-Yolo Plugin ```bash git clone https://github.com/marcoslucianops/DeepStream-Yolo.git cd DeepStream-Yolo # Build the custom parser library CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo # Adjust CUDA_VER to match your system (check with nvcc --version) ``` ### Step 4: Create DeepStream Config Files **`config_infer_primary.txt`** — nvinfer config for the person detector: ```ini [property] gpu-id=0 net-scale-factor=0.0039215697906911373 model-color-format=0 # ─── Choose ONE model ─── # Option A: Single-class person model (simplest) #onnx-file=yolov8n_person_640.onnx #model-engine-file=yolov8n_person_640.engine #num-detected-classes=1 #labelfile-path=labels_person.txt # Option B: Full VisDrone model (higher accuracy) ← RECOMMENDED onnx-file=yolov8m_visdrone_640.onnx model-engine-file=yolov8m_visdrone_640.engine num-detected-classes=10 labelfile-path=labels_visdrone.txt batch-size=1 network-mode=2 # FP16 interval=0 # Process EVERY frame (critical for surveillance) gie-unique-id=1 process-mode=1 # Primary detector network-type=0 # Detector cluster-mode=2 # NMS maintain-aspect-ratio=1 parse-bbox-func-name=NvDsInferParseYolo custom-lib-path=DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so # ─── HIGH RECALL SETTINGS ─── pre-cluster-threshold=0.10 # Very low → catch all persons post-cluster-threshold=0.10 [class-attrs-all] pre-cluster-threshold=0.10 topk=300 nms-iou-threshold=0.5 # For Option B: suppress non-person classes # Set threshold=1.0 for classes 2-9 to filter them out [class-attrs-0] pre-cluster-threshold=0.10 [class-attrs-1] pre-cluster-threshold=0.10 [class-attrs-2] pre-cluster-threshold=1.0 [class-attrs-3] pre-cluster-threshold=1.0 [class-attrs-4] pre-cluster-threshold=1.0 [class-attrs-5] pre-cluster-threshold=1.0 [class-attrs-6] pre-cluster-threshold=1.0 [class-attrs-7] pre-cluster-threshold=1.0 [class-attrs-8] pre-cluster-threshold=1.0 [class-attrs-9] pre-cluster-threshold=1.0 ``` **`labels_person.txt`** (for YOLOv8n single-class): ``` person ``` **`labels_visdrone.txt`** (for YOLOv8m 10-class): ``` pedestrian people bicycle car van truck tricycle awning-tricycle bus motor ``` **`deepstream_app_config.txt`** — main DeepStream app config: ```ini [application] enable-perf-measurement=1 perf-measurement-interval-sec=5 [tiled-display] enable=1 rows=1 columns=1 width=1280 height=720 [source0] enable=1 # For RTSP camera: type=4 uri=rtsp://your_camera_ip:554/stream # For USB camera: #type=1 #camera-v4l2-dev-node=0 # For file: #type=3 #uri=file:///path/to/beach_video.mp4 num-sources=1 gpu-id=0 cudadec-memtype=0 [sink0] enable=1 type=2 # EGL window sync=0 # Async for max throughput gpu-id=0 [osd] enable=1 text-size=15 border-width=2 border-color=0;1;0;1 [streammux] gpu-id=0 batch-size=1 batched-push-timeout=40000 width=1280 height=720 enable-padding=0 live-source=1 # Set to 1 for RTSP/USB, 0 for file [primary-gie] enable=1 gpu-id=0 gie-unique-id=1 nvbuf-memory-type=0 config-file=config_infer_primary.txt ``` ### Step 5: Run DeepStream ```bash # Run the pipeline deepstream-app -c deepstream_app_config.txt # Or use Python bindings for custom logic: python3 deepstream_python_app.py ``` ### Step 6: Optional — Add Tracker for Person Counting ```ini # Add to deepstream_app_config.txt for tracking across frames: [tracker] enable=1 tracker-width=640 tracker-height=480 gpu-id=0 ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml ``` --- ## ⚙️ Confidence Threshold Guide | Scenario | conf | imgsz | Model | Notes | |----------|------|-------|-------|-------| | **Max Recall** (recommended) | 0.10 | 1280 | yolov8m | Beach surveillance — miss nothing | | **Balanced** | 0.20 | 640 | yolov8m | Good recall + precision | | **Edge Real-time** | 0.15 | 640 | yolov8n | ~200 FPS on Jetson Orin | | **SAHI Wide FOV** | 0.10 | 640 slices | yolov8m | Wide-angle cameras | --- ## 📝 Production Tips ### Camera & Environment - **30ft (9m) elevation** is *lower* than typical VisDrone drone altitudes (60–130m) → persons appear *larger* → good generalization - Beach lighting varies dramatically (sunrise/sunset/glare) — the model handles it via HSV augmentation - For **best results**: collect 200+ labeled frames from your actual camera and fine-tune further (see below) ### Key DeepStream Settings 1. **FP16 only** — INT8 quantization degrades recall on small objects 2. **`interval=0`** — process every frame to avoid missing fast-moving persons 3. **`pre-cluster-threshold=0.10`** — low threshold maximizes person detection 4. **Use NvDCF tracker** — smooths detections across frames, improves effective recall ### Further Fine-tuning on Your Beach Data ```python from ultralytics import YOLO model = YOLO("weights/yolov8m_visdrone.pt") model.train( data="your_beach_data.yaml", # YOLO format: images/ + labels/ with 'person' class epochs=25, imgsz=1280, batch=4, lr0=0.001, # Low LR for fine-tuning freeze=10, # Freeze backbone, train neck+head only conf=0.001, patience=10, device=0, ) ``` --- ## Training Details ### YOLOv8n-person (fine-tuned, 3.0M params) - **Base**: [mshamrai/yolov8n-visdrone](https://huggingface.co/mshamrai/yolov8n-visdrone) (YOLOv8n trained on VisDrone 10-class) - **Fine-tuned**: Head-only (backbone frozen) on person-only VisDrone subset (5,684 images, 106K boxes) - **Resolution**: 640×640, Optimizer: Adam (lr=0.001) - **Augmentation**: Mosaic, rotation ±10°, HSV jitter, vertical flip ### YOLOv8m-VisDrone (25.9M params) - **Source**: [mshamrai/yolov8m-visdrone](https://huggingface.co/mshamrai/yolov8m-visdrone) - **Trained**: On full VisDrone2019-DET (6,471 images, 10 classes, 343K boxes) - **Usage**: Filter classes 0 (pedestrian) + 1 (people) via `classes=[0,1]` or DeepStream threshold filtering --- ## Literature & References - [VisDrone2019-DET](https://arxiv.org/abs/2001.06303) — Aerial drone detection benchmark - [SAHI](https://arxiv.org/abs/2202.06934) — Slicing Aided Hyper Inference (+12–14% AP for small objects) - [DeepStream-Yolo](https://github.com/marcoslucianops/DeepStream-Yolo) — NVIDIA DeepStream YOLOv8 integration - [Ultralytics YOLOv8](https://docs.ultralytics.com/) — Model documentation