Volumetric-PTv3: 3D Semantic Segmentation for Airborne LiDAR (8-bit QAT)

Official repository for Volumetric-PTv3, an edge-optimized Deep Residual Network designed for high-accuracy 3D semantic segmentation of large-scale Aerial LiDAR point clouds. This model features strict 8-bit Quantization-Aware Training (QAT) boundaries to maintain peak accuracy while freezing the execution parameters into static graphs for edge hardware targets.

  • Permanently Indexed Artifact Record (DOI): 10.57967/hf/8876
  • Target Hardware Optimization Matrix: Qualcomm Cloud AI 100 Ultra / Standard Core execution layouts

Model Architecture Details

The backbone consists of a 128-channel Deep Residual Network utilizing static shortcuts to prevent runtime dynamic tracing overhead.

  • Input Dimension: Fixed matrix tracking block 1 x 8192 x 14 (14D context feature vector space mapping globally standardized airborne sweeps)
  • Quantization Layer: Simulated 8-bit weights and activations tensor quantization during backward pass optimization loops
  • Export Configuration: Target compiled under strict ONNX Opset 16 parameter rules

Training Progression & Ablation Analysis

The network was trained using the Dayton Aerial LiDAR Elevation Subsets (DALES) dataset under strict cross-entropy loss tracking functions.

Optimization Phase Context Parameters Model Layout Validation mIoU
Baseline Min-Max Scaling Shallow Linear (64-Ch) 8.58%
Spatial Context Adjustment Global Standardization (k=8) Shallow Linear (64-Ch) 21.40%
Capacity Optimization Global Standardization (k=8) Deep Residual (128-Ch) 37.22%
Hardware Freeze (Post-QAT) Fixed-Point Inference Engine Static ONNX Opset 16 Graph 36.98%

Compute & Training Investment

  • Total Pipeline Execution Wall Time: ~15 Hours (Complete joint training and validation cycle)
  • Optimization Strategy: 8-bit Quantization-Aware Training (QAT) simulated backward pass loops

Quickstart: Programmatic Usage

To ingest the model weights checkpoint and construct inference runs, extract the asset components directly using the snippet below:

import torch
from huggingface_hub import hf_hub_download

# Download the identical binary weights asset from the DOI tracking tree
model_path = hf_hub_download(
    repo_id="Debanjan24/volumetric-ptv3-qat-8bit",
    filename="volumetric_ptv3_qat_8bit (4).pth"
)

print(f"[+] Asset synchronized securely to cache: {model_path}")

# Load parameter dictionaries natively into your local architecture block
# checkpoint = torch.load(model_path, map_location="cpu")
# model.load_state_dict(checkpoint["model_state_dict"])

## References & Literature Baseline

Our architecture translates core breakthroughs from the following foundational works into an edge-optimized deployment pipeline:

1. **Point Transformer V3:** Utilized for sparse spatial feature extraction blocks across irregular 3D coordinate distributions.
2. **Deep Residual Learning (ResNet):** Shortcut mapping Topologies adapted to sustain stable gradient flow across the 128-channel dense feature layers.
3. **Quantization-Aware Training (QAT):** Implemented simulated integer-arithmetic optimization loops to guarantee low-latency edge deployment viability.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support