Volumetric-PTv3: 3D Semantic Segmentation for Airborne LiDAR (8-bit QAT)
Official repository for Volumetric-PTv3, an edge-optimized Deep Residual Network designed for high-accuracy 3D semantic segmentation of large-scale Aerial LiDAR point clouds. This model features strict 8-bit Quantization-Aware Training (QAT) boundaries to maintain peak accuracy while freezing the execution parameters into static graphs for edge hardware targets.
- Permanently Indexed Artifact Record (DOI): 10.57967/hf/8876
- Target Hardware Optimization Matrix: Qualcomm Cloud AI 100 Ultra / Standard Core execution layouts
Model Architecture Details
The backbone consists of a 128-channel Deep Residual Network utilizing static shortcuts to prevent runtime dynamic tracing overhead.
- Input Dimension: Fixed matrix tracking block 1 x 8192 x 14 (14D context feature vector space mapping globally standardized airborne sweeps)
- Quantization Layer: Simulated 8-bit weights and activations tensor quantization during backward pass optimization loops
- Export Configuration: Target compiled under strict ONNX Opset 16 parameter rules
Training Progression & Ablation Analysis
The network was trained using the Dayton Aerial LiDAR Elevation Subsets (DALES) dataset under strict cross-entropy loss tracking functions.
| Optimization Phase | Context Parameters | Model Layout | Validation mIoU |
|---|---|---|---|
| Baseline | Min-Max Scaling | Shallow Linear (64-Ch) | 8.58% |
| Spatial Context Adjustment | Global Standardization (k=8) | Shallow Linear (64-Ch) | 21.40% |
| Capacity Optimization | Global Standardization (k=8) | Deep Residual (128-Ch) | 37.22% |
| Hardware Freeze (Post-QAT) | Fixed-Point Inference Engine | Static ONNX Opset 16 Graph | 36.98% |
Compute & Training Investment
- Total Pipeline Execution Wall Time: ~15 Hours (Complete joint training and validation cycle)
- Optimization Strategy: 8-bit Quantization-Aware Training (QAT) simulated backward pass loops
Quickstart: Programmatic Usage
To ingest the model weights checkpoint and construct inference runs, extract the asset components directly using the snippet below:
import torch
from huggingface_hub import hf_hub_download
# Download the identical binary weights asset from the DOI tracking tree
model_path = hf_hub_download(
repo_id="Debanjan24/volumetric-ptv3-qat-8bit",
filename="volumetric_ptv3_qat_8bit (4).pth"
)
print(f"[+] Asset synchronized securely to cache: {model_path}")
# Load parameter dictionaries natively into your local architecture block
# checkpoint = torch.load(model_path, map_location="cpu")
# model.load_state_dict(checkpoint["model_state_dict"])
## References & Literature Baseline
Our architecture translates core breakthroughs from the following foundational works into an edge-optimized deployment pipeline:
1. **Point Transformer V3:** Utilized for sparse spatial feature extraction blocks across irregular 3D coordinate distributions.
2. **Deep Residual Learning (ResNet):** Shortcut mapping Topologies adapted to sustain stable gradient flow across the 128-channel dense feature layers.
3. **Quantization-Aware Training (QAT):** Implemented simulated integer-arithmetic optimization loops to guarantee low-latency edge deployment viability.