Volumetric-PTv3: 3D Semantic Segmentation for Airborne LiDAR (8-bit QAT)

Official repository for Volumetric-PTv3, an edge-optimized Deep Residual Network designed for high-accuracy 3D semantic segmentation of large-scale Aerial LiDAR point clouds. This model features strict 8-bit Quantization-Aware Training (QAT) boundaries to maintain peak accuracy while freezing the execution parameters into static graphs for edge hardware targets.

Permanently Indexed Artifact Record (DOI): 10.57967/hf/8876
Target Hardware Optimization Matrix: Qualcomm Cloud AI 100 Ultra / Standard Core execution layouts

Model Architecture Details

The backbone consists of a 128-channel Deep Residual Network utilizing static shortcuts to prevent runtime dynamic tracing overhead.

Input Dimension: Fixed matrix tracking block 1 x 8192 x 14 (14D context feature vector space mapping globally standardized airborne sweeps)
Quantization Layer: Simulated 8-bit weights and activations tensor quantization during backward pass optimization loops
Export Configuration: Target compiled under strict ONNX Opset 16 parameter rules

Training Progression & Ablation Analysis

The network was trained using the Dayton Aerial LiDAR Elevation Subsets (DALES) dataset under strict cross-entropy loss tracking functions.

Optimization Phase	Context Parameters	Model Layout	Validation mIoU
Baseline	Min-Max Scaling	Shallow Linear (64-Ch)	8.58%
Spatial Context Adjustment	Global Standardization (k=8)	Shallow Linear (64-Ch)	21.40%
Capacity Optimization	Global Standardization (k=8)	Deep Residual (128-Ch)	37.22%
Hardware Freeze (Post-QAT)	Fixed-Point Inference Engine	Static ONNX Opset 16 Graph	36.98%

Compute & Training Investment

Total Pipeline Execution Wall Time: ~15 Hours (Complete joint training and validation cycle)
Optimization Strategy: 8-bit Quantization-Aware Training (QAT) simulated backward pass loops

Quickstart: Programmatic Usage

To ingest the model weights checkpoint and construct inference runs, extract the asset components directly using the snippet below:

import torch
from huggingface_hub import hf_hub_download

# Download the identical binary weights asset from the DOI tracking tree
model_path = hf_hub_download(
    repo_id="Debanjan24/volumetric-ptv3-qat-8bit",
    filename="volumetric_ptv3_qat_8bit (4).pth"
)

print(f"[+] Asset synchronized securely to cache: {model_path}")

# Load parameter dictionaries natively into your local architecture block
# checkpoint = torch.load(model_path, map_location="cpu")
# model.load_state_dict(checkpoint["model_state_dict"])

## References & Literature Baseline

Our architecture translates core breakthroughs from the following foundational works into an edge-optimized deployment pipeline:

1. **Point Transformer V3:** Utilized for sparse spatial feature extraction blocks across irregular 3D coordinate distributions.
2. **Deep Residual Learning (ResNet):** Shortcut mapping Topologies adapted to sustain stable gradient flow across the 128-channel dense feature layers.
3. **Quantization-Aware Training (QAT):** Implemented simulated integer-arithmetic optimization loops to guarantee low-latency edge deployment viability.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support