rowandwhelan's picture
Add TensorRT ScatterElements memory-corruption PoC
eb5799c verified
|
Raw
History Blame Contribute Delete
3.61 kB
metadata
library_name: tensorrt
tags:
  - security-research
  - vulnerability-reproduction
  - tensorrt
  - triton-inference-server

TensorRT ScatterElements GPU and mapped-host OOB write PoC

This repository demonstrates that the stock TensorRT ScatterElements plugin uses runtime indices in GPU pointer arithmetic without normalization or bounds validation.

Tested configurations:

  • TensorRT 11.1.0.106 on Windows
  • TensorRT 10.16.1.11 on Linux
  • Triton Inference Server 2.69.0 with TensorRT 10.16.1.11
  • NVIDIA RTX 3080 Laptop GPU

The artifacts are synthetic. They do not execute host code or access files.

Local primitive

Build the engine and run the local proof:

python make_engine.py
python run_probe.py

The proof checks three cases:

  1. An out-of-range positive index updates a separate CUDA allocation.
  2. The documented-valid index -1 writes before the output buffer instead of selecting its last element.
  3. An in-range control leaves the marker allocation unchanged.

host_pinned_probe.py additionally targets a benign cudaHostAllocPortable marker through its CUDA device mapping:

python host_pinned_probe.py

Expected result:

CROSS_ALLOCATION_WRITE=True
VALID_NEGATIVE_INDEX_OOB_WRITE=True
CONTROL_CLEAN=True
MAPPED_HOST_WRITE=True

Triton cross-model proof

The Linux plans were built with TensorRT 10.16.1.11:

  • scatter_elements_trt10.plan
  • victim_slow_trt10.plan

start_triton.sh expects the official Triton 2.69.0 standalone bundle under ~/triton-2.69/server/tritonserver and the Python environment under ~/triton-2.69/venv.

Run the control and attack:

./start_triton.sh
python triton_setup.py
python triton_race_probe.py --skip-load 0 200 0.0

./start_triton.sh
python triton_setup.py
python triton_race_probe.py --skip-load 740 200 0.0

triton_setup.py uses Triton's model-load API to automate operator setup. With --skip-load, triton_race_probe.py makes no model-control request; the attack phase consists solely of ordinary inference requests to the deployed scatter_writer model while a co-resident model is executing.

Expected result:

# control
victim_before=31343.25
victim_after=31343.25
TARGET_CHANGED=False

# attack
victim_before=31343.25
victim_after=32680.5
TARGET_CHANGED=True

The fixed index 740 was reproduced against fresh, uninstrumented Triton processes. cuda_trace.c is included only to document how the relative GPU offset was initially measured; it is not loaded for the final control or attack runs.

The same primitive reaches Triton's CUDA-pinned host pool. triton_host_metadata_output.txt records a remote inference request changing a live pool metadata word at 0x204c00030 from 0x10000000 to 0x44a72800. GDB was used only to read the before/after bytes; the server and attack request were uninstrumented.

Evidence

  • windows_stock_output.txt: local TensorRT 11.1 proof
  • linux_stock_output.txt: local TensorRT 10.16 proof
  • triton_control_output.txt: 200 in-range inference requests
  • triton_attack_repeatability.txt: three fresh Triton attack runs
  • windows_host_pinned_output.txt and linux_host_pinned_output.txt: mapped host-marker writes
  • triton_host_metadata_output.txt: remote write into live Triton pinned-pool metadata

SHA-256

02cc96cfa0c55c21058c0266ec85f72326f0c19cf245f01dac95f858d20c16fd  scatter_elements.engine
37640bd4f3d153cb23e3e147f8127d7ce0558065fd09de21593d9a26f52bc4ae  scatter_elements_trt10.plan
1f51adc48f98615d548a5a3a7f0f320e3c1b23e7f48205b21554ba850379d7b0  victim_slow_trt10.plan