Add TensorRT ScatterElements memory-corruption PoC

eb5799c verified 9 days ago

3.61 kB

	---
	library_name: tensorrt
	tags:
	- security-research
	- vulnerability-reproduction
	- tensorrt
	- triton-inference-server
	---

	# TensorRT ScatterElements GPU and mapped-host OOB write PoC

	This repository demonstrates that the stock TensorRT `ScatterElements` plugin uses runtime indices in GPU pointer arithmetic without normalization or bounds validation.

	Tested configurations:

	- TensorRT `11.1.0.106` on Windows
	- TensorRT `10.16.1.11` on Linux
	- Triton Inference Server `2.69.0` with TensorRT `10.16.1.11`
	- NVIDIA RTX 3080 Laptop GPU

	The artifacts are synthetic. They do not execute host code or access files.

	## Local primitive

	Build the engine and run the local proof:

	```bash
	python make_engine.py
	python run_probe.py
	```

	The proof checks three cases:

	1. An out-of-range positive index updates a separate CUDA allocation.
	2. The documented-valid index `-1` writes before the output buffer instead of selecting its last element.
	3. An in-range control leaves the marker allocation unchanged.

	`host_pinned_probe.py` additionally targets a benign `cudaHostAllocPortable` marker through its CUDA device mapping:

	```bash
	python host_pinned_probe.py
	```

	Expected result:

	```text
	CROSS_ALLOCATION_WRITE=True
	VALID_NEGATIVE_INDEX_OOB_WRITE=True
	CONTROL_CLEAN=True
	MAPPED_HOST_WRITE=True
	```

	## Triton cross-model proof

	The Linux plans were built with TensorRT `10.16.1.11`:

	- `scatter_elements_trt10.plan`
	- `victim_slow_trt10.plan`

	`start_triton.sh` expects the official Triton `2.69.0` standalone bundle under `~/triton-2.69/server/tritonserver` and the Python environment under `~/triton-2.69/venv`.

	Run the control and attack:

	```bash
	./start_triton.sh
	python triton_setup.py
	python triton_race_probe.py --skip-load 0 200 0.0

	./start_triton.sh
	python triton_setup.py
	python triton_race_probe.py --skip-load 740 200 0.0
	```

	`triton_setup.py` uses Triton's model-load API to automate operator setup. With `--skip-load`, `triton_race_probe.py` makes no model-control request; the attack phase consists solely of ordinary inference requests to the deployed `scatter_writer` model while a co-resident model is executing.

	Expected result:

	```text
	# control
	victim_before=31343.25
	victim_after=31343.25
	TARGET_CHANGED=False

	# attack
	victim_before=31343.25
	victim_after=32680.5
	TARGET_CHANGED=True
	```

	The fixed index `740` was reproduced against fresh, uninstrumented Triton processes. `cuda_trace.c` is included only to document how the relative GPU offset was initially measured; it is not loaded for the final control or attack runs.

	The same primitive reaches Triton's CUDA-pinned host pool. `triton_host_metadata_output.txt` records a remote inference request changing a live pool metadata word at `0x204c00030` from `0x10000000` to `0x44a72800`. GDB was used only to read the before/after bytes; the server and attack request were uninstrumented.

	## Evidence

	- `windows_stock_output.txt`: local TensorRT 11.1 proof
	- `linux_stock_output.txt`: local TensorRT 10.16 proof
	- `triton_control_output.txt`: 200 in-range inference requests
	- `triton_attack_repeatability.txt`: three fresh Triton attack runs
	- `windows_host_pinned_output.txt` and `linux_host_pinned_output.txt`: mapped host-marker writes
	- `triton_host_metadata_output.txt`: remote write into live Triton pinned-pool metadata

	## SHA-256

	```text
	02cc96cfa0c55c21058c0266ec85f72326f0c19cf245f01dac95f858d20c16fd scatter_elements.engine
	37640bd4f3d153cb23e3e147f8127d7ce0558065fd09de21593d9a26f52bc4ae scatter_elements_trt10.plan
	1f51adc48f98615d548a5a3a7f0f320e3c1b23e7f48205b21554ba850379d7b0 victim_slow_trt10.plan
	```