Multi-GPU Quantum Circuit Simulation and the Impact of Network Performance
Abstract
Multi-GPU quantum simulation performance improvements are primarily driven by enhanced interconnect technology rather than GPU architecture advancements, with over 16X faster time-to-solution achieved through better inter-node communication.
As is intrinsic to the fundamental goal of quantum computing, classical simulation of quantum algorithms is notoriously demanding in resource requirements. Nonetheless, simulation is critical to the success of the field and a requirement for algorithm development and validation, as well as hardware design. GPU-acceleration has become standard practice for simulation, and due to the exponential scaling inherent in classical methods, multi-GPU simulation can be required to achieve representative system sizes. In this case, inter-GPU communications can bottleneck performance. In this work, we present the introduction of MPI into the QED-C Application-Oriented Benchmarks to facilitate benchmarking on HPC systems. We review the advances in interconnect technology and the APIs for multi-GPU communication. We benchmark using a variety of interconnect paths, including the recent NVIDIA Grace Blackwell NVL72 architecture that represents the first product to expand high-bandwidth GPU-specialized interconnects across multiple nodes. We show that while improvements to GPU architecture have led to speedups of over 4.5X across the last few generations of GPUs, advances in interconnect performance have had a larger impact with over 16X performance improvements in time to solution for multi-GPU simulations.
Get this paper in your agent:
hf papers read 2511.14664 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper