Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/nxd_model/token_generation_model/_tp0_bk0/model.MODULE_0a59544d8a22ab5b06a0+747527b0.hlo_module.pb', '--output', '/tmp/nxd_model/token_generation_model/_tp0_bk0/model.MODULE_0a59544d8a22ab5b06a0+747527b0.neff', '--target=trn2', '--auto-cast=none', '--model-type=transformer', '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma ', '-O2', '--lnc=2', '--logfile=/tmp/nxd_model/token_generation_model/_tp0_bk0/log-neuron-cc.txt', '--enable-internal-neff-wrapper', '--verbose=35']: 2025-10-17T07:13:35Z 2025-10-17 07:13:34.954102: F hilo/hlo_passes/NeuronHloVerifier.cc:504] [ERROR] [NCC_VRF009] Memory requirement exceeds target architecture's HBM limit. Needed 17187432532 bytes (16 GB) vs. available 17179869184 bytes (16 GB). TIP: Consider using smaller batches or applying model parallelism