Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/nxd_model/encoding/_tp0_bk0/model.MODULE_03ed4b3598344bc45191+fb4cc044.hlo_module.pb', '--output', '/tmp/nxd_model/encoding/_tp0_bk0/model.MODULE_03ed4b3598344bc45191+fb4cc044.neff', '--target=trn1', '--auto-cast=none', '--model-type=transformer', '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma ', '-O2', '--lnc=1', '--logfile=/tmp/nxd_model/encoding/_tp0_bk0/log-neuron-cc.txt', '--verbose=35']: 2026-02-09T16:40:38Z Pre-Partition Pre-Opt Histogram: total HLO instructions: 5611 convert 1055 18.80% ################################################################ reshape 766 13.65% ############################################## transpose 687 12.24% ######################################### slice 543 9.68% ################################ broadcast 478 8.52% ############################ multiply 363 6.47% ###################### parameter 328 5.85% ################### get-tuple-element 324 5.77% ################### constant 223 3.97% ############# call 217 3.87% ############# dot 181 3.23% ########## add 145 2.58% ######## concatenate 74 1.32% #### tuple 73 1.30% #### negate 72 1.28% #### all-reduce 72 1.28% #### gather 3 0.05% iota 3 0.05% sine 1 0.02% all-gather 1 0.02% cosine 1 0.02% reduce 1 0.02% Pre-Partition Post-Op Histogram: total HLO instructions: 4293 convert 911 21.22% ################################################################ reshape 794 18.50% ####################################################### transpose 398 9.27% ########################### parameter 328 7.64% ####################### constant 258 6.01% ################## slice 252 5.87% ################# multiply 218 5.08% ############### custom-call 217 5.05% ############### broadcast 184 4.29% ############ dot 180 4.19% ############ get-tuple-element 180 4.19% ############ add 144 3.35% ########## concatenate 74 1.72% ##### negate 72 1.68% ##### all-reduce 72 1.68% ##### gather 3 0.07% iota 3 0.07% sine 1 0.02% tuple 1 0.02% all-gather 1 0.02% cosine 1 0.02% reduce 1 0.02% Potential split-points stats: #CC 73 #AR 72 #AG 1 #BN 0 nClamp 0 ModuleSplitter initial partitioning... #parts 73 ModuleSplitter initial partitioning... Done. 0 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 72 New disjoint wave: start 2 len 70 NumReps: 35 macs 616929102397440 First non-zero-mac/used part from the end is 72 Not enough zero-mac parts. skip ModuleSplitter initial partitioning... #parts 37 ModuleSplitter initial partitioning... Done. Remat: gather-iota 0 matches, 0 ops rematted Wrote HLO netlist to hlo_netlist.json Wrote graph partitions in debug_info_hlo_partitions.json Processing partition 0 Replaced 0 dropout sequences with OffloadedDropout HLO Ops used in computation: add all-gather all-reduce broadcast concatenate constant convert cosine custom-call dot gather get-tuple-element multiply negate parameter reshape sine slice transpose tuple Invoking RemoveOptimizationBarriers pass Processing partition 1 2026-02-09 16:40:38.383906: F hilo/hlo_passes/NeuronHloVerifier.cc:504] [ERROR] [NCC_VRF009] Memory requirement exceeds target architecture's HBM limit. Needed 22036949508 bytes (20 GB) vs. available 17179869184 bytes (16 GB). TIP: Consider using smaller batches or applying model parallelism