Starting Qwen3.5-2B Metamath training Output: /data/pretrained_models/Qwen3.5-2B-metamath Effective batch size: 16 /data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] `torch_dtype` is deprecated! Use `dtype` instead! Loading weights: 0%| | 0/320 [00:00 main() File "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/tools/train_qwen35_metamath.py", line 339, in main trainer.train() File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1424, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1506, in _inner_training_loop self._run_epoch( File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1734, in _run_epoch tr_loss_step = self.training_step(model, inputs, num_items_in_batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1934, in training_step self.accelerator.backward(loss, **kwargs) File "/home/lg/.local/lib/python3.12/site-packages/accelerate/accelerator.py", line 2329, in backward loss.backward(**kwargs) File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/_tensor.py", line 625, in backward torch.autograd.backward( File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/autograd/__init__.py", line 354, in backward _engine_run_backward( File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.02 GiB. GPU 0 has a total capacity of 79.14 GiB of which 18.98 GiB is free. Including non-PyTorch memory, this process has 60.14 GiB memory in use. Of the allocated memory 56.74 GiB is allocated by PyTorch, and 2.36 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 0%| | 3/966 [00:45<4:00:55, 15.01s/it] Exception ignored in: Traceback (most recent call last): File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 80, in __del__ File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 89, in _stop File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 102, in _stop_locked AttributeError: '_thread.RLock' object has no attribute '_recursion_count'