| Lmod has detected the following error: The following module(s) are unknown: |
| "buildenv-gcccuda/12.1.1-gcc12.3.0" |
|
|
| Please check the spelling or version number. Also try "module spider ..." |
| It is also possible your cache file is out-of-date; it may help to try: |
| $ module --ignore_cache load "buildenv-gcccuda/12.1.1-gcc12.3.0" |
|
|
| Also make sure that all modulefiles written in TCL start with the string |
| |
|
|
|
|
|
|
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:19,804][pytorch_lightning.utilities.rank_zero][INFO] - Using 16bit Automatic Mixed Precision (AMP) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| [2026-03-11 19:10:19,926][pytorch_lightning.utilities.rank_zero][INFO] - GPU available: True (cuda), used: True |
| [2026-03-11 19:10:19,926][pytorch_lightning.utilities.rank_zero][INFO] - TPU available: False, using: 0 TPU cores |
| [2026-03-11 19:10:19,926][pytorch_lightning.utilities.rank_zero][INFO] - IPU available: False, using: 0 IPUs |
| [2026-03-11 19:10:19,926][pytorch_lightning.utilities.rank_zero][INFO] - HPU available: False, using: 0 HPUs |
| [2026-03-11 19:10:19,927][pytorch_lightning.utilities.rank_zero][INFO] - `Trainer(limit_val_batches=1)` was configured so 1 batch will be used. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py:74: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| ckpt = torch.load(checkpoint_path, map_location=torch.device('cpu')) |
| [2026-03-11 19:10:21,692][pytorch_lightning.utilities.rank_zero][INFO] - Model weights loaded. |
| INFO: Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:27,411][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:27,563][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:27,647][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:27,760][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 |
| [2026-03-11 19:10:27,891][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:28,117][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:28,205][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| Will load checkpoint from /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:10:28,486][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 |
| [2026-03-11 19:10:32,855][pytorch_lightning.utilities.rank_zero][INFO] - ---------------------------------------------------------------------------------------------------- |
| distributed_backend=nccl |
| All distributed processes registered. Starting with 8 processes |
| ---------------------------------------------------------------------------------------------------- |
|
|
| wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id stage_b_offline. |
| wandb: Tracking run with wandb version 0.17.9 |
| wandb: W&B syncing is set to `offline` in this directory. |
| wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. |
| INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:10:47,256][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: |
| | Name | Type | Params |
| --------------------------------------------------------------------------------- |
| 0 | diffusion_model | DiffusionMamba | 609 M |
| 1 | validation_lpips_model | LearnedPerceptualImagePatchSimilarity | 2.5 M |
| 2 | vae | AutoencoderKL | 229 M |
| 3 | mamba_memory | BiMambaMemory | 4.5 M |
| 4 | pose_prediction_model | PosePredictionNet | 200 K |
| --------------------------------------------------------------------------------- |
| 609 M Trainable params |
| 236 M Non-trainable params |
| 846 M Total params |
| 3,384.157 Total estimated model params size (MB) |
| [2026-03-11 19:10:49,730][lightning.pytorch.callbacks.model_summary][INFO] - |
| | Name | Type | Params |
| --------------------------------------------------------------------------------- |
| 0 | diffusion_model | DiffusionMamba | 609 M |
| 1 | validation_lpips_model | LearnedPerceptualImagePatchSimilarity | 2.5 M |
| 2 | vae | AutoencoderKL | 229 M |
| 3 | mamba_memory | BiMambaMemory | 4.5 M |
| 4 | pose_prediction_model | PosePredictionNet | 200 K |
| --------------------------------------------------------------------------------- |
| 609 M Trainable params |
| 236 M Non-trainable params |
| 846 M Total params |
| 3,384.157 Total estimated model params size (MB) |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,787][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,787][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,787][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,787][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,787][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,788][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,788][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:10:50,788][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs53b29da5af195896000002b6' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs882f64acc10a0ef1000002b7' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsa8e6dab3b752a107000002b8' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsba38b4f40f4714d1000002b9' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsb91cc4a09cf6e99d000002ba' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs6b92336da45a5c8a000002bb' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsa9c70a3d6ad255ef000002bd' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs7a8f466ece31debf000002bc' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs41ffd7a467d377ed000002be' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs4d8841d5f4a12275000002bf' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs65bc92a9ef8359a9000002c0' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs8b9c4e24d7623df2000002c1' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs6d6a1d30d78b9b40000002c2' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs1074dc4f03a07269000002c3' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs4adc9f3da03a6c76000002c4' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsafa07ac932f72342000002c5' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsd5c554b98bfe3bf5000002c6' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs0f0a74752cdff2bb000002c7' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsedc297dae670e45a000002c8' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs320ed0974425537a000002c9' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsc6538662702932fc000002ca' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs0cbe62abfd9de375000002cb' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsfc39db7b541e8b92000002cc' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs3cffa4b77a5a93d5000002cd' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfse45ab7cb9f42a894000002ce' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsd7470719afe76e80000002cf' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsc71c3b699433fe49000002d0' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs1b7bed37db55e66c000002d1' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank4]: Traceback (most recent call last): |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank4]: return _run_code(code, main_globals, None, |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank4]: exec(code, run_globals) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank4]: run() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank4]: _run_hydra( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank4]: _run_app( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank4]: run_and_report( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank4]: raise ex |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank4]: return func() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank4]: lambda: hydra.run( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank4]: _ = ret.return_value |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank4]: raise self._return_value |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank4]: ret.return_value = task_function(task_cfg) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank4]: run_local(cfg) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank4]: experiment.exec_task(task) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank4]: getattr(self, task)() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank4]: trainer.fit( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank4]: call._call_and_handle_interrupt( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank4]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank4]: return function(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank4]: self._run(model, ckpt_path=ckpt_path) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank4]: results = self._run_stage() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank4]: self.fit_loop.run() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank4]: self.advance() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank4]: self.epoch_loop.run(self._data_fetcher) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank4]: self.advance(data_fetcher) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank4]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank4]: self._optimizer_step(batch_idx, closure) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank4]: call._call_lightning_module_hook( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank4]: output = fn(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank4]: optimizer.step(closure=optimizer_closure) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank4]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank4]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank4]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank4]: closure_result = closure() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank4]: self._result = self.closure(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank4]: return func(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank4]: step_output = self._step_fn() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank4]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank4]: output = fn(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank4]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank4]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank4]: return self._call_impl(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank4]: return forward_call(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank4]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank4]: return self.module(*inputs, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank4]: return self._call_impl(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank4]: return forward_call(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank4]: out = method(*_args, **_kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank4]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank4]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank7]: Traceback (most recent call last): |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank7]: return _run_code(code, main_globals, None, |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank7]: exec(code, run_globals) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank7]: run() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank7]: _run_hydra( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank7]: _run_app( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank7]: run_and_report( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank7]: raise ex |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank7]: return func() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank7]: lambda: hydra.run( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank7]: _ = ret.return_value |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank7]: raise self._return_value |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank7]: ret.return_value = task_function(task_cfg) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank7]: run_local(cfg) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank7]: experiment.exec_task(task) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank7]: getattr(self, task)() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank7]: trainer.fit( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank7]: call._call_and_handle_interrupt( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank7]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank7]: return function(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank7]: self._run(model, ckpt_path=ckpt_path) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank7]: results = self._run_stage() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank7]: self.fit_loop.run() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank7]: self.advance() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank7]: self.epoch_loop.run(self._data_fetcher) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank7]: self.advance(data_fetcher) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank7]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank7]: self._optimizer_step(batch_idx, closure) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank7]: call._call_lightning_module_hook( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank7]: output = fn(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank7]: optimizer.step(closure=optimizer_closure) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank7]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank7]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank7]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank7]: closure_result = closure() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank7]: self._result = self.closure(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank7]: return func(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank7]: step_output = self._step_fn() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank7]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank7]: output = fn(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank7]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank7]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank7]: return self._call_impl(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank7]: return forward_call(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank7]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank7]: return self.module(*inputs, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank7]: return self._call_impl(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank7]: return forward_call(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank7]: out = method(*_args, **_kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank7]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank7]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsd62f175f2b87b4b7000002d2' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs14c3854683fc8e21000002d3' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs43121d8c4ab15ad1000002d4' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank2]: Traceback (most recent call last): |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank2]: return _run_code(code, main_globals, None, |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank2]: exec(code, run_globals) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank2]: run() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank2]: _run_hydra( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank2]: _run_app( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank2]: run_and_report( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank2]: raise ex |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank2]: return func() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank2]: lambda: hydra.run( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank2]: _ = ret.return_value |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank2]: raise self._return_value |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank2]: ret.return_value = task_function(task_cfg) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank2]: run_local(cfg) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank2]: experiment.exec_task(task) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank2]: getattr(self, task)() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank2]: trainer.fit( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank2]: call._call_and_handle_interrupt( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank2]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank2]: return function(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank2]: self._run(model, ckpt_path=ckpt_path) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank2]: results = self._run_stage() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank2]: self.fit_loop.run() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank2]: self.advance() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank2]: self.epoch_loop.run(self._data_fetcher) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank2]: self.advance(data_fetcher) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank2]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank2]: self._optimizer_step(batch_idx, closure) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank2]: call._call_lightning_module_hook( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank2]: output = fn(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank2]: optimizer.step(closure=optimizer_closure) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank2]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank2]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank2]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank2]: closure_result = closure() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank2]: self._result = self.closure(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank2]: return func(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank2]: step_output = self._step_fn() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank2]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank2]: output = fn(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank2]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank2]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank2]: return self._call_impl(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank2]: return forward_call(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank2]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank2]: return self.module(*inputs, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank2]: return self._call_impl(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank2]: return forward_call(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank2]: out = method(*_args, **_kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank2]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank2]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank6]: Traceback (most recent call last): |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank6]: return _run_code(code, main_globals, None, |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank6]: exec(code, run_globals) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank6]: run() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank6]: _run_hydra( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank6]: _run_app( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank6]: run_and_report( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank6]: raise ex |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank6]: return func() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank6]: lambda: hydra.run( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank6]: _ = ret.return_value |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank6]: raise self._return_value |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank6]: ret.return_value = task_function(task_cfg) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank6]: run_local(cfg) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank6]: experiment.exec_task(task) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank6]: getattr(self, task)() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank6]: trainer.fit( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank6]: call._call_and_handle_interrupt( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank6]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank6]: return function(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank6]: self._run(model, ckpt_path=ckpt_path) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank6]: results = self._run_stage() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank6]: self.fit_loop.run() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank6]: self.advance() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank6]: self.epoch_loop.run(self._data_fetcher) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank6]: self.advance(data_fetcher) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank6]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank6]: self._optimizer_step(batch_idx, closure) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank6]: call._call_lightning_module_hook( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank6]: output = fn(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank6]: optimizer.step(closure=optimizer_closure) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank6]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank6]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank6]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank6]: closure_result = closure() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank6]: self._result = self.closure(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank6]: return func(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank6]: step_output = self._step_fn() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank6]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank6]: output = fn(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank6]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank6]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank6]: return self._call_impl(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank6]: return forward_call(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank6]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank6]: return self.module(*inputs, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank6]: return self._call_impl(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank6]: return forward_call(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank6]: out = method(*_args, **_kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank6]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank6]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| [rank5]: Traceback (most recent call last): |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank5]: return _run_code(code, main_globals, None, |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank5]: exec(code, run_globals) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank5]: run() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank5]: _run_hydra( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank5]: _run_app( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank5]: run_and_report( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank5]: raise ex |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank5]: return func() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank5]: lambda: hydra.run( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank5]: _ = ret.return_value |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank5]: raise self._return_value |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank5]: ret.return_value = task_function(task_cfg) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank5]: run_local(cfg) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank5]: experiment.exec_task(task) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank5]: getattr(self, task)() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank5]: trainer.fit( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank5]: call._call_and_handle_interrupt( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank5]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank5]: return function(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank5]: self._run(model, ckpt_path=ckpt_path) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank5]: results = self._run_stage() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank5]: self.fit_loop.run() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank5]: self.advance() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank5]: self.epoch_loop.run(self._data_fetcher) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank5]: self.advance(data_fetcher) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank5]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank5]: self._optimizer_step(batch_idx, closure) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank5]: call._call_lightning_module_hook( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank5]: output = fn(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank5]: optimizer.step(closure=optimizer_closure) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank5]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank5]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank5]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank5]: closure_result = closure() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank5]: self._result = self.closure(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank5]: return func(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank5]: step_output = self._step_fn() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank5]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank5]: output = fn(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank5]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank5]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank5]: return self._call_impl(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank5]: return forward_call(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank5]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank5]: return self.module(*inputs, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank5]: return self._call_impl(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank5]: return forward_call(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank5]: out = method(*_args, **_kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank5]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank5]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank1]: Traceback (most recent call last): |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank1]: return _run_code(code, main_globals, None, |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank1]: exec(code, run_globals) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank1]: run() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank1]: _run_hydra( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank1]: _run_app( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank1]: run_and_report( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank1]: raise ex |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank1]: return func() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank1]: lambda: hydra.run( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank1]: _ = ret.return_value |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank1]: raise self._return_value |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank1]: ret.return_value = task_function(task_cfg) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank1]: run_local(cfg) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank1]: experiment.exec_task(task) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank1]: getattr(self, task)() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank1]: trainer.fit( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank1]: call._call_and_handle_interrupt( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank1]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank1]: return function(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank1]: self._run(model, ckpt_path=ckpt_path) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank1]: results = self._run_stage() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank1]: self.fit_loop.run() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank1]: self.advance() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank1]: self.epoch_loop.run(self._data_fetcher) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank1]: self.advance(data_fetcher) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank1]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank1]: self._optimizer_step(batch_idx, closure) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank1]: call._call_lightning_module_hook( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank1]: output = fn(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank1]: optimizer.step(closure=optimizer_closure) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank1]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank1]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank1]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank1]: closure_result = closure() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank1]: self._result = self.closure(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank1]: return func(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank1]: step_output = self._step_fn() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank1]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank1]: output = fn(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank1]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank1]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank1]: return self._call_impl(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank1]: return forward_call(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank1]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank1]: return self.module(*inputs, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank1]: return self._call_impl(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank1]: return forward_call(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank1]: out = method(*_args, **_kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank1]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank1]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| [rank3]: Traceback (most recent call last): |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank3]: return _run_code(code, main_globals, None, |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank3]: exec(code, run_globals) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank3]: run() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank3]: _run_hydra( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank3]: _run_app( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank3]: run_and_report( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank3]: raise ex |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank3]: return func() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank3]: lambda: hydra.run( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank3]: _ = ret.return_value |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank3]: raise self._return_value |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank3]: ret.return_value = task_function(task_cfg) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank3]: run_local(cfg) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank3]: experiment.exec_task(task) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank3]: getattr(self, task)() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank3]: trainer.fit( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank3]: call._call_and_handle_interrupt( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank3]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank3]: return function(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank3]: self._run(model, ckpt_path=ckpt_path) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank3]: results = self._run_stage() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank3]: self.fit_loop.run() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank3]: self.advance() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank3]: self.epoch_loop.run(self._data_fetcher) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank3]: self.advance(data_fetcher) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank3]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank3]: self._optimizer_step(batch_idx, closure) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank3]: call._call_lightning_module_hook( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank3]: output = fn(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank3]: optimizer.step(closure=optimizer_closure) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank3]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank3]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank3]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank3]: closure_result = closure() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank3]: self._result = self.closure(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank3]: return func(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank3]: step_output = self._step_fn() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank3]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank3]: output = fn(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank3]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank3]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank3]: return self._call_impl(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank3]: return forward_call(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank3]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank3]: return self.module(*inputs, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank3]: return self._call_impl(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank3]: return forward_call(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank3]: out = method(*_args, **_kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank3]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank3]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', '+customized_load=true', '+seperate_load=false', 'experiment.num_nodes=1', 'load=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_a_128/checkpoints/epoch0_step2000.ckpt', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=2', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=30000', 'experiment.validation.val_every_n_step=2500', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| return _run_code(code, main_globals, None, |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| exec(code, run_globals) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| run() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| _run_hydra( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| _run_app( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| run_and_report( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| raise ex |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| return func() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| lambda: hydra.run( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| _ = ret.return_value |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| raise self._return_value |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| ret.return_value = task_function(task_cfg) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| run_local(cfg) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| experiment.exec_task(task) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| getattr(self, task)() |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| trainer.fit( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| call._call_and_handle_interrupt( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| return function(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| results = self._run_stage() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| self.fit_loop.run() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| self.advance() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| self.epoch_loop.run(self._data_fetcher) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| self.advance(data_fetcher) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| self._optimizer_step(batch_idx, closure) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| call._call_lightning_module_hook( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| output = fn(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| optimizer.step(closure=optimizer_closure) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| closure_result = closure() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| self._result = self.closure(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| return func(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| step_output = self._step_fn() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| output = fn(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| wrapper_output = wrapper_module(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| else self._run_ddp_forward(*inputs, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| return self.module(*inputs, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| out = method(*_args, **_kwargs) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| self.log("training/curriculum_phase", float(phase_idx)) |
| TypeError: float() argument must be a string or a real number, not 'NoneType' |
| [rank0]: Traceback (most recent call last): |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank0]: return _run_code(code, main_globals, None, |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank0]: exec(code, run_globals) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank0]: run() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank0]: _run_hydra( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank0]: _run_app( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank0]: run_and_report( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank0]: raise ex |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank0]: return func() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank0]: lambda: hydra.run( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank0]: _ = ret.return_value |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank0]: raise self._return_value |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank0]: ret.return_value = task_function(task_cfg) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank0]: run_local(cfg) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank0]: experiment.exec_task(task) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank0]: getattr(self, task)() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 357, in training |
| [rank0]: trainer.fit( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank0]: call._call_and_handle_interrupt( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank0]: return function(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank0]: self._run(model, ckpt_path=ckpt_path) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank0]: results = self._run_stage() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank0]: self.fit_loop.run() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank0]: self.advance() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank0]: self.epoch_loop.run(self._data_fetcher) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank0]: self.advance(data_fetcher) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank0]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank0]: self._optimizer_step(batch_idx, closure) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank0]: call._call_lightning_module_hook( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank0]: output = fn(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank0]: optimizer.step(closure=optimizer_closure) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank0]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank0]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank0]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank0]: closure_result = closure() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank0]: self._result = self.closure(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank0]: return func(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank0]: step_output = self._step_fn() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank0]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank0]: output = fn(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank0]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank0]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank0]: return self._call_impl(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank0]: return forward_call(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank0]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank0]: return self.module(*inputs, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank0]: return self._call_impl(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank0]: return forward_call(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank0]: out = method(*_args, **_kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank0]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank0]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| srun: error: node109: task 4: Exited with exit code 1 |
| srun: Terminating StepId=7382.0 |
| [2026-03-11T19:11:21.584] error: *** STEP 7382.0 ON node109 CANCELLED AT 2026-03-11T19:11:21 DUE TO TASK FAILURE *** |
| srun: error: node109: task 7: Exited with exit code 1 |
| srun: error: node109: task 2: Terminated |
| srun: error: node109: task 3: Terminated |
| srun: error: node109: task 5: Terminated |
| srun: error: node109: task 1: Terminated |
| srun: error: node109: task 6: Terminated |
| wandb: - 0.000 MB of 0.000 MB uploaded
wandb: You can sync this run to the cloud by running: |
| wandb: wandb sync /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/wandb/offline-run-20260311_191040-stage_b_offline |
| wandb: Find logs at: ./checkpoints/bimamba_stage_b/wandb/offline-run-20260311_191040-stage_b_offline/logs |
| wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information. |
|
Training: | | 0/? [00:00<?, ?it/s]
Training: 0%| | 0/203307 [00:00<?, ?it/s]
Epoch 0: 0%| | 0/203307 [00:00<?, ?it/s] srun: error: node109: task 0: Terminated |
| srun: Force Terminated StepId=7382.0 |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:11:55,669][pytorch_lightning.utilities.rank_zero][INFO] - Using 16bit Automatic Mixed Precision (AMP) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| [2026-03-11 19:11:55,820][pytorch_lightning.utilities.rank_zero][INFO] - GPU available: True (cuda), used: True |
| [2026-03-11 19:11:55,820][pytorch_lightning.utilities.rank_zero][INFO] - TPU available: False, using: 0 TPU cores |
| [2026-03-11 19:11:55,820][pytorch_lightning.utilities.rank_zero][INFO] - IPU available: False, using: 0 IPUs |
| [2026-03-11 19:11:55,820][pytorch_lightning.utilities.rank_zero][INFO] - HPU available: False, using: 0 HPUs |
| [2026-03-11 19:11:55,821][pytorch_lightning.utilities.rank_zero][INFO] - `Trainer(limit_val_batches=1)` was configured so 1 batch will be used. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| INFO: Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:02,278][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 |
| [2026-03-11 19:12:02,371][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:02,381][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:02,614][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:02,672][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:03,366][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:03,439][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:12:03,752][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 |
| [2026-03-11 19:12:07,569][pytorch_lightning.utilities.rank_zero][INFO] - ---------------------------------------------------------------------------------------------------- |
| distributed_backend=nccl |
| All distributed processes registered. Starting with 8 processes |
| ---------------------------------------------------------------------------------------------------- |
|
|
| wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id stage_b_offline. |
| wandb: Tracking run with wandb version 0.17.9 |
| wandb: W&B syncing is set to `offline` in this directory. |
| wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. |
| INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:12:22,730][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: |
| | Name | Type | Params |
| --------------------------------------------------------------------------------- |
| 0 | diffusion_model | DiffusionMamba | 609 M |
| 1 | validation_lpips_model | LearnedPerceptualImagePatchSimilarity | 2.5 M |
| 2 | vae | AutoencoderKL | 229 M |
| 3 | mamba_memory | BiMambaMemory | 4.5 M |
| 4 | pose_prediction_model | PosePredictionNet | 200 K |
| --------------------------------------------------------------------------------- |
| 609 M Trainable params |
| 236 M Non-trainable params |
| 846 M Total params |
| 3,384.157 Total estimated model params size (MB) |
| [2026-03-11 19:12:23,905][lightning.pytorch.callbacks.model_summary][INFO] - |
| | Name | Type | Params |
| --------------------------------------------------------------------------------- |
| 0 | diffusion_model | DiffusionMamba | 609 M |
| 1 | validation_lpips_model | LearnedPerceptualImagePatchSimilarity | 2.5 M |
| 2 | vae | AutoencoderKL | 229 M |
| 3 | mamba_memory | BiMambaMemory | 4.5 M |
| 4 | pose_prediction_model | PosePredictionNet | 200 K |
| --------------------------------------------------------------------------------- |
| 609 M Trainable params |
| 236 M Non-trainable params |
| 846 M Total params |
| 3,384.157 Total estimated model params size (MB) |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:12:24,732][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs72af35385ecb9fff000002d5' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsa440793508d6484c000002d6' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsf74a18fc5e2ba664000002d7' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsb25fd419c3ebe4cc000002d9' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs45d14a918f9ffa8f000002da' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs025bed1fc09137d1000002d8' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsc925b8edc25a9dc0000002db' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsbee0d927da295b62000002dc' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsfcac062be1803dd5000002dd' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsf44c107934afd91a000002de' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsf572c29902b92951000002df' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsf6b1d6ca5f4f28c2000002e0' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs4a24e43bd590b8c1000002e1' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs3636c722fe7fe90e000002e2' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs3ece2e1747ccccfa000002e3' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs6f064bd6011ebe0f000002e4' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs85abee116e0ed750000002e5' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsd1c96769a4eb48c8000002e6' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs04d7a3dc2fe48ae8000002e7' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs92fae5d65201b30c000002e8' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs44031481554af9c0000002e9' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs2981e3b72a4f6e83000002ea' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs674350548c0d757c000002eb' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs67df74a00b4828f1000002ec' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs77cb47e392f6a612000002ed' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfse2d70c56003ebb4d000002ee' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsf6a64622c75972fd000002ef' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsb85766b23051ad0e000002f0' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| return _run_code(code, main_globals, None, |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| exec(code, run_globals) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| run() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| _run_hydra( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| _run_app( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| run_and_report( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| raise ex |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| return func() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| lambda: hydra.run( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| _ = ret.return_value |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| raise self._return_value |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| ret.return_value = task_function(task_cfg) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| run_local(cfg) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| experiment.exec_task(task) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| getattr(self, task)() |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| trainer.fit( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| call._call_and_handle_interrupt( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| return function(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| results = self._run_stage() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| self.fit_loop.run() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| self.advance() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| self.epoch_loop.run(self._data_fetcher) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| self.advance(data_fetcher) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| self._optimizer_step(batch_idx, closure) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| call._call_lightning_module_hook( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| output = fn(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| optimizer.step(closure=optimizer_closure) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| closure_result = closure() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| self._result = self.closure(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| return func(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| step_output = self._step_fn() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| output = fn(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| wrapper_output = wrapper_module(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| else self._run_ddp_forward(*inputs, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| return self.module(*inputs, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| out = method(*_args, **_kwargs) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| self.log("training/curriculum_phase", float(phase_idx)) |
| TypeError: float() argument must be a string or a real number, not 'NoneType' |
| [rank0]: Traceback (most recent call last): |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank0]: return _run_code(code, main_globals, None, |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank0]: exec(code, run_globals) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank0]: run() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank0]: _run_hydra( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank0]: _run_app( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank0]: run_and_report( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank0]: raise ex |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank0]: return func() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank0]: lambda: hydra.run( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank0]: _ = ret.return_value |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank0]: raise self._return_value |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank0]: ret.return_value = task_function(task_cfg) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank0]: run_local(cfg) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank0]: experiment.exec_task(task) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank0]: getattr(self, task)() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank0]: trainer.fit( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank0]: call._call_and_handle_interrupt( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank0]: return function(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank0]: self._run(model, ckpt_path=ckpt_path) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank0]: results = self._run_stage() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank0]: self.fit_loop.run() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank0]: self.advance() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank0]: self.epoch_loop.run(self._data_fetcher) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank0]: self.advance(data_fetcher) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank0]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank0]: self._optimizer_step(batch_idx, closure) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank0]: call._call_lightning_module_hook( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank0]: output = fn(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank0]: optimizer.step(closure=optimizer_closure) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank0]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank0]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank0]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank0]: closure_result = closure() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank0]: self._result = self.closure(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank0]: return func(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank0]: step_output = self._step_fn() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank0]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank0]: output = fn(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank0]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank0]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank0]: return self._call_impl(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank0]: return forward_call(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank0]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank0]: return self.module(*inputs, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank0]: return self._call_impl(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank0]: return forward_call(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank0]: out = method(*_args, **_kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank0]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank0]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs81c923980eb4d684000002f1' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank6]: Traceback (most recent call last): |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank6]: return _run_code(code, main_globals, None, |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank6]: exec(code, run_globals) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank6]: run() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank6]: _run_hydra( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank6]: _run_app( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank6]: run_and_report( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank6]: raise ex |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank6]: return func() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank6]: lambda: hydra.run( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank6]: _ = ret.return_value |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank6]: raise self._return_value |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank6]: ret.return_value = task_function(task_cfg) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank6]: run_local(cfg) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank6]: experiment.exec_task(task) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank6]: getattr(self, task)() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank6]: trainer.fit( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank6]: call._call_and_handle_interrupt( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank6]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank6]: return function(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank6]: self._run(model, ckpt_path=ckpt_path) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank6]: results = self._run_stage() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank6]: self.fit_loop.run() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank6]: self.advance() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank6]: self.epoch_loop.run(self._data_fetcher) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank6]: self.advance(data_fetcher) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank6]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank6]: self._optimizer_step(batch_idx, closure) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank6]: call._call_lightning_module_hook( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank6]: output = fn(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank6]: optimizer.step(closure=optimizer_closure) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank6]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank6]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank6]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank6]: closure_result = closure() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank6]: self._result = self.closure(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank6]: return func(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank6]: step_output = self._step_fn() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank6]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank6]: output = fn(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank6]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank6]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank6]: return self._call_impl(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank6]: return forward_call(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank6]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank6]: return self.module(*inputs, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank6]: return self._call_impl(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank6]: return forward_call(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank6]: out = method(*_args, **_kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank6]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank6]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs9ad9301b65f10564000002f2' |
| [rank3]: Traceback (most recent call last): |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank3]: return _run_code(code, main_globals, None, |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank3]: exec(code, run_globals) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank3]: run() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank3]: _run_hydra( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank3]: _run_app( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank3]: run_and_report( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank3]: raise ex |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank3]: return func() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank3]: lambda: hydra.run( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank3]: _ = ret.return_value |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank3]: raise self._return_value |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank3]: ret.return_value = task_function(task_cfg) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank3]: run_local(cfg) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank3]: experiment.exec_task(task) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank3]: getattr(self, task)() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank3]: trainer.fit( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank3]: call._call_and_handle_interrupt( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank3]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank3]: return function(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank3]: self._run(model, ckpt_path=ckpt_path) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank3]: results = self._run_stage() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank3]: self.fit_loop.run() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank3]: self.advance() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank3]: self.epoch_loop.run(self._data_fetcher) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank3]: self.advance(data_fetcher) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank3]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank3]: self._optimizer_step(batch_idx, closure) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank3]: call._call_lightning_module_hook( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank3]: output = fn(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank3]: optimizer.step(closure=optimizer_closure) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank3]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank3]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank3]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank3]: closure_result = closure() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank3]: self._result = self.closure(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank3]: return func(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank3]: step_output = self._step_fn() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank3]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank3]: output = fn(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank3]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank3]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank3]: return self._call_impl(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank3]: return forward_call(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank3]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank3]: return self.module(*inputs, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank3]: return self._call_impl(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank3]: return forward_call(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank3]: out = method(*_args, **_kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank3]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank3]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs6f4d583e4f241199000002f3' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| wandb: - 0.000 MB of 0.000 MB uploaded
wandb: You can sync this run to the cloud by running: |
| wandb: wandb sync /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/wandb/offline-run-20260311_191215-stage_b_offline |
| wandb: Find logs at: ./checkpoints/bimamba_stage_b/wandb/offline-run-20260311_191215-stage_b_offline/logs |
| wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information. |
| [rank4]: Traceback (most recent call last): |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank4]: return _run_code(code, main_globals, None, |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank4]: exec(code, run_globals) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank4]: run() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank4]: _run_hydra( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank4]: _run_app( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank4]: run_and_report( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank4]: raise ex |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank4]: return func() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank4]: lambda: hydra.run( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank4]: _ = ret.return_value |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank4]: raise self._return_value |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank4]: ret.return_value = task_function(task_cfg) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank4]: run_local(cfg) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank4]: experiment.exec_task(task) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank4]: getattr(self, task)() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank4]: trainer.fit( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank4]: call._call_and_handle_interrupt( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank4]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank4]: return function(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank4]: self._run(model, ckpt_path=ckpt_path) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank4]: results = self._run_stage() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank4]: self.fit_loop.run() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank4]: self.advance() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank4]: self.epoch_loop.run(self._data_fetcher) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank4]: self.advance(data_fetcher) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank4]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank4]: self._optimizer_step(batch_idx, closure) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank4]: call._call_lightning_module_hook( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank4]: output = fn(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank4]: optimizer.step(closure=optimizer_closure) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank4]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank4]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank4]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank4]: closure_result = closure() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank4]: self._result = self.closure(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank4]: return func(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank4]: step_output = self._step_fn() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank4]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank4]: output = fn(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank4]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank4]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank4]: return self._call_impl(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank4]: return forward_call(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank4]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank4]: return self.module(*inputs, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank4]: return self._call_impl(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank4]: return forward_call(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank4]: out = method(*_args, **_kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank4]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank4]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank2]: Traceback (most recent call last): |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank2]: return _run_code(code, main_globals, None, |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank2]: exec(code, run_globals) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank2]: run() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank2]: _run_hydra( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank2]: _run_app( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank2]: run_and_report( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank2]: raise ex |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank2]: return func() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank2]: lambda: hydra.run( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank2]: _ = ret.return_value |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank2]: raise self._return_value |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank2]: ret.return_value = task_function(task_cfg) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank2]: run_local(cfg) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank2]: experiment.exec_task(task) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank2]: getattr(self, task)() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank2]: trainer.fit( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank2]: call._call_and_handle_interrupt( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank2]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank2]: return function(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank2]: self._run(model, ckpt_path=ckpt_path) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank2]: results = self._run_stage() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank2]: self.fit_loop.run() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank2]: self.advance() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank2]: self.epoch_loop.run(self._data_fetcher) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank2]: self.advance(data_fetcher) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank2]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank2]: self._optimizer_step(batch_idx, closure) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank2]: call._call_lightning_module_hook( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank2]: output = fn(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank2]: optimizer.step(closure=optimizer_closure) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank2]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank2]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank2]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank2]: closure_result = closure() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank2]: self._result = self.closure(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank2]: return func(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank2]: step_output = self._step_fn() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank2]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank2]: output = fn(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank2]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank2]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank2]: return self._call_impl(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank2]: return forward_call(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank2]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank2]: return self.module(*inputs, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank2]: return self._call_impl(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank2]: return forward_call(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank2]: out = method(*_args, **_kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank2]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank2]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs8dc2a2bf5c4b6424000002f4' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank5]: Traceback (most recent call last): |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank5]: return _run_code(code, main_globals, None, |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank5]: exec(code, run_globals) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank5]: run() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank5]: _run_hydra( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank5]: _run_app( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank5]: run_and_report( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank5]: raise ex |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank5]: return func() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank5]: lambda: hydra.run( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank5]: _ = ret.return_value |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank5]: raise self._return_value |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank5]: ret.return_value = task_function(task_cfg) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank5]: run_local(cfg) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank5]: experiment.exec_task(task) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank5]: getattr(self, task)() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank5]: trainer.fit( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank5]: call._call_and_handle_interrupt( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank5]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank5]: return function(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank5]: self._run(model, ckpt_path=ckpt_path) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank5]: results = self._run_stage() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank5]: self.fit_loop.run() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank5]: self.advance() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank5]: self.epoch_loop.run(self._data_fetcher) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank5]: self.advance(data_fetcher) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank5]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank5]: self._optimizer_step(batch_idx, closure) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank5]: call._call_lightning_module_hook( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank5]: output = fn(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank5]: optimizer.step(closure=optimizer_closure) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank5]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank5]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank5]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank5]: closure_result = closure() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank5]: self._result = self.closure(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank5]: return func(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank5]: step_output = self._step_fn() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank5]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank5]: output = fn(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank5]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank5]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank5]: return self._call_impl(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank5]: return forward_call(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank5]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank5]: return self.module(*inputs, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank5]: return self._call_impl(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank5]: return forward_call(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank5]: out = method(*_args, **_kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank5]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank5]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank1]: Traceback (most recent call last): |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank1]: return _run_code(code, main_globals, None, |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank1]: exec(code, run_globals) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank1]: run() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank1]: _run_hydra( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank1]: _run_app( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank1]: run_and_report( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank1]: raise ex |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank1]: return func() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank1]: lambda: hydra.run( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank1]: _ = ret.return_value |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank1]: raise self._return_value |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank1]: ret.return_value = task_function(task_cfg) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank1]: run_local(cfg) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank1]: experiment.exec_task(task) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank1]: getattr(self, task)() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank1]: trainer.fit( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank1]: call._call_and_handle_interrupt( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank1]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank1]: return function(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank1]: self._run(model, ckpt_path=ckpt_path) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank1]: results = self._run_stage() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank1]: self.fit_loop.run() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank1]: self.advance() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank1]: self.epoch_loop.run(self._data_fetcher) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank1]: self.advance(data_fetcher) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank1]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank1]: self._optimizer_step(batch_idx, closure) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank1]: call._call_lightning_module_hook( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank1]: output = fn(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank1]: optimizer.step(closure=optimizer_closure) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank1]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank1]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank1]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank1]: closure_result = closure() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank1]: self._result = self.closure(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank1]: return func(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank1]: step_output = self._step_fn() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank1]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank1]: output = fn(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank1]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank1]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank1]: return self._call_impl(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank1]: return forward_call(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank1]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank1]: return self.module(*inputs, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank1]: return self._call_impl(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank1]: return forward_call(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank1]: out = method(*_args, **_kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank1]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank1]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=60000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank7]: Traceback (most recent call last): |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank7]: return _run_code(code, main_globals, None, |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank7]: exec(code, run_globals) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank7]: run() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank7]: _run_hydra( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank7]: _run_app( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank7]: run_and_report( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank7]: raise ex |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank7]: return func() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank7]: lambda: hydra.run( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank7]: _ = ret.return_value |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank7]: raise self._return_value |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank7]: ret.return_value = task_function(task_cfg) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank7]: run_local(cfg) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank7]: experiment.exec_task(task) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank7]: getattr(self, task)() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank7]: trainer.fit( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank7]: call._call_and_handle_interrupt( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank7]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank7]: return function(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank7]: self._run(model, ckpt_path=ckpt_path) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank7]: results = self._run_stage() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank7]: self.fit_loop.run() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank7]: self.advance() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank7]: self.epoch_loop.run(self._data_fetcher) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank7]: self.advance(data_fetcher) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank7]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank7]: self._optimizer_step(batch_idx, closure) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank7]: call._call_lightning_module_hook( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank7]: output = fn(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank7]: optimizer.step(closure=optimizer_closure) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank7]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank7]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank7]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank7]: closure_result = closure() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank7]: self._result = self.closure(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank7]: return func(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank7]: step_output = self._step_fn() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank7]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank7]: output = fn(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank7]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank7]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank7]: return self._call_impl(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank7]: return forward_call(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank7]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank7]: return self.module(*inputs, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank7]: return self._call_impl(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank7]: return forward_call(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank7]: out = method(*_args, **_kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank7]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank7]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| srun: error: node109: task 6: Exited with exit code 1 |
| srun: Terminating StepId=7382.1 |
| [2026-03-11T19:12:56.307] error: *** STEP 7382.1 ON node109 CANCELLED AT 2026-03-11T19:12:56 DUE TO TASK FAILURE *** |
| srun: error: node109: task 4: Exited with exit code 1 |
| srun: error: node109: task 3: Exited with exit code 1 |
|
Training: | | 0/? [00:00<?, ?it/s]
Training: 0%| | 0/203307 [00:00<?, ?it/s]
Epoch 0: 0%| | 0/203307 [00:00<?, ?it/s] srun: error: node109: task 7: Terminated |
| srun: error: node109: task 2: Terminated |
| srun: error: node109: tasks 1,5: Terminated |
| srun: error: node109: task 0: Terminated |
| srun: Force Terminated StepId=7382.1 |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'training': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information |
| warnings.warn(msg, UserWarning) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/fabric/__init__.py:40: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. |
| warnings.warn( |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. |
| warnings.warn(msg) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lpips/lpips.py:107: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md |
| self.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:29,746][pytorch_lightning.utilities.rank_zero][INFO] - Using 16bit Automatic Mixed Precision (AMP) |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| [2026-03-11 19:13:29,895][pytorch_lightning.utilities.rank_zero][INFO] - GPU available: True (cuda), used: True |
| [2026-03-11 19:13:29,895][pytorch_lightning.utilities.rank_zero][INFO] - TPU available: False, using: 0 TPU cores |
| [2026-03-11 19:13:29,895][pytorch_lightning.utilities.rank_zero][INFO] - IPU available: False, using: 0 IPUs |
| [2026-03-11 19:13:29,895][pytorch_lightning.utilities.rank_zero][INFO] - HPU available: False, using: 0 HPUs |
| [2026-03-11 19:13:29,896][pytorch_lightning.utilities.rank_zero][INFO] - `Trainer(limit_val_batches=1)` was configured so 1 batch will be used. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| /proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py:54: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:35,682][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:35,736][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:36,014][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 |
| [2026-03-11 19:13:36,039][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:36,122][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:36,377][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:37,037][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 |
| INFO: Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 |
| [36mOutputs will be saved to:[39m /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b |
| [36mExecuting task:[39m training out of ['training'] |
| [2026-03-11 19:13:37,081][lightning.fabric.utilities.distributed][INFO] - Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 |
| [2026-03-11 19:13:40,890][pytorch_lightning.utilities.rank_zero][INFO] - ---------------------------------------------------------------------------------------------------- |
| distributed_backend=nccl |
| All distributed processes registered. Starting with 8 processes |
| ---------------------------------------------------------------------------------------------------- |
|
|
| wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id stage_b_offline. |
| wandb: Tracking run with wandb version 0.17.9 |
| wandb: W&B syncing is set to `offline` in this directory. |
| wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. |
| INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,222][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| [2026-03-11 19:13:56,223][lightning.pytorch.accelerators.cuda][INFO] - LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] |
| INFO: |
| | Name | Type | Params |
| --------------------------------------------------------------------------------- |
| 0 | diffusion_model | DiffusionMamba | 609 M |
| 1 | validation_lpips_model | LearnedPerceptualImagePatchSimilarity | 2.5 M |
| 2 | vae | AutoencoderKL | 229 M |
| 3 | mamba_memory | BiMambaMemory | 4.5 M |
| 4 | pose_prediction_model | PosePredictionNet | 200 K |
| --------------------------------------------------------------------------------- |
| 609 M Trainable params |
| 236 M Non-trainable params |
| 846 M Total params |
| 3,384.157 Total estimated model params size (MB) |
| [2026-03-11 19:13:58,723][lightning.pytorch.callbacks.model_summary][INFO] - |
| | Name | Type | Params |
| --------------------------------------------------------------------------------- |
| 0 | diffusion_model | DiffusionMamba | 609 M |
| 1 | validation_lpips_model | LearnedPerceptualImagePatchSimilarity | 2.5 M |
| 2 | vae | AutoencoderKL | 229 M |
| 3 | mamba_memory | BiMambaMemory | 4.5 M |
| 4 | pose_prediction_model | PosePredictionNet | 200 K |
| --------------------------------------------------------------------------------- |
| 609 M Trainable params |
| 236 M Non-trainable params |
| 846 M Total params |
| 3,384.157 Total estimated model params size (MB) |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| [2026-03-11 19:13:58,924][lightning.pytorch.trainer.connectors.signal_connector][INFO] - SLURM auto-requeueing enabled. Setting signal handlers. |
| INFO: SLURM auto-requeueing enabled. Setting signal handlers. |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| /proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/models/mamba_memory.py:173: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. |
| with torch.cuda.amp.autocast(enabled=False): |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs9d365dff2de99f3d000002f5' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfseb545fc65b28ad08000002f6' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs7a6277f3dd99bed5000002f7' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs0fe6ea92d75a0eb8000002f8' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs1386bda6e252e2ca000002f9' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs2353dc075b9c99c6000002fa' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsdf2933162189a325000002fb' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsca00bbfef7641f10000002fc' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsc5666ec1f88a8205000002fd' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank3]: Traceback (most recent call last): |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank3]: return _run_code(code, main_globals, None, |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank3]: exec(code, run_globals) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank3]: run() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank3]: _run_hydra( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank3]: _run_app( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank3]: run_and_report( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank3]: raise ex |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank3]: return func() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank3]: lambda: hydra.run( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank3]: _ = ret.return_value |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank3]: raise self._return_value |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank3]: ret.return_value = task_function(task_cfg) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank3]: run_local(cfg) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank3]: experiment.exec_task(task) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank3]: getattr(self, task)() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank3]: trainer.fit( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank3]: call._call_and_handle_interrupt( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank3]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank3]: return function(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank3]: self._run(model, ckpt_path=ckpt_path) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank3]: results = self._run_stage() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank3]: self.fit_loop.run() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank3]: self.advance() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank3]: self.epoch_loop.run(self._data_fetcher) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank3]: self.advance(data_fetcher) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank3]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank3]: self._optimizer_step(batch_idx, closure) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank3]: call._call_lightning_module_hook( |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank3]: output = fn(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank3]: optimizer.step(closure=optimizer_closure) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank3]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank3]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank3]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank3]: closure_result = closure() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank3]: self._result = self.closure(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank3]: return func(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank3]: step_output = self._step_fn() |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank3]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank3]: output = fn(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank3]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank3]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank3]: return self._call_impl(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank3]: return forward_call(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank3]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank3]: return self.module(*inputs, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank3]: return self._call_impl(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank3]: return forward_call(*args, **kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank3]: out = method(*_args, **_kwargs) |
| [rank3]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank3]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank3]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs951f1e300f55a554000002fe' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfse088d6b4e1f9aead000002ff' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsfd7795ed81f092f900000300' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs3e945a9669e9ce9a00000301' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsb49412da2ae7fef600000303' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs25efc39f9888a1f200000302' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsd696dd4acbab45c400000304' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs72672062820b00e400000305' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs43facad8025e4ec900000306' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsfc1cd3d0ddef284100000307' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsb0116cb4f59b393100000308' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs8bccb6146f87903000000309' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs2fab6ba82e6afd0a0000030a' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs2f013097f85abe080000030b' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs440182216121e5b80000030c' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfse6d490efff902e300000030d' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs5f328fe42087769d0000030e' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs97044d8b9d1cc1aa0000030f' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs2d62f34d94bd1e7500000310' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| return _run_code(code, main_globals, None, |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| exec(code, run_globals) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| run() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| _run_hydra( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| _run_app( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| run_and_report( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| raise ex |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| return func() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| lambda: hydra.run( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| _ = ret.return_value |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| raise self._return_value |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| ret.return_value = task_function(task_cfg) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| run_local(cfg) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| experiment.exec_task(task) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| getattr(self, task)() |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| trainer.fit( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| call._call_and_handle_interrupt( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| return function(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| results = self._run_stage() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| self.fit_loop.run() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| self.advance() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| self.epoch_loop.run(self._data_fetcher) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| self.advance(data_fetcher) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| self._optimizer_step(batch_idx, closure) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| call._call_lightning_module_hook( |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| output = fn(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| optimizer.step(closure=optimizer_closure) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| closure_result = closure() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| self._result = self.closure(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| return func(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| step_output = self._step_fn() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| output = fn(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| wrapper_output = wrapper_module(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| else self._run_ddp_forward(*inputs, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| return self.module(*inputs, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| out = method(*_args, **_kwargs) |
| File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| self.log("training/curriculum_phase", float(phase_idx)) |
| TypeError: float() argument must be a string or a real number, not 'NoneType' |
| [rank0]: Traceback (most recent call last): |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank0]: return _run_code(code, main_globals, None, |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank0]: exec(code, run_globals) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank0]: run() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank0]: _run_hydra( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank0]: _run_app( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank0]: run_and_report( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank0]: raise ex |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank0]: return func() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank0]: lambda: hydra.run( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank0]: _ = ret.return_value |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank0]: raise self._return_value |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank0]: ret.return_value = task_function(task_cfg) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank0]: run_local(cfg) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank0]: experiment.exec_task(task) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank0]: getattr(self, task)() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank0]: trainer.fit( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank0]: call._call_and_handle_interrupt( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank0]: return function(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank0]: self._run(model, ckpt_path=ckpt_path) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank0]: results = self._run_stage() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank0]: self.fit_loop.run() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank0]: self.advance() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank0]: self.epoch_loop.run(self._data_fetcher) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank0]: self.advance(data_fetcher) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank0]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank0]: self._optimizer_step(batch_idx, closure) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank0]: call._call_lightning_module_hook( |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank0]: output = fn(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank0]: optimizer.step(closure=optimizer_closure) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank0]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank0]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank0]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank0]: closure_result = closure() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank0]: self._result = self.closure(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank0]: return func(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank0]: step_output = self._step_fn() |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank0]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank0]: output = fn(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank0]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank0]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank0]: return self._call_impl(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank0]: return forward_call(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank0]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank0]: return self.module(*inputs, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank0]: return self._call_impl(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank0]: return forward_call(*args, **kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank0]: out = method(*_args, **_kwargs) |
| [rank0]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank0]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank0]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs90df101c141fa00c00000311' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsca22b33c7e37fd3900000312' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfs32ec70a740f41e1e00000313' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank7]: Traceback (most recent call last): |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank7]: return _run_code(code, main_globals, None, |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank7]: exec(code, run_globals) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank7]: run() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank7]: _run_hydra( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank7]: _run_app( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank7]: run_and_report( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank7]: raise ex |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank7]: return func() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank7]: lambda: hydra.run( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank7]: _ = ret.return_value |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank7]: raise self._return_value |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank7]: ret.return_value = task_function(task_cfg) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank7]: run_local(cfg) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank7]: experiment.exec_task(task) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank7]: getattr(self, task)() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank7]: trainer.fit( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank7]: call._call_and_handle_interrupt( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank7]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank7]: return function(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank7]: self._run(model, ckpt_path=ckpt_path) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank7]: results = self._run_stage() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank7]: self.fit_loop.run() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank7]: self.advance() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank7]: self.epoch_loop.run(self._data_fetcher) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank7]: self.advance(data_fetcher) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank7]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank7]: self._optimizer_step(batch_idx, closure) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank7]: call._call_lightning_module_hook( |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank7]: output = fn(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank7]: optimizer.step(closure=optimizer_closure) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank7]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank7]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank7]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank7]: closure_result = closure() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank7]: self._result = self.closure(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank7]: return func(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank7]: step_output = self._step_fn() |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank7]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank7]: output = fn(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank7]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank7]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank7]: return self._call_impl(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank7]: return forward_call(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank7]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank7]: return self.module(*inputs, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank7]: return self._call_impl(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank7]: return forward_call(*args, **kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank7]: out = method(*_args, **_kwargs) |
| [rank7]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank7]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank7]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank4]: Traceback (most recent call last): |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank4]: return _run_code(code, main_globals, None, |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank4]: exec(code, run_globals) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank4]: run() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank4]: _run_hydra( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank4]: _run_app( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank4]: run_and_report( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank4]: raise ex |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank4]: return func() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank4]: lambda: hydra.run( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank4]: _ = ret.return_value |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank4]: raise self._return_value |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank4]: ret.return_value = task_function(task_cfg) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank4]: run_local(cfg) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank4]: experiment.exec_task(task) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank4]: getattr(self, task)() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank4]: trainer.fit( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank4]: call._call_and_handle_interrupt( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank4]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank4]: return function(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank4]: self._run(model, ckpt_path=ckpt_path) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank4]: results = self._run_stage() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank4]: self.fit_loop.run() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank4]: self.advance() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank4]: self.epoch_loop.run(self._data_fetcher) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank4]: self.advance(data_fetcher) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank4]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank4]: self._optimizer_step(batch_idx, closure) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank4]: call._call_lightning_module_hook( |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank4]: output = fn(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank4]: optimizer.step(closure=optimizer_closure) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank4]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank4]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank4]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank4]: closure_result = closure() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank4]: self._result = self.closure(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank4]: return func(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank4]: step_output = self._step_fn() |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank4]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank4]: output = fn(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank4]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank4]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank4]: return self._call_impl(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank4]: return forward_call(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank4]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank4]: return self.module(*inputs, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank4]: return self._call_impl(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank4]: return forward_call(*args, **kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank4]: out = method(*_args, **_kwargs) |
| [rank4]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank4]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank4]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank1]: Traceback (most recent call last): |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank1]: return _run_code(code, main_globals, None, |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank1]: exec(code, run_globals) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank1]: run() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank1]: _run_hydra( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank1]: _run_app( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank1]: run_and_report( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank1]: raise ex |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank1]: return func() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank1]: lambda: hydra.run( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank1]: _ = ret.return_value |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank1]: raise self._return_value |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank1]: ret.return_value = task_function(task_cfg) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank1]: run_local(cfg) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank1]: experiment.exec_task(task) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank1]: getattr(self, task)() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank1]: trainer.fit( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank1]: call._call_and_handle_interrupt( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank1]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank1]: return function(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank1]: self._run(model, ckpt_path=ckpt_path) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank1]: results = self._run_stage() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank1]: self.fit_loop.run() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank1]: self.advance() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank1]: self.epoch_loop.run(self._data_fetcher) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank1]: self.advance(data_fetcher) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank1]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank1]: self._optimizer_step(batch_idx, closure) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank1]: call._call_lightning_module_hook( |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank1]: output = fn(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank1]: optimizer.step(closure=optimizer_closure) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank1]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank1]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank1]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank1]: closure_result = closure() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank1]: self._result = self.closure(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank1]: return func(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank1]: step_output = self._step_fn() |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank1]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank1]: output = fn(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank1]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank1]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank1]: return self._call_impl(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank1]: return forward_call(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank1]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank1]: return self.module(*inputs, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank1]: return self._call_impl(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank1]: return forward_call(*args, **kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank1]: out = method(*_args, **_kwargs) |
| [rank1]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank1]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank1]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| Traceback (most recent call last): |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers |
| finalizer() |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 224, in __call__ |
| res = self._callback(*self._args, **self._kwargs) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/multiprocessing/util.py", line 133, in _remove_temp_dir |
| rmtree(tempdir) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 725, in rmtree |
| _rmtree_safe_fd(fd, path, onerror) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd |
| onerror(os.unlink, fullname, sys.exc_info()) |
| File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd |
| os.unlink(entry.name, dir_fd=topfd) |
| OSError: [Errno 16] Device or resource busy: '.nfsa5f3c5809fcbd7d300000314' |
| wandb: - 0.000 MB of 0.000 MB uploaded
wandb: You can sync this run to the cloud by running: |
| wandb: wandb sync /proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/wandb/offline-run-20260311_191349-stage_b_offline |
| wandb: Find logs at: ./checkpoints/bimamba_stage_b/wandb/offline-run-20260311_191349-stage_b_offline/logs |
| wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information. |
| srun: error: node109: task 3: Exited with exit code 1 |
| srun: Terminating StepId=7382.2 |
| [2026-03-11T19:14:28.754] error: *** STEP 7382.2 ON node109 CANCELLED AT 2026-03-11T19:14:28 DUE TO TASK FAILURE *** |
| INFO: [rank: 2] Received SIGTERM: 15 |
| [2026-03-11 19:14:28,755][lightning.pytorch.trainer.connectors.signal_connector][INFO] - [rank: 2] Received SIGTERM: 15 |
| INFO: [rank: 5] Received SIGTERM: 15 |
| INFO: [rank: 6] Received SIGTERM: 15 |
| [2026-03-11 19:14:28,755][lightning.pytorch.trainer.connectors.signal_connector][INFO] - [rank: 5] Received SIGTERM: 15 |
| [2026-03-11 19:14:28,755][lightning.pytorch.trainer.connectors.signal_connector][INFO] - [rank: 6] Received SIGTERM: 15 |
| INFO: Bypassing SIGTERM: 15 |
| [2026-03-11 19:14:28,756][lightning.pytorch.trainer.connectors.signal_connector][INFO] - Bypassing SIGTERM: 15 |
| INFO: Bypassing SIGTERM: 15 |
| [2026-03-11 19:14:28,757][lightning.pytorch.trainer.connectors.signal_connector][INFO] - Bypassing SIGTERM: 15 |
| INFO: Bypassing SIGTERM: 15 |
| [2026-03-11 19:14:28,757][lightning.pytorch.trainer.connectors.signal_connector][INFO] - Bypassing SIGTERM: 15 |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank6]: Traceback (most recent call last): |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank6]: return _run_code(code, main_globals, None, |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank6]: exec(code, run_globals) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank6]: run() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank6]: _run_hydra( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank6]: _run_app( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank6]: run_and_report( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank6]: raise ex |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank6]: return func() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank6]: lambda: hydra.run( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank6]: _ = ret.return_value |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank6]: raise self._return_value |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank6]: ret.return_value = task_function(task_cfg) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank6]: run_local(cfg) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank6]: experiment.exec_task(task) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank6]: getattr(self, task)() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank6]: trainer.fit( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank6]: call._call_and_handle_interrupt( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank6]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank6]: return function(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank6]: self._run(model, ckpt_path=ckpt_path) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank6]: results = self._run_stage() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank6]: self.fit_loop.run() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank6]: self.advance() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank6]: self.epoch_loop.run(self._data_fetcher) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank6]: self.advance(data_fetcher) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank6]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank6]: self._optimizer_step(batch_idx, closure) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank6]: call._call_lightning_module_hook( |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank6]: output = fn(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank6]: optimizer.step(closure=optimizer_closure) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank6]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank6]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank6]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank6]: closure_result = closure() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank6]: self._result = self.closure(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank6]: return func(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank6]: step_output = self._step_fn() |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank6]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank6]: output = fn(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank6]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank6]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank6]: return self._call_impl(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank6]: return forward_call(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank6]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank6]: return self.module(*inputs, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank6]: return self._call_impl(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank6]: return forward_call(*args, **kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank6]: out = method(*_args, **_kwargs) |
| [rank6]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank6]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank6]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| [rank2]: Traceback (most recent call last): |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank2]: return _run_code(code, main_globals, None, |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank2]: exec(code, run_globals) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank2]: run() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank2]: _run_hydra( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank2]: _run_app( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank2]: run_and_report( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank2]: raise ex |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank2]: return func() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank2]: lambda: hydra.run( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank2]: _ = ret.return_value |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank2]: raise self._return_value |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank2]: ret.return_value = task_function(task_cfg) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank2]: run_local(cfg) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank2]: experiment.exec_task(task) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank2]: getattr(self, task)() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank2]: trainer.fit( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank2]: call._call_and_handle_interrupt( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank2]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank2]: return function(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank2]: self._run(model, ckpt_path=ckpt_path) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank2]: results = self._run_stage() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank2]: self.fit_loop.run() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank2]: self.advance() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank2]: self.epoch_loop.run(self._data_fetcher) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank2]: self.advance(data_fetcher) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank2]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank2]: self._optimizer_step(batch_idx, closure) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank2]: call._call_lightning_module_hook( |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank2]: output = fn(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank2]: optimizer.step(closure=optimizer_closure) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank2]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank2]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank2]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank2]: closure_result = closure() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank2]: self._result = self.closure(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank2]: return func(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank2]: step_output = self._step_fn() |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank2]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank2]: output = fn(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank2]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank2]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank2]: return self._call_impl(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank2]: return forward_call(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank2]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank2]: return self.module(*inputs, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank2]: return self._call_impl(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank2]: return forward_call(*args, **kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank2]: out = method(*_args, **_kwargs) |
| [rank2]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank2]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank2]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
| srun: error: node109: tasks 0-1,4,7: Terminated |
| Error executing job with overrides: ['+name=train_stage_b_mamba', 'algorithm=df_video_mamba3stage', 'experiment.num_nodes=1', 'dataset.save_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/datasets/minecraft', 'dataset.n_frames=200', '+dataset.n_frames_valid=200', '+dataset.angle_range=110', '+dataset.pos_range=8', '+dataset.wo_updown=false', '+dataset.customized_validation=true', '+dataset.add_timestamp_embedding=true', '+dataset.use_explicit_memory_frames=false', 'algorithm.training_stage=stage_b_diffusion_frozen_memory', 'algorithm.use_mamba_memory_pipeline=true', 'algorithm.use_oracle_pose_eval=false', 'algorithm.enable_memory_noise_curriculum=false', '+algorithm.require_pose_prediction=false', '+algorithm.use_memory_attention=false', '+algorithm.relative_embedding=false', '+algorithm.memory_retrieval_topk=32', 'algorithm.diff_window_size=8', 'algorithm.memory_condition_length=0', 'algorithm.context_frames=100', '+algorithm.n_tokens=8', 'experiment.training.batch_size=8', 'experiment.training.checkpointing.every_n_train_steps=2500', 'experiment.training.max_steps=175000', 'experiment.validation.val_every_n_step=2500', 'resume=stage_b_offline', '+output_dir=/proj/cvl/users/x_fahkh2/WorldMem_Repro/checkpoints/bimamba_stage_b/'] |
| [rank5]: Traceback (most recent call last): |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 196, in _run_module_as_main |
| [rank5]: return _run_code(code, main_globals, None, |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/runpy.py", line 86, in _run_code |
| [rank5]: exec(code, run_globals) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 202, in <module> |
| [rank5]: run() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| [rank5]: _run_hydra( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| [rank5]: _run_app( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| [rank5]: run_and_report( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| [rank5]: raise ex |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| [rank5]: return func() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| [rank5]: lambda: hydra.run( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| [rank5]: _ = ret.return_value |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| [rank5]: raise self._return_value |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| [rank5]: ret.return_value = task_function(task_cfg) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 198, in run |
| [rank5]: run_local(cfg) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/main.py", line 122, in run_local |
| [rank5]: experiment.exec_task(task) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 172, in exec_task |
| [rank5]: getattr(self, task)() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/experiments/exp_base.py", line 371, in training |
| [rank5]: trainer.fit( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit |
| [rank5]: call._call_and_handle_interrupt( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt |
| [rank5]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch |
| [rank5]: return function(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl |
| [rank5]: self._run(model, ckpt_path=ckpt_path) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run |
| [rank5]: results = self._run_stage() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage |
| [rank5]: self.fit_loop.run() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run |
| [rank5]: self.advance() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance |
| [rank5]: self.epoch_loop.run(self._data_fetcher) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run |
| [rank5]: self.advance(data_fetcher) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 240, in advance |
| [rank5]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 187, in run |
| [rank5]: self._optimizer_step(batch_idx, closure) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 265, in _optimizer_step |
| [rank5]: call._call_lightning_module_hook( |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook |
| [rank5]: output = fn(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_base.py", line 65, in optimizer_step |
| [rank5]: optimizer.step(closure=optimizer_closure) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 151, in step |
| [rank5]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 265, in optimizer_step |
| [rank5]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 230, in optimizer_step |
| [rank5]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 77, in optimizer_step |
| [rank5]: closure_result = closure() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in __call__ |
| [rank5]: self._result = self.closure(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |
| [rank5]: return func(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure |
| [rank5]: step_output = self._step_fn() |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 315, in _training_step |
| [rank5]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook |
| [rank5]: output = fn(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 381, in training_step |
| [rank5]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 633, in __call__ |
| [rank5]: wrapper_output = wrapper_module(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank5]: return self._call_impl(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank5]: return forward_call(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward |
| [rank5]: else self._run_ddp_forward(*inputs, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward |
| [rank5]: return self.module(*inputs, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |
| [rank5]: return self._call_impl(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |
| [rank5]: return forward_call(*args, **kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/envs/worldmem/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 626, in wrapped_forward |
| [rank5]: out = method(*_args, **_kwargs) |
| [rank5]: File "/proj/cvl/users/x_fahkh2/WorldMem_Repro/algorithms/worldmem/df_video_mamba3stage.py", line 1073, in training_step |
| [rank5]: self.log("training/curriculum_phase", float(phase_idx)) |
| [rank5]: TypeError: float() argument must be a string or a real number, not 'NoneType' |
|
Training: | | 0/? [00:00<?, ?it/s]
Training: 0%| | 0/203307 [00:00<?, ?it/s]
Epoch 0: 0%| | 0/203307 [00:00<?, ?it/s] srun: error: node109: task 6: Exited with exit code 1 |
| srun: error: node109: task 2: Exited with exit code 1 |
| srun: error: node109: task 5: Exited with exit code 1 |
| srun: Force Terminated StepId=7382.2 |
|
|