RuntimeError when multiplying tensors a and b at non-singleton dimension 4. Error caused while calculating loss. ¶
By: shirshak.acharya on Aug. 14, 2024, 4:25 p.m.
While trying to train the FracSegNet model which is given on the github : https://github.com/YzzLiu/FracSegNet/issues/6
I get following overall error message as :
epoch: 0
Traceback (most recent call last):
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/bin/nnUNet_train", line 8, in <module>
sys.exit(main())
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/run/run_training.py", line 164, in main
trainer.run_training()
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 431, in run_training
ret = super().run_training()
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/network_training/nnUNetTrainer.py", line 306, in run_training
super(nnUNetTrainer, self).run_training()
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/network_training/network_trainer.py", line 447, in run_training
l = self.run_iteration(self.tr_gen, True)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 237, in run_iteration
l = self.loss(output, target, disMap, self.epoch)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/loss_functions/deep_supervision.py", line 29, in forward
l = weights[0] * self.loss(x[0], y[0],disMap[0],epoch)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/loss_functions/dice_loss.py", line 373, in forward
dc_loss = self.dc(net_output, target, disMap_weight, loss_mask=mask, current_epoch = epoch) if self.weight_dice != 0 else 0
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/loss_functions/dice_loss.py", line 198, in forward
tp, fp, fn, _ = get_tp_fp_fn_tn(x, y,disMap,axes, loss_mask, False,current_epoch)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/nnunet/training/loss_functions/dice_loss.py", line 150, in get_tp_fp_fn_tn
tp = torch.mul(tp, disMap2onehot)
RuntimeError: The size of tensor a (112) must match the size of tensor b (221) at non-singleton dimension 4
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the print"
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception in thread Thread-5 (results_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/mnt/Enterprise2/shirshak/NewFracSegNet/FracSegNet/.venv/lib/python3.10/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the print"
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
I think the problem might be solved by someone here.... What might be the problem here? Anyone here to help! I really need to train this FracSegNet model and make submission fast....