The algorithm failed on one or more cases.

The algorithm failed on one or more cases.  

  By: jesus.alzateg on July 10, 2023, 8:34 a.m.

Dear organizers,

Could you please tell me what was the error of my latest submission (July 7)? I have tested the algorithm in Grand Challenge server, and it works, however when I upload a large image memory runs out, I would like to know if it is the same error.

Best Regards,

Alejandro.

Re: The algorithm failed on one or more cases.  

  By: giansteve on July 10, 2023, 8:50 a.m.

Dear Alejandro,

The submission created on July 7, 2023, 12:20 p.m. failed with the following message: RuntimeError: DataLoader worker (pid(s) 37) exited unexpectedly. The Stdout and Stderr are the following:

  • Stdout:
2023-07-07T10:27:27.044000+00:00 Data statistics:
2023-07-07T10:27:27.044000+00:00 Type: <class 'monai.data.meta_tensor.MetaTensor'> torch.float32
2023-07-07T10:27:27.044000+00:00 Shape: torch.Size([512, 512, 399])
2023-07-07T10:27:27.044000+00:00 Value range: (-1024.0, 3071.0)
  • Stderr:
2023-07-07T10:27:21.043000+00:00 monai.transforms.io.dictionary LoadImaged.__init__:image_only: Current default value of argument `image_only=False` has been deprecated since version 1.1. It will be changed to `image_only=True` in version 1.3.
2023-07-07T10:27:31.045000+00:00 ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
2023-07-07T10:27:36.046000+00:00 Traceback (most recent call last):
2023-07-07T10:27:36.046000+00:00   File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1132, in _try_get_data
2023-07-07T10:27:36.046000+00:00     data = self._data_queue.get(timeout=timeout)
2023-07-07T10:27:36.046000+00:00   File "/opt/conda/lib/python3.10/multiprocessing/queues.py", line 113, in get
2023-07-07T10:27:36.046000+00:00     if not self._poll(timeout):
2023-07-07T10:27:36.046000+00:00   File "/opt/conda/lib/python3.10/multiprocessing/connection.py", line 257, in poll
2023-07-07T10:27:36.046000+00:00     return self._poll(timeout)
2023-07-07T10:27:36.046000+00:00   File "/opt/conda/lib/python3.10/multiprocessing/connection.py", line 424, in _poll
2023-07-07T10:27:36.047000+00:00     r = wait([self], timeout)
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/multiprocessing/connection.py", line 931, in wait
2023-07-07T10:27:36.047000+00:00     ready = selector.select(timeout)
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/selectors.py", line 416, in select
2023-07-07T10:27:36.047000+00:00     fd_event_list = self._selector.poll(timeout)
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
2023-07-07T10:27:36.047000+00:00     _error_if_any_worker_fails()
2023-07-07T10:27:36.047000+00:00 RuntimeError: DataLoader worker (pid 37) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
2023-07-07T10:27:36.047000+00:00 
2023-07-07T10:27:36.047000+00:00 The above exception was the direct cause of the following exception:
2023-07-07T10:27:36.047000+00:00 
2023-07-07T10:27:36.047000+00:00 Traceback (most recent call last):
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-07-07T10:27:36.047000+00:00     return _run_code(code, main_globals, None,
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
2023-07-07T10:27:36.047000+00:00     exec(code, run_globals)
2023-07-07T10:27:36.047000+00:00   File "/opt/app/process.py", line 202, in <module>
2023-07-07T10:27:36.047000+00:00     Segaalgorithm().process()
2023-07-07T10:27:36.047000+00:00   File "/home/user/.local/lib/python3.10/site-packages/evalutils/evalutils.py", line 183, in process
2023-07-07T10:27:36.047000+00:00     self.process_cases()
2023-07-07T10:27:36.047000+00:00   File "/home/user/.local/lib/python3.10/site-packages/evalutils/evalutils.py", line 191, in process_cases
2023-07-07T10:27:36.047000+00:00     self._case_results.append(self.process_case(idx=idx, case=case))
2023-07-07T10:27:36.047000+00:00   File "/opt/app/process.py", line 88, in process_case
2023-07-07T10:27:36.047000+00:00     input_image=first(test_org_loader)
2023-07-07T10:27:36.047000+00:00   File "/opt/app/MONAI/monai/utils/misc.py", line 118, in first
2023-07-07T10:27:36.047000+00:00     for i in iterable:
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
2023-07-07T10:27:36.047000+00:00     data = self._next_data()
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1328, in _next_data
2023-07-07T10:27:36.047000+00:00     idx, data = self._get_data()
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1294, in _get_data
2023-07-07T10:27:36.047000+00:00     success, data = self._try_get_data()
2023-07-07T10:27:36.047000+00:00   File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1145, in _try_get_data
2023-07-07T10:27:36.047000+00:00     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
2023-07-07T10:27:36.047000+00:00 RuntimeError: DataLoader worker (pid(s) 37) exited unexpectedly

We hope this helps

Best SEG.A. Team

Re: The algorithm failed on one or more cases.  

  By: jesus.alzateg on July 11, 2023, 1:09 p.m.

Thanks for your answer, I have made a new submission, could you tell me what is the error (July 11)?

Best Regards,

Alejandro.

Re: The algorithm failed on one or more cases.  

  By: giansteve on July 11, 2023, 1:39 p.m.

Hi Alejandro,

from your new submission I can only see the Stderr, that I paste here:

2023-07-11T13:05:28.227000+00:00 Traceback (most recent call last):
2023-07-11T13:05:28.227000+00:00   File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
2023-07-11T13:05:28.227000+00:00     return _run_code(code, main_globals, None,
2023-07-11T13:05:28.227000+00:00   File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
2023-07-11T13:05:28.227000+00:00     exec(code, run_globals)
2023-07-11T13:05:28.227000+00:00   File "/opt/app/evaluation.py", line 227, in <module>
2023-07-11T13:05:28.227000+00:00     Segaeval().evaluate()
2023-07-11T13:05:28.227000+00:00   File "/home/user/.local/lib/python3.8/site-packages/evalutils/evalutils.py", line 416, in evaluate
2023-07-11T13:05:28.227000+00:00     self.score()
2023-07-11T13:05:28.227000+00:00   File "/opt/app/evaluation.py", line 119, in score
2023-07-11T13:05:28.227000+00:00     [self.score_case(case=case)]
2023-07-11T13:05:28.227000+00:00   File "/opt/app/evaluation.py", line 167, in score_case
2023-07-11T13:05:28.227000+00:00     hausdorff_distance.Execute(gt, pred)
2023-07-11T13:05:28.227000+00:00   File "/home/user/.local/lib/python3.8/site-packages/SimpleITK/SimpleITK.py", line 31430, in Execute
2023-07-11T13:05:28.227000+00:00     return _SimpleITK.HausdorffDistanceImageFilter_Execute(self, image1, image2)
2023-07-11T13:05:28.227000+00:00 RuntimeError: Exception thrown in SimpleITK HausdorffDistanceImageFilter_Execute: /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.3/itkDirectedHausdorffDistanceImageFilter.hxx:144:
2023-07-11T13:05:28.227000+00:00 ITK ERROR: pixelcount is equal to 0

It appears that the evaluation step failed due to an image with pixelcount 0

Best Gian Marco

 Last edited by: apepe on Aug. 15, 2023, 12:58 p.m., edited 1 time in total.

Re: The algorithm failed on one or more cases.  

  By: jesus.alzateg on July 11, 2023, 3:55 p.m.

Thank you, does that mean the algorithm doesn't predict anything in one of test images? When I evaluate it locally it works equal to my previous successful submissions.

Best Regards,

Alejandro.

Re: The algorithm failed on one or more cases.  

  By: jesus.alzateg on July 13, 2023, 5:27 p.m.

Hi, I would like to know what are the errors of my last submission (July 13)?

Best regards Alejandro.

Re: The algorithm failed on one or more cases.  

  By: giansteve on July 14, 2023, 6:45 a.m.

Hello Jesus,

your last submission produced the following error message: RuntimeError: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!

with the following Traceback error message: Traceback (most recent call last):

2023-07-13T16:31:27.242000+00:00   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-07-13T16:31:27.242000+00:00     return _run_code(code, main_globals, None,
2023-07-13T16:31:27.242000+00:00   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
2023-07-13T16:31:27.242000+00:00     exec(code, run_globals)
2023-07-13T16:31:27.242000+00:00   File "/opt/app/process.py", line 299, in <module>
2023-07-13T16:31:27.242000+00:00     Segaalgorithm().process()
2023-07-13T16:31:27.242000+00:00   File "/home/user/.local/lib/python3.10/site-packages/evalutils/evalutils.py", line 183, in process
2023-07-13T16:31:27.242000+00:00     self.process_cases()
2023-07-13T16:31:27.242000+00:00   File "/home/user/.local/lib/python3.10/site-packages/evalutils/evalutils.py", line 191, in process_cases
2023-07-13T16:31:27.242000+00:00     self._case_results.append(self.process_case(idx=idx, case=case))
2023-07-13T16:31:27.242000+00:00   File "/opt/app/process.py", line 68, in process_case
2023-07-13T16:31:27.242000+00:00     predictions = self.predict(input_image_path=input_image_file_path)
2023-07-13T16:31:27.242000+00:00   File "/opt/app/process.py", line 232, in predict
2023-07-13T16:31:27.242000+00:00     predictor.predict_from_files(input_folder,
2023-07-13T16:31:27.242000+00:00   File "/opt/app/nnUNet/nnunetv2/inference/predict_from_raw_data.py", line 249, in predict_from_files
2023-07-13T16:31:27.242000+00:00     return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export)
2023-07-13T16:31:27.242000+00:00   File "/opt/app/nnUNet/nnunetv2/inference/predict_from_raw_data.py", line 342, in predict_from_data_iterator
2023-07-13T16:31:27.242000+00:00     for preprocessed in data_iterator:
2023-07-13T16:31:27.242000+00:00   File "/opt/app/nnUNet/nnunetv2/inference/data_iterators.py", line 109, in preprocessing_iterator_fromfiles
2023-07-13T16:31:27.242000+00:00     raise RuntimeError('Background workers died. Look for the error message further up! If there is '
2023-07-13T16:31:27.242000+00:00 RuntimeError: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!

Best Gian Marco