Reason of failed submission on final test

Reason of failed submission on final test  

  By: Jarartur on Aug. 15, 2024, 7:55 p.m.

Hello, My final submission seems to have failed, but I do not have any logs telling me why. Is it possible to know if it's due to time limit, OOM, or other reason?

Is it possible to view the logs so that I can see which stage needs optimization?

From statistics on the algorithm's page, I calculated that the algorithm succeeded in 28/30 cases. Is that right? Is it possible to know at what stage they were?

Kind regards, Artur

 Last edited by: Jarartur on Aug. 15, 2024, 8:12 p.m., edited 3 times in total.

Re: Reason of failed submission on final test  

  By: YSang on Aug. 16, 2024, 1:28 a.m.

It was due to the time limit being exceeded in a few cases.

https://grand-challenge.org/algorithms/new-model/jobs/9bff7a4a-9d24-4a76-bc0d-e01618ce0452/

Please find it from: your profile page ➡️ your algorithm ➡️ results ➡️ ℹ️ ➡️ logs.

Re: Reason of failed submission on final test  

  By: Jarartur on Aug. 16, 2024, 6:51 a.m.

Thank you for this swift reply!

Unfortunately, I cannot see the link you have sent, it says:

Forbidden You do not have permission to access this content.

Which I think is due to GC policy. Nevertheless, thank you for pointing me in the right direction.

Re: Reason of failed submission on final test  

  By: Jarartur on Aug. 16, 2024, 4:26 p.m.

Hello, May I bother you again for information as to why my submission failed? All the preliminary cases finished under 6 mins, so I don't know what can be the problem anymore.

Any view into the logs of the failed cases would be really helpful, as all the cases I trained on and all the preliminary cases are fine and well under the limit.

Kind regards, Artur

 Last edited by: Jarartur on Aug. 16, 2024, 4:27 p.m., edited 1 time in total.

Re: Reason of failed submission on final test  

  By: YSang on Aug. 16, 2024, 4:31 p.m.

Sure. This happened to several cases:

Error Message

Output directory 'images/pelvic-fracture-ct-segmentation' is empty

Stdout

2024-08-16T16:10:55.510000+00:00 
2024-08-16T16:10:55.510000+00:00 
2024-08-16T16:10:55.510000+00:00 Please cite the following paper when using FracSegNet:
2024-08-16T16:10:55.510000+00:00 
2024-08-16T16:10:55.510000+00:00  Y. Liu, S. yibulayimu, Y. Sang, et al.,"Pelvic Fracture Segmentation Using a Multi-scale Distance-Weighted Neural Network." MICCAI 2023. https://doi.org/10.1007/978-3-031-43996-4_30
2024-08-16T16:10:55.510000+00:00 
2024-08-16T16:10:55.510000+00:00 FracSegNet Detail in: https://github.com/YzzLiu/FracSegNet
2024-08-16T16:10:55.510000+00:00 Thanks to Febian, et al.'s excellent work on nn-UNet. Detail in: https://github.com/MIC-DKFZ/nnUNet
2024-08-16T16:10:55.510000+00:00 
2024-08-16T16:10:56.510000+00:00 nnUNet_raw_data_base is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
2024-08-16T16:10:56.510000+00:00 nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
2024-08-16T16:10:56.510000+00:00 empty cuda cache...
2024-08-16T16:10:56.510000+00:00 weights_tranerV2 =  [0.53333333 0.26666667 0.13333333 0.06666667 0.        ]
2024-08-16T16:10:58.511000+00:00 begin preprocessing...
2024-08-16T16:10:58.511000+00:00 [PosixPath('/tmp/images/pelvic-fracture-ct-segmentation/input/0aa4b544-df8f-49c5-8490-66ac3b266709_0000.mha')]
2024-08-16T16:10:58.511000+00:00 using preprocessor GenericPreprocessor
2024-08-16T16:11:03.514000+00:00 before crop: (1, 350, 512, 512) after crop: (1, 350, 512, 512) spacing: [0.7989502  0.97600001 0.97600001] 
2024-08-16T16:11:03.514000+00:00 
2024-08-16T16:11:04.514000+00:00 cupy!
2024-08-16T16:12:13.531000+00:00 no separate z, order 3
2024-08-16T16:12:15.532000+00:00 cupy!
2024-08-16T16:12:16.532000+00:00 no separate z, order 1
2024-08-16T16:12:19.533000+00:00 before: {'spacing': array([0.7989502 , 0.97600001, 0.97600001]), 'spacing_transposed': array([0.7989502 , 0.97600001, 0.97600001]), 'data.shape (data is transposed)': (1, 350, 512, 512)} 
2024-08-16T16:12:19.533000+00:00 after:  {'spacing': array([1.77334511, 1.73180876, 1.73180876]), 'data.shape (data is resampled)': (1, 158, 289, 289)} 
2024-08-16T16:12:19.533000+00:00 
2024-08-16T16:12:19.533000+00:00 begin prediction...
2024-08-16T16:12:19.533000+00:00 debug: mirroring False mirror_axes (0, 1, 2)
2024-08-16T16:12:19.534000+00:00 step_size: 0.5
2024-08-16T16:12:19.534000+00:00 do mirror: False
2024-08-16T16:12:19.534000+00:00 data shape: (1, 158, 289, 289)
2024-08-16T16:12:19.534000+00:00 patch size: [ 96 160 128]
2024-08-16T16:12:19.534000+00:00 steps (x, y, and z): [[0, 31, 62], [0, 64, 129], [0, 54, 107, 161]]
2024-08-16T16:12:19.534000+00:00 number of tiles: 36
2024-08-16T16:12:19.534000+00:00 computing Gaussian
2024-08-16T16:12:31.537000+00:00 prediction done
2024-08-16T16:12:31.537000+00:00 force_separate_z: None interpolation order: 1
2024-08-16T16:12:31.537000+00:00 separate z: False lowres axis None
2024-08-16T16:12:31.537000+00:00 cupy!
2024-08-16T16:12:31.5
```37000+00:00 no separate z, order 1
2024-08-16T16:12:37.539000+00:00 empty cuda cache...
2024-08-16T16:12:37.539000+00:00 begin preprocessing...
2024-08-16T16:12:37.539000+00:00 [PosixPath('/tmp/images/pelvic-fracture-ct-segmentation/input/0aa4b544-df8f-49c5-8490-66ac3b266709_0000.mha')]
2024-08-16T16:12:37.539000+00:00 using preprocessor GenericPreprocessor
2024-08-16T16:12:42.540000+00:00 before crop: (1, 350, 512, 512) after crop: (1, 350, 512, 512) spacing: [0.7989502  0.97600001 0.97600001] 
2024-08-16T16:12:42.540000+00:00 
2024-08-16T16:12:43.540000+00:00 cupy!
2024-08-16T16:12:43.541000+00:00 no separate z, order 3
2024-08-16T16:12:44.541000+00:00 cupy!
2024-08-16T16:12:45.541000+00:00 no separate z, order 1
2024-08-16T16:12:46.542000+00:00 before: {'spacing': array([0.7989502 , 0.97600001, 0.97600001]), 'spacing_transposed': array([0.7989502 , 0.97600001, 0.97600001]), 'data.shape (data is transposed)': (1, 350, 512, 512)} 
2024-08-16T16:12:46.542000+00:00 after:  {'spacing': array([0.79998779, 0.78125   , 0.78125   ]), 'data.shape (data is resampled)': (1, 350, 640, 640)} 
2024-08-16T16:12:46.542000+00:00 
2024-08-16T16:12:48.542000+00:00 cupy!
2024-08-16T16:12:52.543000+00:00 begin prediction...
2024-08-16T16:12:52.543000+00:00 debug: mirroring False mirror_axes (0, 1, 2)
2024-08-16T16:12:52.543000+00:00 step_size: 0.5
2024-08-16T16:12:52.543000+00:00 do mirror: False
2024-08-16T16:12:52.543000+00:00 data shape: (5, 350, 640, 640)
2024-08-16T16:12:52.543000+00:00 patch size: [ 96 160 128]
2024-08-16T16:12:52.543000+00:00 steps (x, y, and z): [[0, 42, 85, 127, 169, 212, 254], [0, 80, 160, 240, 320, 400, 480], [0, 64, 128, 192, 256, 320, 384, 448, 512]]
2024-08-16T16:12:52.544000+00:00 number of tiles: 441
2024-08-16T16:12:52.544000+00:00 computing Gaussian
2024-08-16T16:15:13.580000+00:00 prediction done
2024-08-16T16:15:13.580000+00:00 force_separate_z: None interpolation order: 1
2024-08-16T16:15:13.580000+00:00 separate z: False lowres axis None
2024-08-16T16:15:13.580000+00:00 cupy!
2024-08-16T16:15:16.581000+00:00 no separate z, order 1
2024-08-16T16:15:22.583000+00:00 nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
2024-08-16T16:15:22.583000+00:00 nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
2024-08-16T16:15:22.583000+00:00 START
2024-08-16T16:15:22.583000+00:00 Start prediction
2024-08-16T16:15:22.583000+00:00 nnUNet segmentation starting!
2024-08-16T16:15:24.584000+00:00 Namespace(predictions=PosixPath('/tmp/images/pelvic-fracture-ct-segmentation/fractures'), anatomical_seg=PosixPath('/tmp/images/pelvic-fracture-ct-segmentation/anatomical'), output=PosixPath('/output/images/pelvic-fracture-ct-segmentation'), threshold=0.0001)

Stderr

2024-08-16T16:10:58.511000+00:00 /home/user/.local/lib/python3.11/site-packages/nnunet/training/model_restore.py:213: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-08-16T16:10:58.511000+00:00   params = torch.load((join(folder, model_file)), map_location=torch.device('cuda', torch.cuda.current_device()))
2024-08-16T16:11:05.515000+00:00 /home/user/.local/lib/python3.11/site-packages/cupy/cuda/compiler.py:233: PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...
2024-08-16T16:11:05.515000+00:00   jitify._init_module()
2024-08-16T16:12:19.533000+00:00 /home/user/.local/lib/python3.11/site-packages/nnunet/training/network_training/network_trainer.py:391: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
2024-08-16T16:12:19.533000+00:00   self.amp_grad_scaler = GradScaler()
2024-08-16T16:12:19.534000+00:00 /home/user/.local/lib/python3.11/site-packages/nnunet/network_architecture/neural_network.py:128: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2024-08-16T16:12:19.534000+00:00   with context():
2024-08-16T16:12:37.539000+00:00 /home/user/.local/lib/python3.11/site-packages/nnunet/training/model_restore.py:213: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-08-16T16:12:37.539000+00:00   params = torch.load((join(folder, model_file)), map_location=torch.device('cuda', torch.cuda.current_device()))
2024-08-16T16:15:17.581000+00:00 Traceback (most recent call last):
2024-08-16T16:15:17.581000+00:00   File "/opt/app/resources/src/fracSegNet/inference/predict_single.py", line 197, in <module>
2024-08-16T16:15:17.581000+00:00     predict_single_case(str(model_folder), file_name, output_file_name_cascade, cascade=True)
2024-08-16T16:15:17.581000+00:00   File "/opt/app/resources/src/fracSegNet/inference/predict_single.py", line 175, in predict_single_case
2024-08-16T16:15:17.581000+00:00     return remove_component_and_save(softmax, output_filename, dct, 1, None, None, None, npz_file, rm_components=clean_up, save_stl=save_stl)
2024-08-16T16:15:17.581000+00:00            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-16T16:15:17.581000+00:00   File "/home/user/.local/lib/python3.11/site-packages/nnunet/inference/segmentation_export.py", line 124, in remove_component_and_save
2024-08-16T16:15:17.581000+00:00     seg_old_spacing = resample_data_or_seg(segmentation_softmax, shape_original_after_cropping, is_seg=False,
2024-08-16T16:15:17.581000+00:00                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-16T16:15:17.581000+00:00   File "/home/user/.local/lib/python3.11/site-packages/nnunet/preprocessing/preprocessing.py", line 219, in resample_data_or_seg
2024-08-16T16:15:17.581000+00:00     reshaped_final_data = np.vstack(reshaped)
2024-08-16T16:15:17.581000+00:00                           ^^^^^^^^^^^^^^^^^^^
2024-08-16T16:15:17.582000+00:00   File "/home/user/.local/lib/python3.11/site-packages/cupy/_manipulation/join.py", line 129, in vstack
2024-08-16T16:15:17.582000+00:00     return concatenate([cupy.atleast_2d(m) for m in tup], 0,
2024-08-16T16:15:17.582000+00:00            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-16T16:15:17.582000+00:00   File "/home/user/.local/lib/python3.11/site-packages/cupy/_manipulation/join.py", line 60, in concatenate
2024-08-16T16:15:17.582000+00:00     return _core.concatenate_method(tup, axis, out, dtype, casting)
2024-08-16T16:15:17.582000+00:00            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-16T16:15:17.582000+00:00   File "cupy/_core/_routines_manipulation.pyx", line 586, in cupy._core._routines_manipulation.concatenate_method
2024-08-16T16:15:17.582000+00:00   File "cupy/_core/_routines_manipulation.pyx", line 658, in cupy._core._routines_manipulation.concatenate_method
2024-08-16T16:15:17.582000+00:00   File "cupy/_core/core.pyx", line 135, in cupy._core.core.ndarray.__new__
2024-08-16T16:15:17.582000+00:00   File "cupy/_core/core.pyx", line 223, in cupy._core.core._ndarray_base._init
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 738, in cupy.cuda.memory.alloc
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 1424, in cupy.cuda.memory.MemoryPool.malloc
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 1445, in cupy.cuda.memory.MemoryPool.malloc
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 1116, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 1137, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 1382, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
2024-08-16T16:15:17.582000+00:00   File "cupy/cuda/memory.pyx", line 1385, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
2024-08-16T16:15:17.582000+00:00 cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 3,670,016,000 bytes (allocated so far: 10,505,422,848 bytes).
2024-08-16T16:15:19.582000+00:00 Traceback (most recent call last):
2024-08-16T16:15:19.583000+00:00   File "/opt/app/resources/src/move_file.py", line 17, in <module>
2024-08-16T16:15:19.583000+00:00     convert(input_path, output_path, name)
2024-08-16T16:15:19.583000+00:00   File "/opt/app/resources/src/move_file.py", line 6, in convert
2024-08-16T16:15:19.583000+00:00     input_path = list(input_path.glob('*.mha'))[0]
2024-08-16T16:15:19.583000+00:00                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
2024-08-16T16:15:19.583000+00:00 IndexError: list index out of range
2024-08-16T16:15:22.583000+00:00 Traceback (most recent call last):
2024-08-16T16:15:22.583000+00:00   File "/opt/app/resources/src/fracture-surfaces/process.py", line 75, in <module>
2024-08-16T16:15:22.583000+00:00     Pengwin_baseline(args.input, args.output, args.model).process()
2024-08-16T16:15:22.584000+00:00   File "/opt/app/resources/src/fracture-surfaces/process.py", line 62, in process
2024-08-16T16:15:22.584000+00:00     self.predict()
2024-08-16T16:15:22.584000+00:00   File "/opt/app/resources/src/fracture-surfaces/process.py", line 34, in predict
2024-08-16T16:15:22.584000+00:00     ct_mha, seg_mha = subfiles(self.input_path, suffix='.mha')
2024-08-16T16:15:22.584000+00:00     ^^^^^^^^^^^^^^^
2024-08-16T16:15:22.584000+00:00 ValueError: not enough values to unpack (expected 2, got 1)
2024-08-16T16:15:24.584000+00:00 
2024-08-16T16:15:24.584000+00:00 0it [00:00, ?it/s]
2024-08-16T16:15:24.584000+00:00 0it [00:00, ?it/s]
 Last edited by: YSang on Aug. 16, 2024, 4:35 p.m., edited 4 times in total.

Re: Reason of failed submission on final test  

  By: Jarartur on Aug. 16, 2024, 5:18 p.m.

Thank you, that is really helpful — sure enough it is CUDA OOM error, but now I know where to look, thanks!