Troubleshooting Failed Algorithm Submissions

Troubleshooting Failed Algorithm Submissions ¶

By: joeran.bosma on July 13, 2022, 10:31 a.m.

Hi all,

We noticed some algorithm submissions have failed on the Open Development Phase - Validation and Tuning leaderboard, due to inference errors during one or multiple cases. As challenge organizers we can help troubleshoot these submissions. To see the logs on the validation cases, we need access to the submitted algorithms. To initiate troubleshooting your algorithm, please contact Anindo Saha (anindya.shaha@radboudumc.nl) and me (joeran.bosma@radboudumc.nl) with a link to your algorithm. We will then request access, and after you accept our request, we will check out the inference logs.

For general information about algorithm submissions, please refer to the "Submission of Inference Containers to the Open Development Phase" section on our challenge website.

Kind regards, Joeran

Last edited by: anindo on Aug. 15, 2023, 12:56 p.m., edited 2 times in total.

Re: Troubleshooting Failed Algorithm Submissions ¶

By: svesal on July 26, 2022, 7:28 a.m.

Hi,

We got the following error when running ./test.sh on linux? any update/fix for this.

RuntimeError: Exception thrown in SimpleITK ImageFileReader_Execute: /tmp/SimpleITK/Code/IO/src/sitkImageReaderBase.cxx:105: sitk::ERROR: Unable to determine ImageIO reader for "/input/images/transverse-t2-prostate-mri/10032_1000032_t2w.mha"

Best, Sulaiman

Re: Troubleshooting Failed Algorithm Submissions ¶

By: anindo on July 26, 2022, 9:58 a.m.

Hi Sulaiman,

Let's assume that you're using the baseline U-Net (semi-supervised) GC algorithm template, as your base GC algorithm template. In that case, when you execute the script 'picai_unet_semi_supervised_gc_algorithm/test.sh' on Linux, your system expects all the files in 'picai_unet_semi_supervised_gc_algorithm/test/' (i.e. input images + expected output predictions for the given algorithm) to exist for testing purposes.

For example, when building and testing the aforementioned baseline U-Net (semi-supervised) algorithm container, the file 'picai_unet_semi_supervised_gc_algorithm/test/input/images/transverse-t2-prostate-mri/100321000032_t2w.mha' is mapped to '/input/images/transverse-t2-prostate-mri/100321000032_t2w.mha' within the context of the Docker container, which is subsequently read by Line 171, 'picai_unet_semi_supervised_gc_algorithm/process.py in the built container.

Based on your error message, I would presume that perhaps these testing files don't exist in the right directory when building and testing your Docker container, leading to invalid file paths for SimpleITK to subsequently read inside the built algorithm container. At our end, we have had no issues running './test.sh' on Linux for all six of our provided baseline AI solutions.

Hope this helps.

Last edited by: anindo on Aug. 15, 2023, 12:57 p.m., edited 3 times in total.

Re: Troubleshooting Failed Algorithm Submissions ¶

By: svesal on July 26, 2022, 10:20 p.m.

Hi @Anindo,

Thank you for prompt response. The test data exists in the right directory and we tested on both Windows and Linux but same error appear. We didn't modify anything and basically ran the ./build.sh and ./test.sh.

It seems the data don't get copied to docker space.

docker run --rm -v picai_baseline_unet_processor-output:/output/ -v C:\PICAI_2022\picai_unet_semi_supervised_gc_algorithm\test\:/input/
OError: [Errno 2] No such file or directory: '/output/cspca-case-level-likelihood.json'

Re: Troubleshooting Failed Algorithm Submissions ¶

By: anindo on July 27, 2022, 8:04 a.m.

Hi Sulaiman,

I've re-uploaded the expected output files to picai_unet_semi_supervised_gc_algorithm now, and tried it out myself on Windows and Linux. Works as expected.

Please try out the following steps:

Clone a copy of this repo with Git LFS initialized (e.g. this can be done directly by cloning the repo using GitHub Desktop). Downloading a zipped copy of the repo from the web interface will not work, as the model weights have been uploaded via Git LFS.
Navigate to the root directory of the repo and execute 'test.bat' (on Windows) or 'test.sh' (on Linux/macOS). Should work now.
If it doesn't work, please share a copy of the full console output (not just the truncated error message) here, or directly via e-mail at anindya.shaha@radboudumc.nl. I'll have a look and assist further.

Hope this helps.

Last edited by: anindo on Aug. 15, 2023, 12:57 p.m., edited 2 times in total.

Re: Troubleshooting Failed Algorithm Submissions ¶

By: svesal on July 28, 2022, 2:55 a.m.

Hi Anindo,

Thank you, that worked like a charm both on Windows and Linux on my side too.

Best, Sulaiman

Re: Troubleshooting Failed Algorithm Submissions ¶

By: sakina on Sept. 2, 2022, 2:26 a.m.

Hi,

It is not an error, but I get this message in the end after executing test.sh file:

Success! 5 model(s) have been initialized. ---------------------------------------------------------------------------------------------------- Preprocessing Images ... Complete. Generating Predictions ... ('max. difference between prediction and reference:', 0.54458225) Expected detection map was not found... Found case-level prediction 0.502765476704, expected 0.526926517487 Expected case-level prediction was not found... picai_baseline_unet_processor-output-cc63196333**

This was while using the provided repo template for U-Net supervised. I had got something similar for U-Net semi-supervised too. I am very curious why so. Also when using your baseline models only and training, there is still a huge gap between our performances. I would appreciate your help in understanding this too.

Last edited by: anindo on Aug. 15, 2023, 12:57 p.m., edited 1 time in total.

Re: Troubleshooting Failed Algorithm Submissions ¶

By: anindo on Sept. 4, 2022, 2:39 p.m.

Hi Sakina,

Let's assume that you're using the baseline U-Net (semi-supervised) GC algorithm template, as your base GC algorithm template. In that case, when you execute the script 'picai_unet_semi_supervised_gc_algorithm/test.sh' on Linux/macOS, your system expects all the files in 'picai_unet_semi_supervised_gc_algorithm/test/' (i.e. input images + expected output predictions for the given algorithm) to exist for testing purposes. Same applies for Windows-based systems, when executing 'picai_unet_semi_supervised_gc_algorithm/test.bat'.

Here, testing files cspca_detection_map.mha and cspca-case-level-likelihood.json are the expected output predictions only for our provided trained model weights. That's why these expected output predictions don't match that of your independently trained model, and that's basically what that error message is indicating. If you want to test the container, but for your own trained model, you need to replace those expected output prediction files with ones produced by your own model for that given input case (imaging + clinical information). In any case, testing before exporting your algorithm container is helpful for debugging, but it isn't a mandatory step and can be skipped.

"Also when using your baseline models only and training, there is still a huge gap between our performances. I would appreciate your help in understanding this too."

Assuming that you're using the same number of cases [1295 cases with human annotations (supervised) or 1295 cases with human annotations + 205 cases with AI annotations (semi-supervised)] preprocessed the same way; and have trained (default command with the same number of epochs, data augmentation, hyperparameters, model selection), 5-fold cross-validated (same splits as provided) and ensembled (using member models from all 5 folds) the baseline AI models the exact same way as indicated in the latest iteration of picai_baseline, your performance on the leaderboard should be similar to that of ours. Deviations may still exist owing to the stochasticity of optimizing DL models at train-time —due to which, the same AI architecture, trained on the same data, for the same number of training steps, can typically exhibit slightly different performance each time (Frankle et al., 2019).

Now you should also observe a substantial difference in performance between your 5-fold cross-validation metrics using the training dataset of 1500 cases, and that of your/our performance on the leaderboard using the hidden validation cohort of 100 cases. This is to be expected, due to the factors discussed here.

Hope this helps.

Last edited by: anindo on Aug. 15, 2023, 12:57 p.m., edited 7 times in total.

Re: Troubleshooting Failed Algorithm Submissions ¶

By: ibhatt on Nov. 16, 2022, 9:28 p.m.

Hi all,

I am getting this error since yesterday during submitting a new algorithm. The docker is uploaded and it's fine with different names and etc. However after uploading different versions still not able to submit our final model.

"A submission for this algorithm container image for this phase already exists."

Thanks, Best,

Re: Troubleshooting Failed Algorithm Submissions ¶

By: joeran.bosma on Nov. 16, 2022, 9:34 p.m.

Hi Indrani Bhattacharya,

Did you change the internal working of the Docker container? Its name or uploader do not define whether the container is "known" or not. You can add for example a simple print statement to process.py to make the Docker container unique.

Hope this helps, Joeran

Re: Troubleshooting Failed Algorithm Submissions ¶

By: matildebrovero on May 10, 2024, 5:21 p.m.

Hi, we are trying to make a submission but this always returns "failed for one or more cases". Before doing that we tested locally our algorithm using the ./tesh.sh script and we can see it successfully found all the output file in the docker container. After that we also tried the "try-out algorithm" section and it returns "succeed" but we cannot see the output file in the viewer because it says under "visibility" that "image and results are private". Is that okay or I must be able to see output images? If correct then I can’t figure out what’s going wrong with the submission itself (id submission: 1459aa1c-012f-4c8b-8af8-861e38c25abb). Thanks

Re: Troubleshooting Failed Algorithm Submissions ¶

By: joeran.bosma on May 13, 2024, 8:02 a.m.

Hi Matilde Brovero,

You are correct, the logs for the leaderboards are hidden, to protect against any leakage of the data or labels. I've requested access to the algorithm you submitted. If you grant me access, I can see the logs, and send you a message with the same.

Best, Joeran

Re: Troubleshooting Failed Algorithm Submissions ¶

By: matildebrovero on May 13, 2024, 8:10 a.m.

I've accepted your request, thanks!

Re: Troubleshooting Failed Algorithm Submissions ¶

By: TalaShaar on Nov. 17, 2024, 10:43 p.m.

Hello,

I am trying to submit my algorithm and I am getting an error "Failed on one or more cases". But, I am not being able to access the logs in order to troubleshoot the problem. Submission id is: 597c3339-97de-4f0b-a6f7-e059caac13d6 It showed that the algorithm was successful on 99 cases and failed on 2. Could you please help me with access the error logs? Kindly appreciated

Re: Troubleshooting Failed Algorithm Submissions ¶

By: anindo on Nov. 19, 2024, 2:10 p.m.

Hi TalaShaar,

Your algorithm successfully made predictions for 99 cases, but failed to make a prediction for a single case where inference took over 20 minutes (which is roughly the time limit asserted for inference per case in this challenge). Notably, this is also the case in the Hidden Tuning Cohort with the largest field-of-view, image size and memory footprint. We recommend that you review all steps in your algorithm (where the processing time of a given step does not scale well with the size of the input images) and make modifications accordingly, such that inference time is strictly limited to under 20 minutes per case even for larger images.

Hope this helps.