Submission Failure: Error Analysis and Possibility of Modifying requirements.txt

Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: chpark on April 14, 2025, 4:24 a.m.

I have built an initial version of the training pipeline and successfully registered an algorithm for the submission sanity check. However, the submission failed with the message: "The algorithm failed on one or more cases." (Prior to submission, I have verified that the algorithm runs successfully using do_test_run.sh in the provided environment without any issues.)

I suspect that the most likely reason for this issue is that I modified the provided requirements.txt file to match the library versions used in my local training environment when building the Docker image, as shown below:

--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.4.1
torchvision==0.19.1
numpy==2.2.4
scipy==1.15.2
pandas==2.2.3
SimpleITK==2.4.1
tqdm

In this regard, I would like to kindly ask the following:

1) Is it possible to obtain a more detailed error log or analysis to identify the exact cause of the failure?

2) Am I allowed to modify the fixed versions in the provided requirements.txt file, or is there any flexibility to adjust specific library versions if necessary?

I would appreciate your guidance on this matter. Please let me know if you need any additional information from my side.

Thank you.

 Last edited by: chpark on April 14, 2025, 4:50 a.m., edited 6 times in total.

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: drepeeters on April 14, 2025, 6:41 a.m.

Hi Chpark,

You can get a more detailed error log on the results page of your algorithm. You can find the results of your latest submission here.

The latest submission fails due to the following issue RuntimeError: Attempting to deserialize object on CUDA device 4 but torch.cuda.device_count() is 1.. This happens when a model checkpoint was saved on a machine with multiple GPU's, and in our submission environment you only have access to 1 GPU (cuda:0). In this scenario, PyTorch cannot automatically map the device of the checkpoint.

You could modify the line where the checkpoint is loaded to include map_location and use something like:
ckpt = torch.load(os.path.join(self.model_root, self.model_name, "model.pth"), map_location="cuda:0")
Please check the PyTorch documentation for specific information on the use of map_location here.

Regarding the changes in the requirements.txt, I do not see an issue with adjusting towards specific library versions. The current requirements.txt just ensures compatibility with our baseline algorithms.

Hopefully this answers your questions. Feel free to reach out again if necessary.

Kind regards,
Dre Peeters

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: chpark on April 14, 2025, 7:22 a.m.

Thank you very much for your kind and detailed response.

I would like to let you know that I am unable to access the first link provided as "here" in your reply (the link for checking the detailed error log on the results page). It seems I do not have the necessary permission or access rights to view that page.

In order to handle similar issues more efficiently in future submissions, I would like to ask if there is any way for me to directly access or download the detailed error logs for my submission.

Additionally, I appreciate your guidance regarding the use of map_location in the checkpoint loading process. I will proceed accordingly based on your suggestions.

Thank you for your support.

Best regards, Changhyun

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: chpark on April 14, 2025, 9:01 a.m.

I just learned how to use the Algorithms and Results pages, and I was able to check the details regarding the issue I asked about. Thank you for your support.

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: sriramgs on May 5, 2025, 10:50 a.m.

Dear Organising Team,

I had tried to upload the Docker container during the sanity check phase. It ran without errors, and I used the same image to submit it under the Open Development Phase. But it had failed, and the error is displayed as - "The algorithm failed on one or more cases." I cannot see the error logs in the results section. Please guide.

Thanks, Sriram.

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: drepeeters on May 7, 2025, 6:57 a.m.

Dear Sriram,

Sorry for the late reply. Here is the error message:

2025-05-05T08:56:29.371000+00:00 Traceback (most recent call last):  
2025-05-05T08:56:29.371000+00:00   File "/opt/app/inference.py", line 214, in <module>  
2025-05-05T08:56:29.371000+00:00     raise SystemExit(run())  
2025-05-05T08:56:29.371000+00:00   File "/opt/app/inference.py", line 156, in run  
2025-05-05T08:56:29.371000+00:00     malignancy_risks = processor.process()  
2025-05-05T08:56:29.371000+00:00   File "/opt/app/inference.py", line 110, in process  
2025-05-05T08:56:29.372000+00:00     output = self.predict(image, coords)  
2025-05-05T08:56:29.372000+00:00   File "/opt/app/inference.py", line 75, in predict  
2025-05-05T08:56:29.372000+00:00     file_path = [os.path.join(os.path.join('/tmp', 'our_tmp'), os.listdir(os.path.join('/tmp', 'our_tmp'))[0])]
2025-05-05T08:56:29.372000+00:00 IndexError: list index out of range

You can also see these Logs yourself, if you go to Results section on your algorithm page.
If you need further help, please let me know.

Kind regards,
Dre

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: sriramgs on May 8, 2025, 5:43 a.m.

Dear Dre,

Thanks for providing the logs. Actually, I checked the results section of my algorithm. But the latest entry is related to the 'Sanity Check' submission. After that, I had made submission to 'Open Development Phase' and that is not displaying. Would you please tell me where to find the logs in the portal? I have attached a pic for your reference.

Thanks,

Sriram.

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: drepeeters on May 8, 2025, 6:49 a.m.

Dear Sriram,

Could you maybe try to refresh your Results page. I am seeing a lot more items in the table. See image:

If I click the on the Result button of the last item in the above screenshot (the one that failed), I am forwarded to the Algorithm Result where I can check the Logs to view the error message:

Please let me know if this won't work for you and I will check further.

Kind regards,
Dre

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: sriramgs on May 9, 2025, 1:14 p.m.

Dear Dre,

I checked again but could not see the entry on the results page. Also, I manually entered the job number in the URL and tried to load the results, but I am getting a forbidden error.

Thanks,

Sriram.

Re: Submission Failure: Error Analysis and Possibility of Modifying requirements.txt  

  By: drepeeters on May 9, 2025, 3:03 p.m.

Dear Sriram,

I've asked for clarification from the Grand-Challenge support team. I will let you know once I hear from them.

Kind regards, Dre