question of "The algorithm failed on one or more cases"error

question of "The algorithm failed on one or more cases"error  

  By: xiehuhan on March 27, 2024, 10:01 a.m.

Hello,when I submit and evaluate my algorithm, it shows "The algorithm failed on one or more cases." However, I couldn't find what error caused this issue. Because I can run perfectly using ./test_run.sh locally, I suspect that the runtime of individual cases may exceed the time limit specified in the rules. Could you please provide me with some assistance to help me understand the reason for the evaluation failure?Or if participants in other submitted algorithms have experienced the same problem, could you share your experience in dealing with it?

 Last edited by: xiehuhan on March 27, 2024, 10:01 a.m., edited 1 time in total.

Re: question of "The algorithm failed on one or more cases"error  

  By: gsaxner on March 27, 2024, 10:38 a.m.

Hi!

For the preliminary test phase, participants have access to all error and output messages the algorithm produces. Right now, it looks like there are no outputs (error messages or debugging messages) of your algorithm. Unfortunately, I also do not have any more information than those outputs about what might have caused the error.

To find out more about what causes the problem, could you

  • Let me know the approximate runtime of your algorithm on your system? Currently, we have a 10-minute time limit per case. If your algorithm needs substantially longer than that, it might be that it just times out.

  • Include some debugging outputs (e.g., print statements, try-catch blocks) throughout your algorithm, and submit again? This should help us narrow down where the error occurs.

Thank you!

Re: question of "The algorithm failed on one or more cases"error  

  By: xiehuhan on March 28, 2024, 4:22 a.m.

Thank you very much for your reply. It takes about nine minutes to deduce a case (960 images) on my machine. I wonder if you can see my running time. If it is really a matter of time, I will compress my model.

Re: question of "The algorithm failed on one or more cases"error  

  By: gsaxner on March 28, 2024, 9:45 a.m.

I can see that your algorithm times out after the 10-minute time limit. Since there are no error messages, and your model seems to run close to the time limit, it could be that the algorithm simply runs into the time cap.

It could also be that it somehow got caught in an infinite loop somewhere.

If you can speed up your inference and try again that would be great. Also, as per my previous post, maybe you can include some debug/print statements throughout the code to check if it is running as expected. You can, for example, use tqdm to display the progress of your algorithm or simply print the current frame number every 10 frames or so. This would also give us an intuition on whether and how much more runtime is needed for your algorithm.

Since runtime seems to be a concern for participants, we might consider increasing the 10-minute time limit to e.g. 15 minutes.

Thank you!

Re: question of "The algorithm failed on one or more cases"error  

  By: xiehuhan on March 30, 2024, 1:30 p.m.

Thank you very much for your suggestions. Regarding the issue of adding debug statements, I have already used the tqdm library to visualize the inference progress in the inference part of the code, and there are two print outputs in the data loading part (from the competition's GitHub code). However, it seems that my image does not output anything when it is running, which may indicate that an error occurred even before entering the preprocessing stage. Secondly, concerning the runtime issue, to test the effectiveness of the image, I loaded the packaged image using docker load on another workstation (RTX 3090), and it ran successfully with an inference time of 6 minutes for one case. I'm a novice at using Docker for the first time, and I suspect there might be some environmental conflicts, which is causing me a lot of confusion.

Re: question of "The algorithm failed on one or more cases"error  

  By: xiehuhan on March 31, 2024, 4:25 p.m.

I seem to have identified an issue. Is your test.sh script hardcoded to run the image with the name DOCKER_TAG="example-algorithm-preliminary-development-phase"? Because I noticed that in my initial submission, I modified the Docker tag to simple-exp1.

I want to test if this is the issue by re-uploading the image after changing the image name. However, the newly submitted image seems to be identified as the same image, as it shows "A submission for this algorithm container image for this phase already exists."

Additionally, I have submitted a simple video of running my image on the workstation:https://drive.google.com/file/d/1YZrVDktg54vo5T6ihDGgLpeTitgOWi9L/view

 Last edited by: xiehuhan on April 1, 2024, 12:21 a.m., edited 3 times in total.

Re: question of "The algorithm failed on one or more cases"error  

  By: gsaxner on April 2, 2024, 11:45 a.m.

Hi!

Yes, this indeed sounds like an issue with the Docker image. Since you changed the tag, did you also change the tag accordingly when selecting the image to save for uploading? I.e., in the docker save ... command? Otherwise, the .tar.gz will contain the old image, using the previous tag.

Best regards,

Christina

Re: question of "The algorithm failed on one or more cases"error  

  By: xiehuhan on April 3, 2024, 4:52 a.m.

Thank you for your patient response. After debugging the code, I suspect that the issue is most likely due to the runtime exceeding the time limit.

Could you please provide me with the names of the GPU platforms that are used for inference?This would give me a reference standard to quantify my model.

Re: question of "The algorithm failed on one or more cases"error  

  By: gsaxner on April 3, 2024, 2:36 p.m.

Sure! All algorithms are run on grand-challenge.org's AWS infrastructure. This uses a Nvidia T4 with 16GB GPU memory, and 4 or 8 CPUs with either 16 GB or 32 GB of CPU memory. You can find all the details about the runtime environment here.

Maybe you can also find some hints in the submission tips. Let me know if you require further help. Upon reasonable request, we can increase the runtime limit for algorithms (which is currently 10 minutes), but we need to know approximately by how much .

Re: question of "The algorithm failed on one or more cases"error  

  By: xiehuhan on April 4, 2024, 6:27 a.m.

Thank you for your understanding. Based on the current local runtime, I would like to allocate an additional five minutes of runtime for each case. Of course, if there is more redundancy in time, it would provide more opportunities to try out additional models.

Re: question of "The algorithm failed on one or more cases"error  

  By: gsaxner on April 4, 2024, 1:36 p.m.

Thanks for the feedback!

We have now increased the time limit for both phases from 10 to 20 minutes. I hope this helps you in completing your submission to the challenge.

Looking forward to your submission!