Unable to access large gpu even when they are free and people are using them

Unable to access large gpu even when they are free and people are using them  

  By: agaldran on March 6, 2024, 9:15 p.m.

Hello!

So this has been happening to me all day: I launch a jobman process requesting large-gpu, and it remains waiting forever. However, when I check the queue, there are like two or three users of that resource, so I should be able to access it, right? In the attached screenshot shows my current situation. The most annoying thing is that all day I have been seeing that resource being used by a variable number of users, which means that the problem is on my end?

I have been emailing the organizers all day, asking what could be happenning, but no answer. I thought that maybe some caritative soul in the forum has encountered the same situation and knows how to resolve it? It's very frustrating that, so close to the deadline, I am only able to train toy models...

Re: Unable to access large gpu even when they are free and people are using them  

  By: lWM on March 6, 2024, 9:57 p.m.

I think that there are only 2 large-gpu resources available. The real numbers are lower than the ones mentioned on GitHub.

Re: Unable to access large gpu even when they are free and people are using them  

  By: agaldran on March 6, 2024, 11:33 p.m.

Oooohhh that makes sense! It would have been a time saver, had I known that information. Well, this (together with the ranking changing chaotically with a couple of mistaken predictions) adds to the fun of this competition :) Thanks IWM!

Re: Unable to access large gpu even when they are free and people are using them  

  By: lWM on March 7, 2024, 9:33 a.m.

Indeed - the challenge is really a grand-challenge. All these tasks are enormously difficult and reaching even the minimum score (0.6) requires a lot of work and computational resources. It seems that all solutions, except the ones for lung, are more or less random noise since the difference between score 0.6 and 0.7 can be made by one or two cases.

In terms of the computational resources - I think they were decreased on purpose since the final number of participants of the championship phase seems to be lower than expected.

 Last edited by: lWM on March 7, 2024, 10:24 a.m., edited 1 time in total.