Useless hack of the leaderboard

Useless hack of the leaderboard  

  By: simon.j on Jan. 30, 2022, 10:06 p.m.

Given binary labels Y and continuous predictions X in [0, 1] , ROC-AUC can be computed as follows : AUC =p / (2 * N+ * N-) where N+ is the number of positives (Y=1), N- the number of negatives (Y=0), and p is the number of pairs (X1, X2) such as X1 > X2, Y1 = 1 and Y2 = 0 (well ranked pairs).

By looking at the different results leaderboard we can deduce that on the 200 patients there are 118 patients with COVID, and 30 severe cases.

 Last edited by: simon.j on Aug. 15, 2023, 12:55 p.m., edited 2 times in total.

Re: Useless hack of the leaderboard  

  By: simon.j on Jan. 30, 2022, 10:26 p.m.

I am very surprised that my last submission get the exact same AUC for the COVID prediction task than the proposed baseline algorithm. It means that both models ranked the patients exactly in the same way. For both tasks, the maximum achievable performance is not known, and is certainly not 100% AUC. Does this coincidence indicates that ~80% AUC may be the maximum achievable on this leaderboard dataset ? Or that both models are too much similar in their approach of the problem ? I would be interested to have your thoughts :)

Re: Useless hack of the leaderboard  

  By: simon.j on Jan. 31, 2022, 8:21 a.m.

Interestingly, the performance of radiologists to diagnose COVID from CT-scans on the STOIC database in also 80% (see Table 2. in https://pubs.rsna.org/doi/10.1148/radiol.2021210384). It reinforces the hypothesis that it is the maximum achievable on this cohort.

Other interesting facts from the paper for the severity prediction tasl (not linked to this thread) : - CT acquisitions were performed without contrast material administration except when pulmonary embolism was suspected as a confounding diagnosis to COVID-19 pneumonia at presentation. Injected CT are quite easy to detect on images so may be an indirect way to add the feature "suspected pulmonary embolism" - In Table 3 we see a list of variables significantly associated with a severe outcome: age, sex, oxygen supplementation, diabetes, coronary atery disease, hypertension, coronary calcification score, emphysema and lung disease extent - In Figure 5, we see that a logistic regression get 69% AUC to predict severity based on some of these variables. I find it surprinsingly low compared to the litterature and the performances obtained on the leaderboard (~80%).

Re: Useless hack of the leaderboard  

  By: LuukBoulogne on Jan. 31, 2022, 4:29 p.m.

Thank you for thinking along with the organization of the challenge. We agree and think that having the patient label distribution for the leaderboard as public information is not an issue for this challenge, as we are using the AUC metric and algorithms are evaluated on each patient independently. As you have probably also seen, the label distribution could also largely be deduced from the information in the info tab on this web page and the Radiology publication about the STOIC study.

The maximum achievable performance is indeed not known yet. We hope to get a good idea of what is possible with the current state-of-the-art through this challenge, especially for the detection of severe COVID19. For RT-PCR result prediction, the ~80% AUC is indeed very similar to the AUC that the readers in the STOIC study obtained on the full dataset of 10K cases. For severe COVID19 prediction, the results in the Radiology publication were computed for the patients that were positive for both RT-PCR and CT. This might be a more difficult subset of the full cohort than the test set for this challenge.

Re: Useless hack of the leaderboard  

  By: simon.j on Jan. 31, 2022, 6:25 p.m.

Thank you for your reply ! Using a 10 folds cross validation on the 2000 patients I get an average AUC of 81% to predict severity. Restricted to the 1205 patients with COVID, it goes down to 79% AUC. It is still far above the 69% mentionned in the paper 🤔

Re: Useless hack of the leaderboard  

  By: LuukBoulogne on Feb. 1, 2022, 1:10 p.m.

That's indeed interesting. The latter way of computing AUC is how the AUC for severity is computed for this challenge. This has now been added clearly to the challenge overview. Severity classification may or may not be more difficult for the subset of PCR positive patients that were also judged positive at CT by the readers in the STOIC study, or for the random subset of PCR postive patients in the test set.

Re: Useless hack of the leaderboard  

  By: simon.j on Feb. 1, 2022, 4:26 p.m.

Thank you for the precision, it makes quite a difference regarding the training set we should consider for each task :)

Re: Useless hack of the leaderboard  

  By: simon.j on April 13, 2022, 4:23 p.m.

Useless update : based on the first results on the Qualification (last submission) Leaderboard, there are 469 covid patients and 118 severe cases among the n=800 patients.

An upper bound of the ROC-AUC variance is AUC * (1 - AUC) / min(n-positives, n-negatives) (source). For a ROC-AUC of 80%, we get σ-max =3.7% for the severity prediction task, to be compared with 7.3% for the n=200 patients of the first leaderboard.