Can you predict who will develop severe COVID-19 from a chest CT scan?

Published 16 Dec. 2021

Last week, we opened STOIC2021: A COVID-19 AI challenge with 10,000 CT scans. In total, $20,000 in AWS Credits will be awarded to the winning teams. Together with its participants, we aim to find the best solution for predicting who will develop severe COVID-19 from a CT scan and make these solutions easily accessible for everyone.

The STOIC project

Today, many COVID-19 datasets have been collected and many AI algorithms have been trained to detect COVID-19 infections from CT. The STOIC project, led by Assistance Publique - Hôpitaux de Paris, set itself apart by collecting one of the largest COVID-19 datasets to date, comprising CT scans of more than 10,000 individuals. What sets this study apart even further, is that it also recorded which patients had been intubated or deceased at a one-month follow-up. A good indication of which patients are at a high risk for severe COVID-19 disease could potentially help healthcare professionals in making treatment decisions. It is not easy for human experts to predict who will develop severe COVID-19, so AI may play an assisting role. Therefore we are organizing STOIC2021.


In this challenge we are trying a new approach. Traditionally, challenge participants can download the test set (without labels), run their algorithm on the test data, and submit their answers. Modern challenges ask participants to submit functional algorithms instead. Not only does this make challenges more fair and less prone to overfitting on the test set, it also makes sure that the algorithm results are reproducible. And most importantly, publishing algorithms in a ready-to-use form makes them available for further research.

For STOIC2021, we want to take the reusability a step further and provide the winning solution not only in a ready-to-evaluate form, but in a ready-to-train form as well. Challenge participants selected for the final phase (details below) will be asked to submit training code instead of an already functional algorithm. This code will be used to generate an algorithm in a secure way on the platform. The code of the winning solutions will also be made publicly available after the challenge. As many AI researchers can attest to, having code published on GitHub doesn't guarantee that it’s easy to get running. Training the submitted solutions on the platform guarantees that the training code of the winning solutions can be deployed. And it ensures that the training code can be reused for solving other research problems as well.

Training on private data

Medical images are sensitive data and it's usually not easy to make large sets of them publicly available. STOIC2021 would not have been possible at its full scale if we didn't keep a large portion of the data private. Generating the algorithms based on submitted training code also makes training on private data possible.

No development in the dark

Of course, not having access to any training data would severely impede the development process. Understanding the data we work with well is key to coming up with good AI solutions. Because of this, a large chunk of the data, namely 2,000 CT scans, can be downloaded at the Registry of Open Data on AWS. These data serve as the development set for STOIC2021.

The structure of STOIC2021

The challenge consists of two phases. The first phase is a qualification round, where you get to use 2,000 public CT scans to develop your algorithm. During this phase, you can submit versions of your trained algorithm and receive feedback through a public leaderboard, which shows performance for classifying COVID-19 presence and, more importantly, severity. To avoid overfitting on the test data, the feedback you get is based only on a part of the test set. After development, you can submit one final version of your algorithm, which will be evaluated on the complete test set of 1,000 CT scans. We will invite the creators of the best performing algorithms to the final round.

In the second phase, the final round, you will submit the code for training your algorithm, which is run by the challenge organizers in a secure environment using the full training set of 9,000+ CT scans. If your training code results in one of the best performing algorithms on the test set, you win STOIC2021.


The winning algorithms will continue to live on, publicly available to be used around the world. The code for training and inference of the winning algorithms will also be made publicly available. Furthermore, the qualification round will be re-opened after the challenge for future research. We plan to publish the findings of STOIC2021 in a peer reviewed article, written in collaboration with the best performing teams of the challenge.

How to join

Interested to participate in STOIC2021? Or eager to learn more?