Frequently Asked Questions¶
Setting up an algorithm submission challenge¶
- mha and .tiff for images
- .json for annotations, metadata, as well as number, string and boolean data types
- .obj
- .mp4
⚠️ Please note the restriction on supported image formats. Other image formats, such as nifti files, are fine for uploading data to Grand Challenge, but they will be converted to .mha format (or .tiff) by our backend. For the algorithms submitted to a challenge this means that they will need to read image inputs in either .mha or .tiff format and will need to output images in these file types as well (if the task requires image outputs). Image outputs in any other format will be rejected and the corresponding algorithm jobs will be marked as failed.
Although we support the upload of other formats, we generally recommend converting the files yourself to .mha (or .tiff respectively) to ensure correct conversion.
Please note that mha and nifiti are equivalent formats and can both be read using SimpleITK.ReadImage(). Check out the SimpleITK library for more information.
⚠️ Exception: For the evaluation container, the ground truth can be in any format you like since that data is internal to your container and not stored or processed in any way by Grand Challenge. The algorithm outputs that your evaluation container will read will again be in mha format though since those do get validated and stored by our platform upon creation.
We mount a file system with the secret test data for your container at /input/. How the test data is structured is defined by the algorithm interface(s) configured for your phase.
Consider an algorithm that requires a ct-image
as input and outputs a covid-19-lesion-segmentation
as well as a covid-19-probability-score
. The mounted file system would look as follows (the output directory will initially be empty, of course, but the algorithm will need to follow this structure when saving outputs):
|── input | |── images | | |── ct | | | |── randomly-generated-uuid.mha |── output | |── images | | |── covid-19-lesion-segementation | | | |── randomly-generated-uuid.mha | |── probability-covid-19.json
Each of the subfolders in the input directory contains exactly 1 file since we start one algorithm job per archive item (== case / patient). The algorithm in this case thus reads a single image from /input/images/ct/
and writes the output to /output/images/covid-19-lesion-segmentation/
and /output/probability-covid-19.json
.
Behind the scenes, the structure of these directories is determined by the sockets defined for each of the configured algorithm interface(s) for your phase. You can read more about algorithm interfaces here. The above described algorithm has 1 interface with the specified input and output sockets. Each socket defines the relative path to its value (i.e. the image or file), and that determines the directory structure shown above. You can take a look at the list of available sockets and their relative paths here. The support team will help you choose the right sockets for your use case and configure algorithm interfaces for your challenge phase accordingly. Algorithms created for your challenge phase inherit the interface and thus their sockets.
A final note on uuids: the file name of the input file and the file name of the output file are random but are irrelevant because each input and output is uniquely identified by its socket name and relative path.
An algorithm gets one (set of) input(s) (i.e. one archive item) at a time. This could be just one image at a time or a combination of an image with a segmentation mask or some metadata or a secondary image. Consider a challenge set-up where the algorithms need access to a patient's ct-image
as well as a lung-volume
measure. Each archive item in your archive would consist of two files: one for the ct scan and one containing the lung volume measure.
The algorithms that run on the archive data will need to implement 1 interface with input and output sockets for those two types of data. Check here for a list of available sockets and contact support if you need a new socket. Each file in an archive item needs to be linked to one specific socket. You cannot have the same socket twice in one archive item; they need to be unique on the archive-item level.
In the above case, we would make use of the existing sockets 'ct-image' and 'lung-volume' and upload the cases through the API.
If you have optional inputs, for example you don't have a lung volume for each CT image, it is possible to have archive items with and archive items without the lung volume measure. Algorithms that run on the data in the archive will then need to have 2 algorithm interfaces configured: one interface with just the CT image as input socket, and 1 interface with both the CT image and the lung volume sockets as inputs.
Rest assured that the support team will help you choose the appropriate sockets and will configure the necessary interfaces for you. Once the interfaces are configured for you challenge phase, algorithms created through your challenge page will automatically inherit the correct and necessary algorithm interfaces.
/input/socket-slug-1/
and input/socket-slug-2/
etc, where each folder contains exactly 1 file (see earlier question). The exact paths depend on the interfaces (and particularly their sockets) configured for your challenge phase. In a typical challenge this means that an algorithm only ever gets one patient's data at a time (e.g., patient A's CT image and the corresponding mask in the first run, patient B's CT image and mask in the second run etc.).
No. GC generates random UUIDS for each image you upload to our website and will use those as the file name. Your algorithm container should hence not expect to be reading your-original-file-name.mha
. Likewise, the outputs your algorithm writes will be checked, validated and subsequently stored with their own unique UUID (which is again different from the file name you give the output when saving it).
Since an algorithm only ever gets one image set (i.e., one patient's data) at a time, the algorithm doesn't need to worry about the original file names or about matching, for example, an input ct image with it's corresponding metadata or segmentation file – there will only be one ct file and one metadata or segmentation file available to the algorithm at any given time. The algorithm simply reads the inputs from their specified file paths (each path being a folder with exactly 1 file, the exact path is defined by the socket, see earlier question) and writes the outputs to designated output folders (again different paths for each output, defined by the output sockets you defined for your algorithm interface(s) for your phase).
For the evaluation container you will get all outputs in one go. Now you do need to match the inputs to their respective outputs. To do that your evaluation container also gets a predictions.json file that contains the information for the matching. For each algorithm job that was part of the submission, the predictions.json file lists the inputs and the outputs with their specific file names. This is explained in more detail here
On the leaderboard page of your challenge you can find the following button. Clicking this will download the evaluations or download the metrics as a CSV file. In the output column, you will find the content of the metrics.json file.

An evaluation is run once for a given submission, after all the algorithm jobs for that submission have finished successfully. The outputs produced by the algorithm are then provided to the evaluation container at the following path: /input/"job_pk"/output/"socket_relative_path"
. To match the algorithm output filenames with the original algorithm input filenames, we provide a json file at /input/predictions.json
. This json file lists the inputs and outputs for each algorithm job along with that job's primary key. The above path can then be constructed by replacing:
job_pk
with the primary key (pk) of each algorithm job, i.e., the top-level "pk" entry for each JSON object in the predictions.json filesocket_relative_path
with the relative_path for each of the output sockets of a job. The relative path's for each socket can be found here. If the algorithms output a ct-image and a nodule locations json file, you would read the corresponding files for the first algorithm job from /input//output/images/ct/ and /input//output/nodule-locations.json
Challenge costs¶
There are numerous ways for you to control your compute costs. Measures you can take include:
- Limiting the number of participants. If you enable "manual participant review", you decide who gets to submit solutions to your challenge, and if you have reached a certain number, you can stop accepting people to your challenge.
- Limiting the number of submissions that participants can make during a specified submission window.
- Putting a reasonable upper limit on algorithm job run times. We enforce this limit on the single job level, i.e., for the processing of a single set of inputs. Regardless of the costs, limiting algorithm run times is desirable since truly clinically useful algorithms will benefit from being fast, so forcing your participants to develop efficient solutions is a good thing to do.
- If your test data set consists of a large number of very small images, you might be better off batching your inputs. The reason for this is that GC starts one algorithm job per input image (i.e., archive item), so the more images you have, the more jobs need to be started which increases costs. The downside to this approach is that the resulting algorithms will not be directly useful for clinicians, who will usually want to process a single (unbatched) image input. The integrated web viewer on Grand Challenge is also not equipped to read and display batched images, and hence algorithm result viewing will not be possible with such a design.