Challenge setup

In a modern challenge on, both the test data and the test labels are hidden. Participants submit an algorithm as a solution to the challenge. This algorithm is then run on the hidden test set (which must be uploaded as an archive by the challenge admins) on the Grand Challenge platform. The results that the algorithm produces are subsequently evaluated using a custom evaluation method provided by the challenge admins. The evaluation produces a set of metrics, which are subsequently displayed on the leaderboard and used to rank submissions on specific criteria. See below for details on the underlying compute infrastructure.

In the simplest, standard case, a challenge has one task and is carried out in two phases. The first phase is usually a preliminary phase where participants familiarize themselves with the algorithm submission system and test their algorithms on a small subset of images. From experience, we know that it takes participants a few attempts to get their algorithm containers right, so it is important and strongly recommended to have such a preliminary sanity-check phase. The second phase is the final test phase, often with a single submission policy, which evaluates the submitted algorithms on a larger test set. You could also think of the two phases as a qualification and a final phase, where you use the qualification phase to select participants for the second, final test phase, as was done by STOIC.

To set up a challenge after your request has been accepted, you as a challenge organizer need to take the steps summarized in the flowchart below. For more information on each of these steps, check out our documentation for finding a training data hosting platform and handling your secret test data, as well as for building an evaluation method. For more general help on organizing a challenge, take a look at our tips for organizing a challenge page.

Organizing a challenge

đź’ˇEnabling algorithm submission to a phase currently needs to be requested from

Please provide the following information in your email:

  • Link to the phase for which you would like to enable algorithm submissions.
  • Link to the archive you would like to link to that phase so that it is used as the test set.
  • What inputs the algorithms submitted to the phase should take.
  • What outputs the algorithms submitted to the phase must produce.

Algorithm specifications

The algorithms your participants submit will need to all read in the same inputs and produce the same outputs for the evaluation to work. Grand Challenge uses so-called input and output interfaces to formally define what an algorithm's inputs and outputs should be. Let us know which inputs and outputs you require. If they are not yet listed here, we will add them for you. Once this is set up, also make sure to communicate this clearly to your participants. From experience, we have seen that providing an example algorithm (and docker container) works best.

Some good examples of these are MIDOG, Airogs, CONIC, Tiger and Node21. MIDOG and Tiger have pretty good example videos so you can see how it would work from a participant's perspective.

Evaluation method

You will need to upload your own evaluation container that reads a predictions.json file and the outputs from the algorithm, evaluates it appropriately and returns a metrics.json file.


If you host a challenge on our platform, all algorithm and evaluation containers will be run on our AWS infrastructure where storage and compute scale elastically on demand. The algorithm that participants submit to your challenge are then run on each image in the archive that you linked to the respective phase. We use a g4dn.2xlarge instance (Nvidia T4, 16GB GPU memory, 8 CPU, 30GB CPU memory) for running the container images. The participants do not get access to the internet or the logs to prevent exfiltration of the test set. You as a challenge admin get access to the results and logs of each algorithm so you can help your participants if their submissions fail.

đź’ˇPlease read the tips for running a challenge.