Create Evaluation Method¶
Every challenge requires a unique and fair way to evaluate incoming submissions. In many cases, these evaluation scripts depend on specific libraries or computational environments that are difficult to replicate directly on our host servers.
To ensure consistency, reproducibility, and flexibility, we require all challenge organizers to provide a Docker container image that packages their evaluation code and its dependencies.
This container will be executed on our infrastructure to automatically evaluate participant submissions for each phase of your challenge.
📦 By containerizing your evaluation method, you gain full control over the runtime environment, including software versions, libraries, and custom tools — ensuring that your metrics are computed exactly as intended.
Deadline: 6 weeks post-challenge acceptance
Build your evaluation container¶
To make the process easier, you can download the Challenge phase starter kit with an example evaluation. The starter kit contains customized examples for your challenge phases. When your challenge gets accepted, and your challenge phases have been set up by the support team: it is available for download. Use the starter-kit example as a starting point for creating your custom evaluation container.
Upload your evaluation container¶
After building, testing, and exporting the Docker container using the scripts provided in the Challenge starter-kit (see above), you should have .tar.gz
file containing the evaluation Docker image. You can upload this to your challenge by navigating to Admin ⟶ [Phase name] ⟶ Evaluation Methods and clicking on Add a new method:
Subsequently, Grand Challenge will verify your new evaluation Docker image. Once this is done and succeeded, you will see Active Method for this Phase to indicate that this is the active evaluation method for this phase now.
Provide Ground Truth¶
You can include the ground truth in your evaluation container. For challenge phases with large ground truths, you can choose to upload the ground truth separately from the container. This allows you to update them independently, which can save time and reduce data transfer.
You can upload and manage your ground truth in the phase settings:
The ground truth will then be extracted to /opt/ml/input/data/ground_truth/
.
Configure Evaluation Settings¶
To configure the leaderboard and submission details, navigate to Admin ⟶ [name of phase] ⟶ Settings:
Under Settings and Phase, the title of the selected phase can be set. This defines the leaderboard name presented to the participants:
Configure Submission Settings¶
Under the Submission tab, the submission mechanism for the selected phase can be configured. You can:
- Indicate opening and closing dates for submissions
- Allow submissions from only verified participants
- Provide instructions to the participants on how to make submissions
- Limit the number of submissions and the time period within which a participant can make submissions
- Request that participants provide supplementary files when they make submissions (like an ArXiv link)
More instructions on how to configure this mechanism are available under Admin ⟶ [name of phase] ⟶ Settings ⟶ Submission