Automated evaluation for Type 2 challenges

💡Prerequisite: Please read more about how to set up automated evaluation for Type 1 challenges first.

Evaluation containers for Type 2 challenges are similar to that of Type 1 challenges, but there is one important difference. Since Grand Challenge automatically runs submitted algorithms on a private test set for Type 2 challenges, it assigns unique filenames to the outputs of algorithm inference jobs. Grand challenge produces a JSON file that tells you how to map the random filenames for the outputs with the original filenames from the input and where to read the output files from. You as a challenge organizer must read /input/predictions.json to map the prediction filenames with the input filenames. This is necessary to evaluate the predictions correctly. Here's an example of how that can be done. In this example, we defined a load_predictions_json function which loads the JSON, loops through the inputs and outputs, and then finds the exact filenames for the outputs.

import json
from pathlib import Path

def load_predictions_json(fname: Path):

    cases = {}

    with open(fname, "r") as f:
        entries = json.load(f)

    if isinstance(entries, float):
        raise TypeError(f"entries of type float for file: {fname}")

    for e in entries:
        # Find case name through input file name
        inputs = e["inputs"]
        name = None
        for input in inputs:
            if input["interface"]["slug"] == "generic-medical-image":
                name = str(input["image"]["name"])
                break  # expecting only a single input
        if name is None:
            raise ValueError(f"No filename found for entry: {e}")

        entry = {"name": name}

        # Find output value for this case
        outputs = e["outputs"]

        for output in outputs:
            if output["interface"]["slug"] == "generic-medical-image":
                pk = output["image"]["pk"]
                if ".mha" not in pk:
                    pk += ".mha"
                cases[pk] = name

    return cases



We then use the mapping_dict to map the outputs with the actual filenames when computing the metrics in the evaluation script. This is done by updating self._predictions_cases["ground_truth_path"] with the contents of mapping_dict.

self.mapping_dict = load_predictions_json(Path("/input/predictions.json"))
self._predictions_cases["ground_truth_path"] = [
    self._ground_truth_path / self.mapping_dict[Path(path).name]
    for path in self._predictions_cases.path
]


Next to the predictions.json file, the evaluation container also has access to the algorithm's outputs. You may want to read the output files, for example if you have large json files or heatmaps or segmentation outputs. The outputs of the algorithm containers are provided to the evaluation container at the following path:

/input/"job_pk"/output/"interface_relative_path"

where:

  • job_pk corresponds to the 'pk' of each algorithm job, i.e. the top-level "pk" entry for each json object in the predictions.json file
  • interface_relative_path corresponds to the relative_path for each of the outputs of a job (for the relative path of the first output of the first algorithm job: [0]["outputs"][0]["interface"]["relative_path"], if your algorithm produces more than one output, you need to loop over the outputs to get their relative paths respectively).