Automated evaluation for Type 2 challenges

💡Prerequisite: Please read more about how to set up automated evaluation for Type 1 challenges first.

Evaluation containers for Type 2 challenges are similar to that of Type 1 challenges. There is a subtle but important difference. Since grand challenge automatically runs submitted algorithms on a private test set for Type 2 challenges, it assigns unique filenames to the outputs of algorithm inference jobs. This can cause a problem if you build your evaluation container like in a Type 1 challenge. The filenames will not be sorted properly and you will probably get very low scores in your evaluation metrics.

Grand challenge, however, produces a JSON file that tells you how to map the random filenames for the outputs with the original filenames from the input. You as a challenge organizer must read /input/predictions.json to map the prediction filenames with the input filenames. This is necessary to evaluate the predictions correctly. Here's an example of how that can be done. In this example, we defined a load_predictions_json function which loads the JSON, loops through the inputs and outputs, and then finds the exact filenames for the outputs.

import json
from pathlib import Path

def load_predictions_json(fname: Path):

    cases = {}

    with open(fname, "r") as f:
        entries = json.load(f)

    if isinstance(entries, float):
        raise TypeError(f"entries of type float for file: {fname}")

    for e in entries:
        # Find case name through input file name
        inputs = e["inputs"]
        name = None
        for input in inputs:
            if input["interface"]["slug"] == "generic-medical-image":
                name = str(input["image"]["name"])
                break  # expecting only a single input
        if name is None:
            raise ValueError(f"No filename found for entry: {e}")

        entry = {"name": name}

        # Find output value for this case
        outputs = e["outputs"]

        for output in outputs:
            if output["interface"]["slug"] == "generic-medical-image":
                pk = output["image"]["pk"]
                if ".mha" not in pk:
                    pk += ".mha"
                cases[pk] = name

    return cases

We then use the mapping_dict to map the outputs with the actual filenames when computing the metrics in the evaluation script. This is done by updating self._predictions_cases["ground_truth_path"] with the contents of mapping_dict.

self.mapping_dict = load_predictions_json(Path("/input/predictions.json"))
self._predictions_cases["ground_truth_path"] = [
    self._ground_truth_path / self.mapping_dict[Path(path).name]
    for path in self._predictions_cases.path