Dear participants,

First of all, thanks again for your interest in the TIGER challenge and for being actively participating and interacting with us while developing your algorithms!

We also apologize for having temporarily closed submissions to leaderboard 1, which we have now just reopened, as well as for leaderboard 2. The reason is that in the past weeks we have received an increasing number of submissions, which have also allowed us to come up with general trends on how submitted algorithms perform, common issues, and also to test the evaluation scripts that we had prepared under several (sometimes unexpected) conditions. Therefore, we decided to take a few days of time to make a “bulk update” across several aspects of the challenge, which we think are improving it. The changes are listed here:

  1. We have updated the way the FROC score is computed. As announced from the start of the challenge, the exact values of the operating points considered to compute the score might have been fine-tuned during TIGER to best evaluate the performance of submitted methods. Based on the initial results, we observed that some operating points were rarely reached (e.g., 5 FP/mm^2, 500 FPs/mm^2) and therefore either redundant or penalizing the FROC score. Therefore, we decided to adjust those points to the observations made based on submissions that we have received until now. As a result, the FROC scores on leaderboard 1 have changed slightly, in our opinion better evaluating the actual performance of the submitted methods now. Please check the Evaluation section for more details.

  2. Talking about FROC analysis, we would like to encourage participants to submit predictions of detected TILs with a certain likelihood score that varies per prediction. We have received some submissions where all TILs got a score equal to 1.0 (or in general a fixed single value for all predictions), which is in principle a valid submission, but that makes the whole idea of FROC analysis a bit meaningless and can also penalize the method during evaluation. Based on this experience, we have improved the FROC evaluation to be more robust to these types of cases, but still, we strongly recommend participants submitting actual scores for each detection.

  3. We have updated the way we compute the Dice score, mostly by making it more robust to the presence of predictions with labels equal to zero. This is not a label that was initially expected, as class labels are from 1 to 7, but we noticed that some submitted algorithms also produce zeros, so we adapted the script to handle these cases as well. However, we encourage participants to check that their algorithms do not produce any zeros, as this is not a label that is foreseen in the challenge and might lead to some ambiguity in the classes predicted.

  4. We have released evaluation scripts, for both L1 and L2, which participants can use to do local tests. They can be found here.

  5. We have detected that several methods produce wrong segmentation output, probably due to problems with stitching during inference; some examples from the test set are these: example 1, example 2. We strongly recommend participants to test their algorithms on example slides from the training set, using both slides from JB, RUMC and TCGA, before submitting their methods, and contacting us should they encounter any problems, like this type of artifacts, so that we can work together on fixing issues.

  6. We have updated the FAQs section to address some questions that we recently received, as for example: What happens after the end of the TIGER challenge?, Can we use the TIGER training data for other purposes outside of the challenge, including publications?, Can participants submit to only L1 or only L2? , etc. We will keep updating this section in the future, so we encourage participants to look for answers there, and if not present, ask via the forum.

  7. We have **increased the time to process one slide in L2, now it is 2 hours per slide **

  8. We have added the weights to the models in the baseline algorithm, so people can now re-build it and apply it, and also use it as a base for their development. You will find the needed documentation in the GitHub repository of the baseline algorithm at this link.

  9. We have discussed that, because of the delays due to several technical issues encountered until now, we will likely extend the date of the end of TIGER, stay tuned!

As a final comment, we are very happy with the way the interaction with participants is going, and we would like to encourage participants to keep doing this, reach out to us with any question or issue, and in particular, test algorithms before submission, look at the visual results on the web viewer, or download the results and inspect them locally, and discuss with us any source of artifacts, such as the ones we mention in point 5 of this announcement.

Regards, Mart, Witali, Francesco, on behalf of the TIGER organizing team