Use of pre-trained models

Use of pre-trained models  

  By: a.martel on Jan. 21, 2022, 3:40 p.m.

In the rules it states: "Models pre-trained on ImageNet or other natural images datasets are allowed. Models pre-trained on digital pathology data are not allowed, even if pre-trained using publicly available data. " I can understand excluding models pretrained on public data - that requires a lot of compute time and disk space so would give large groups a significant advantage. Unoftunately it also excludes pre-trained models for digital pathology data that are publicly available. I would be interested to know why these are excluded - imagenet models are not ideal and if a pretraiend model is publicly available then it does not unfairly advantage anyone. Examples are the pretrained hovernet model or our SimCLR model - both of which are likely to help with training on a limited dataset.

Re: Use of pre-trained models  

  By: f.ciompi on Jan. 25, 2022, 6:06 a.m.

Sorry for the late reply! We decided to introduce this rule because the test set(s) are partly built using publicly available data. Therefore, models pretrained on (potentially) the same test slides might be positively biased having learned a representation (even if unsupervised) of the test data.

One option that we discussed would be to allow people to pre-train models on private data, but under the condition that such data should be made publicly available. This option might work but if people would release the data close to the end of the challenge, or even after the challenge, then this would not be fair for the other participants, and in favour of teams with lots of private data available in the house, in the end we decided not to introduce this rule.

Note that the TIGER training set contains 371 whole-slide images, with thousands of densely annotated regions of interest, but also with several unannotated regions, which could be used to (re)train self-supervision approaches.