CoNIC Challenge: Pushing state-of-the-art for automatic nuclear recognition
Published 17 Nov. 2022
We recently concluded the CoNIC Challenge and presented the results at the International Symposium for Biomedical Imaging (ISBI). Here, we gave an overview of the challenge, with key insights into the best-performing solutions. In this blog post, we will provide a summary of the challenge and present some of our main findings. We will also give a summary of the Grand Challenge submission format, which required participants to submit algorithms - keeping the test set completely unseen.
Automatic Nuclear Recognition in Histology Images
Digitised images of histology slides are routinely analysed in clinical practice, where they can contain tens of thousands of various nuclei. Assessment of these nuclei is needed to provide an accurate diagnosis, but manual quantification is not possible due to the huge amount that exist in a single slide. Instead, automatic recognition of nuclei using machine learning enables the extraction of extensive cell-based statistics across the entire tissue sample. These statistics, along with other features, can then be used in downstream models for computational pathology (CPath), such as predicting patient outcome. To help drive forward research and innovation for automatic nuclei recognition in CPath, we organised the Colon Nuclei Identification and Counting (CoNIC) Challenge. The challenge encouraged researchers to develop algorithms for segmentation, classification and counting of nuclei within the current largest known publicly available nuclei-level dataset in CPath, containing around half a million labelled nuclei.
Tissue samples are analysed routinely each day by pathologists. Here, the tissue is prepared and cut into thin sections before being placed on a glass slide and stained with specific dyes. For example, Haematoxylin and Eosin stains are widely used, which turn the connective tissue pink and the DNA dark blue/purple. The DNA is found within the cell nuclei, which therefore enables visual assessment of nuclear shape and texture. This assessment is traditionally done using the microscope, but with the advent of digital slide scanners pathologists can now do this on the computer monitor. The interaction between various types of nuclei, such as epithelial cells, different inflammatory cells and fibroblasts are important indicators of the state of a disease. For example, Cancer-Associated Fibroblasts (CAFs) and Tumour-Infiltrating Lymphocytes (TILs) are two particularly well-studied biomarkers. As a result of this, we consider the following types of nuclei in the challenge:
- Epithelial cells
- Plasma cells
- Connective tissue cells
In the figure below we show some illustrations and cropped regions from the dataset showing the difference in appearance between the various considered inflammatory cells.
We believe that algorithms trained to detect the above types of nuclei can help uncover additional important cell-based biomarkers for assessment of prognosis.
To encourage the development of models to automatically detect the above nuclei, we designed two tasks for CoNIC:
- Segmentation and classification
- Cellular composition
The first task required teams to automatically localise the boundaries of each nuclear type in the input image, whereas the second task required teams to predict the counts of the different nuclei. Task two was independent to task one and therefore participants did not necessarily need to use the output of task one to perform counting. Take a look at the challenge overview below. As you can see, the segmentation and classification task was ranked according to the multi-class panoptic quality and the cellular composition task was ranked by the multi-class coefficient of determination. For both of these metrics, we calculate the statistics for each class and then take the average. This way, the metrics aren't biased towards a particular class.
You can also see from the figure above where our registered participants were from. At the time of the competition, we had 481 registered participants. Many of these were from China, India, USA, Canada and the UK.
Our overall challenge dataset includes 535 thousand different nuclei. For this, we used data from the Lizard Dataset along with around 40 thousand additional labelled nuclei from an internal colon biopsy dataset. We released 4,981 256x256 patches that had full segmentation and classification ground truth, along with the counts of each type of nucleus. Here, we performed counting within the central 224x224 region so that we did not consider spurious pixels for counting. We also allowed participants to extract their own patches from the original Lizard Dataset if they desired. For example, participants may have wanted to use a higher degree of overlap during patch extraction. Below, you can see an example of the ground truth that we provided for segmentation and classification.
Here, you can see that the instance segmentation map labels nuclei uniquely from 1 to N and the class map labels nuclei from 1 to C. Here, N is the number of nuclei and C is the number of classes.
Challenge Structure and Results
Unlike previous challenges for nuclear recognition in CPath, we used an algorithm-based competition, enabled by the Grand Challenge platform. This meant that participants were required to submit their code rather than results. Doing this allowed us to keep images completely hidden from participants, helping to ensure a reliable evaluation. Like other challenges using the platform, we conducted a preliminary test phase before the final submission. Here, participants followed our step-by-step instructions on how to submit to the challenge and see their corresponding results appear on the public leaderboards. You can see from the plots below that over the course of the preliminary phases, results generally improved, potentially due to the competitive nature among teams.
Upon conclusion of the preliminary phase, participants were confident in how to submit their models and were ready to make their final submissions. Below you can see a snapshot of the top 10 results on each task.
Workshop and Submission Summary
During the challenge workshop we were joined by the following individuals, who presented a summary their winning solutions:
Martin Weigert (EPFL | StarDist)
Wenhua Zhang (Pathology AI)
Josef Lorenz Rumberger (MDC Berlin | IFP Bern)
- Sen Yang (AI_medical)
We found that most participants preferred using an encoder-decoder network architecture, as opposed to region-proposal models. Also, we found that top submissions used a strategy to deal with the class imbalance present in the dataset. For this, participants either used a weighted loss function, patch over-sampling or a combination of them both. We also saw some interesting techniques, such as copy-and-paste augmentation, to help incorporate minority classes into the training set more often. Most submissions used a strong instance segmentation target, such as radial distance maps (used in StarDist) or horizontal and vertical distance maps (used in HoVer-Net). Only one team predicted the counts directly from the input image, but didn't perform as well as other approaches that used the segmentation output to infer the counts. However, the advantage of predicting the counts directly is that it requires less effort in providing ground truth for model training. This strategy was used recently in the LYSTO challenge. You can find a full summary of all methods by checking out the method description papers here.
Now that the challenge has finished, we are still allowing submissions. We have created post-challenge leaderboards for each of the tasks, where you can see how you fair against the results obtained during the competition. Take a look at the final instructional video and have a go at pushing forward state-of-the-art for automated nuclear recognition!