Only 1000 labels are available? Not 1295?

Only 1000 labels are available? Not 1295? ¶

By: melhzy on June 1, 2022, 9:15 p.m.

According to your annotation data description, we should get 1295 annodations for both csPCA (original) and csPCa (resamples). However, I find there are 1000 downloadable items in both original and resampled folders. There are 295 entries removed from your delineation folder. Where we could download the rest of the 295 labels?

Another questions is that we got both the "original" and "resampled" labels. Which one should we use?

Last edited by: melhzy on Aug. 15, 2023, 12:56 p.m., edited 3 times in total.

Re: Only 1000 labels are available? Not 1295? ¶

By: anindo on June 1, 2022, 10:03 p.m.

Hi Z. Huang,

Thanks for your query.

Regarding the "missing" 295 annotations:

As shown below, all 1295 annotations are available, just that GitHub has a limit on the number of items it can display on its website. By cloning or downloading the full repository, you should be able to access all files (including all 1295 csPCa annotations).

Regarding the "original" and "resampled" labels:

All axial bpMRI sequences (T2W, DWI/HBV, ADC) per case, were used to localize and annotate csPCa lesions. However, depending on the annotator/center and their preference, some annotations have been mapped or created at the spatial resolution of the T2W image, while others have been created at the resolution of the ADC or DWI/HBV images. These original annotations are available in: picai_labels/blob/main/csPCa_lesion_delineations/human_expert/original
For a given case, we expect your AI model to predict a csPCa detection map with the same spatial dimensions and resolution as the T2W image. Hence, we have also converted and provided all original annotations at the same dimensions and spatial resolution as their corresponding T2W images, here: picai_labels/blob/main/csPCa_lesion_delineations/human_expert/resampled
You can choose to directly use the "resampled" annotations, or preprocess and incorporate the "original" ones, depending on your overall preprocessing and training strategy. Next week, we plan to release picai_baseline: a GitHub repo of baseline AI models that you can use to kickstart your development cycle. Its goal is to help developers get familiar with the end-to-end pipeline of preprocessing prostate bpMRI data, training an AI model for csPCa detection/diagnosis in 3D, and encapsulating the trained AI model in a Docker container for submission to the leaderboard. So you can also refer to those models and their source code, to inform your strategy on which data to use and how to use it.

Hope this helps.

Last edited by: anindo on Aug. 15, 2023, 12:56 p.m., edited 6 times in total.

Re: Only 1000 labels are available? Not 1295? ¶

By: JMitura on June 17, 2022, 4:54 p.m.

Additional question It seems that most of the label images (from those 1295 available ) are empty (all zeros), am I correct?

Socondly when time will come to evaluation of the algorithm apart from image PSA level and age will be also available for the algorithm ?

Re: Only 1000 labels are available? Not 1295? ¶

By: anindo on June 17, 2022, 5:05 p.m.

Out of the 1500 cases shared in the Public Training and Development Dataset, 1075 cases have benign tissue or indolent PCa (i.e. their labels should be empty or full of 0s) and 425 cases have csPCa (i.e. their labels should have lesion blobs of value 2, 3, 4 or 5). Out of these 425 positive cases, only 220 cases carry an annotation derived by a human expert. Remaining 205 positive cases have not been annotated. In other words, only 17% (220/1295) of the annotations provided in picai_labels/csPCa_lesion_delineations/human_expert should have csPCa lesion annotations, while the remaining 83% (1075/1295) of annotations should be empty.

For more details, please check out the following page where this has been documented more extensively: https://pi-cai.grand-challenge.org/DATA/

Indeed. During evaluation, PSA (if reported during clinical routine), PSA density (if reported during clinical routine), prostate volume (if reported during clinical routine), patient age (always), MRI scanner manufacturer (always), MRI scanner model name (always) and diffusion b-value of the high b-value DWI/HBV scan (always), will be available to every AI algorithm per validation/testing case.

For the Public Training and Development Dataset, these clinical variables can be found in the marksheet.

Hope this helps.

Last edited by: anindo on Aug. 15, 2023, 12:56 p.m., edited 3 times in total.

Re: Only 1000 labels are available? Not 1295? ¶

By: JMitura on June 18, 2022, 7:43 a.m.

Thank You !