Imaging data missing

Imaging data missing  

  By: sakina on May 17, 2022, 4:49 p.m.

Hi,

On downloading the five folds for the imaging data, I observed that the label is available for "10403", but the imaging data for that is not there in any fold. Thus, only 1499 imaging cases are uploaded. Am I missing something or this is an error?

My other questions are:

  1. There are 23 imaging data folders with more than one T2 and other data. I believe each of them also counts towards the 1500 MRI dataset. Are they in the same folder to indicate they are from the same patient? Why are those specific cases alone grouped like that?

  2. Only 1295 labels are available. Will that be the total truth provided? Or will the rest be uploaded later?

Re: Imaging data missing  

  By: anindo on May 17, 2022, 9:13 p.m.

Hey. Thanks for your queries.

> On downloading the five folds for the imaging data, I observed that the label is available for "10403", but the imaging data for that is not there in any fold. Thus, only 1499 imaging cases are uploaded. Am I missing something or this is an error?

One other team is also facing issues with downloading imaging for patient 10403. We're looking into this and will keep you posted.

> There are 23 imaging data folders with more than one T2 and other data. I believe each of them also counts towards the 1500 MRI dataset. Are they in the same folder to indicate they are from the same patient? Why are those specific cases alone grouped like that?

These 23 folders represent patients with multiple prostate MRI studies/exams (essentially MRI exams of the same prostate gland or patient, at different timepoints). For instance, patient 10417 has two studies (1000424, 1000425) in our dataset. So in terms of the folder structure, you should see a patient-level folder, with imaging for all studies belonging to that given patient in the same folder, as follows:

images (root folder with all patients, and in turn, all 1500 studies) | 10417 (patient-level folder, including all studies for given patient) |--------------10417_1000424_t2w.mha (imaging for study 1000424)
|--------------10417_1000424_adc.mha (imaging for study 1000424)
|--------------10417_1000424_hbv.mha (imaging for study 1000424)
...
|--------------10417_1000425_t2w.mha (imaging for study 1000425)
|--------------10417_1000425_adc.mha (imaging for study 1000425)
|--------------10417_1000425_hbv.mha (imaging for study 1000425)
...

The Public Training and Development Dataset includes 1500 studies or cases (not 1500 patients), and multiple studies from the same patient do count towards this number. In clinical routine, not all patients undergo multiple exams. And furthermore, not all exams from the same patient were necessarily sampled into the Public Training and Development Dataset. Some may have been sampled into the Private Training Dataset. Hence you see this happen for only 23 patients. For more details on the distribution of patients/cases in the Public Training and Development Dataset, please check out the README or marksheet.csv included in the picai_labels repo, or Table 1 in our preregistered study protocol.

> Only 1295 labels are available. Will that be the total truth provided? Or will the rest be uploaded later?

Teams with one of the top 5 AI algorithms in the Open Development Phase of the challenge, will have the chance to train their models using 6000-8000 additional cases from our Private Training Dataset in the Closed Testing Phase. Substantial fraction of that dataset will lack expert-derived annotations as well (which is a common scenario in real-life practice, due to annotation burden at scale). Hence, we encourage participants to develop methods that can figure out how to use non-annotated cases in the Public Training and Development Dataset. At Radboudumc, we deal with such cases using a semi-supervised learning strategy (Bosma et al., 2022). We will make AI-derived annotations for all 1500 public training cases and all cases in the Private Training Dataset (later in the challenge), using this method. Participants can choose to use these AI-derived annotations for non-annotated cases, or use their own methodology for the same. To stay up-to-date on when these AI-derived annotations are released, please check out the README included in the picai_labels repo, or the dedicated forum thread for all updates on the Public Training and Development Dataset.

Hope this helps.

 Last edited by: anindo on Aug. 15, 2023, 12:56 p.m., edited 23 times in total.

Re: Imaging data missing  

  By: anindo on May 18, 2022, 5:53 p.m.

Hey.

We can confirm that patient folder 10403 (including all imaging files for study 1000409) is missing from our uploaded dataset. Thanks for bringing this to our attention! We'll address this along with all other pending fixes in a scheduled update on 31 May 2022.

For all updates/fixes regarding this dataset, please follow our dedicated forum post on this topic.

Thanks again!

 Last edited by: anindo on Aug. 15, 2023, 12:56 p.m., edited 1 time in total.