Number of files per Patient ID

Number of files per Patient ID  

  By: melhzy on June 1, 2022, 11:17 p.m.

Hi everyone,

I am reviewing the dataset and trying to understand different files associated to each patient id (if I understand it correctly).

According to the summary paper of this challenge, the imaging modality(ies) available in our dataset include: 1. Axial, sagittal and coronal T2-weighted imaging (T2W); 2. Axial computed high b-value (≥ 1400 s/mm2) diffusion-weighted imaging (DWI); 3. Axial apparent diffusion coefficient maps (ADC). 4. All cases used for the reader study will also include dynamic contrast-enhanced (DCE) sequences.

I am kind of expected to see 4 images per patient and per study. I find patient ids (10941, 10776) have 3 mha files each, patient ids (10691, 10116, 11217) have 4 mha files each, and the rest patient ids are all having 5 or 5*n mha files each.

I have printed a few examples below.

  • Patient ID with 3 mha files:

  • Patient ID with 4 mha files:

  • Patient ID with 5 mha files:

  • Patient ID with 10 mha files:

  • Patient ID with 15 mha files:

I have a question regarding file names and image modalities.

How should I map _adc.mha, _cor.mha, _hbv.mha, _sag.mha, and _t2w.mha appropriately with the imaging modalities description mentioned in our summary paper or challenge description paper?

Also, could we have an official definition or description regarding the _adc.mha, _cor.mha, _hbv.mha, _sag.mha, and _t2w.mha files?

 Last edited by: melhzy on Aug. 15, 2023, 12:56 p.m., edited 4 times in total.

Re: Number of files per Patient ID  

  By: joeran.bosma on June 2, 2022, 9:24 a.m.

Hi melhzy,

Thank you for your query. You are correct that the number of files per patient is not constant across the dataset. Also, you are correct in which sequences are included for each case. One small difference, however, is that the sagittal and coronal T2-weighted imaging (_sag.mha) and (_cor.mha) are optional. The sagittal and coronal T2-weighted sequences are available for most cases, but we did not exclude cases when one or both of these sequences were missing. This explains why some cases have 3 or 4 imaging files.

For each of the 1500 cases in the PI-CAI: Public Training and Development Dataset, the filename consists of the [patient_id]_[study_id]_[modality].mha. Each patient can have multiple studies (prostate MRI examinations, at different moments in time), which are grouped in the patient's folder to indicate these studies are from the same patient. This explains why some patients have 10 (2 studies) or 15 (3 studies) imaging files.

The sequences are mapped to the filenames in the following way: 1. Axial T2-weighted imaging (T2W): [patient_id]_[study_id]_t2w.mha 2. Axial computed high b-value (≥ 1400 s/mm2) diffusion-weighted imaging (DWI): [patient_id]_[study_id]_hbv.mha 3. Axial apparent diffusion coefficient maps (ADC): [patient_id]_[study_id]_adc.mha 4. Sagittal T2-weighted imaging: [patient_id]_[study_id]_sag.mha 5. Coronal T2-weighted imaging: [patient_id]_[study_id]_cor.mha

Kind regards, Joeran

Re: Number of files per Patient ID  

  By: anindo on June 2, 2022, 9:53 a.m.

Also note this distinction, i.e. for the Public Training and Development Dataset and the Private Training Dataset:

  • Every patient case will at least have three imaging sequences: axial T2W, axial DWI and axial ADC scans (i.e. files ending in _t2w.mha, _hbv.mha, _adc.mha, respectively). Additionally, they can also have either, both or none of these optional imaging sequences: sagittal and coronal T2W scans (i.e. files ending in _sag.mha, _cor.mha, respectively). None of them will have DCE sequences.

But for the Hidden Validation and Tuning Cohort and the Hidden Testing Cohort:

  • Every patient case will have exactly five imaging sequences: axial, sagittal and coronal T2W; axial DWI and axial ADC scans (i.e. files ending in _t2w.mha, _sag.mha, _cor.mha, _hbv.mha, _adc.mha, respectively). For part of the Hidden Testing Cohort, DCE sequences will only be available to radiologists participating in the reader study. But they will not be available for AI algorithms, within the context of this grand challenge, at any given stage.

We will clarify the file-mapping better in the README.md associated with the Public Training and Development Dataset .

If you want to dive deeper into what these different imaging sequences are and why they are useful for csPCa detection/diagnosis, you can have a look at:

Hope this helps.

 Last edited by: anindo on Aug. 15, 2023, 12:56 p.m., edited 3 times in total.