About the Dataset

About the Dataset  

  By: f.bolelli on June 13, 2024, 3:16 p.m.

Dear Challenge Participants,

Thank you for your interest in our challenge. We would like to make everyone aware about some feedback and questions we received regarding the dataset used in our challenge.

Annotation Process: First, we would like to stress that the dataset has been annotated half-automatically, with complete manual supervision. Multiple clinical experts were involved in the process, but due to time constraints, they did not supervise each other, meaning that the annotation for a specific volume has been supervised by a single expert only. Despite our best efforts, we acknowledge that minor inconsistency may still be present in the labels.

Changelog and Updates: Please note that our webpage (ditto.ing.unimore.it/toothfairy2) provides a changelog with every dataset update performed since its release. We encourage you to review this for the latest information. However, please avoid downloading the entire dataset at each minor change. We ensure that from the beginning of the Preliminary Phase to the end of the Challenge (Final Phase) the dataset will not be modified.

If you find any potential issue, please contact us mentioning both the “patient” and the “potential issue”, our clinical experts will promptly review it.

Model Generalization: We acknowledge that also minor inconsistency can potentially impact model generalization. However, we would like to emphasize that all challenge participants face the same conditions and tackling potential annotation issue is part of the challenge.

Dataset Metadata: When converting the CBCT volumes from DICOM to .mha format we accidentally dropped the voxel spacing information which resulted in the default spacing of 1mm. The information reported in our structured challenge submission is the correct one: "the spacing is 0.3mm isotropic". The current version of the dataset (13th June 2024) reports the correct metadata.

Thank you for your understanding and cooperation. We appreciate your active participation and valuable feedback, which helps us improve the challenge experience for everyone.

Best regards,
Federico Bolelli on behalf of the ToothFairy2 challenge team.

Re: About the Dataset  

  By: qqqwwwee on June 17, 2024, 12:23 a.m.

Dear organizers,

We cannot open the webpage (ditto.ing.unimore.it/toothfairy2) to download the data now. Could you help us with it?

Best regards, Jin

Re: About the Dataset  

  By: f.bolelli on June 17, 2024, 8:31 a.m.

The service should be back online. Our entire infrastructure was under maintenance last weekend.

Re: About the Dataset  

  By: jh.han on June 20, 2024, 2:17 a.m.

Thank you for building huge size of datasets. It is helpful in this fields

Re: About the Dataset  

  By: mmilutinovic on June 21, 2024, 1:49 p.m.

Set B Images and Segmentation Masks Geometry Mismatch

Dear organizers,

While preprocessing .mha files from imagesTr and labelsTr, we noticed that 43/63 images and segmentation masks do not share the same geometry which does not follow nnU-Net dataset format.

Since Set B contains volumes with a broader field of view, is an interpolation of "missing" slices in the segmentation masks (labels) to ensure consistent geometry between segmentations and their corresponding images a necessary step in the challenge?

The list of files with mismatched shapes is the following: ToothFairy2F_021_0000.mha ToothFairy2F_020_0000.mha ToothFairy2F_057_0000.mha ToothFairy2F_063_0000.mha ToothFairy2F_014_0000.mha ToothFairy2F_037_0000.mha ToothFairy2F_008_0000.mha ToothFairy2F_009_0000.mha ToothFairy2F_041_0000.mha ToothFairy2F_002_0000.mha ToothFairy2F_003_0000.mha ToothFairy2F_030_0000.mha ToothFairy2F_005_0000.mha ToothFairy2F_004_0000.mha ToothFairy2F_026_0000.mha ToothFairy2F_027_0000.mha ToothFairy2F_051_0000.mha ToothFairy2F_050_0000.mha ToothFairy2F_018_0000.mha ToothFairy2F_065_0000.mha ToothFairy2F_064_0000.mha ToothFairy2F_012_0000.mha ToothFairy2F_048_0000.mha ToothFairy2F_001_0000.mha ToothFairy2F_043_0000.mha ToothFairy2F_028_0000.mha ToothFairy2F_029_0000.mha ToothFairy2F_061_0000.mha ToothFairy2F_060_0000.mha ToothFairy2F_055_0000.mha ToothFairy2F_054_0000.mha ToothFairy2F_066_0000.mha ToothFairy2F_067_0000.mha ToothFairy2F_011_0000.mha ToothFairy2F_010_0000.mha ToothFairy2F_058_0000.mha ToothFairy2F_059_0000.mha ToothFairy2F_025_0000.mha ToothFairy2F_052_0000.mha ToothFairy2F_053_0000.mha ToothFairy2F_038_0000.mha ToothFairy2F_039_0000.mha ToothFairy2F_007_0000.mha


Best regards, Marija Milutinovic

 Last edited by: mmilutinovic on June 21, 2024, 3 p.m., edited 1 time in total.
Reason: Question rephrasing

Re: About the Dataset  

  By: f.bolelli on June 22, 2024, 6:31 a.m.

Dear Marija Milutinovic,

Thank you very much for your feedback. No, the different geometry between labels and data was not intentional. When converting the raw data inside "F" volumes to HU (the latest dataset update) we mistakenly uploaded the wrong data into the zip file.

We have already fixed the issue, and the current version on our server provides "F" data and labels with coherent geometries.

I am writing a specific email to all of the participants who downloaded the data in the last week and experienced such an issue.

Thank you again,
Best regards,
Federico.

 Last edited by: f.bolelli on June 22, 2024, 6:32 a.m., edited 1 time in total.

Re: About the Dataset  

  By: mmilutinovic on June 26, 2024, 12:57 p.m.

Dear Federico Bolelli,

Thank you very much for your prompt reply and fixing the issue!

We managed to preprocess all files, however, we noticed an origin and direction mismatch for a few volumes in set B.


Example: ToothFairy2F_049.mha

Warning: Origin mismatch between segmentation and corresponding images.

Origin images: (122.69999999999999, 0.0, 0.0). Origin seg: (0.0, 0.0, 0.0).

Warning: Direction mismatch between segmentation and corresponding images.

Direction images: (-1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0). Direction seg: (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0).

The following volumes are inconsistent in terms of origin and direction: 2F_{11, 41, 43, 44, 45, 46, 47, 48, 49, 50}.

Best regards,

Marija

 Last edited by: mmilutinovic on June 26, 2024, 1:14 p.m., edited 2 times in total.
Reason: Added the IDs of actual inconsistent volumes in terms of origin and direction

Re: About the Dataset  

  By: f.bolelli on June 28, 2024, 9:11 a.m.

Dear Marija,

Thank you for your feedback! You are right, there are few "F" volumes with inconsistent metadata regarding origin and direction. I am going to update that information in the next few days. Anyway, this is just an issue related to the .mha metadata, data and labels of those volumes are correctly aligned. (JTLYK, the correct metadata is the one associated with the data).

Best regards,
Federico Bolelli

 Last edited by: f.bolelli on June 28, 2024, 1:13 p.m., edited 1 time in total.