Problems such as inconsistencies or errors in the training data

Problems such as inconsistencies or errors in the training data  

  By: hebingdou on Nov. 1, 2024, 3:46 p.m.

1、XML annotation of ROI regions and issue-masks regions is heavily biased.The error is visualized as an error in at least the following images:['D_P000012_mask.tif', 'D_P000006_mask.tif', 'D_P000015_mask.tif', 'D_P000014_mask.tif', 'D_P000013_mask.tif', 'D_P000018_mask.tif', 'D_P000003_mask.tif', 'D_P000011_mask.tif', 'C_P000038_mask.tif', 'C_P000030_mask.tif', 'D_P000010_mask.tif', 'D_P000009_mask.tif', 'D_P000019_mask.tif', 'D_P000016_mask.tif', 'D_P000017_mask.tif']. 2、The presence of two different types of cell sites in the same cell---error:A_P000006_PAS_CPG.tif,The bug exists quite a bit, but it's very time consuming to check, it can be troubleshooted through the code, but with the time constraints, I'd like to submit the issue to you guys for fixes. 3、Labeled cell sites fall outside the ROI region of the. 4、The number of labeled cells in json and the number of labels presented in xml, are grossly unequal.['B_P000005','B_P000010'] 5、There is a serious deviation between the tag position XY in XML and XY in json, resulting in a score of almost 0 when evaluated using the json file.['A_P000033','A_P000022']

I have shown only a part of all the above errors, the errors appeared in the training set, so is it possible that they appeared in the validation set, the test set, some of them are acceptable and some of them will lead to a serious deviation from the right track. Therefore, organizers are requested to carefully review the above errors and update them in the training validation test data.

Re: Problems such as inconsistencies or errors in the training data  

  By: LindaSt on Nov. 4, 2024, 11:08 a.m.

Hi! We have done an additional QC check, corrected many double annotations, and updated the data on AWS on October 1st (see the announcements). Another update with the additional IHC data is coming this week.

We will examine the ROI issues, the double cell locations, and the deviation between the JSON and XML files. Thanks for letting us know.

There should be very few cells outside of the ROI, and they are disregarded during evaluation.

 Last edited by: DvanMidden on Nov. 4, 2024, 11:54 a.m., edited 3 times in total.

Re: Problems such as inconsistencies or errors in the training data  

  By: LindaSt on Nov. 5, 2024, 3:53 p.m.

Hi! Just to give you a short update, we have isolated the issue with the ROIs. Our pathologist is doing an additional quality check to try and eliminate all the double annotations. We hope to update the dataset by the end of the week. When I regenerate the JSON files, I'll also double check to make sure there no missing annotations.

Re: Problems such as inconsistencies or errors in the training data  

  By: hebingdou on Nov. 21, 2024, 7:28 a.m.

Can you check if the tissue-masks correspond one to one with pas-cpg? At least I found these unable to find the corresponding files in another folder. pas-cpg not mask-tif: C_P000021_PAS_CPG.tif C_P000022_PAS_CPG.tif C_P000023_PAS_CPG.tif C_P000024_PAS_CPG.tif C_P000025_PAS_CPG.tif C_P000026_PAS_CPG.tif C_P000027_PAS_CPG.tif C_P000028_PAS_CPG.tif C_P000029_PAS_CPG.tif C_P000030_PAS_CPG.tif C_P000031_PAS_CPG.tif C_P000032_PAS_CPG.tif C_P000033_PAS_CPG.tif C_P000034_PAS_CPG.tif C_P000035_PAS_CPG.tif C_P000036_PAS_CPG.tif C_P000037_PAS_CPG.tif C_P000038_PAS_CPG.tif C_P000039_PAS_CPG.tif C_P000040_PAS_CPG.tif

 Last edited by: hebingdou on Nov. 21, 2024, 7:29 a.m., edited 1 time in total.