XML Annotations comparison to PNG Masks

XML Annotations comparison to PNG Masks  

  By: arian.arab on Feb. 3, 2022, 4:13 p.m.

Thanks for organizing the challenge. I have a question regarding the annotation of the PNG files compated to the annotations from the XML files.

Firstly, I guess one annotation is missing for "TC" files and 27 annotations for the "TCGA" files. (The tiff image is available but the XML file is missing), please correct me if I am wrong.

Secondly, I have looked at the data both loading the annotations from the XML files and the PNG masks (JSON and PNG mask).

In the XML annotations there are 9 labels corresponding to: "ROI", "Invasive Tumor", "Tumor Associated Stroma", "In-situ Tumor", "Healthy Glands", "Necrosis not in situ", "Inflamed Stroma", "Rest" and “Lymphocytes and plasma cells".

From the PNG masks there are 8 different labels corresponding to: "ROI", "Invasive Tumor", "Tumor Associated Stroma", "In-situ Tumor", "Healthy Glands", "Necrosis not in situ", "Inflamed Stroma", "Rest"

The "Lymphocytes and plasma cells" locations are stored in the JSON file correctly.

However, I noticed that for some TIFF files, when looking at the XML file, some part of the ROI is annotated as "tumor-associated stroma" and this is the only annotation for that specific ROI. However, for the same file when I look at the PNG file, I can see that the annotations are labeled for the “invasive tumor” and “tumor-associated stroma”. It seems like that the ROI box is assigned to the invasive tumor. Why is that and how is it done? How one decides to correspond the not-labeled annotations to different classes? By looking at the XML annotations, it seems like that ROI is the correcponding larger bounding box targeting the region of interest to annotate further.

Are these XML annotations, manually converted into PNG files or is there an algorithm?

The PNG masks have a class with the value of zero, which my guess is that coming from the ROI box.

I am not sure how the PNG masks for the class zero are created, because there is also another class as "Rest".

I have prepared several examples in a PowerPoint file; I could share it if that helps.

Thanks,

Arian

Re: XML Annotations comparison to PNG Masks  

  By: mart.vanrijthoven on Feb. 4, 2022, 6:31 a.m.

Dear Arian,

Thanks for your message. To answer your first question. There are 27 images of TCGA subset that only contain tissue annotations. You can find the annotation files in the s3://tiger-training/wsirois/wsi-level-annotations/annotations-tissue-bcss-xmls/ directory. Only large regions of interest are available for these images without cell annotations. In the directory s3://tiger-training/wsirois/wsi-level-annotations/annotations-tissue-cells-xmls, these annotation files are indeed not present because there are no cell annotations available. For one image of TC, there was indeed no annotation file. Please see this topic: https://grand-challenge.org/forums/forum/tiger-601/topic/no-annotation-file-for-tc_s01_p000091_c0001_b101-303/

I am unsure if I follow you entirely for your second question. I will be happy to check your PowerPoint file. Please also point us to example annotation files and corresponding png masks. It may help us understand the issue better.

To convert the XML files to mask, we used OpenCV-python and, in particular: cv2.fillPoly.

Best wishes, Mart

 Last edited by: mart.vanrijthoven on Aug. 15, 2023, 12:55 p.m., edited 2 times in total.
Reason: create clickable link

Re: XML Annotations comparison to PNG Masks  

  By: arian.arab on Feb. 4, 2022, 2:01 p.m.

Thanks for your prompt response. Your reply answers my first question. Thanks for that.

Regarding the second question, if you for example, look at the file named: "TCGA-A2-A04Q-01Z-00-DX1.DF7ED6B6-7701-486D-9007-F26B6F0682C4.tif", the XML annotations contain 4 classes: ROI, inflammed stroma, invasive tumor and, tumor associated stroma.

By looking at the same file and loading the PNG mask, the PNG mask contains 4 labels as well: 0,1,2,6.

I beleive the zero'th class is assinged to what is not labeled in the ROI box. if we include the zero's class, total number of classes goes to 8 different classes instead of 7.

As another example file: 'TCGA-A2-A04T-01Z-00-DX1.71444266-BD56-4183-9603-C7AC20C9DA1E.tif' contains these classes from XML file: ROI, inflamed stroma, invasive tumor, necrosis not in situ, rest and, tumor associated stroma. The PNG mask, cotains 0,1,2,5,6,7.

Should we just discard the zero'th class? This class is more clear in the rotated ROIs. for example file named as "TCGA-EW-A1OV-01Z-00-DX1.93698123-5B34-4163-848B-2D75A5F7B001.tif" has a rotated ROI which then trasnlates into a class 0 in the PNG mask.

Best, Arian

Re: XML Annotations comparison to PNG Masks  

  By: mart.vanrijthoven on Feb. 5, 2022, 9:16 a.m.

Dear Arian,

Thank you for clarifying your second question. You are indeed correct. The png masks contain zero values for pixels that are not labeled. For example, some pixels between two adjacent tissue types are not labeled. Non-labeled pixels can also exist in TCGA slides because an 'exclude' label was present in the original annotations. Furthermore, as you have noticed in the TCGA dataset, some ROIs are rotated. Therefore, the extracted rectangle region also includes pixels that are not labeled.

The zeros in the png masks should not be interpreted as another class but should be treated as non-labeled pixels.

Best wishes, Mart