Updated calcuation of evaluation metrics

Updated calcuation of evaluation metrics ¶

By: evihuijben on July 3, 2023, 1:42 p.m.

Dear participant,

We have been alerted by a participant to an error in our evaluation pipeline for calculating PSNR and SSIM. Both metrics consider a population-wide data range, which was originally set to [-1000, 2000], but many scans contain values that are outside this range. We have redefined this data range to [-1024, 3000], and for the calculation of PSNR and SSIM we have clipped both CT and sCT scans to these values.

The updated script for calculating these metrics can be found on our Github.

All validation task 1 and 2 submissions made before this bug was fixed have already been re-evaluated. We analyzed the difference in ranking position for each submission, and on average the positions changed by 0.29 and 0.27 for task 1 and 2, respectively. The maximum position shift was 3 places, and this occurred only once. All shifts of more than one position occurred in the bottom half of the leader board.

Best,

Evi & SynthRAD Organizers

Re: Updated calcuation of evaluation metrics ¶

By: whisneyzyw on July 6, 2023, 7:11 a.m.

Dear Evi & SynthRAD Organizers,

I noticed that only the pixels in the mask were considered when calculating the MAE and PSNR metrics, while the whole image was included when calculating the SSIM metrics. It can be understood that the calculation of SSIM needs to keep the shape of the image, so it is impossible to obtain only the pixels in the mask in a way similar to "gt = gt[mask==1]".

However, I noticed that the pixel range outside the mask of CT images in the training set is roughly [-980, -1024]. These pixel values seem to be unpredictable, and they are not paid attention to clinically. I'm unsure about the potential impact of values outside the mask on the calculation of the SSIM metric. (I think maybe have some influence, albeit to a limited extent.) I consider whether it is possible to set the pixels outside the mask of the ground-truth CT image to a fixed value when calculating SSIM to ensure that their influence on the results is completely eliminated, e.g., "gt[mask==0] = -1000" or "gt[mask==0] = -1024".

Best regards, Yiwen

Re: Updated calcuation of evaluation metrics ¶

By: evihuijben on July 11, 2023, 8:22 a.m.

Dear Yiwen,

Thank you for your suggestion. We discussed it and will be calculating a masked version of the SSIM at test time. Once the test phase is open, you will find the exact implementation at our Github.

Best regards, Evi