Grand Challenge

evaluation script ¶

By: tanbobo on Dec. 3, 2021, 7:10 a.m.

Dear Organizer, Is there any possible to provide the evaluation script? The evaluation description is confusion.

Re: evaluation script ¶

By: coendevente on Dec. 7, 2021, 9:22 a.m.

Dear tanbobo,

We will release the script for the final evaluation in the coming time, keep an eye out on the updates that we post on the challenge page.

Are there any specific parts of the evaluation description that are not clear?

Best regards, Coen de Vente

Last edited by: coendevente on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.

Re: evaluation script ¶

By: tanbobo on Dec. 7, 2021, 9:59 a.m.

Dear Coen de Vente, GOOD JOB, thank you for your time.

Re: evaluation script ¶

By: Tomas on Dec. 12, 2021, 8:27 a.m.

Dear coendevente,

I want to clarify some things regarding the evaluation. In the evaluation page says "likelihood score for referable glaucoma at eye level (O1), a binary decision on referable glaucoma presence (O2)"

From this I understand that we need to predict referable glaucoma for each eye of the patient (O1) and then to predict whether a patient has referable glaucoma at all (thus, based on results of both eyes). However, in the dataset we don't have patient level information: train_labels.csv file only has information on the name of a single eye image and whether it's referable or not. Thus, it seems that the dataset is constructed in such a way so that a single eye prediction can be done and not a patient-wise prediction. Or am I missing some information relating dataset images to specific patients?

Re: evaluation script ¶

By: coendevente on Dec. 13, 2021, 8:49 a.m.

Dear Tomas,

You do not need to predict anything on patient level. During evaluation, you will just be given an image of one eye at a time and you are expected to give the four described outputs for each image, individually. It may have been a bit confusing that we have only written "at eye level" when referring to the first output, so we have updated the text.

Best regards, Coen

Last edited by: coendevente on Aug. 15, 2023, 12:55 p.m., edited 2 times in total.

Re: evaluation script ¶

By: SKJP on Dec. 31, 2021, 1:05 a.m.

Dear Coen,

I would like to clarify the evaluation metrics O3 and O4. According to the challenge description, it says "a binary decision on whether an image is gradable (O3), and a non-thresholded scalar value that is positively correlated with the likelihood for ungradability (O4)".

In my understanding, O3 takes 1 (positive) when an image is gradable and 0 (negative) when an image is ungradable. But how about O4 ? Which does O4 take a smaller or larger value when the gradability is high ? I'm confusing the wording "the likelihood for ungradability".

Best regards, Satoshi

Re: evaluation script ¶

By: coendevente on Jan. 3, 2022, 11:02 a.m.

Dear Sathoshi,

O3 is expected to be positive (true/1) for ungradable and negative (false/0) for gradable. https://airogs.grand-challenge.org/data-and-challenge/ has been updated to improve clarity.

A higher value for O4 should indicate an ungradable image. Therefore, when calculating the AUC for ungradability (δ), ungradable cases will have positive labels and gradable cases will have negative labels.

https://github.com/qurAI-amsterdam/airogs-example-algorithm/blob/master/process.py has also been updated with comments in line 66-69, aiming to remove any ambiguity about the matter.

Best regards, Coen de Vente

Last edited by: coendevente on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.

Re: evaluation script ¶

By: coendevente on Jan. 31, 2022, 12:36 p.m.

The evaluation code is now available here.