About the Evaluation Metric

About the Evaluation Metric  

  By: haibo.nick.jin on Nov. 28, 2021, 6:48 a.m.

Hi,

I notice that the evaluation metric of the detection task is a combine of AUC and FROC, where AUC is actually for classification task and its weights is even larger than FROC. As far as I understand, you would like the model to pay more attention to whole-image classification? Or any other reasons? By the way, is it possible to provide the code of evaluation metrics? Thank you.

Re: About the Evaluation Metric  

  By: ecemsogancioglu on Nov. 28, 2021, 7:16 p.m.

Hi,

Thanks for your question! Indeed, you are right. We weight AUC higher, because it is, clinically, the most relevant metric.

In the clinical settings, we would like to have an AI systems which can identify a CXR image with a potential nodule, so that those patients would get a CT scan to further investigate this. Therefore, AI system which can classify an image as nodule or no-nodule is the most important feature, hence AUC score is weighted higher. Of course, localization is also very important to build an explainable system, but very precise localization is not the most important feature.

Regarding the evaluation metrics, we do not release our evaluation code since it contains the ground-truth data, and labels for the test set. But our AUC implementation simply uses sklearn, and our FROC implementation is based on Camelyon challenge evaluation code.

Hope this clarifies things a bit, please let us know if you have any other question!

Best, Ecem

 Last edited by: ecemsogancioglu on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.