Grand Challenge

FROC evaluation ¶

By: qinghezeng29 on March 28, 2022, 8:32 p.m.

Dear organizers,

We have some questions about the FROC evaluation. 1. We used a threshold to suppress the false detection. Is this recommended? 2. We observed a shift from the origin of the FROC curve in our results. Is this normal? What would be the possible reasons as this didn't happen when we computed locally using your code on the training set (not sure we did it correctly). 3. Could you please kindly explain to us what are the differences between the refined FROC code (here) and the previous code (here)?

Thank you very much in advance!

Best regards, Qinghe

Re: FROC evaluation ¶

By: crunch on March 29, 2022, 10:09 a.m.

Hi, 1. The threshold cutoff corresponds to the right-most point on the froc-curve, so setting it too low might indeed hurt the evaluation performance. So, no threshold should be fine (unless your model generates hundreds of thousands of predictions which might pose a technical problem). If your current threshold results in a curve beyond the right-most evaluated point at 300 fpr, lowering the threshold won't change the result. 2. The first evaluated point on the froc-curve is the true positive rate when having 10 false positives (per mm²) and so far all models have 0 tpr for this value. Currently submitted models start at 30fp or so (first non-zero tp). Previously we plotted the froc from the first to last non-zero tp value, now the plotting starts from fpr=0 and ensures that all evaluated froc points (10, 20, 50, 100, 200, 300 fps) are visible. 3. The difference in the froc code is in the step-wise interpolation of the missing fp-values. Previously, the highest next value was taken, now its taking the lowest next value (see bisect in the code).

I hope this answers your questions :-)

Re: FROC evaluation ¶

By: qinghezeng29 on March 31, 2022, 3:19 p.m.

Dear Witali,

Thank you for your helpful answer!

Just a few more question.

When we perform the evaluation locally, shall we modify the line

target_fps=[5, 10, 20, 50, 100, 200, 500] to target_fps=[10, 20, 50, 100, 200, 300]

in order to simulate the real performance?
If there are more than 1 predicted "hit" for 1 annotated lymphocyte, will they all be counted as TPs?
Just for curiosity, why are we submitting results in mm, as normally we have pixel predictions from the detection model and the FROC will also be calculated in pixels? Thank you so much!

Yours sincerely, Qinghe

Last edited by: qinghezeng29 on Aug. 15, 2023, 12:56 p.m., edited 1 time in total.

Re: FROC evaluation ¶

By: crunch on April 1, 2022, 8:40 a.m.

Hi Qinghe,

thanks for noticing, I had actually still the old version of the froc pushed, now fixed.
No, if there are several hits for 1 lymphocyte (in the sense that they are very close), one of them will be counted as tp and the rest as fps. You should use e.g. non-max suppression to avoid such cases.
The format of the detection coordinates is determined by the grand-challenge api. For the froc-curve we normalize the fps by the area in mm² to have a better interpretability of the results (300 lymphocytes per mm² is much more clear then per X pixels, which also depends on the spacing.)

Re: FROC evaluation ¶

By: qinghezeng29 on April 4, 2022, 3:50 p.m.

Things are clear now, thank you so much for your answer Witali! Best, Qinghe

Re: FROC evaluation ¶

By: qinghezeng29 on April 8, 2022, 4:35 p.m.

Dear Witali,

Are there any overlapped bounding boxes (or annotated lymphocytes closer than 4 microns) in the training and test set? Thank you in advance!

Best, Qinghe

Re: FROC evaluation ¶

By: mart.vanrijthoven on April 12, 2022, 12:48 p.m.

Dear Qinghe,

Yes there are bounding boxes closer to each other than 4 micron in the training and test set (about 5%).

Best wishes, Mart

Re: FROC evaluation ¶

By: qinghezeng29 on April 12, 2022, 1:56 p.m.

Thank you for your answer Mart!

Best wishes, Qinghe