Large differences in baseline results between predictions and preliminary/final phases

Dear organizers of the DENTEX challenge,

The submissions our team has done to the predictions and preliminary phases resulted in similar evaluation metrics.

When analyzing the metrics of the baseline method, very large differences can be observed between the predictions and preliminary phases.

Could you elaborate as to why these large differences occur?

As the Average Recall is consistently low, do you take a maximum number of bounding boxes per image when submitting to the Predictions phase?

Thank you for your time!

Best,

Niels van Nistelrooij