Grand Challenge

How is the score being evaluated? ¶

By: bhamm on June 17, 2025, 8:51 a.m.

Dear Organizing Team,

we just noticed your test run on the leaderboard showing a score of 0.500000. The associated metrics json contains the following:

{
  "results": {
    "AUC": 0.5,
    "Sensitivity": 0.07999999999999999,
    "Specificity": 0.125
  }
}

According to the guidelines, the final score should be the average of Sensitivity and Specificity and AUC, with AUC used as a tie breaker.

Given that, we are wondering how the 0.500000 score was calculated based on the above values.

Could you please clarify?

Best regards, Benjamin Hamm

Re: How is the score being evaluated? ¶

By: NickPayne on June 17, 2025, 9:55 a.m.

Hi Benjamin

You are correct that the overall score will be the average of the AUC, sensitivity, and specificity.

I have amended the evaluation script and scoreboard to display the average as "score" along with the individual metrics.

edit: just to say that the "score" of 0.5 previously displayed was the AUC with an unhelpful column title.

Best regards

Nick

Last edited by: NickPayne on June 17, 2025, 10:48 a.m., edited 2 times in total.

Re: How is the score being evaluated? ¶

By: YannickKirchhoff on June 20, 2025, 4:54 p.m.

Hi Nick,

somewhat related to the earlier question, is the evaluation code for the challenge publicly available? We would like to use the same implementation used for the final evaluation to evaluate our training runs.

Thanks, Yannick

Re: How is the score being evaluated? ¶

By: NickPayne on June 23, 2025, 10:44 a.m.

Hi Yannick

I've added the evaluation script and accompanying files to our HuggingFace repository: https://huggingface.co/datasets/ODELIA-AI/ODELIA-Challenge-2025/tree/main/evaluation-method

Best regards

Nick

Re: How is the score being evaluated? ¶

By: YannickKirchhoff on June 23, 2025, 3:58 p.m.

Thanks a lot!