My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset

My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: gpio on Dec. 3, 2024, 4:34 a.m.

For example, the debugging used "P000002_A" and other datasets. Then get the score. I use the 'evaluate.py' on my local computer, the same dataset, but get a different score. I use the JSON file in the annotations folder as my ground truth. Has anyone faced similar issues?

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: LindaSt on Dec. 3, 2024, 10:47 a.m.

Hi! It is most likely due to the fact that I did not update the json files here on grand challenge after the additional QC. I'll do that today, and hopefully, you'll be getting the same results.

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: LindaSt on Dec. 3, 2024, 12:33 p.m.

I've updated the json files for the two cases (P000002_A and P000003_A) in the debugging phase now. Let me know if the issue persists. I've also updated the evaluation container, the updated scripts are on github.

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: gpio on Dec. 10, 2024, 4:42 a.m.

Hi. I see the update to the 'evaluate.py' on the GitHub. However, the FROC score is still different. In my local environment, I generate the prediction JSON file. Then use 'evaluate.py' to calculate the froc. However, when I upload the model, the FROC score on the same dataset is lower in the debugging platform.

I use the files in 'json_pixel' folder as the ground truth. I use '3_inference.ipynb' which is a little modified to generate the predicted json. I use 'evaluate.py' to get FROC score.

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: LindaSt on Dec. 10, 2024, 11:56 a.m.

Hi! This is indeed quite strange. Could you send me the output and metrics files you get locally and the link to the run on the debugging session? I'll run it locally on my end as well and compare.

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: gpio on Dec. 11, 2024, 8:28 a.m.

https://drive.google.com/drive/folders/1gRRhxRJ56HedX6dyAMba1Kmy03jak4O6?usp=drive_link

I have uploaded my prediction(in the P_0000002A folder) and my result(metrics). I also uploaded the my inference.py in case you need it. I'm truly grateful that you took the time to help me verify this.

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: LindaSt on Dec. 23, 2024, 11:04 a.m.

Hi gpio

I just wanted to let you know that I have not forgotten about you. I was just very busy because we have some persisting issues with three cases in the live leaderboard case that we've been trying to fix and I've also had some other deadlines. Sorry about that, I'll get back to you as soon as I can.

Re: My Model Perform Differently on Local and Online Debugging Scores for the Same Dataset ¶

By: FazaelAyatollahi on Feb. 7, 2025, 7:55 a.m.

Hello gpio

I run the evaluation code on your data, and I can see there are differences in the results on my local system than what you provided based on your local calculation, specially 1 percent difference on froc_score_aggr for Monocytes. So I am not sure why your results are different than mine, did you change any parameter in the evaluation code? which metrics are different for you?

Would you please make sure you are using the last version of code in the github.