Potential Errors in Evaluation Code

Potential Errors in Evaluation Code  

  By: wildsquirrel on Nov. 4, 2024, 1:23 p.m.

I have noticed two potential errors in your evaluation script. They are in the match_coordinates() function. https://github.com/computationalpathologygroup/monkey-challenge/blob/c29b5d7d0e3e8f63c3a092b76df822111c8497cf/evaluation/evaluate.py#L163

The first error is here https://github.com/computationalpathologygroup/monkey-challenge/blob/c29b5d7d0e3e8f63c3a092b76df822111c8497cf/evaluation/evaluate.py#L181 if len(ground_truth) == 0 or len(predictions) == 0: return 0, 0, 0, np.array([]), np.array([]) If there is no ground truth, but there are predictions, the number of false positives shouldn't be zero. I would suggest changing or to and here.

The second error is here https://github.com/computationalpathologygroup/monkey-challenge/blob/c29b5d7d0e3e8f63c3a092b76df822111c8497cf/evaluation/evaluate.py#L197 In this loop, if a matching coordinate has been found, it should be removed from the predictions. Otherwise the same predicted coordinate could be matched to more than one ground truth. If this happens you will get more matched_gt than the actual number of predictions, this will lead to the number of false positives being negative. A fix can be setting the distance of the matched coordinate to all other ground truth coordinates to infinity, which prevents it from being matched again: dist_matrix[:, closest_pred_idx] = np.inf

 Last edited by: wildsquirrel on Nov. 4, 2024, 7:43 p.m., edited 5 times in total.

Re: Potential Errors in Evaluation Code  

  By: LindaSt on Nov. 4, 2024, 2:50 p.m.

Hi! Thanks for pointing this out. I'll look into it and push a fix asap.

Re: Potential Errors in Evaluation Code  

  By: LindaSt on Nov. 5, 2024, 12:01 p.m.

Hi wildsquirrel

I've updated the evaluation code to make the point assignment a greedy matching with a unique assignment. I also changed the or to an and, as you suggested. The updated code is on GitHub, and I've also updated the evaluation dockers here on GC. I've re-run all the existing submissions. Unfortunately, yours gave an error (all the others ran through without a problem). Here's the log:

2024-11-05T11:45:21.471000+00:00 ValueError: XB must be a 2-dimensional array.
2024-11-05T11:45:21.471000+00:00 
2024-11-05T11:45:21.471000+00:00 The above exception was the direct cause of the following exception:
2024-11-05T11:45:21.471000+00:00 
2024-11-05T11:45:21.471000+00:00 Traceback (most recent call last):
2024-11-05T11:45:21.471000+00:00   File "/opt/app/evaluate.py", line 385, in <module>
2024-11-05T11:45:21.471000+00:00     raise SystemExit(main())
2024-11-05T11:45:21.471000+00:00                      ^^^^^^
2024-11-05T11:45:21.471000+00:00   File "/opt/app/evaluate.py", line 238, in main
2024-11-05T11:45:21.471000+00:00     results = run_prediction_processing(fn=process, predictions=predictions)
2024-11-05T11:45:21.471000+00:00               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-05T11:45:21.471000+00:00   File "/opt/app/helpers.py", line 76, in run_prediction_processing
2024-11-05T11:45:21.471000+00:00     raise PredictionProcessingError(
2024-11-05T11:45:21.471000+00:00 helpers.PredictionProcessingError: Error for prediction {'pk': '0905120d-e4d0-46fa-b8f7-d821d29c4fc6', 'url': 'https://grand-challenge.org/algorithms/tiakong_detect/jobs/0905120d-e4d0-46fa-b8f7-d821d29c4fc6/', 'inputs': [{'pk': 2498930, 'file': None, 'image': {'pk': 'f1fdc557-ecbe-42ab-88e0-6633f815d2f0', 'name': 'E_P000015_mask.tif'}, 'value': None, 'interface': {'pk': 238, 'kind': 'Segmentation', 'slug': 'tissue-mask', 'title': 'Tissue Mask', 'super_kind': 'Image', 'description': 'Segmentation of the tissue in the slide. 0: Background 1: Tissue', 'default_value': None, 'look_up_table': None, 'relative_path': 'images/tissue-mask', 'overlay_segments': []}}, {'pk': 2499105, 'file': None, 'image': {'pk': 'e87c7799-99ca-48a0-bc08-775540c4f341', 'name': 'E_P000015_PAS_CPG.tif'}, 'value': None, 'interface': {'pk': 502, 'kind': 'Image', 'slug': 'kidney-transplant-biopsy', 'title': 'Kidney Transplant Biopsy', 'super_kind': 'Image', 'description': 'Whole-slide image of a PAS-stained kidney transplant biopsy', 'default_value': None, 'look_up_table': None, 'relative_path': 'images/kidney-transplant-biopsy-wsi-pas', 'overlay_segments': []}}], 'status': 'Succeeded', 'api_url': 'https://grand-challenge.org/api/v1/algorithms/jobs/0905120d-e4d0-46fa-b8f7-d821d29c4fc6/', 'outputs': [{'pk': 2598095, 'file': 'https://grand-challenge.org/media/components/componentinterfacevalue/a4/cf/2598095/detected-lymphocytes.json', 'image': None, 'value': None, 'interface': {'pk': 24, 'kind': 'Multiple points', 'slug': 'detected-lymphocytes', 'title': 'Detected Lymphocytes', 'super_kind': 'File', 'description': 'Lymphocytes in stromal regions', 'default_value': None, 'look_up_table': None, 'relative_path': 'detected-lymphocytes.json', 'overlay_segments': []}}, {'pk': 2598096, 'file': 'https://grand-challenge.org/media/components/componentinterfacevalue/a4/d0/2598096/detected-monocytes.json', 'image': None, 'value': None, 'interface': {'pk': 503, 'kind': 'Multiple points', 'slug': 'detected-monocytes', 'title': 'Detected monocytes', 'super_kind': 'File', 'description': 'Location of detected monocytes', 'default_value': None, 'look_up_table': None, 'relative_path': 'detected-monocytes.json', 'overlay_segments': []}}, {'pk': 2598097, 'file': 'https://grand-challenge.org/media/components/componentinterfacevalue/a4/d1/2598097/detected-inflammatory-cells.json', 'image': None, 'value': None, 'interface': {'pk': 504, 'kind': 'Multiple points', 'slug': 'detected-inflammatory-cells', 'title': 'Detected Inflammatory Cells', 'super_kind': 'File', 'description': 'Location of detected inflammatory cells', 'default_value': None, 'look_up_table': None, 'relative_path': 'detected-inflammatory-cells.json', 'overlay_segments': []}}], 'started_at': '2024-10-31T17:50:44.970920Z', 'completed_at': '2024-10-31T17:55:10.086920Z', 'view_content': {}, 'algorithm_image': 'Algorithm Image b9cc07c3 (SHA256: a214c3ac, comment: Overall detection baseline using )', 'hanging_protocol': None, 'rendered_result_text': '', 'optional_hanging_protocols': []}: XB must be a 2-dimensional array.