Panoptica Metrics SIGTERM issues

Panoptica Metrics SIGTERM issues  

  By: minanessiem on Aug. 14, 2024, 1:04 p.m.

Hello ISLES24 Team,

I have just started integrating your provided Panoptica-based script to add the official metrics into my training pipeline. I currently have it such that I cache all of my predictions and their associated labels and then pass these case by case to the metrics calculator that encapsulates the code snippets provided on the github repo.

I keep however facing an issue where I keep getting the following error:

[rank: 0] Received SIGTERM: 15

When I set verbose=True, this gives me the following log:

Panoptic: Start Evaluation
-- Got SemanticPair, will approximate instances
-- Got UnmatchedInstancePair, will match instances
(here when it keeps hanging, I press ctrl+c and see the error again)
^C[rank: 0] Received SIGTERM: 15

I have tried multiple options, to have the metrics be calculated after each batch, which breaks the training pipeline mid-epoch, the other is to cache or multiprocessing Queue the epoch results to calculate them on_epoch_end, but this only pushes the can down the line.

This issue strikes me whether I run workers=0 or workers >1

Is there maybe something I dont understand about using the panoptica library?

Thank you in advance.

Re: Panoptica Metrics SIGTERM issues  

  By: ezequieldlrosa on Aug. 14, 2024, 4:12 p.m.

Hello, which version of panoptica are you using?

Re: Panoptica Metrics SIGTERM issues  

  By: minanessiem on Aug. 14, 2024, 5:36 p.m.

I'm using version 1.0.1, the latest version on pypi

Re: Panoptica Metrics SIGTERM issues  

  By: ezequieldlrosa on Aug. 14, 2024, 6:42 p.m.

Hi Mina,

I checked, and it looks like we’re using the same version. Unfortunately, I can’t provide a precise answer right now; our focus with these scripts was more on computing challenge performance metrics rather than integrating them into training models.

I’ve reached out to the package developers for further insights. In the meantime, a few things to consider:

1 ) Are you using binary masks? These metrics don’t support soft predictions.

2) It could potentially be a multiprocessing issue, as panoptica and some DL packages parallelize many processes.

3) As a workaround, you could use the isles22 challenge metrics. It computes the exact same metrics, though F1-score and lesion count difference might vary slightly (overall, they are very similar). The difference is just on implementation: in ISLES22, we consider a lesion 'detected' if at least one voxel overlaps. In contrast, panoptica uses a more refined approach depending on the IoU performance.

Best, Ezequiel

 Last edited by: ezequieldlrosa on Aug. 14, 2024, 6:49 p.m., edited 1 time in total.

Re: Panoptica Metrics SIGTERM issues  

  By: minanessiem on Aug. 14, 2024, 6:58 p.m.

To make my issue more clear, I am using the following relevant libraries:

pytorch-lightning==2.4.0
torch==2.4.0
torchmetrics==1.4.1

Relevant objects in this case are the Train_DataLoader, the Validation_DataLoader (which are both assigned the argument num_workers), The Pytorch Lightning Trainer, and the custom Pytorch Lightning Torchmetric.

I have tried all combintions of constructing the evaluator object (inside the DICE torchmetric, inside the Pytorch Lightning Model instantiation, inside the main class), having the Validation_DataLoader be the class that runs the calculations (while using num_workers=0), there is not an instance where there are multiprocessing workers in my script that do not lead the panoptica library to give me the SIGTERM issues.

The only instance which does not lead to the codebase crashing, is where num_workers=0 for the Train_DataLoader and the Validation_DataLoader. This, you might imagine, takes my pipeline from 30s to more than 3 minutes.

Re: Panoptica Metrics SIGTERM issues  

  By: minanessiem on Aug. 14, 2024, 6:59 p.m.

Thank you for your reply, I'll be looking into it ASAP.

Re: Panoptica Metrics SIGTERM issues  

  By: minanessiem on Aug. 15, 2024, 2:10 p.m.

Thank you, this resolved my issues.