About the result of your training code.

About the result of your training code.  

  By: hnefa335 on Dec. 25, 2021, 6:29 a.m.

Hi. Thanks for your training code and I have one question.

I run your training code without any modification and I get 44.04 mPQ scores for the test dataset. (FOLD_IDX = 0 and I got 44.04 for the 100th epoch, 43.66 for the 98th epoch, 43.15 for the 99th epoch.. )

But on the baseline code you provided before, the mPQ is 49.98.

I checked that I was able to get the same result (mPQ 49.98) when I used your pretrained weight you provided for the test dataset of 0th fold, but, when I used the weight I trained with your traning code, I got 44.04 for the same test datset.

I just wonder how you were able to get the 49.98 mPQ score on the baseline.

Thanks!

 Last edited by: hnefa335 on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.

Re: About the result of your training code.  

  By: dangvuquoc1993 on Dec. 25, 2021, 2:29 p.m.

Hi, you can try to select different epoch and see.

We did not do anything fancy for selecting the model checkpoint. The weights we provide were taken from epoch that has the highest average dice for typing.

There is a file within the model checkpoints called stats.dat (should be under exp_output/local/models/baseline/00/model/01 where you can retrieve the validation dice scores in each epoch of the 2nd training phase. To select the correct keywords, check the debug.log file for printed output format, for example infer-valid-tp_dice_0 to infer-valid-tp_dice_6 . Retrieving these scores and taking their average will allow you to rank epoch and select the checkpoint.

Edited: make sure that only checkpoints within exp_output/local/models/baseline/00/model/01 should be selected (entire model training or phase 2 training).

 Last edited by: dangvuquoc1993 on Aug. 15, 2023, 12:55 p.m., edited 2 times in total.

Re: About the result of your training code.  

  By: dangvuquoc1993 on Dec. 26, 2021, 3:32 p.m.

Follow up my previous post, we have dumped the mPQ validation results for other epochs as sanity check. You can view them here here .

You can see that there are better checkpoints than the one we provided. However, you can also observe that the resulting mPQ can range from 0.44 to 0.50. You can also observe that going from epoch 25 onward (dropping the learning rate from 1.0e-4 to 1.0e-5), the results mainly hover around 0.49. Although we have yet clearly understood the cause of your results, perhaps we had better random compared to yours. Notice that although we try as best as we can to make the training determinstic by setting random generator (here and here]), determinstic in practice is hard to achieve when running things in parallel. You can try a retraining and see if thing can get improved.

Re: About the result of your training code.  

  By: hnefa335 on Dec. 27, 2021, 9:28 a.m.

Thanks for your reply!

but after I retrain your code without any modification, again I got same score(43~44) at the epoch with highest valid dice for fold 0, can't go over 44 with several epochs. (but got 49.98 with your pretrained weight for same test dataset with same test process) Maybe there must be some mistakes in my process.. I will check again.

Are you planning to upload your test code too?

Thanks!

 Last edited by: hnefa335 on Aug. 15, 2023, 12:55 p.m., edited 6 times in total.

Re: About the result of your training code.  

  By: simongraham73 on Dec. 27, 2021, 12:11 p.m.

At test time, we use the notebook that we created here.

Did you not use this notebook at test time? Also, can you confirm that you are using checkpoints from the second phase? i.e the directory should end in 01/

Best,

Simon

 Last edited by: simongraham73 on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.

Re: About the result of your training code.  

  By: hnefa335 on Dec. 27, 2021, 3:32 p.m.

Yes. I used the checkpoint from the second phase which has the best valid dice. And I used the test code in baseline but with little modification. (about the functions - model._proc_np_hv, model._get_instance_info which are not in model in training code)

I was just curious about the log for mPQ you provided above and wanted to implement as it is.

I will check again. Thank you for your reply!

 Last edited by: simongraham73 on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.

Re: About the result of your training code.  

  By: winniezhangcoding on Dec. 29, 2021, 4:16 a.m.

Hi,

I also tried to train a model with the released training code and found that I could not reach 49 either. I could only reach around 43 and were unable to get anywhere near 49.

Also, I could only reach around 75 in terms of regression score, which is far from the performance of the released checkpoint (around 86).

Have you found out the reason here? Or maybe we should apply some modifications on the released code?

Thanks in advance!

 Last edited by: winniezhangcoding on Aug. 15, 2023, 12:55 p.m., edited 1 time in total.

Re: About the result of your training code.  

  By: simongraham73 on Dec. 29, 2021, 11:06 a.m.

We have dug into the training code to try and understand why participants could not quite replicate our baseline results. We then realised that the released code did not appopriately load the ImageNet pretrained weights. This has now been accounted for in the recent PR and should now work as expected.

Re: About the result of your training code.  

  By: joangibert14 on Jan. 12, 2022, 3:28 p.m.

Hi,

tried the new release but still not getting the same results. Actually I'm getting far worse (PQ = 0.527; mPQ=0.336). Is anyone facing the same issue or did you just obtain similar results?

Tried to git clone it twice and downloaded the weights and dataset couple of times too...

Re: About the result of your training code.  

  By: simongraham73 on Jan. 13, 2022, 4:03 p.m.

@joangibert14 - we are currently retraining the model, so that we can provide a more calculated response. We will reach out tomorrow.

Re: About the result of your training code.  

  By: winniezhangcoding on Jan. 13, 2022, 4:06 p.m.

Hi,

I tried the new code and got similar results in PQ and mPQ with the released checkpoint(PQ: 0.60429818 mPQ:0.483305). Though they are not as good as the scores of the baseline checkpoint, but they are close.

However, in the case of multi r2, I am also far from the baseline checkpoint. I got 0.789007, which is around 7 points behind the baseline. I have tried checkpoints from all the epochs but none of them could reach a score as high as 0.86. I am still trying to figure our what got wrong here.

Re: About the result of your training code.  

  By: simongraham73 on Jan. 13, 2022, 9:32 p.m.

We are pleased to hear that you have recovered the PQ and mPQ. We will investigate the R^2 over the next day.

Note, we are using a fairly small subset of the data for validation and therefore stats may be prone to fluctuate. The class imbalance may also make the stats senstive to fluctuation. The important thing is to get the good result on the test set.