Grand Challenge

Question about runtime evaluation ¶

By: XA on July 14, 2022, 3:17 a.m.

Hi,

In the released evaluation code, we can see that runtime computation per testing case starts from the docker container initialization and ends at the container close. This evaluation manner includes repeated calculation for environment initialization cost, e.g., docker deploy and gpu warmup, which may not reveal the real runtime of the testing case inference. In practical deployment scenario, we usually initialize the docker container once, and then execute the inference tasks sequentially. We can observe that the first task often takes longer time, and later tasks will have a stable runtime performance. Therefore, maybe start docker container once, inference all testing cases, measure the complete time and get the average is a better solution for inference runtime evalution.

Re: Question about runtime evaluation ¶

By: junma on July 17, 2022, 6:42 p.m.

Thanks for raising the great point. Yes, the runing time includes the docker start time because we need to obtain the resources metrics for each case. Thus, we cannot infer all the cases by starting docker container once. This point will be mentioned in our summary paper.

Re: Question about runtime evaluation ¶

By: XA on July 18, 2022, 2:51 a.m.

Thanks for your reply. Given the docker evaluation constraints, maybe disentangle the Dice/NSD and resource evaluation is more appropriate. I mean evaluate Dice/NSD in case level while evaluate resource cost in whole set level. In terms of this Challenge, it may be difficult to update the evaluation rule at current stage, I just express the things that are deserved to be disccussed.