The annotation format of the test set is similar to that of the training set, i.e., part of the slices in the test set are labeled. During the evaluation phase, only the slices with ground-truth annotations will be used for the calculation of quantitative metrics, e.g. DSC, 95 HD. We will not specify the annotated slices in advance, therefore, participants should provide results for each slice.