Additional Information about dataset && participation policies

Additional Information about dataset && participation policies  

  By: MungoMeng on June 13, 2025, 5:52 a.m.

Dear organizers,

I am confused about the dataset description and participation policies.

As stated in the Dataset description, the total number of cases is more than 1200 from at least 11 centers, and the total number of training cases is approximately 700 from 9 different centers, while the number of test cases is approximately 400 from at least 3 centers. However, only 10 centers are listed in the table. It seems that CHUV exists in HECTOR2022 but is missing this year. Moreover, only 7 centers are listed on the data download page (not 9 as stated).

In addition, could you please define the specific inputs for each task? For example, is HPV status available as a clinical indicator for Task 2 (Prognosis), and what clinical indicators are available during testing? Are RTDose and PlanningCT available as input for Task 3 (HPV prediction)? Are RTDose and PlanningCT available for all test cases or just a subset? This information is crucial when designing methods and preparing Docker containers.

For participation policies, I am wondering whether the pretrained foundation models are allowed? Is it allowed to use the pretrained weights, followed by fine-tuning with only the HECKTOR training data?

Best thanks!

Re: Additional Information about dataset && participation policies  

  By: SalmaHassan73 on June 16, 2025, 7:15 a.m.

Dear Mingyuan Meng,

For the finalized list of training centers, please refer to the dataset download page, as it contains the breakdown of cases per center, task, and modality. We are in the process of finalizing the test centers and will be sharing that information soon.

Regarding your questions about available modalities:

  • Please note that the input for each task will mimic the available information in each task folder, as organized in the training dataset.
  • RTDose and PlanningCT are not available as input for Task 3 (HPV prediction). These modalities are also not available for all test cases in Task 2.
  • Detailed instructions and sample Docker containers per task will be provided closer to the validation phase to clarify input availability.
  • For Task 2 (Prognosis), HPV status will not be available during testing. Please refer to the EHR files in each case subfolder for available features per task.
  • A summary of accessible clinical indicators will be available per phase and task when submission opens.

Participation policy:

Please note that methods using external pretrained weights on external data will not be eligible for prizes. This aligns with the policy stated on the website:

“Participants can use the training data in any way they wish for training the models. Using additional (public or not) data for training or unsupervised training on the test data is not eligible for prizes since we want to compare models trained on the same data with a held-out test set.”

All the best!