Clarification on Provided Data for Validation and Test Sets (WSI Embeddings)

Clarification on Provided Data for Validation and Test Sets (WSI Embeddings)  

  By: LukeBe on June 26, 2025, 2:35 p.m.

Hi there,

I would like to clarify whether the validation and test datasets include precomputed WSI embeddings as the training data, or if only the raw WSIs (along with the corresponding masks) and clinical data are provided.

Thank you in advance!

Best regards, Luke

Re: Clarification on Provided Data for Validation and Test Sets (WSI Embeddings)  

  By: Rospaans on June 27, 2025, 9:40 a.m.

Hi Luke,

Good question!

We provided the feature embeddings to the training set as additional bonus data. You are not obligated to use these embeddings during training. Therefore, the validation and test phases will not include the feature embeddings, but the raw WSI's + tissue masks. The features were calculated using the UNI model from MAHMOOD lab. More specifically, we used the slide2vec repository to perform the feature extraction: https://github.com/clemsgrs/slide2vec

If you go to a submission page (e.g. : https://chimera.grand-challenge.org/evaluation/brs-prediction-final-debug/submissions/create/) you can spot which type of data the algorithm is fed. Per item if you click on the blue "i" you can see how it is loaded.

Good to know: for task1 there are multiple possible inputs. 10 to be precise. The only difference is the number of H&E slides per interfrace. (interf0 = 1 tif file + mask, interf1 = 2 tif files + mask, etc)

Hope this helps! Good luck with training!

Kind regards,

On behalf of the CHIMERA Team