Hi, thank you for your questions.
Doubts Regarding the Images: Metadata of the images don't contain information regarding the following:
Base Magnification (Magnification of image at Level 0)
Pixel-to-mm ratio for image
Scanner used for scanning the image
Can these information be provided for the images?
- The images are provided with a maximum magnification of 0.25 microns per pixel.
- In order to reduce image size we removed intermediate resolutions, thus the image pyramids contain the following resolutions: 0.25, 1.0, 4.0, 16.0, .... micron per pixel
- These images were scanned with 3D-Histech Digital Scanner
Another question was around the problem definition. It's mentioned in the data section as "Your task is to estimate the time to biochemical recurrence (in years, as a continuous variable e.g. 1.23 years or 17.68 years).".
Does that mean we are only concerned with predicting time-to-event for the Biochemical Recurrence events only? If that is the case , since time-to-event for non-Biochemical recurrence events corresponds to the time of last follow up, should we also predict the time-to-event for non-Biochemical Recurrence events, or should those be discarded during inference?
Yes, your understanding is correct. The primary task is to estimate the time to biochemical recurrence (BCR) in years, which means we are specifically focused on predicting the time-to-event for BCR events only.
For patients who do not experience a BCR event, the time-to-event corresponds to the time of the last follow-up, and these instances are considered censored data. In survival analysis, censored data indicates that the event of interest (BCR in this case) has not occurred by the time of the last follow-up.
During inference, you should include all patients in your model, both those who have experienced BCR and those who have not (censored). However, the prediction should be focused on estimating the time to BCR. The non-BCR events (censored data) provide valuable information about the patients who have not experienced recurrence up to their last follow-up and help in accurately modeling the survival function.
So, to clarify:
-
Include both BCR and non-BCR (censored) data in your model training and inference.
-
The model should predict the time to BCR for all patients, understanding that for censored data, the prediction is an estimate beyond their last follow-up.