DataScientX (N. Debs, A. Routier, et al.; France) algorithm trained on PI-CAI: Private and Public Training Dataset
About
- Coronal T2 Prostate MRI (Coronal T2 MRI of the Prostate)
- Transverse T2 Prostate MRI (Transverse T2 MRI of the Prostate)
- Sagittal T2 Prostate MRI (Sagittal T2 MRI of the Prostate)
- Transverse HBV Prostate MRI (Transverse High B-Value Prostate MRI)
- Transverse ADC Prostate MRI (Transverse Apparent Diffusion Coefficient Prostate MRI)
- Clinical Information Prostate MRI (Clinical information to support clinically significant prostate cancer detection in prostate MRI. Provided information: patient age in years at the time of examination (patient_age), PSA level in ng/mL as reported (PSA_report or PSA), PSA density in ng/mL^2 as reported (PSAD_report), prostate volume as reported (prostate_volume_report), prostate volume derived from automatic whole-gland segmentation (prostate_volume_automatic), scanner manufacturer (scanner_manufacturer), scanner model name (scanner_model_name), diffusion b-value of (calculated) high b-value diffusion map (diffusion_high_bvalue), Malignant Neoplasm Histotype (histology_type), Prostate Imaging-Reporting and Data System (PIRADS), Neural invasion (neural_invasion, yes/no), Vascular invasion (vascular_invasion, yes/no), Lymphatic invasion (lymphatic_invasion, yes/no). Values acquired from radiology reports will be missing, if not reported.)
- Case-level Cancer Likelihood Prostate MRI (Case-level likelihood of harboring clinically significant prostate cancer, in range [0,1].)
- Transverse Cancer Detection Map Prostate MRI (Single-class, detection map of clinically significant prostate cancer lesions in 3D, where each voxel represents a floating point in range [0,1].)
Challenge Performance
Date | Challenge | Phase | Rank |
---|---|---|---|
Jan. 16, 2025 | PI-CAI | Closed Testing Phase - Tuning | 5 |
Jan. 16, 2025 | PI-CAI | Closed Testing Phase - Testing (Final Ranking) | 1 |
Model Facts
Summary
This algorithm represents the submission from the DataScientX team (N. Debs, A. Routier, et al.; France) to the PI-CAI challenge [1]. We independently retrained this algorithm using the PI-CAI Private and Public Training Dataset (9107 cases from 8028 patients, including a sequestered dataset of 7607 cases and the public dataset of 1500 cases). This algorithm will perform two tasks: localize and classify each lesion with clinically significant prostate cancer (if any) using a 0–100% likelihood score and classify the overall case using a 0–100% likelihood score for clinically significant prostate cancer diagnosis.
To this end, this model uses biparametric MRI data and several clinical variables associated with the examination. Specifically, this algorithm uses the axial T2-weighted MRI scan, the axial apparent diffusion coefficient map, and the calculated or acquired axial high b-value scan. This algorithm uses the prostate-specific antigen level normalized by the total volume of the gland as measured by the prostate segmentation method to inform their predictions.
- A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1
Mechanism
Team: Guerbet
Noëlie Debs (1,#), Alexandre Routier (1,#), Clément Abi-Nader (1), Arnaud Marcoux (1), François Nicolas (1), Alexandre Bône (1), Marc-Michel Rohé (1)
(1) Guerbet Research, Villepinte, France
(#) These authors contributed equally to this work
Contact: noelie.debs@guerbet.com, alexandre.routier@guerbet.com, marc-michel.rohe@guerbet.com.
Code availability: Private source code.
Abstract: We developed a method for automated 3D detection and diagnosis of clinically significant prostate cancer in bi-parametric MRI. The method involves four sequential steps, including training neural networks for prostate gland segmentation, cancer lesion segmentation, and box-based lesion detection. An ensembling method is then applied, combining probabilities from different models, including Prostate-Specific Antigen (PSA) density, and using 5-folds cross-validation for training.
Data preparation: Data used for the proposed method included bpMRI scans (T2w, high b-value DWI, ADC map), as well as PSA level variables. No registration was performed; only the DWI/ADC sequences were resampled into the T2w space. For annotation, we used both human and AI derived lesion masks, as well as prostate masks which is detailed afterwards.
Training setup: Our pipeline, depicted in Fig. A1, is decomposed in 4 steps. For each step, the corresponding model was trained using 5-fold cross-validation.
- nnU-Net for prostate gland segmentation. A baseline nnU-Net [1] was trained to segment the prostate gland. The network took as input 3 modalities (T2w, ADC, DWI), and the ground truth was AI-derived prostate masks. The training parameters were chosen by the nnU-Net heuristic. Among others, the chosen loss was a combination of Cross-Entropy (CE) and Dice.
- nnU-Net for lesion segmentation. Our objective was to create a network that tends to over-segment cancer lesions. We considered 3 modalities (T2w, ADC, DWI) as inputs, and the ground truth was lesion masks annotated by experts if available, otherwise masks from AI algorithm. Based on the prostate segmentation algorithm trained in 1) we cropped the images along the xy-axes, thus allowing the network to focus on the prostate while accelerating the training speed of the segmentation algorithm. Finally, we relied on a modified version of the nnU-Net segmenting both the prostate gland and lesions, in which we used a different loss function depending on the segmented structure. We used the common combination of Dice and CE to segment the prostate gland, and a combination of Tversky loss [2] and CE to segment the lesions and force the network to over-segment them. The β parameter of the Tversky loss was arbitrarily set to 0.7. In inference mode an ensemble prediction of these 5 models is used to provide a lesion map of vowel-wise softmax outputs. Raw output softmax maps were then post-processed into probabilistic lesion detection maps using the following approach: (i) softmax was uncropped back to native T2w space (ii) voxel-level probabilities ≤ 0.1 were set to 0; (iii) the resulting probabilistic map was decomposed into connected components, (iv) a single lesion-level probability was assigned to each connected component by averaging the corresponding voxel-level probabilities.
- nnDetection for lesion detection. A baseline nnDetection [3], based on Retina U-Net model [4], was trained to detect cancer lesion. The network took as input 3 modalities (T2w, ADC, DWI), and the ground truth was lesion masks annotated by experts if available, otherwise masks from AI algorithm. The nnDetection algorithm returned bounding box outputs around suspicious regions of the image. Only bounding boxes that matched spatially both outputs from nnU-Net lesions (with a minimum IoU of 0.1) and prostate gland mask (at least 10% within the prostate gland), were retained as output of the nnDetection model.
- Final ensembling. Final detection maps were created by combining the results of all pipeline components, as well as complementary biological information. The spatial localization of lesions was entirely determined by the lesion segmentation method detailed in 1). The corresponding probabilities were computed by ensembling: — the average softmax score from the lesion segmentation method detailed in 2), — the detection score from the lesion detection method detailed in 3), — the PSA measurement normalized by the total volume of the gland as measured by the prostate segmentation method detailed in 1). In case of missing PSA information, we performed imputation with the mean PSA value from the training dataset.
After the softmax averaging, box predictions and PSA were associated in trio, the corresponding lesion presence probabilities were combined with a logistic regression model.
Figure: Overview of the proposed ensembling method, using PSA density, voxel-based semantic segmentation, and box-based object detection model predictions.
Model parameters:
-
Total parameters for prostate gland segmentation with nnU-Net: 44,797,984 x5
-
Total parameters for lesion segmentation with nnU-Net: 30,560,704 x5
-
Total parameters for lesion detection with nnDetection: 24,720,004 x5
-
Total parameters for logistic regression for final ensembling: 9 (3 parameters per input)
Inference setup: During inference, first nnU-Net was able to produce a prostate gland segmentation. Secondly, input sequences (T2w, DWI and ADC map) were cropped around the prostate gland and were fed into nnU-Net for lesion segmentation. Raw softmax were then post-processed as described in ‘training setup’ section. Third, input images (T2w, DWI and ADC map) were tested using nnDetection. The model only allowed cropping if input images had a size higher than [81, 192, 192] mm. Below this size, no cropping was performed. If input image was cropped, then the model output was uncropped back into the native space. The detection algorithm returned scored bounding boxes around suspicious regions in the image. Final ensembling was performed using voxel-based prediction, box-based predictions, and PSA values to create a final detection map.
References:
-
F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. doi:10.1038/s41592-020-01008-z
-
S. S. M., Salehi, D. Erdogmus, and A. Gholipour, (2017, September). Tversky loss function for image segmentation using 3D fully convolutional deep networks. In International workshop on machine learning in medical imaging (pp. 379-387). Cham: Springer International Publishing. doi:10.1007/978-3-319-67389-9_44
-
M. Baumgartner, P. F. Jäger, F. Isensee, and K. H. Maier-Hein (2021). nnDetection: a self-configuring method for medical object detection. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24 (pp. 530-539). Springer International Publishing. doi:10.1007/978-3-030-87240-3_51
-
P. F. Jaeger, S. A. Kohl, S. Bickelhaupt, F. Isensee, T. A. Kuder, H. P. Schlemmer, and K. H. Maier-Hein (2020, April). Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection. In Machine Learning for Health Workshop (pp. 171-183). PMLR. https://proceedings.mlr.press/v116/jaeger20a
Validation and Performance
This algorithm was evaluated on the PI-CAI Testing Cohort. This hidden testing cohort included prostate MRI examinations from 1000 patients across four centers, including 197 cases from an external unseen center. Histopathology and a follow-up period of at least 3 years were used to establish the reference standard. See the PI-CAI paper for more information [1].
Patient-level diagnosis performance is evaluated using the Area Under Receiver Operating Characteristic (AUROC) metric. Lesion-level detection performance is evaluated using the Average Precision (AP) metric. Overall score used to rank each AI algorithm is the average of both task-specific metrics: Overall Ranking Score = (AP + AUROC) / 2.
This algorithm achieved an AUROC of 0.934, an AP of 0.693, and an Overal Ranking Score of 0.814.
Free-Response Receiver Operating Characteristic (FROC) curve is used for secondary analysis of AI detections (as recommended in Penzkofer et al., 2022). We highlight the performance on the FROC curve using the SensX metric. SensX refers to the sensitivity of a given AI system at detecting clinically significant prostate cancer (i.e., Gleason grade group ≥ 2 lesions) on MRI, given that it generates the same number of false positives per examination as the PI-RADS ≥ X operating point of radiologists. Here, by radiologists, we refer to the radiology readings that were historically made for these cases during multidisciplinary routine practice. Across the PI-CAI testing leaderboards (Open Development Phase - Testing Leaderboard, Closed Testing Phase - Testing Leaderboard), SensX is computed at thresholds that are specific to the testing cohort (i.e., depending on the radiology readings and set of cases).
This algorithm achieved a Sens3 of 0.799, a Sens4 of 0.779, and a Sens5 of 0.616.
Figure. Diagnostic performance of the top five AI algorithms (N. Debs et al. [Guerbet Research, France], Y. Yuan et al. [University of Sydney, Australia], H. Kan et al. [University of Science and Technology, China], C. Li et al. [Stanford University, United States] and , A. Karagöz et al. [Istanbul Technical University, Turkey]), and the AI system ensembled from all five methods, across the 400 cases used in the reader study (left column) and the full hidden testing cohort of 1000 cases (right column). Case-level diagnosis performance was evaluated using receiver operating characteristic curves and the AUROC metric (top row), while lesion-level detection performance was evaluated using precision-recall curves and the AP metric (middle row). Secondary analysis of lesion-level detection performance was conducted using FROC curves (bottom row).
- A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1
Uses and Directions
-
For research use only. This algorithm is intended to be used only on biparametric prostate MRI examinations of patients with raised PSA levels or clinical suspicion of prostate cancer. This algorithm should not be used in different patient demographics.
-
Benefits: AI-based risk stratification for clinically significant prostate cancer using prostate MRI can potentially aid the diagnostic pathway of prostate cancer, reducing over-treatment and unnecessary biopsies.
-
Target population: This algorithm was trained on patients with raised PSA levels or clinical suspicion of prostate cancer, without prior treatment (e.g. radiotherapy, transurethral resection of the prostate (TURP), transurethral ultrasound ablation (TULSA), cryoablation, etc.), without prior positive biopsies, without artifacts, and with reasonably-well aligned sequences.
-
MRI scanner: This algorithm was trained and evaluated exclusively on prostate biparametric MRI scans acquired with various commercial 1.5 Tesla or 3 Tesla scanners using surface coils from Siemens Healthineers, Erlangen, Germany or Philips Medical Systems, Eindhoven, Netherland. It does not account for vendor-neutral properties or domain adaptation, and in turn, the compatibility with scans derived using any other MRI scanner or those using endorectal coils is unkown.
-
Sequence alignment and position of the prostate: While the input images (T2W, HBV, ADC) can be of different spatial resolutions, the algorithm assumes that they are co-registered or aligned reasonably well.
-
General use: This model is intended to be used by radiologists for predicting clinically significant prostate cancer in biparametric MRI examinations. The model is not a diagnostic for cancer and is not meant to guide or drive clinical care. This model is intended to complement other pieces of patient information in order to determine the appropriate follow-up recommendation.
-
Appropriate decision support: The model identifies lesion X as at a high risk of being malignant. The referring radiologist reviews the prediction along with other clinical information and decides the appropriate follow-up recommendation for the patient.
-
Before using this model: Test the model retrospectively and prospectively on a diagnostic cohort that reflects the target population that the model will be used upon to confirm the validity of the model within a local setting.
-
Safety and efficacy evaluation: To be determined in a clinical validation study.
Warnings
-
Risks: Even if used appropriately, clinicians using this model can misdiagnose cancer. Delays in cancer diagnosis can lead to metastasis and mortality. Patients who are incorrectly treated for cancer can be exposed to risks associated with unnecessary interventions and treatment costs related to follow-ups.
-
Inappropriate Settings: This model was not trained on MRI examinations of patients with prior treatment (e.g. radiotherapy, transurethral resection of the prostate (TURP), transurethral ultrasound ablation (TULSA), cryoablation, etc.), prior positive biopsies, artifacts or misalignment between sequences. Hence it is susceptible to faulty predictions and unintended behaviour when presented with such cases. Do not use the model in the clinic without further evaluation.
-
Clinical rationale: The model is not interpretable and does not provide a rationale for high risk scores. Clinical end users are expected to place the model output in context with other clinical information to make the final determination of diagnosis.
-
Inappropriate decision support: This model may not be accurate outside of the target population. This model is not designed to guide clinical diagnosis and treatment for prostate cancer.
-
Generalizability: This model was developed with prostate MRI examinations from Radboud University Medical Center, Ziekenhuisgroep Twente, and Prostaat Centrum Noord-Nederland. Do not use this model in an external setting without further evaluation.
-
Discontinue use if: Clinical staff raise concerns about the utility of the model for the intended use case or large, systematic changes occur at the data level that necessitate re-training of the model.
Common Error Messages
The inference progress for this algorithm is incorrectly captured as "error messages", resulting in each case being predicted with "Succeeded, with warnings". This does not mean actual warnings occurred, although this can still be the case.
Error message | Solution |
---|---|
Found different values for fold, will overwrite 0 with -1 | This warning can be ignored |
Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3