PIMed-Stanford (X. Li, S. Vesal, S. Saunders, et al.; USA) algorithm trained on PI-CAI: Private and Public Training Dataset


Logo for PIMed-Stanford (X. Li, S. Vesal, S. Saunders, et al.; USA) algorithm trained on PI-CAI: Private and Public Training Dataset

About

Creators:
Image Version:
77167cfd-03f6-41d0-aa80-109815e5767d
Last updated:
June 8, 2023, 7:49 a.m.
Inputs:
  • Coronal T2 Prostate MRI  (Coronal T2 MRI of the Prostate)
  • Transverse T2 Prostate MRI  (Transverse T2 MRI of the Prostate)
  • Sagittal T2 Prostate MRI  (Sagittal T2 MRI of the Prostate)
  • Transverse HBV Prostate MRI  (Transverse High B-Value Prostate MRI)
  • Transverse ADC Prostate MRI  (Transverse Apparent Diffusion Coefficient Prostate MRI)
  • Clinical Information Prostate MRI  (Clinical information to support clinically significant prostate cancer detection in prostate MRI. Provided information: patient age in years at the time of examination (patient_age), PSA level in ng/mL as reported (PSA_report or PSA), PSA density in ng/mL^2 as reported (PSAD_report), prostate volume as reported (prostate_volume_report), prostate volume derived from automatic whole-gland segmentation (prostate_volume_automatic), scanner manufacturer (scanner_manufacturer), scanner model name (scanner_model_name), diffusion b-value of (calculated) high b-value diffusion map (diffusion_high_bvalue), Malignant Neoplasm Histotype (histology_type), Prostate Imaging-Reporting and Data System (PIRADS), Neural invasion (neural_invasion, yes/no), Vascular invasion (vascular_invasion, yes/no), Lymphatic invasion (lymphatic_invasion, yes/no). Values acquired from radiology reports will be missing, if not reported.)
Outputs:
  • Case-level Cancer Likelihood Prostate MRI  (Case-level likelihood of harboring clinically significant prostate cancer, in range [0,1].)
  • Transverse Cancer Detection Map Prostate MRI  (Single-class, detection map of clinically significant prostate cancer lesions in 3D, where each voxel represents a floating point in range [0,1].)

Challenge Performance

Date Challenge Phase Rank
Sept. 7, 2023 PI-CAI Closed Testing Phase - Testing (Final Ranking) 4
June 13, 2024 PI-CAI Closed Testing Phase - Tuning 7

Model Facts

Summary

This algorithm represents the submission from the PIMed team (X. Li, S. Vesal, S. Saunders, et al.; USA) to the PI-CAI challenge [1]. We independently retrained this algorithm using the PI-CAI Private and Public Training Dataset (9107 cases from 8028 patients, including a sequestered dataset of 7607 cases and the public dataset of 1500 cases). This algorithm will perform two tasks: localize and classify each lesion with clinically significant prostate cancer (if any) using a 0–100% likelihood score and classify the overall case using a 0–100% likelihood score for clinically significant prostate cancer diagnosis.

To this end, this model uses biparametric MRI data. Specifically, this algorithm uses the axial T2-weighted MRI scan, the axial apparent diffusion coefficient map, and the calculated or acquired axial high b-value scan.

  1. A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1

Mechanism

Team: PiMED Stanford

Cynthia Xinran Li (1,3,#), Sulaiman Vesal (1,2,#), Sara Saunders (1,2,#), Simon John Christoph Soerensen (1,2), Hassan Jahanandish (1,2), Stefania Moroianu (1), Indrani Bhattacharya (1,2), Richard E. Fan (2), Geoffrey A. Sonn (2), Mirabela Rusu (1)

(1) Department of Radiology, Stanford University, Stanford CA 94305, United States

(2) Department of Urology, Stanford University, Stanford CA 94305, United States

(3) Institute for Computational and Mathematical Engineering, Stanford CA 94305, United States

(#) These authors contributed equally to this work

Contact: svesal@stanford.edu, mirabela.rusu@stanford.edu.

Code availability: Private source code.

Trained model availability: grand-challenge.org/algorithms/pi-cai-pubpriv-pimed/

Abstract: We developed an algorithm to detect and localize clinically significant prostate cancer using bi-parametric magnetic resonance images (bpMRI), which are commonly employed in prostate cancer detection, biopsy guidance, and treatment planning [1]. Our approach is an ensemble of three networks, a 3D-UNet [3], and two networks based on holistically nested edge detection (referred here as the Stanford Prostate Cancer Network or SPCNet). The two SPCNet variant architectures have been improved relative to our prior study [4] to reduce false positive lesion detections in a context in which only one third of the subjects where in the positive class with clinically significant prostate cancer (csPCa, Gleason Grade Group >= 2). SPCNet-Decision features a prostate cancer detection head that evaluates the presence of cancer in individual MRI slices, while SPCNet-Clinical incorporates a trinary classification head that categorizes each slice in three classes: normal, containing clinically insignificant prostate cancer (ciPCa, Gleason Grade Group = 1), or csPCa. The approach enhances the accuracy and reliability of prostate cancer detection, leveraging the strengths of each component to provide a comprehensive solution for improved clinical outcomes.

Data preparation: The following preprocessing steps were applied to the bpMRI:

  • Resampling to 0.5mm×0.5mm×3.0mm/voxel [5].
  • Cropping around the center of the bpMRI to 20 slices of 256 × 256 pixels [5].
  • Using z-score normalization of the three MRI sequences (T2, ADC, DWI) independently using intensities within the prostate for the subjects in the PI-CAI: Public Training and Development Dataset [5].
  • Segmenting the prostate using our in-house model, ProGNet [6].

Training setup: Our approach ensembles three models, the SPCNet-Decision, SPCNet-Clinical and a 3D-UNet (Fig. A2). The SPCNet architectures are 2.5D models that take three consecutive slices from each of the three MRI sequences T2w, ADC, and DWI [4]. The architecture has multiple outputs at various image scales, which are then upsampled and fused to form the final output. To train ProGNet we used the provided AI-generated prostate segmentations (https://grand-challenge.org/algorithms/prostate-segmentation/).

To reduce the false-positive predictions of SPCNet we included a new classification head to form a multi-task optimization problem. The classification head in the SPCNet-Decision predicts whether a slice contains clinically significant prostate cancer (binary task) [2]. The classification head applies a 1x1 convolution layer and fully connected layer to the probability maps generated by SPCNet and determines if cancer is present in that slice. SPCNet-Decision outperforms the vanilla SPCNet model both in patient-level area under receiver operating characteristic curve (AUROC) and lesion-level average precision (AP).

We also addressed the challenge of distinguishing ciPCA from csPCa. We developed SPCNet-Clinical by using a trinary Classification head to predict whether each slice contains only normal tissue, ciPCa, or csPCa. Since the Gleason Grade groups are only available at patient level and not at slice level, we generate slice-level ground truth label to supervise the classification head as follows: label 2 for patients with csPCa for the slices with any cancer annotation, label 1 for patients with ciPCa for the slices with cancer annotation and label 0 for all other slices. SPCNet-Clinical slightly outperformed SPCNet-Decision on AUROC and AP.

We trained all variations of the SPCNet model with the Adam optimizer and a learning rate of 0.005, using weighted-cross-entropy loss computed within the prostate on the voxel-level prediction and cross-entropy on the classification head predictions. The prostate segmentation was used to compute the gradient only within the prostate boundaries. All models were trained for 25 epochs during Open Development Phase and 50 epochs during Closed Testing Phase. The epoch with best validation performance is used for inference.

We trained several baseline 3D models locally, but the standard 3D-UNet with residual connections outperformed other models and was included in our approach. This network includes 5 levels and was trained using a multi-class Dice and cross-entropy loss using Adam optimizer with an initial learning rate of 0.001. The models were trained for 200 epochs and the learning rate was adjusted using the Cosine Annealing scheduler.

Model parameters:

  • Total number of parameters for SPCNet-Decision: 74,777,651 x5

  • Total number of parameters for SPCNet-Clinical: 74,777,651 x5

  • Total number of parameters for 3D-UNet: 47,991,551 x5

Figure: Detection strategy: combine SPCNet-Decision, SPCNet-Clinical, and 3D-UNet. The SPCNet-Decision is an holistically nested edge detection network combined with a slice based binary classification head (csPCa vs. normal+ciPCa). SPCNet-Clinical has a similar architecture, with a trinary classification head (csPCa, ciPCa, normal). The 3D-UNet takes as input a 3-channel 3D MRI. At inference time, the probability maps output by all three models are averaged, to produce a pixel-level probability of csPCa.

Inference setup: For inference, we first preprocessed the test cases as done for the training data. Next, we combined the prediction probability maps from the three models, SPCNet-Decision, SPCNet-Clinical, and 3D-UNet using voxel level averaging across five folds for each model. For post-processing, we employed the prostate gland segmentation from ProGNet [6] to restrict predictions to those within the prostate region.

Acknowledgements: This work was supported by the Department of Radiology and the Department of Urology at Stanford University, National Institutes of Health, National Cancer Institute (R37CA260346). Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R37CA260346. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References:

  1. Bhattacharya, I., Khandwala, Y.S., Vesal, S., Shao, W., Yang, Q., Soerensen, S.J., Fan, R.E., Ghanouni, P., Kunder, C.A., Brooks, J.D., Hu, Y., Rusu, M., Sonn, G.A.: A review of artificial intelligence in prostate cancer detection on imaging. Therapeutic Advances in Urology 14, 17562872221128791 (2022). doi:10.1177/175628722-21128791

  2. Li, C.X., Bhattacharya, I., Vesal, S., Saunders, S., Soerensen, S.J., Fan, R.E., Sonn, G.A., Rusu, M.: Improving Automated Prostate Cancer Detection and Classification Accuracy with Multi-Scale Cancer Information. MICCAI MLMI workshop (2023). In Press. doi:10.1007/978-3-031-45673-2_34

  3. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. pp. 234–241. Springer International Publishing, Cham (2015) doi:10.1007/978-3-319-24574-4_28

  4. Seetharaman, A., Bhattacharya, I., Chen, L.C., Kunder, C.A., Shao, W., Soerensen, S.J.C., Wang, J.B., Teslovich, N.C., Fan, R.E., Ghanouni, P., Brooks, J.D., Too, K.J., Sonn, G.A., Rusu, M.: Automated detection of aggressive and indolent prostate cancer on magnetic resonance imaging. Medical Physics n/a(n/a) (2020). doi:10.1002/mp.14855

  5. A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1

  6. Soerensen, S.J.C., Fan, R.E., Seetharaman, A., Chen, L., Shao, W., Bhattacharya, I., Kim, Y.h., Sood, R., Borre, M., Chung, B.I., Sonn, G.A., Rusu, M.: Deep learning improves speed and accuracy of prostate gland segmentations on mri for targeted biopsy. The Journal of Urology pp. 10–1097 (2021) doi:10.1097/JU.0000000000001783

Validation and Performance

This algorithm was evaluated on the PI-CAI Testing Cohort. This hidden testing cohort included prostate MRI examinations from 1000 patients across four centers, including 197 cases from an external unseen center. Histopathology and a follow-up period of at least 3 years were used to establish the reference standard. See the PI-CAI paper for more information [1].

Patient-level diagnosis performance is evaluated using the Area Under Receiver Operating Characteristic (AUROC) metric. Lesion-level detection performance is evaluated using the Average Precision (AP) metric. Overall score used to rank each AI algorithm is the average of both task-specific metrics: Overall Ranking Score = (AP + AUROC) / 2.

This algorithm achieved an AUROC of 0.900, an AP of 0.659 and an Overal Ranking Score of 0.779.

Free-Response Receiver Operating Characteristic (FROC) curve is used for secondary analysis of AI detections (as recommended in Penzkofer et al., 2022). We highlight the performance on the FROC curve using the SensX metric. SensX refers to the sensitivity of a given AI system at detecting clinically significant prostate cancer (i.e., Gleason grade group ≥ 2 lesions) on MRI, given that it generates the same number of false positives per examination as the PI-RADS ≥ X operating point of radiologists. Here, by radiologists, we refer to the radiology readings that were historically made for these cases during multidisciplinary routine practice. Across the PI-CAI testing leaderboards (Open Development Phase - Testing Leaderboard, Closed Testing Phase - Testing Leaderboard), SensX is computed at thresholds that are specific to the testing cohort (i.e., depending on the radiology readings and set of cases).

This algorithm achieved a Sens3 of 0.776, a Sens4 of 0.746, and a Sens5 of 0.543.

Figure. Diagnostic performance of the top five AI algorithms (N. Debs et al. [Guerbet Research, France], Y. Yuan et al. [University of Sydney, Australia], H. Kan et al. [University of Science and Technology, China], C. Li et al. [Stanford University, United States] and , A. Karagöz et al. [Istanbul Technical University, Turkey]), and the AI system ensembled from all five methods, across the 400 cases used in the reader study (left column) and the full hidden testing cohort of 1000 cases (right column). Case-level diagnosis performance was evaluated using receiver operating characteristic curves and the AUROC metric (top row), while lesion-level detection performance was evaluated using precision-recall curves and the AP metric (middle row). Secondary analysis of lesion-level detection performance was conducted using FROC curves (bottom row).

  1. A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1

Uses and Directions

  • For research use only. This algorithm is intended to be used only on biparametric prostate MRI examinations of patients with raised PSA levels or clinical suspicion of prostate cancer. This algorithm should not be used in different patient demographics.

  • Benefits: AI-based risk stratification for clinically significant prostate cancer using prostate MRI can potentially aid the diagnostic pathway of prostate cancer, reducing over-treatment and unnecessary biopsies.

  • Target population: This algorithm was trained on patients with raised PSA levels or clinical suspicion of prostate cancer, without prior treatment (e.g. radiotherapy, transurethral resection of the prostate (TURP), transurethral ultrasound ablation (TULSA), cryoablation, etc.), without prior positive biopsies, without artifacts, and with reasonably-well aligned sequences.

  • MRI scanner: This algorithm was trained and evaluated exclusively on prostate biparametric MRI scans acquired with various commercial 1.5 Tesla or 3 Tesla scanners using surface coils from Siemens Healthineers, Erlangen, Germany or Philips Medical Systems, Eindhoven, Netherland. It does not account for vendor-neutral properties or domain adaptation, and in turn, the compatibility with scans derived using any other MRI scanner or those using endorectal coils is unkown.

  • Sequence alignment and position of the prostate: While the input images (T2W, HBV, ADC) can be of different spatial resolutions, the algorithm assumes that they are co-registered or aligned reasonably well.

  • General use: This model is intended to be used by radiologists for predicting clinically significant prostate cancer in biparametric MRI examinations. The model is not a diagnostic for cancer and is not meant to guide or drive clinical care. This model is intended to complement other pieces of patient information in order to determine the appropriate follow-up recommendation.

  • Appropriate decision support: The model identifies lesion X as at a high risk of being malignant. The referring radiologist reviews the prediction along with other clinical information and decides the appropriate follow-up recommendation for the patient.

  • Before using this model: Test the model retrospectively and prospectively on a diagnostic cohort that reflects the target population that the model will be used upon to confirm the validity of the model within a local setting.

  • Safety and efficacy evaluation: To be determined in a clinical validation study.

Warnings

  • Risks: Even if used appropriately, clinicians using this model can misdiagnose cancer. Delays in cancer diagnosis can lead to metastasis and mortality. Patients who are incorrectly treated for cancer can be exposed to risks associated with unnecessary interventions and treatment costs related to follow-ups.

  • Inappropriate Settings: This model was not trained on MRI examinations of patients with prior treatment (e.g. radiotherapy, transurethral resection of the prostate (TURP), transurethral ultrasound ablation (TULSA), cryoablation, etc.), prior positive biopsies, artifacts or misalignment between sequences. Hence it is susceptible to faulty predictions and unintended behaviour when presented with such cases. Do not use the model in the clinic without further evaluation.

  • Clinical rationale: The model is not interpretable and does not provide a rationale for high risk scores. Clinical end users are expected to place the model output in context with other clinical information to make the final determination of diagnosis.

  • Inappropriate decision support: This model may not be accurate outside of the target population. This model is not designed to guide clinical diagnosis and treatment for prostate cancer.

  • Generalizability: This model was developed with prostate MRI examinations from Radboud University Medical Center, Ziekenhuisgroep Twente, and Prostaat Centrum Noord-Nederland. Do not use this model in an external setting without further evaluation.

  • Discontinue use if: Clinical staff raise concerns about the utility of the model for the intended use case or large, systematic changes occur at the data level that necessitate re-training of the model.

Common Error Messages

Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3