BDAV_Y (Y. Yuan, et al.; Australia) algorithm trained on PI-CAI: Private and Public Training Dataset

About

Editors:

joeran.bosma

anindo

Contact email:

joeran.bosma@radboudumc.nl

Image Version:

67bce955-f1bc-4e22-9697-8c0120172063 — May 24, 2023

Summary

This algorithm represents the submission from the BDAV_Y team (Y. Yuan, et al.; Australia) to the PI-CAI challenge [1]. We independently retrained this algorithm using the PI-CAI Private and Public Training Dataset (9107 cases from 8028 patients, including a sequestered dataset of 7607 cases and the public dataset of 1500 cases). This algorithm will perform two tasks: localize and classify each lesion with clinically significant prostate cancer (if any) using a 0–100% likelihood score and classify the overall case using a 0–100% likelihood score for clinically significant prostate cancer diagnosis.

To this end, this model uses biparametric MRI data. Specifically, this algorithm uses the axial T2-weighted MRI scan, the axial apparent diffusion coefficient map, and the calculated or acquired axial high b-value scan.

A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1

Mechanism

Team: BDAV_Y

Yuan Yuan (1), Euijoon Ahn (2), Dagan Feng (1,3), Mohamed Khadra (4), Jinman Kim (1)

(1) School of Computer Science, Faculty of Engineering, The University of Sydney, New South Wales, Australia

(2) College of Science & Engineering, James Cook University, Cairns, Australia

(3) Institute of Translational Medicine, Shanghai Jiao Tong University, Shanghai, China

(4) Department of Urology, Nepean Hospital, Kingswood, Australia

Contact: yyua9990@uni.sydney.edu.au, euijoon.ahn@jcu.edu.au, dagan.feng@sydney.edu.au, mohamed.khadra@health.nsw.gov.au, jinman.kim@sydney.edu.au.

Code availability: Private source code.

Trained model availability: grand-challenge.org/algorithms/pi-cai-pubpriv-bday/

Abstract: We propose a deep learning framework designed to automate the detection of clinically significant prostate cancer (csPCa) in biparametric magnetic resonance imaging (bpMRI) scans. Our framework comprises of 1) a mesh network that integrates 2D and 3D feature representations, 2) a self-supervised pre-training scheme and, 3) the use of anatomical prior (zonal) information. Our results show that our framework had better performances than other baseline deep learning approaches.

Data preparation: We conducted pre-processing on the bpMRI scans and converted them to the nnU-Net Raw Data Archive format, adhering to the official pre-processing technique recommended by the organizers of the PI-CAI challenge (github.com/DIAGNijmegen/picai_prep). The images were then resampled to the spacing of 3 mm × 0.5 mm × 0.5 mm.

Zonal mask generation. We generated prostate zonal segmentation masks (central gland (CG) and peripheral zone (PZ)) for the cases within the PI-CAI dataset, employing a 3D nnU-Net model [1]. This model was trained using T2-weighted (T2W) and apparent diffusion coefficient (ADC) images, along with manual labels. These data sources were derived from three publicly available datasets: the Prostate158 dataset [2], the ProstateX dataset [3], and the MSD prostate dataset [4]. In total, the training dataset consisted of 394 cases. We then post-processed the generated zonal masks to remove the noisy markers outside the prostate region.
Self-supervised pre-training. We cropped the images and masks with the region of the prostate’s bounding box (based on the generated zonal mask) expanding 2.5 cm outward in all directions as the region of interest (ROI). We randomly cropped 12 sub-volumes from different locations of each ROI to obtain the input 3D cubes sized 16×64×64.

Training setup: We employed the mesh network architecture detailed in [5] as the foundational backbone of our framework. First, an image restoration pretext task for self-supervised learning was used to pre-train the network. Four data transformation techniques were used in the image restoration task, including non-linear transformation, local shuffling, inner-cutout and outer-cutout. The detailed descriptions of data transformation methods are available in [6]. We used the mean-square-error (MSE) loss for model optimization and adopted the stochastic gradient descent (SGD) optimizer with a momentum of 0.9. A batch size of 24 was employed for the pre-training, while the initial learning rate of 0.1 was gradually decreased in line with the step learning rate policy. A convergence criterion was established, whereby training would halt if no decrement in the loss function was observed within 20 epochs. After the pre-training is completed, the network was then fine-tuned using the annotated data for the csPCa detection and diagnosis. The data normalization and augmentation adhere to the identical configurations as specified in nnU-Net [1]. Here, we adopted a composite loss function that combines focal loss with cross-entropy loss for model optimization. The two loss terms were then averaged. We adhered to the five-fold cross-validation partitioning stipulated by the PI-CAI challenge organizers. During training, the learning rate initialization at 0.01 gradually diminished in accordance with the “poly” learning rate policy, culminating at a maximum of 500 epochs. The training utilized a patch size of 16 × 320 × 320 voxels, accompanied by a batch size of 2. Our framework was implemented on an NVIDIA GeForce RTX 3090 GPU, and its execution was facilitated through the PyTorch framework.

Model parameters:

Total number of parameters for nnU-Net: 44,798,816
Total number of parameters for SSL pre-training: 8,781,024
Total number of parameters for fine-tuning: 8,780,608 x5

Inference setup: During the inference mode, the images were pre-processed using the identical settings as in the training mode. We cropped the images and masks with the region of the prostate’s bounding box (based on the generated zonal mask), expanding 2.5 cm outward in all directions as the ROI for prediction. We generated detection maps from the SoftMax predictions using a dynamic lesion extraction methodology, as expounded in [7]. Subsequently, the highest pixel value within the detection map is employed as the indication of patient-level diagnosis probability. An ensemble prediction methodology is employed to amalgamate the outputs of five distinct models generated from the five-fold cross-validation.

Acknowledgements: We would like to acknowledge the PI-CAI challenge organizers and sponsors for developing the challenge, offering the data and evaluating the algorithms comprehensively.

References:

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. doi:10.1038/s41592-020-01008-z
L. C. Adams, M. R. Makowski, G. Engel, M. Rattunde, F. Busch, P. Asbach, S. M. Niehues, S. Vinayahalingam, B. van Ginneken, G. Litjens, et al., “Prostate158-an expert-annotated 3t mri dataset and algorithm for prostate cancer detection,” Computers in Biology and Medicine, vol. 148, p. 105817, 2022. doi:10.1016/j.compbiomed.2022.105817
S. G. Armato III, H. Huisman, K. Drukker, L. Hadjiiski, J. S. Kirby, N. Petrick, G. Redmond, M. L. Giger, K. Cha, A. Mamonov, et al., “Prostatex challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images,” Journal of Medical Imaging, vol. 5, no. 4, pp. 044501–044501, 2018. doi:10.1117/1.JMI.5.4.044501
M. Antonelli, A. Reinke, S. Bakas, K. Farahani, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, et al., “The medical segmentation decathlon,” Nature communications, vol. 13, no. 1, p. 4128, 2022. doi:10.1038/s41467-022-30695-9
Z. Dong, Y. He, X. Qi, Y. Chen, H. Shu, J.-L. Coatrieux, G. Yang, and S. Li, “Mnet: Rethinking 2d/3d networks for anisotropic medical image segmentation,” arXiv preprint arXiv:2205.04846, 2022.
Z. Zhou, V. Sodha, J. Pang, M. B. Gotway, and J. Liang, “Models genesis,” Medical image analysis, vol. 67, p. 101840, 2021. doi:10.1016/j.media.2020.101840
J. S. Bosma, A. Saha, M. Hosseinzadeh, I. Slootweg, M. de Rooij, and H. Huisman, “Semisupervised learning with report-guided pseudo labels for deep learning-based prostate cancer detection using biparametric mri,” Radiology: Artificial Intelligence, p. e230031, 2023. doi:10.1148/ryai.230031

Interfaces

This algorithm implements all of the following input-output combinations:

Inputs Outputs

	Inputs	Outputs
1	Coronal T2 Prostate MRI Description Coronal T2 MRI of the Prostate Kind Image Read from `/input/images/coronal-t2-prostate-mri/<uuid>.mha` or `/input/images/coronal-t2-prostate-mri/<uuid>.tif` Coronal T2 Prostate MRI Transverse T2 Prostate MRI Description Transverse T2 MRI of the Prostate Kind Image Read from `/input/images/transverse-t2-prostate-mri/<uuid>.mha` or `/input/images/transverse-t2-prostate-mri/<uuid>.tif` Transverse T2 Prostate MRI Sagittal T2 Prostate MRI Description Sagittal T2 MRI of the Prostate Kind Image Read from `/input/images/sagittal-t2-prostate-mri/<uuid>.mha` or `/input/images/sagittal-t2-prostate-mri/<uuid>.tif` Sagittal T2 Prostate MRI Transverse HBV Prostate MRI Description Transverse High B-Value Prostate MRI Kind Image Read from `/input/images/transverse-hbv-prostate-mri/<uuid>.mha` or `/input/images/transverse-hbv-prostate-mri/<uuid>.tif` Transverse HBV Prostate MRI Transverse ADC Prostate MRI Description Transverse Apparent Diffusion Coefficient Prostate MRI Kind Image Read from `/input/images/transverse-adc-prostate-mri/<uuid>.mha` or `/input/images/transverse-adc-prostate-mri/<uuid>.tif` Transverse ADC Prostate MRI Clinical Information Prostate MRI Description Clinical information to support clinically significant prostate cancer detection in prostate MRI. Provided information: patient age in years at the time of examination (patient_age), PSA level in ng/mL as reported (PSA_report or PSA), PSA density in ng/mL^2 as reported (PSAD_report), prostate volume as reported (prostate_volume_report), prostate volume derived from automatic whole-gland segmentation (prostate_volume_automatic), scanner manufacturer (scanner_manufacturer), scanner model name (scanner_model_name), diffusion b-value of (calculated) high b-value diffusion map (diffusion_high_bvalue), Malignant Neoplasm Histotype (histology_type), Prostate Imaging-Reporting and Data System (PIRADS), Neural invasion (neural_invasion, yes/no), Vascular invasion (vascular_invasion, yes/no), Lymphatic invasion (lymphatic_invasion, yes/no). Values acquired from radiology reports will be missing, if not reported. Kind Anything Read from `/input/clinical-information-prostate-mri.json` Example { "PSA_report": 9.3, "PSAD_report": 0.465, "patient_age": 76, "scanner_model_name": "Ingenia", "scanner_manufacturer": "Philips Medical Systems", "diffusion_high_bvalue": 2000, "prostate_volume_report": 20.0 } Clinical Information Prostate MRI	Case-level Cancer Likelihood Prostate MRI Description Case-level likelihood of harboring clinically significant prostate cancer, in range [0,1]. Kind Float Write to `/output/cspca-case-level-likelihood.json` Example 42.0 Case-level Cancer Likelihood Prostate MRI Transverse Cancer Detection Map Prostate MRI Description Single-class, detection map of clinically significant prostate cancer lesions in 3D, where each voxel represents a floating point in range [0,1]. Kind Heat Map Write to `/output/images/cspca-detection-map/<uuid>.mha` or `/output/images/cspca-detection-map/<uuid>.tif` Transverse Cancer Detection Map Prostate MRI

Validation and Performance

This algorithm was evaluated on the PI-CAI Testing Cohort. This hidden testing cohort included prostate MRI examinations from 1000 patients across four centers, including 197 cases from an external unseen center. Histopathology and a follow-up period of at least 3 years were used to establish the reference standard. See the PI-CAI paper for more information [1].

Patient-level diagnosis performance is evaluated using the Area Under Receiver Operating Characteristic (AUROC) metric. Lesion-level detection performance is evaluated using the Average Precision (AP) metric. Overall score used to rank each AI algorithm is the average of both task-specific metrics: Overall Ranking Score = (AP + AUROC) / 2.

This algorithm achieved an AUROC of 0.909, an AP of 0.690, and an Overal Ranking Score of 0.800.

Free-Response Receiver Operating Characteristic (FROC) curve is used for secondary analysis of AI detections (as recommended in Penzkofer et al., 2022). We highlight the performance on the FROC curve using the SensX metric. SensX refers to the sensitivity of a given AI system at detecting clinically significant prostate cancer (i.e., Gleason grade group ≥ 2 lesions) on MRI, given that it generates the same number of false positives per examination as the PI-RADS ≥ X operating point of radiologists. Here, by radiologists, we refer to the radiology readings that were historically made for these cases during multidisciplinary routine practice. Across the PI-CAI testing leaderboards (Open Development Phase - Testing Leaderboard, Closed Testing Phase - Testing Leaderboard), SensX is computed at thresholds that are specific to the testing cohort (i.e., depending on the radiology readings and set of cases).

This algorithm achieved a Sens3 of 0.781, a Sens4 of 0.754, and a Sens5 of 0.568.

Figure. Diagnostic performance of the top five AI algorithms (N. Debs et al. [Guerbet Research, France], Y. Yuan et al. [University of Sydney, Australia], H. Kan et al. [University of Science and Technology, China], C. Li et al. [Stanford University, United States] and , A. Karagöz et al. [Istanbul Technical University, Turkey]), and the AI system ensembled from all five methods, across the 400 cases used in the reader study (left column) and the full hidden testing cohort of 1000 cases (right column). Case-level diagnosis performance was evaluated using receiver operating characteristic curves and the AUROC metric (top row), while lesion-level detection performance was evaluated using precision-recall curves and the AP metric (middle row). Secondary analysis of lesion-level detection performance was conducted using FROC curves (bottom row).

A. Saha, J. S. Bosma, J. J. Twilt, B. van Ginneken, A. Bjartell, A. R. Padhani, D. Bonekamp, G. Villeirs, G. Salomon, G. Giannarini, J. Kalpathy-Cramer, J. Barentsz, K. H. Maier-Hein, M. Rusu, O. Rouvière, R. van den Bergh, V. Panebianco, V. Kasivisvanathan, N. A. Obuchowski, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, and the PI-CAI consortium. “Artificial Intelligence and Radiologists in Prostate Cancer Detection on MRI (PI-CAI): An International, Paired, Non-Inferiority, Confirmatory Study”. The Lancet Oncology 2024; 25(7): 879-887. doi:10.1016/S1470-2045(24)00220-1

Challenge Performance

Date	Challenge	Phase	Rank
Jan. 16, 2025	PI-CAI	Closed Testing Phase - Tuning	6
Jan. 16, 2025	PI-CAI	Closed Testing Phase - Testing (Final Ranking)	4

Uses and Directions

For research use only. This algorithm is intended to be used only on biparametric prostate MRI examinations of patients with raised PSA levels or clinical suspicion of prostate cancer. This algorithm should not be used in different patient demographics.
Benefits: AI-based risk stratification for clinically significant prostate cancer using prostate MRI can potentially aid the diagnostic pathway of prostate cancer, reducing over-treatment and unnecessary biopsies.
Target population: This algorithm was trained on patients with raised PSA levels or clinical suspicion of prostate cancer, without prior treatment (e.g. radiotherapy, transurethral resection of the prostate (TURP), transurethral ultrasound ablation (TULSA), cryoablation, etc.), without prior positive biopsies, without artifacts, and with reasonably-well aligned sequences.
MRI scanner: This algorithm was trained and evaluated exclusively on prostate biparametric MRI scans acquired with various commercial 1.5 Tesla or 3 Tesla scanners using surface coils from Siemens Healthineers, Erlangen, Germany or Philips Medical Systems, Eindhoven, Netherland. It does not account for vendor-neutral properties or domain adaptation, and in turn, the compatibility with scans derived using any other MRI scanner or those using endorectal coils is unkown.
Sequence alignment and position of the prostate: While the input images (T2W, HBV, ADC) can be of different spatial resolutions, the algorithm assumes that they are co-registered or aligned reasonably well.
General use: This model is intended to be used by radiologists for predicting clinically significant prostate cancer in biparametric MRI examinations. The model is not a diagnostic for cancer and is not meant to guide or drive clinical care. This model is intended to complement other pieces of patient information in order to determine the appropriate follow-up recommendation.
Appropriate decision support: The model identifies lesion X as at a high risk of being malignant. The referring radiologist reviews the prediction along with other clinical information and decides the appropriate follow-up recommendation for the patient.
Before using this model: Test the model retrospectively and prospectively on a diagnostic cohort that reflects the target population that the model will be used upon to confirm the validity of the model within a local setting.
Safety and efficacy evaluation: To be determined in a clinical validation study.

Warnings

Risks: Even if used appropriately, clinicians using this model can misdiagnose cancer. Delays in cancer diagnosis can lead to metastasis and mortality. Patients who are incorrectly treated for cancer can be exposed to risks associated with unnecessary interventions and treatment costs related to follow-ups.
Inappropriate Settings: This model was not trained on MRI examinations of patients with prior treatment (e.g. radiotherapy, transurethral resection of the prostate (TURP), transurethral ultrasound ablation (TULSA), cryoablation, etc.), prior positive biopsies, artifacts or misalignment between sequences. Hence it is susceptible to faulty predictions and unintended behaviour when presented with such cases. Do not use the model in the clinic without further evaluation.
Clinical rationale: The model is not interpretable and does not provide a rationale for high risk scores. Clinical end users are expected to place the model output in context with other clinical information to make the final determination of diagnosis.
Inappropriate decision support: This model may not be accurate outside of the target population. This model is not designed to guide clinical diagnosis and treatment for prostate cancer.
Generalizability: This model was developed with prostate MRI examinations from Radboud University Medical Center, Ziekenhuisgroep Twente, and Prostaat Centrum Noord-Nederland. Do not use this model in an external setting without further evaluation.
Discontinue use if: Clinical staff raise concerns about the utility of the model for the intended use case or large, systematic changes occur at the data level that necessitate re-training of the model.

Common Error Messages

Left empty by the Algorithm Editors

Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3