TwinTrack: Calibrated PDAC CT Segmentation


Algorithm Logo

About

Editor:
User Mugshot Tryzis 
Image Version:
a3edabe1-9428-44c9-9acf-6e96f4d2d6ba — Aug. 29, 2025
Associated publication:
Kirscher T, Ertl A, Maier-Hein K, Coubez X, Meyer P, Faisan S. TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation. arXiv. Published online April 20, 2026.

Summary

TwinTrack is a pancreatic ductal adenocarcinoma (PDAC) segmentation algorithm for contrast-enhanced CT that was designed for ambiguous multi-rater settings, where disagreement between experts reflects genuine uncertainty rather than simple annotation noise. The method combines a coarse-to-fine segmentation pipeline with a post-hoc calibration step that maps predicted tumor probabilities to the mean human response (MHR), i.e. the fraction of expert annotators labeling a voxel as tumor. This makes the output easier to interpret in practice: calibrated probabilities can be read as the expected proportion of experts who would assign the tumor label at a given voxel.

If you use TwinTrack in your research, please cite:

@misc{kirscher2026twintrack,
      title={TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation}, 
      author={Tristan Kirscher and Alexandra Ertl and Klaus Maier-Hein and Xavier Coubez and Philippe Meyer and Sylvain Faisan},
      year={2026},
      eprint={2604.15950},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.15950}, 
}

Mechanism

Target population. TwinTrack targets patients with pancreatic ductal adenocarcinoma (PDAC) on contrast-enhanced CT.

Algorithm description. The method uses a two-stage segmentation pipeline followed by multi-rater-aware post-hoc calibration:

  1. A low-resolution nnU-Net first localizes the pancreas on the full CT and defines a high-recall region of interest (ROI) by dilation.
  2. Inside this ROI, an ensemble of 3 independently trained high-resolution nnU-Nets refines the prediction.
  3. The ensemble outputs are averaged and merged into a binary tumor vs. non-tumor prediction.
  4. A post-hoc isotonic regression calibration model is then applied to the tumor probabilities so that they align with the voxel-wise mean human response (MHR) observed in a small multi-rater calibration set.

Inputs and outputs.

  • Input: contrast-enhanced CT volume.
  • Output: calibrated voxel-wise PDAC probability map and corresponding tumor segmentation.
  • Interpretation of the probability map: values close to 1 indicate strong expected agreement among expert annotators that the voxel belongs to tumor; intermediate values reflect regions of expert disagreement or boundary ambiguity.

Interfaces

This algorithm implements all of the following input-output combinations:

Inputs Outputs
1
    Thoracic-Abdominal CT image
    PDAC Segmentation
    PDAC Confidence

Validation and Performance

The segmentation models were trained on PANORAMA batch 4. Multi-rater annotations were used only for post-hoc calibration: in the paper, the calibration mapping was fit on the CURVAS-PDACVI training split (40 CT scans with 5 expert annotations) without retraining the base segmenter. Reported results below are bootstrap means on the CURVAS-PDACVI test set (n = 64).

Ambiguity-aware segmentation and calibration metrics

Method TDSC ↑ ECE ↓ CRPS ↓
None (uncalibrated) 0.553 0.0156 6032.2
Single-rater calibration target 0.300 0.0209 10341.8
Hard-label calibration target 0.307 0.0209 9859.8
TwinTrack (MHR calibration) 0.569 0.0147 5924.4

Challenge Performance

Date Challenge Phase Rank
Aug. 2, 2025 CURVAS-PDACVI Validation Phase 17
Aug. 27, 2025 CURVAS-PDACVI Sanity Check Phase 1
Aug. 29, 2025 CURVAS-PDACVI Testing Phase 1

Uses and Directions

TwinTrack was developed for research use in uncertainty-aware PDAC segmentation on contrast-enhanced CT, especially in settings where multiple expert annotations are available or clinically relevant disagreement is expected.

Appropriate uses include:

  • generating calibrated PDAC probability maps that explicitly reflect inter-rater ambiguity,
  • supporting ambiguity-aware segmentation studies,
  • supporting downstream analyses in which uncertainty-aware tumor extent is useful, such as vascular invasion assessment derived from the predicted tumor extent.

TwinTrack should not be used as a standalone diagnostic system, should not replace expert radiological review or manual delineation, and has not been validated in this work for other organs, tumor types, imaging modalities, or non-contrast CT acquisitions.

Warnings

  • The calibrated output represents an estimate of expert agreement, not a direct probability of malignancy, pathology confirmation, or clinical outcome.
  • Calibration improves the interpretability of the score distribution, but it does not guarantee correct localization if the underlying segmenter fails.
  • Performance may degrade under dataset shift, including different scanners, contrast phases, reconstruction settings, institutions, annotation protocols, or patient populations.
  • Boundary regions with intermediate probabilities should be interpreted as ambiguous areas, not as labeling errors.
  • This algorithm was evaluated in a challenge/research setting and should be used accordingly.

Common Error Messages

Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3