TwinTrack: Calibrated PDAC CT Segmentation

About

Editor:

Tryzis

Contact email:

tristan.kirscher@unistra.fr

Image Version:

a3edabe1-9428-44c9-9acf-6e96f4d2d6ba — Aug. 29, 2025

Associated publication:

Kirscher T, Ertl A, Maier-Hein K, Coubez X, Meyer P, Faisan S. TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation. arXiv. Published online May 19, 2026.

Summary

TwinTrack is a pancreatic ductal adenocarcinoma (PDAC) segmentation algorithm for contrast-enhanced CT that was designed for ambiguous multi-rater settings, where disagreement between experts reflects genuine uncertainty rather than simple annotation noise. The method combines a coarse-to-fine segmentation pipeline with a post-hoc calibration step that maps predicted tumor probabilities to the mean human response (MHR), i.e. the fraction of expert annotators labeling a voxel as tumor. This makes the output easier to interpret in practice: calibrated probabilities can be read as the expected proportion of experts who would assign the tumor label at a given voxel.

If you use TwinTrack in your research, please cite:

@misc{kirscher2026twintrack,
      title={TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation}, 
      author={Tristan Kirscher and Alexandra Ertl and Klaus Maier-Hein and Xavier Coubez and Philippe Meyer and Sylvain Faisan},
      year={2026},
      eprint={2604.15950},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.15950}, 
}

Mechanism

Target population. TwinTrack targets patients with pancreatic ductal adenocarcinoma (PDAC) on contrast-enhanced CT.

Algorithm description. The method uses a two-stage segmentation pipeline followed by multi-rater-aware post-hoc calibration:

A low-resolution nnU-Net first localizes the pancreas on the full CT and defines a high-recall region of interest (ROI) by dilation.
Inside this ROI, an ensemble of 3 independently trained high-resolution nnU-Nets refines the prediction.
The ensemble outputs are averaged and merged into a binary tumor vs. non-tumor prediction.
A post-hoc isotonic regression calibration model is then applied to the tumor probabilities so that they align with the voxel-wise mean human response (MHR) observed in a small multi-rater calibration set.

Inputs and outputs.

Input: contrast-enhanced CT volume.
Output: calibrated voxel-wise PDAC probability map and corresponding tumor segmentation.
Interpretation of the probability map: values close to 1 indicate strong expected agreement among expert annotators that the voxel belongs to tumor; intermediate values reflect regions of expert disagreement or boundary ambiguity.

Interfaces

This algorithm implements all of the following input-output combinations:

Inputs Outputs

	Inputs	Outputs
1	Thoracic-Abdominal CT image Slug `thoracic-abdominal-ct-image` Description A CT image of the thorax and abdomen Kind Image Read from `/input/images/thoracic-abdominal-ct/<uuid>.mha` or `/input/images/thoracic-abdominal-ct/<uuid>.tif` Thoracic-Abdominal CT image	PDAC Segmentation Slug `pdac-segmentation` Description Pancreatic ductal adenocarcinoma segmentation with 0-background and 1-PDAC Kind Segmentation Write to `/output/images/pdac-segmentation/<uuid>.mha` or `/output/images/pdac-segmentation/<uuid>.tif` PDAC Segmentation PDAC Confidence Slug `pdac-confidence` Description Probabilistic map (0.0 - 1.0) of the confidence of PDAC segmentation Kind Heat Map Write to `/output/images/pdac-confidence/<uuid>.mha` or `/output/images/pdac-confidence/<uuid>.tif` PDAC Confidence

Validation and Performance

The segmentation models were trained on PANORAMA batch 4. Multi-rater annotations were used only for post-hoc calibration: in the paper, the calibration mapping was fit on the CURVAS-PDACVI training split (40 CT scans with 5 expert annotations) without retraining the base segmenter. Reported results below are bootstrap means on the CURVAS-PDACVI test set (n = 64).

Ambiguity-aware segmentation and calibration metrics¶

Method	TDSC ↑	ECE ↓	CRPS ↓
None (uncalibrated)	0.553	0.0156	6032.2
Single-rater calibration target	0.300	0.0209	10341.8
Hard-label calibration target	0.307	0.0209	9859.8
TwinTrack (MHR calibration)	0.569	0.0147	5924.4

Challenge Performance

Date	Challenge	Phase	Rank
Aug. 2, 2025	CURVAS-PDACVI	Validation Phase	17
Aug. 27, 2025	CURVAS-PDACVI	Sanity Check Phase	1
Aug. 29, 2025	CURVAS-PDACVI	Testing Phase	1

Uses and Directions

TwinTrack was developed for research use in uncertainty-aware PDAC segmentation on contrast-enhanced CT, especially in settings where multiple expert annotations are available or clinically relevant disagreement is expected.

Appropriate uses include:

generating calibrated PDAC probability maps that explicitly reflect inter-rater ambiguity,
supporting ambiguity-aware segmentation studies,
supporting downstream analyses in which uncertainty-aware tumor extent is useful, such as vascular invasion assessment derived from the predicted tumor extent.

TwinTrack should not be used as a standalone diagnostic system, should not replace expert radiological review or manual delineation, and has not been validated in this work for other organs, tumor types, imaging modalities, or non-contrast CT acquisitions.

Warnings

The calibrated output represents an estimate of expert agreement, not a direct probability of malignancy, pathology confirmation, or clinical outcome.
Calibration improves the interpretability of the score distribution, but it does not guarantee correct localization if the underlying segmenter fails.
Performance may degrade under dataset shift, including different scanners, contrast phases, reconstruction settings, institutions, annotation protocols, or patient populations.
Boundary regions with intermediate probabilities should be interpreted as ambiguous areas, not as labeling errors.
This algorithm was evaluated in a challenge/research setting and should be used accordingly.

Common Error Messages

Left empty by the Algorithm Editors

Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3