TwinTrack: Calibrated PDAC CT Segmentation
About
Summary
TwinTrack is a pancreatic ductal adenocarcinoma (PDAC) segmentation algorithm for contrast-enhanced CT that was designed for ambiguous multi-rater settings, where disagreement between experts reflects genuine uncertainty rather than simple annotation noise. The method combines a coarse-to-fine segmentation pipeline with a post-hoc calibration step that maps predicted tumor probabilities to the mean human response (MHR), i.e. the fraction of expert annotators labeling a voxel as tumor. This makes the output easier to interpret in practice: calibrated probabilities can be read as the expected proportion of experts who would assign the tumor label at a given voxel.
If you use TwinTrack in your research, please cite:
@misc{kirscher2026twintrack, title={TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation}, author={Tristan Kirscher and Alexandra Ertl and Klaus Maier-Hein and Xavier Coubez and Philippe Meyer and Sylvain Faisan}, year={2026}, eprint={2604.15950}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2604.15950}, }
Mechanism
Target population. TwinTrack targets patients with pancreatic ductal adenocarcinoma (PDAC) on contrast-enhanced CT.
Algorithm description. The method uses a two-stage segmentation pipeline followed by multi-rater-aware post-hoc calibration:
- A low-resolution nnU-Net first localizes the pancreas on the full CT and defines a high-recall region of interest (ROI) by dilation.
- Inside this ROI, an ensemble of 3 independently trained high-resolution nnU-Nets refines the prediction.
- The ensemble outputs are averaged and merged into a binary tumor vs. non-tumor prediction.
- A post-hoc isotonic regression calibration model is then applied to the tumor probabilities so that they align with the voxel-wise mean human response (MHR) observed in a small multi-rater calibration set.
Inputs and outputs.
- Input: contrast-enhanced CT volume.
- Output: calibrated voxel-wise PDAC probability map and corresponding tumor segmentation.
- Interpretation of the probability map: values close to 1 indicate strong expected agreement among expert annotators that the voxel belongs to tumor; intermediate values reflect regions of expert disagreement or boundary ambiguity.
Interfaces
This algorithm implements all of the following input-output combinations:
Validation and Performance
The segmentation models were trained on PANORAMA batch 4. Multi-rater annotations were used only for post-hoc calibration: in the paper, the calibration mapping was fit on the CURVAS-PDACVI training split (40 CT scans with 5 expert annotations) without retraining the base segmenter. Reported results below are bootstrap means on the CURVAS-PDACVI test set (n = 64).
Ambiguity-aware segmentation and calibration metrics¶
| Method | TDSC ↑ | ECE ↓ | CRPS ↓ |
|---|---|---|---|
| None (uncalibrated) | 0.553 | 0.0156 | 6032.2 |
| Single-rater calibration target | 0.300 | 0.0209 | 10341.8 |
| Hard-label calibration target | 0.307 | 0.0209 | 9859.8 |
| TwinTrack (MHR calibration) | 0.569 | 0.0147 | 5924.4 |
Challenge Performance
| Date | Challenge | Phase | Rank |
|---|---|---|---|
| Aug. 2, 2025 | CURVAS-PDACVI | Validation Phase | 17 |
| Aug. 27, 2025 | CURVAS-PDACVI | Sanity Check Phase | 1 |
| Aug. 29, 2025 | CURVAS-PDACVI | Testing Phase | 1 |
Uses and Directions
TwinTrack was developed for research use in uncertainty-aware PDAC segmentation on contrast-enhanced CT, especially in settings where multiple expert annotations are available or clinically relevant disagreement is expected.
Appropriate uses include:
- generating calibrated PDAC probability maps that explicitly reflect inter-rater ambiguity,
- supporting ambiguity-aware segmentation studies,
- supporting downstream analyses in which uncertainty-aware tumor extent is useful, such as vascular invasion assessment derived from the predicted tumor extent.
TwinTrack should not be used as a standalone diagnostic system, should not replace expert radiological review or manual delineation, and has not been validated in this work for other organs, tumor types, imaging modalities, or non-contrast CT acquisitions.
Warnings
- The calibrated output represents an estimate of expert agreement, not a direct probability of malignancy, pathology confirmation, or clinical outcome.
- Calibration improves the interpretability of the score distribution, but it does not guarantee correct localization if the underlying segmenter fails.
- Performance may degrade under dataset shift, including different scanners, contrast phases, reconstruction settings, institutions, annotation protocols, or patient populations.
- Boundary regions with intermediate probabilities should be interpreted as ambiguous areas, not as labeling errors.
- This algorithm was evaluated in a challenge/research setting and should be used accordingly.