3D N2V U Net DVT Inspired


Algorithm Logo

About

Editor:
Contact email:
Image Version:
f2ec31a2-f60a-442a-8166-e33a6c3090ae — May 5, 2026

Summary

The algorithm is a 3D U-Net with a transformer bottleneck that adapts the Denoising Vision Transformers (Yang et al., 2024) framework for image-level denoising of calcium-imaging video stacks. The paper's core insight is that any ViT's output can be decomposed into three terms — clean signal f(x), position-dependent artifact g(E_pos), and a residual interaction h(x, E_pos). This decomposition transfers naturally to fluorescence microscopy where the three terms map onto the true calcium signal, fixed sensor pattern noise, and signal-dependent shot/read noise respectively. The network processes a noisy 3D patch through a conventional convolutional encoder (two downsampling stages), passes the bottleneck features through a transformer block that explicitly implements the DVT decomposition (a learnable artifact field G, a 3-layer residual MLP, and a single-Transformer-block denoiser with new positional embeddings — the configuration the paper found best in Table 6, row d), then reconstructs the output through a symmetric convolutional decoder with skip connections. The output is residual: the network predicts the noise component, which is subtracted from the input to yield the denoised stack.

Mechanism

Target population

The algorithm targets researchers working with two-photon calcium-imaging recordings of neuronal populations (typically rodent cortex) used in systems neuroscience. It is a research tool for improving low-SNR fluorescence microscopy video — not for clinical use — so that downstream analyses such as ROI segmentation and spike inference become more reliable.

Algorithm description

A self-supervised 3D U-Net with a transformer bottleneck inspired by Denoising Vision Transformers (Yang et al., 2024). The bottleneck implements the paper's clean-signal / position-artifact / residual-noise decomposition via a learnable artifact field, a 3-layer residual MLP, and a single-Transformer-block denoiser with new positional embeddings. Training is zero-shot per stack: a 400-iteration temporal-median warmup is followed by 4000 iterations of 3D Noise2Void. Inference uses overlapping sliding windows with Gaussian blending.

Inputs and outputs

Input: a noisy calcium-imaging stack as a 3D TIFF of shape [F, H, W] (typically [1500, 490, 490]). Output: a denoised stack of identical shape and dtype, with calcium transients preserved and shot noise / fixed-pattern artifacts suppressed.


Interfaces

This algorithm implements all of the following input-output combinations:

Inputs Outputs
1
    Stacked Synthetic Calcium Microscopy Images of Neurons, subject to noise
    Stacked Synthetic Calcium Microscopy Images of Neurons, with reduced noise

Validation and Performance


Challenge Performance

Date Challenge Phase Rank
May 5, 2026 AI4LIFE-CIDC25 Final Submission Phase: Content Generalization 3

Uses and Directions

This algorithm was developed for research purposes only.

Warnings

Common Error Messages

Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3