Restormer3D


Algorithm Logo

About

Editor:
Contact email:
Image Version:
a3fbdb8d-667d-4fc4-b615-c458bbe1942d — May 26, 2026
Model Version:
d8d6e0c7-7a84-4f92-9624-6741937504e7 — May 26, 2026

Summary

A self-supervised 3D denoiser for two-photon calcium imaging stacks based on Restormer3D — a 3D extension of the Restormer transformer architecture (Zamir et al., CVPR 2022) adapted to volumetric data. The model is trained per-stack using a Noise2Void blind-spot objective (Krull et al., CVPR 2019), with a temporal-median warmup stage that provides a structural prior before blind-spot training. Submitted to the AI4Life Calcium Imaging Denoising Challenge (CIDC25). The model is pretrained on a multi-stack training set and briefly fine-tuned on each input stack at inference time.

Mechanism

Target use: denoising 3D fluorescence microscopy stacks of shape [T, H, W] (frames × height × width) acquired by two-photon calcium imaging of neural activity.

Architecture: Restormer3D — a U-Net-shaped encoder/decoder with four levels, where each transformer block contains Multi-Dconv Head Transposed Attention (MDTA) (channel-wise attention with depth-wise 3D convolutions on the queries/keys/values) and a Gated-Dconv Feed-Forward Network (GDFN). Configuration: dim=32, num_blocks=(2,2,2,3), num_refinement_blocks=2, heads=(1,2,4,8), ffn_expansion_factor=2.0, bias-free convolutions throughout. ~3.6M parameters, trained in full fp32.

Training: two-stage self-supervised. Stage 0 (warmup) regresses each random 3D patch against the per-stack temporal median (a 2D image computed across all frames) using L1 loss. Stage 1 applies 3D Noise2Void blind-spot masking — random voxels are replaced with neighbors within a small radius, and the model learns to predict the original masked values from spatial+temporal context. No clean ground truth is required at any stage.

Preprocessing: robust per-stack normalization at the 0.5th and 99.5th intensity percentiles. The temporal median is computed once per stack and used only for warmup.

Inference: sliding-window over the input stack with 50% temporal overlap (Hann window blending plus mirror-padding the time axis) to avoid the periodic frame-boundary artifacts that arise from non-overlapping temporal tiles. 50% spatial overlap with Gaussian blending.

Pretrain → fine-tune flow: the model is pretrained offline on a multi-stack training set (noisy stacks only, no clean targets). At submission time the pretrained checkpoint is loaded and the model is briefly fine-tuned (800 N2V iterations, lr=1e-4) on each input stack before inference. This adapts the model to the specific noise statistics of each input.

Inputs: one or more 3D TIFF stacks of noisy calcium imaging frames. Outputs: denoised 3D TIFF stacks of identical shape and dtype as the input.


Interfaces

This algorithm implements all of the following input-output combinations:

Inputs Outputs
1
    Stacked Synthetic Calcium Microscopy Images of Neurons, subject to noise
    Stacked Synthetic Calcium Microscopy Images of Neurons, with reduced noise

Validation and Performance


Challenge Performance

Date Challenge Phase Rank
May 27, 2026 AI4LIFE-CIDC25 Preliminary Phase: Content Generalisation 1
May 27, 2026 AI4LIFE-CIDC25 Final Submission Phase: Content Generalization 1
May 27, 2026 AI4LIFE-CIDC25 Preliminary Phase: Noise Level Generalization 1
May 27, 2026 AI4LIFE-CIDC25 Final Submission Phase: Noise Level Generalization 1

Uses and Directions

This algorithm was developed for research purposes only.

Warnings

Common Error Messages

Information on this algorithm has been provided by the Algorithm Editors, following the Model Facts labels guidelines from Sendak, M.P., Gao, M., Brajer, N. et al. Presenting machine learning model information to clinical end users with model facts labels. npj Digit. Med. 3, 41 (2020). 10.1038/s41746-020-0253-3