Dense features

Dense features  

  By: Telcrome on May 27, 2025, 9:51 a.m.

Hi!

  • Is there documentation that explains how the algorithm should output features for the current segmentation adapters? The minimal example seems to use 1D embeddings for patches within regions, is that correct?
  • If it's not possible already, could you also add the option for algorithms to provide spatial features such as (Channels, spatials...)? this layer and this layer would need to be replaced by a resize, is that correct?

Thanks in advance!

 Last edited by: Telcrome on May 27, 2025, 11:33 a.m., edited 1 time in total.

Re: Dense features  

  By: MichelleStegeman on May 27, 2025, 2:48 p.m.

Hi! Thanks for reaching out with your questions.

You're right, the minimal example currently demonstrates the use of 1D embeddings for patches within regions. This setup is primarily intended for demonstration purposes and is a simplified setup. If you're interested in implementing your own adaptor strategy, you can check out the instructions in this repository: https://github.com/DIAGNijmegen/unicorn_eval. It explains how to submit a PR with your custom adaptor.

That said, we’re happy to look into providing small adjustments to the simplified setup that accepts 2D embeddings if you’d prefer to use this demonstrator with minimal modifications. Please feel free to share any specific use case or requirements you have in mind so we can better tailor the solution to your needs.

Regarding your question about spatial features, I’m not entirely sure I follow. Could you clarify what you mean?

Re: Dense features  

  By: Telcrome on May 27, 2025, 7:30 p.m.

Thanks for your quick reply!

In the case of 2D image input, (e.g. channels=3, H=256, W=256), with spatial features I mean 2D embeddings that might be downsampled (e.g. embedding-size=42, H=256//4, W=256//4). For a 3D input, such as (C=1, H, W, D), it could be (Z, H//4, W//4, D).

A suitable adapter could then be a 2D or 3D convolutional layer. Would you be interested to provide something like this as a modification to the current example?

foundation_model_output = ... # Z, H//4, W//4 tensor
ground_truth = ... # classes, H, W tensor

# Adapter
predictions = nn.Conv2d(EMBEDDING_DIM:=42, NUMBER_CLASSES, kernel_size=1, padding=0)(foundation_model_output)
ground_truth_downsampled = F.resize(ground_truth, predictions.shape[-2:], interpolation=InterpolationMode.NEAREST)
loss = crossentropy(predictions, ground_truth_downsampled)
 Last edited by: Telcrome on May 27, 2025, 7:35 p.m., edited 1 time in total.

Re: Dense features  

  By: clemsg on May 27, 2025, 8:51 p.m.

Hi, interesting question!

To give you a bit of context: when we framed the challenge, we focused on (vision) foundation models that output a 1D embedding per image (this is the case for the vast majority of models). It also allows control on the total size of the file containing the features. If your model instead produces 2D embeddings, you can consider the following workaround:

If the output height and width are consistent across all image inputs, you can flatten the 2D tensor into 1D before saving, and simply reshape it back to the original 2D shape in your adaptor after loading the (flattened) features from disk.

It would look like this:

foundation_model_output_flattened = ... # embedding_size * H//4 * W//4 tensor
foundation_model_output_2d = foundation_model_output_flattened.reshape(embedding_size, H//4, W//4)

predictions = nn.Conv2d(embedding_size, NUMBER_CLASSES, kernel_size=1, padding=0)(foundation_model_output_2d)

We strongly encourage participants to implement their own adaptor logic! Instructions for doing so are available in the evaluation toolkit repository: the README.md explains how to submit a PR with your custom adaptor.

Alternatively, we could consider extending the accepted feature format to support 2D grids natively. However, this would require internal discussion and may not be feasible in the short term. For now, we recommend using the flattening approach, which should be sufficient in most cases.

Let us know how this works out for you!

 Last edited by: clemsg on May 27, 2025, 9:13 p.m., edited 2 times in total.