Clarification about "one-to-many" model

Dear UNICORN-challenge organizer team,

thank you for hosting this interesting challenge!

I have a question regarding the described "one-to-many" model: Would an agent approach be allowed, as they resemble one model orchestrating the task, however forwarding the input to a specialized model?

Further, in the Zenodo document (page 10) for Task 1 it says that:

"The foundation model should always remain the same, at least within tasks of the same time (e.g. one vision model for vision tasks, one language model for language tasks, one vision-language model for vision-language tasks;"

However, for vision tasks there are 2D and 3D tasks. Should one model be used to encode both 2D images and 3D images.

Thank you for clarifying my questions!

Hi,

that’s a great question! Let me provide more details below.
We’ll also make a public announcement later today to ensure everyone is on the same page.

The goal of UNICORN is to benchmark foundation models: models that can handle diverse tasks within a domain or modality, rather than highly specialized, task-specific solutions. Allowing subspecialized models per task would defeat the purpose of the benchmark and reduce its value for assessing generalization.

That said, asking participants to submit a single model that handles all tasks across all modalities is not reasonable, as such models don’t yet exist to the best of our knowledge. To strike a good balance, we require that each submission uses a single model per modality group, as follows:

one model for pathology vision tasks
one model for radiology tasks using CT
one model for radiology tasks using MR
one model for language tasks
one model for the vision-language task

If your model can handle multiple modalities in a unified way, that’s even better! :)

Regarding agent-style approaches: yes, you may implement a controller or agent that routes inputs internally, as long as all tasks within the same modality group use the same model. Forwarding inputs to different specialized models for each task is not permitted.

The main reason for this constraint is that developed models should be capable of handling new tasks within its modality group without retraining or structural changes. During the post-challenge evaluation study, we will likely run the best performing models on new tasks to further assess generalization.

Let us know if anything remains unclear!

Clarification about "one-to-many" model

Clarification about "one-to-many" model ¶

Re: Clarification about "one-to-many" model ¶