Validation - 2 submissions overall is very low

Validation - 2 submissions overall is very low  

  By: cdancette on June 23, 2025, 12:49 p.m.

Dear organizers,

I am currently in the process of evaluating one of our models for the UNICORN challenge. I noticed we are only allowed 2 submissions overall for the validation sets, which is very low - We have multiple versions of our model, so it is hard to know which one we will submit. Usually, for ML challenges like this, the validation set allows many submissions to benchmark multiple approaches.

Is there a way to benchmark the model on a held-out validation set with a larger number of tries, to give us an estimate of the performance of various algorithms, before making the last submission ?

Re: Validation - 2 submissions overall is very low  

  By: clemsg on June 26, 2025, 9:08 a.m.

Hi,

thank you for your question, and great to hear you're actively testing your models for UNICORN!

You're right that submission limits are stricter than in some ML challenges. This is primarily due to the high cost of running inference with foundation models, especially for vision tasks. To manage compute resources fairly across all teams, we limit the number of submissions during the validation phase.

For vision tasks, the submission limit applies only to the encoder Docker. You are free to re-run the same encoder with different adaptor strategies, without it counting toward the submission limit. Instructions for this are available here.

We want to stress that running inference on the platform should be seen as complementary to local development, not as the main way to evaluate models. If you plan to try multiple versions of your algorithm, we strongly encourage setting up a proper local testing pipeline to benchmark performance locally before making your a submission on platform. To support local development, we've released several resources:

As a reminder, each team may submit up to 5 algorithms per task during the validation phase, see more information here.

Depending on how the compute budget is used throughout the challenge, we may allow additional submissions toward the end, but we cannot guarantee this! So we highly recommend using the available resources to iterate locally.

Let us know if anything is unclear or if you have further questions!