Data storage and licenses
Storing the data for your challenge consists of two parts: the public training data should be uploaded to a public data host and the secret test data should be uploaded to a private archive on grand-challenge.org.
Storing training data
We recommend using Zenodo to make the training data available to participants in your challenge. The main benefits of Zenodo are that Zenodo is open source, it promotes Open Science, it’s free, and it assigns a DOI to any dataset you upload which increases data traceability. However, please note that Zenodo has a 50GB limit per repository. We have created a grand-challenge.org Community on Zenodo. If you add data for your challenge on Zenodo, please list your record in this community. In the future, we plan to more closely integrate Zenodo in grand-challenge.org. We're also currently working on other solutions for datasets larger than 50GB, feel free to leave a message in the Forum for any suggestions you may have.
We have recently shared the first data set associated with a challenge on grand-challenge on the AWS Open Data Registry. This is a registry to discover and share datasets that are available via AWS resources. Data shared here can directly be accessed via AWS S3 and you can download them for free. The limitation of 50GB per repository that Zenodo imposes does not exist for the AWS Open Data Registry. If you want to add a dataset or view an example on how to add a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.
Storing test data
The test data for your challenge must be uploaded to a private archive on grand-challenge.org. Ensure to not set this repository public as this is your private/hidden test set that should not be made available to participants. This archive is then linked to the respective phase in your challenge by one of the site admins (send an email to firstname.lastname@example.org to set this up). Running algorithm submissions on this archive then works automatically and is taken care of by the website without requiring any further input from your side. More instructions on creating an archive can be found here.
⚠️ Note that you need one separate archive for each of the phases of your challenge.
Choosing a data license
Lastly we would like to touch upon choosing a license for your public training dataset. We recommend you choose a permissive CC license for the training data you share for your challenge. We have a preference for a CC BY license, since this license offers a lot of flexibility and is also least susceptible to different interpretations by different users. The latter can be problematic in the case of a more restrictive CC BY-NC-ND license, which prohibits commercial use of the data and prohibits sharing of any adapted/modified version of the data. Since (the weights of) a model can potentially be viewed as an adapted/modified version of the training data, training and publishing a model in a challenge would not be possible with a CC BY-NC-ND license, thereby effectively prohibiting participation in your challenge. If a CC BY license is not possible for your challenge and/or you want to discuss license options for your challenge, please contact us at email@example.com.