Public Training and Development Dataset: Updates and Fixes ¶
By: drepeeters on Jan. 15, 2025, 10:50 a.m.
The LUNA25: Public Training and Development dataset, consisting of over 6000 cases, is now online! ... Please monitor this thread for all updates and fixes regarding this dataset.
- Imaging data has been released via: https://zenodo.org/records/14223624
- Annotations have been released via: https://zenodo.org/records/14673658
Example below on how to download all the MHA CT images and the nodule blocks from Zenodo using Python.To get a Zenodo Token you must make an account on their website. The full dataset consists of 46 luna25_images.zip
files and 2 luna25_nodule_blocks.zip
files.
import os import requests ACCESS_TOKEN = "YOUR ZENODO TOKEN" record_id = "14223624" #LUNA25 record id # Specify the output folder where files will be saved output_folder = "YOUR-OUTPUT_PATH" os.makedirs(output_folder, exist_ok=True) # Get the metadata of the Zenodo record r = requests.get(f"https://zenodo.org/api/records/{record_id}", params={'access_token': ACCESS_TOKEN}) if r.status_code != 200: print("Error retrieving record:", r.status_code, r.text) exit() # Extract download URLs and filenames download_urls = [f['links']['self'] for f in r.json()['files']] filenames = [f['key'] for f in r.json()['files']] print(f"Total files to download: {len(download_urls)}") # Download each file for index, (filename, url) in enumerate(zip(filenames, download_urls)): file_path = os.path.join(output_folder, filename) print(f"Downloading file {index}/{len(download_urls)}: {filename} -> {file_path}") with requests.get(url, params={'access_token': ACCESS_TOKEN}, stream=True) as r: r.raise_for_status() # Raise an error for failed requests with open(file_path, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): # Download in chunks f.write(chunk) print(f"Completed: {filename}") print("All downloads completed successfully!")