The HU value (-3024) in the mha file should be -1024

The HU value (-3024) in the mha file should be -1024  

  By: biointell.felon on June 16, 2025, 7:53 a.m.

I found there's HU value of -3024 in the raw mha file. I think that should be -1024, because there's no value between -3024 and -1024.

step.1. read image

id_img = "1.2.840.113654.2.55.100199553060090932939760477295777719301.mha" ct_img = sitk.ReadImage(os.path.join("tmp", id_img)) ct_data_raw = sitk.GetArrayFromImage(ct_img) print(f"ct_data_raw.min: \n{ct_data_raw.min()}") # -3024 print(f"ct_data_raw.max: \n{ct_data_raw.max()}") # 2221

plot HU distribution:

plt.hist(ct_data_raw.flatten(), bins=50)

step.2. replace -3024 with -1024

arr = np.where(ct_data_raw == -3024, -1024, ct_data_raw) print(arr.min) # -1024 plt.hist(arr.flatten(), bins=50)

Re: The HU value (-3024) in the mha file should be -1024  

  By: bogdanobreja on June 16, 2025, 9:39 a.m.

Dear biontell.felon,

Thanks for letting us know.

Indeed, a value of -3024 HU falls well outside the typical Hounsfield Unit range and is usually used as a background placeholder during DICOM-to-MHA conversion. As you correctly pointed out, we also recommend mapping such values to -1024 HU (air) during preprocessing to ensure consistency. That said, we leave the final decision to participants, depending on their specific pipeline.

Since the data has already been uploaded to Zenodo, we won’t be re-uploading or zipping the scans at this stage. However, we’ll keep this post visible for the benefit of other participants and may add a note about it to the dataset description in a future update.

We also noticed that four identical posts were created by you on the forum regarding this issue. To keep the discussion clear and centralized, we’ll leave this post and remove the duplicates.

Kind Regards, Bogdan Obreja.

 Last edited by: bogdanobreja on June 16, 2025, 9:41 a.m., edited 1 time in total.