Data centric configuration

Data centric configuration  

  By: tanya.chutani on Aug. 21, 2024, 9:07 a.m.

Hi Team,

I was training the data centric model, please let me know if the configuration is correct or not:

data:
  data_dir: 
  splits_file
  fold: 0
  target_shape: [ 128, 160, 112 ]
  batch_size: 2
  suffix: .nii.gz
  num_workers_train: 0
  num_workers_val: 0

logger:
  experiment: lightning_logs
  name: test_example1

model:
  pretrained: false
  resume: false
  ckpt_path:
  lr: 0.001
  sw_batch_size: 4
  seed: 42

trainer:
  max_steps: 10
  check_val_every_n_epoch: 5
  precision: bf16-mixed #32, 16
  accelerator: gpu
  devices: 1
  deterministic: true
  strategy: auto
  sync_batchnorm: true #false

Also can we change the batch size or it needs to remain constant?

 Last edited by: tanya.chutani on Aug. 21, 2024, 11:24 a.m., edited 1 time in total.

Re: Data centric configuration  

  By: jdex on Aug. 21, 2024, 10:26 a.m.

Looks fine. You are allowed to change the batchsize.

Re: Data centric configuration  

  By: tanya.chutani on Aug. 21, 2024, 10:54 a.m.

Can we also change the target shape/patch size?

Re: Data centric configuration  

  By: jdex on Aug. 21, 2024, 2:41 p.m.

No, see here for further details https://github.com/ClinicalDataScience/datacentric-challenge?tab=readme-ov-file#the-fixed-model.

 Last edited by: jdex on Aug. 21, 2024, 2:48 p.m., edited 2 times in total.

Re: Data centric configuration  

  By: jdex on Aug. 21, 2024, 3:07 p.m.

To be more precise the patch size is hardcoded in the model. If you call forward on the model it will use this patch size during sliding window inference. What you could do for example is to increase/reduce the volume size by resampling them to some other resolution. Keep in mind the goal of this category is to focus on data. Hyperparameter optimizations do not count as datacentric.

Re: Data centric configuration  

  By: tanya.chutani on Aug. 22, 2024, 5:21 a.m.

Oh. Thanks for clarifying that.

Re: Data centric configuration  

  By: shhhnn on Aug. 23, 2024, 2:28 p.m.

Hello, could you also please clarify if we are allowed to change max_steps: 10 for the data centric model? In the datacentric baseline it was mentioned that the model was "trained for roughly 250k steps on all cases", so 10 steps seems very low

 Last edited by: shhhnn on Aug. 23, 2024, 2:39 p.m., edited 1 time in total.

Re: Data centric configuration  

  By: jdex on Aug. 28, 2024, 10:19 a.m.

Certainly, the configuration can be adapted as you wish. The steps argument is included only for testing purposes. You can add parameters or remove anything you don't need. Apart from the files marked as fixed, everything in the repository can be thoroughly customized. The overall goal of this category is to focus on the data, which is why the model architecture, optimizer, loss function, and learning rate schedule are fixed. However, since convergence time (steps) is highly dependent on the amount of data you use (n data points, batch size) and the learning rate, these parameters are not fixed. This allows participants to enlarge the dataset size (i.e. augmentation) or reduce the dataset size (i.e. remove outliers) without being constrained by a fixed convergence time.

Re: Data centric configuration  

  By: jdex on Aug. 28, 2024, 10:42 a.m.

Just as an example. The training config can be found in the download of the weights.

data:
  data_dir: **/data/2024-05-10_Autopet_v1.1
  data_dir_preprocessed: **/data/autopet3/preprocessed
  splits_file: **/data/2024-05-10_Autopet_v1.1/splits_final.json
  fold: all
  target_shape: [128, 160, 112]
  batch_size: 2
  suffix: .npz
  num_workers_train: 10
  num_workers_val: 4

logger:
  experiment: final_logs
  name: all

model:
  pretrained: false
  resume: false
  sw_batch_size: 8
  lr: 0.001
  seed: 42

trainer:
  max_epochs: 774
  precision: bf16-mixed 
  accelerator: gpu
  devices: 2
  deterministic: true
  strategy: auto
  check_val_every_n_epoch: 1
  sync_batchnorm: true
  accumulate_grad_batches: 1
  limit_val_batches: 0
  num_sanity_val_steps: 0

Re: Data centric configuration  

  By: shhhnn on Aug. 28, 2024, 1:57 p.m.

Of course, that makes sense now. Thank you for the clarification!

 Last edited by: shhhnn on Aug. 28, 2024, 1:58 p.m., edited 1 time in total.