Question about the valid.json file

Question about the valid.json file  

  By: Gibok on June 26, 2023, 8:45 a.m.

Hello, I am the beginner for this challenge session, I really appreciate if you guys comment useful information or something. I have some questions for the data training.

  1. I am wondering where the valid.json file in the folders that we can get from the HierarchicalDet(git) is located. There is one valid.json file in the pycocotools folder but it is not the one I would like to find. (Are the json files(custom_train.json, val.json) in pycocotools folder in use for training?)

  2. In order to apply our custom data, I think we have to add the code below to detectron2/data/datasets/builtin.py.

def register_my_coco_datasets(): """The function to register custom coco datasets.""" register_coco_instances( "custom_train", {}, "path/to/train.json", # we have 3 json files (quadrant, quadrant_enumeration, quadrant-enumeration-disease) "path/to/train_images" # we have a lot of images in 3 folders (quadrant, quadrant_enumeration, quadrant-enumeration-disease) ) register_coco_instances( "custom_valid", {}, "path/to/valid.json", #we don't have this file. "path/to/valid_images" # we have 50 images in quadrant_enumeration_disease folder )

Thank you.

Re: Question about the valid.json file  

  By: sezginer on June 26, 2023, 9:53 a.m.

Hello @Gibok,

Thank you for your interest in our challenge and our HierarchicalDet framework. Unfortunately, we do not provide a valid.json file for the HierarchicalDet or the challenge itself (which are based on the same data), as the validation and test phases are conducted exclusively on the grand challenge platform. If you wish to use our code, there are a few options available:

  1. Omit validation: You can proceed without applying validation and exclude the valid.json file altogether.

  2. Split training data: Another option is to split your training data and use a portion of it as a validation set during training.

If you are using an object detection model implemented with the detectron2 library (such as HierarchialDet) and want to train it with custom data, you need to register your new JSON files. Here is how you can do it:

For training:

register_coco_instances('custom_train_class', {}, "../sorted/challenge/train_merged_disease_coco3class_onlyd_fixed.json", "../sorted/challenge/for_coco_disease_train")

For validation:

register_coco_instances('custom_validation_class', {}, "../sorted/challenge/test_merged_disease_coco3class.json", "../sorted/challenge/for_coco_disease_test")

Make sure to provide the correct file locations for the JSON files, such as "../sorted/challenge/train_merged_disease_coco3class_onlyd_fixed.json", and the corresponding input panoramic x-rays, such as "../sorted/challenge/for_coco_disease_train". You should add these lines before the training loop of your model. More information can be found here.

After registering the datasets with detectron2, you also need to specify them in your training configuration file:

DATASETS:
  TRAIN: ("custom_train_class",)
  TEST: ("custom_validation_class",)

I hope this information is helpful to you.

Best regards, Sezgin

Re: Question about the valid.json file  

  By: Gibok on June 29, 2023, 9:08 a.m.

Thank you for answering me, Sezgin (@Sezginer),

According to your recommendations, I split the training data into training and validation files.

But, when I try to implement the train_net.py in HierachicalDet, I get this message below. FileNotFoundError: [Errno 2] No such file or directory: 'ibrahim/Diseasedataset_base_enumeration_m_t_inference_train/inference/coco_instances_results.json' from "HierarchicalDet/hierarchialdet/dataset_mapper.py"

Where the json files come from in the dataset_mapper.py? I think that some coco_instance_results.json files are originated from the Detectron2, but these are related with the general objects or person or natural environmental picture data, not with the panoramic x ray transmission data.

I got a coco_instance_results.json file and instances_predictions.pth after training detectron2 through train_net.py. If the Train and Val sets are used to acquire a coco_instance_results.json file, should I have to train other data to get the other coco_instance_results.json file? Since the data I used is coco(train2017, val2017) in training detectron2, I am wondering that I should get the other new coco_instance_results.json file by using the other data(lvis).

If I don't have to use the data with the detectron2 to get the coco_instance_results.json files, should I use the data(quadrant, quadrant_enumeration, quadrant_enumeration-disease, unlabelled) directly in training?

Even if I use the x-ray data in order to get coco_instance_results.json files, I can't train those because I would get this message "No such file or directory: 'ibrahim/Diseasedataset_base_enumeration_m_t_inference_train/inference/coco_instances_results.json' from "HierarchicalDet/hierarchialdet/dataset_mapper.py""

I would appreciate it if you could give me an idea.

Sincerely, Gibok

In the dataset_mapper.py

self.img_format = cfg.INPUT.FORMAT self.is_train = is_train

    boxes_train = "ibrahim/Diseasedataset_base_enumeration_m_t_inference_train/inference/coco_instances_results.json"
    boxes_valid = "ibrahim/Diseasedataset_base_enumeration_m_t_inference_val/inference/coco_instances_results.json"


    self.train_boxes=[]
    self.valid_boxes=[]
    f_train = open(boxes_train)
    dict_train = json.load(f_train)

    f_valid= open(boxes_valid)
    dict_valid=json.load(f_valid)

    for inference in dict_train:
      if inference["score"]>=0.5:
        self.train_boxes.append(inference)

    for inference in dict_valid:
      if inference["score"]>=0.5:
        self.valid_boxes.append(inference)

+c.f) As a result, I determined to train the training data(quadrant) with the validation data split from training data using DiffusionDet architecture, When finishing this training, I suppose that I can get coco_instances_results.json file from the output folder. Then, how can I use this json file for boxes_train and boxes_valid in dataset_mapper.py?

 Last edited by: Gibok on Aug. 15, 2023, 12:58 p.m., edited 3 times in total.

Re: Question about the valid.json file  

  By: sezginer on July 3, 2023, 3:40 p.m.

Dear @Gibok,

Thank you for your interest in HierarchialDet. I would like to provide you with a clearer explanation of how the model works.

HierarchialDet operates in a hierarchical manner. To begin, you need to train the model for quadrant detection. Once trained, you can obtain the inference results from the quadrant model. These inferred bounding boxes are then utilized to manipulate enumeration noisy boxes, which is why the model requires coco_instances_results.json. The diagnosis model relies on the inferences from the enumeration model, as you mentioned earlier with the line (boxes_train = "ibrahim/Diseasedataset_base_enumeration_m_t_inference_train/inference/coco_instances_results.json"). For a more detailed explanation, please refer to our preprint available at here.

Unfortunately, we are unable to provide all model weights at this time, as the preprint is still under review. However, you have the option to train all models yourself since we have provided all the necessary data. Alternatively, if you choose to close the bounding box manipulation part, please be aware that the model's performance will be compromised, as our approach relies on hierarchically labeled data. If you need assistance with removing this part, I am more than willing to help. Simply submit an issue to the Github repository for guidance. You also need to close bounding box manipulation to train quadrant detection model.

I would like to emphasize that we highly recommend utilizing all labels to achieve a higher accuracy model. I hope this information proves helpful to you.

Best regards, Sezgin