Hi,
The example images that you shared are still (expected to be ) in the lung region. Because there is still some lung in the areas that do not look dark, such as behind the heart.
We used a lung segmentation algorithm to select a region where we could pick random nodule locations as in the example link
So we select the area that is denoted with the blue line, and pick a random nodule locations within this area.
I think it is not a perfect estimation of the lung area, because it is not easy to estimate the full volume from a 2D CXR. But it should still be good enough. There is a chance that this can create few bounding boxes that are not in the lung region, but we expect that will not harm the algorithm performance.