Developing an Algorithm from a Template - Prepare your code for containerization

Developing an Algorithm from a Template¶

Reference the runtime-environment information for restrictions that apply for containers.

If you wish not to worry about how data is loaded and written, and you just want to get your Algorithm on the platform with as few lines of code as possible. Then finalizing an (almost) working custom-made template might be the easiest way.

We'll be using a demo algorithm template for this instruction. However, downloading an updated and tailored algorithm template should be preferred!

Algorithm editors can find the templates via Templates ⟶ Download Algorithm Image Template:

Reference Algorithm¶

In this tutorial, we will build an Algorithm image for a U-Net that segments retinal blood vessels from the DRIVE Challenge.

The image below shows the output of a very simple U-Net that segments vessels.

To start the process, let's clone the repository that contains the weights from a pre-trained model and the Python scripts to run inference on a new fundus image.

$ git clone https://github.com/DIAGNijmegen/drive-vessels-unet.git

Create a base repository using the algorithm template¶

The templates provide methods to wrap your algorithm in Docker containers. Just execute the following command in a terminal of your choice:

$ git clone https://github.com/DIAGNijmegen/demo-algorithm-template

This will create a templated repository with a Dockerfile and other files.

The scripts for your container files were automatically generated by the platform. It includes bash scripts for building, testing, and saving the algorithm image:

├── Dockerfile
├── README.md
├── do_build.sh
├── do_save.sh
├── do_test_run.sh
├── inference.py
├── requirements.txt
├── resources
│   └── some_resource.txt
└── test
    └── input
        ├── age-in-months.json
        └── images
            └── color-fundus
                └── 998dca01-2b74-4db5-802f-76ace545ec4b.mha

Running the test¶

It is informative to try and run algorithm image as a container on your local system. This allows for quick debugging without the need for the--somewhat slow--saving and uploading of the image.

There is a helper script for this which has the correct docker calls:

$ ./do_test_run.sh

This should output some basic docker build commands and all the stdout and stderr printing the template currently has. Note that on the first run, the build process might take a while since it needs to download some large image layers.

Inserting the Algorithm¶

The next step is to edit inference.py. This is the file where you will insert the implementation of the reference algorithm.

In the inference.py, a function, run(), has been created for you, and it is instantiated and called with:

if __name__ == "__main__":
    raise SystemExit(run())

The default function run() generated by the platform does simple reading of the input and saving of the output. In between reading and writing, there is a clear point where we are to insert the reference algorithm:

def run():
    # Read the input
    input_color_fundus_image = load_image_file_as_array(
        location=INPUT_PATH / "images/color-fundus",
    )
    input_age_in_months = load_json_file(
         location=INPUT_PATH / "age-in-months.json",
    ) # Note: we'll be ignoring this input completely

    # Process the inputs: any way you'd like
    _show_torch_cuda_info()

    with open(RESOURCE_PATH / "some_resource.txt", "r") as f:
        print(f.read())

    # TODO: add your custom inference here

    # For now, let us make bogus predictions
    output_binary_vessel_segmentation = numpy.eye(4, 2)

    # Save your output
    write_array_as_image_file(
        location=OUTPUT_PATH / "images/binary-vessel-segmentation",
        array=output_binary_vessel_segmentation,
    )

    return 0

The reference algorithm is found in a similar file reference-algorithm/inference.py.

We'll copy over the relevant part, adding import at the top of our python script as needed. Including some pre and postprocessing of the images. First, we'll start with the torch device settings and initializing the model:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize MONAI UNet with updated arguments
model = monai.networks.nets.UNet(
    spatial_dims=2,
    in_channels=3,
    out_channels=1,
    channels=(16, 32, 64, 128, 256),
    strides=(2, 2, 2, 2),
    num_res_units=2,
).to(device)

Next, we'll load in the weights.

🔩 Copying your model weights into the image¶

Ensure that you copy all the files needed to run your scripts, including the model weights, into /opt/app/. This can be configured in the Dockerfile using the COPY command. If your model weights are stored in a resources/ folder, they are already copied into the image. This is done via this line of the Dockerfile:

COPY --chown=user:user resources /opt/app/resources

For now, we'll be copying the best_metric_model_segmentation2d_dict.pt from our reference Algorithm into the resources/ directory.

Of course, we'll still need to load the weights into our initialized model by adding the following line to inference.py:

model.load_state_dict(torch.load( RESOURCE_PATH / "best_metric_model_segmentation2d_dict.pth"))

🛠️ Processing Input and Output¶

The input is already read, but generally, we need to convert it a bit to work with our algorithm. We're going to hide that with a pre_process function. The same holds for the output: we are already writing a numpy array to an image, but we might need to perform some thresholding after our forward pass. We'll do that with a post_processing function of our design. In inference.py we'll combine the processing with the forward pass:

input_tensor = pre_process(image=input_color_fundus_image, device=device)

# Do the forward pass
with torch.no_grad():
  out = model(input_tensor).squeeze().detach().cpu().numpy()

output_binary_vessel_segmentation = post_process(image=out, shape=input_color_fundus_image.shape)

🏗️ Combining everything¶

Finally, we should end up with an updated inference.py that will look something like this:

def run():
    # Read the input
    input_color_fundus_image = load_image_file_as_array(
        location=INPUT_PATH / "images/color-fundus",
    )
    input_age_in_months = load_json_file(
         location=INPUT_PATH / "age-in-months.json",
    )

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Initialize MONAI UNet with updated arguments
    model = monai.networks.nets.UNet(
        spatial_dims=2,
        in_channels=3,
        out_channels=1,
        channels=(16, 32, 64, 128, 256),
        strides=(2, 2, 2, 2),
        num_res_units=2,
    ).to(device)

    model.load_state_dict(torch.load( RESOURCE_PATH / "best_metric_model_segmentation2d_dict.pth"))

    # Ensure model is in evaluation mode
    model.eval()

    input_tensor = pre_process(image=input_color_fundus_image, device=device)

    # Do the forward pass
    with torch.no_grad():
        out = model(input_tensor).squeeze().detach().cpu().numpy()

    output_binary_vessel_segmentation = post_process(image=out, shape=input_color_fundus_image.shape)

    # Save your output
    write_array_as_image_file(
        location=OUTPUT_PATH / "images/binary-vessel-segmentation",
        array=output_binary_vessel_segmentation,
    )

    return 0

def pre_process(image, device):
    # Step 1: Convert the input numpy array to a PyTorch tensor with float data type
    input_tensor = torch.from_numpy(image).float()

    # Step 2: Rearrange dimensions from [height, width, channels] to [channels, height, width]
    input_tensor = input_tensor.permute(2, 0, 1)

    # Step 3: Add a batch dimension to make it [1, channels, height, width]
    input_tensor = input_tensor.unsqueeze(0)

    # Step 4: Move the tensor to the device (CPU or GPU)
    input_tensor = input_tensor.to(device)

    # Calculate padding
    height, width = image.shape[:2]
    pad_height = (16 - (height % 16)) % 16
    pad_width = (16 - (width % 16)) % 16

    # Apply padding equally on all sides
    padding = (pad_width // 2, pad_width - pad_width // 2, pad_height // 2, pad_height - pad_height // 2)

    return F.pad(input_tensor, padding)


def post_process(image, shape):
    image = transform.resize(image, shape[:-1], order=3)
    image = (expit(image) > 0.80)
    return (image * 255).astype(np.uint8)

There are a few things we still need to do before we can run the algorithm image.

Updating the Dockerfile¶

Ensure that you import the right base image in your Dockerfile. For our U-Net, we will build our Docker with the official PyTorch Docker as the base image. This should take care of installing PyTorch with the necessary CUDA environments inside your Docker. If you're using TensorFlow, please build your Docker with the official base image from TensorFlow. You can browse through Docker Hub to find your preferred base image. The base image can be specified in the first line of your Dockerfile:

FROM pytorch/pytorch

Here are some best practices for configuring your Dockerfile.

📝 Configuring requirements.txt¶

Ensure that all of the dependencies with their versions are specified in requirements.txt as shown in the example below:

SimpleITK
numpy
monai==1.4.0
scikit-learn
scipy
scikit-image

Note that we haven't included torch as it comes with the PyTorch base image included in our Dockerfile in the previous step.

🦾 Do a test run locally¶

Finally, we are near the end! Add a good example input image in the test/input/color-fundus and run the local test:

$ ./do_test_run.sh

This should create a local Docker image, spawn a container, and do a forward pass on the input image. If all goes well, it should output a binary segmentation to test/output/images/binary-vessel-segmentation.

Once you are happy things work locally you can save the image as an uploaded as documented in a next section.

Previous section

Next section