Taxonomy Icon

Data Science

In Part 1 in this series, we gave an overview of this project and explained how we scaled down the images. Part 2 showed how we investigated image filters and determined a set of filters that can be used for effective tissue segmentation with our data set. Now in this article, we’ll explain morphology operators and how we combined filters and applied filters to multiple images.

Morphology

Information about image morphology can be found at https://en.wikipedia.org/wiki/Mathematical_morphology. The primary morphology operators are erosion, dilation, opening, and closing. With erosion, pixels along the edges of an object are removed. With dilation, pixels along the edges of an object are added. Opening is erosion followed by dilation. Closing is dilation followed by erosion. With morphology operators, a structuring element (such as a square, circle, or cross) is passed along the edges of the objects to perform the operations. Morphology operators are typically performed on binary and grayscale images. In our examples, we apply morphology operators to binary images (2-dimensional arrays of 2 values, such as True/False, 1.0/0.0, and 255/0).

Erosion

Let’s have a look at an erosion example. We create a binary image by calling the filter_grays() function on the original RGB image. The filter_binary_erosion() function uses a disk as the structuring element that erodes the edges of the “No Grays” binary image. We demonstrate erosion with disk structuring elements of radius 5 and radius 20.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
util.display_img(no_grays, "No Grays", bg=True)
bin_erosion_5 = filter.filter_binary_erosion(no_grays, disk_size=5)
util.display_img(bin_erosion_5, "Binary Erosion (5)", bg=True)
bin_erosion_20 = filter.filter_binary_erosion(no_grays, disk_size=20)
util.display_img(bin_erosion_20, "Binary Erosion (20)", bg=True)
Original slide No grays
Original Slide No Grays
Binary erosion (disk_size = 5) Binary erosion (disk_size = 20)
Binary Erosion (disk_size = 5) Binary Erosion (disk_size = 20)

Notice that increasing the structuring element radius increases the compute time.

RGB                  | Time: 0:00:00.171309  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.086484  Type: bool    Shape: (1385, 1810)
Binary Erosion       | Time: 0:00:00.167290  Type: uint8   Shape: (1385, 1810)
Binary Erosion       | Time: 0:00:00.765442  Type: uint8   Shape: (1385, 1810)

Dilation

The filter_binary_dilation() function uses a disk structuring element in a similar manner as the corresponding erosion function. We’ll utilize the same “No Grays” binary image from the previous example and dilate the image using a disk radius of 5 pixels followed by a disk radius of 20 pixels.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
util.display_img(no_grays, "No Grays", bg=True)
bin_dilation_5 = filter.filter_binary_dilation(no_grays, disk_size=5)
util.display_img(bin_dilation_5, "Binary Dilation (5)", bg=True)
bin_dilation_20 = filter.filter_binary_dilation(no_grays, disk_size=20)
util.display_img(bin_dilation_20, "Binary Dilation (20)", bg=True)

We see that dilation expands the edges of the binary image as opposed to the erosion, which shrinks the edges.

Binary dilation (disk_size = 5) Binary dilation (disk_size = 20)
Binary Dilation (disk_size = 5) Binary Dilation (disk_size = 20)

Console output:

RGB                  | Time: 0:00:00.176491  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.081817  Type: bool    Shape: (1385, 1810)
Binary Dilation      | Time: 0:00:00.096302  Type: uint8   Shape: (1385, 1810)
Binary Dilation      | Time: 0:00:00.538761  Type: uint8   Shape: (1385, 1810)

Opening

As mentioned, opening is erosion followed by dilation. Opening can be used to remove small foreground objects.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
util.display_img(no_grays, "No Grays", bg=True)
bin_opening_5 = filter.filter_binary_opening(no_grays, disk_size=5)
util.display_img(bin_opening_5, "Binary Opening (5)", bg=True)
bin_opening_20 = filter.filter_binary_opening(no_grays, disk_size=20)
util.display_img(bin_opening_20, "Binary Opening (20)", bg=True)
Binary opening (disk_size = 5) Binary opening (disk_size = 20)
Binary Opening (disk_size = 5) Binary Opening (disk_size = 20)

Opening is a fairly expensive operation because it is an erosion followed by a dilation. The compute time increases with the size of the structuring element. The 5-pixel disk radius for the structuring element results in a 0.25s operation, whereas the 20-pixel disk radius results in a 2.45s operation.

RGB                  | Time: 0:00:00.169241  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.085474  Type: bool    Shape: (1385, 1810)
Binary Opening       | Time: 0:00:00.248629  Type: uint8   Shape: (1385, 1810)
Binary Opening       | Time: 0:00:02.452089  Type: uint8   Shape: (1385, 1810)

Closing

Closing is a dilation followed by an erosion. Closing can be used to remove small background holes.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
util.display_img(no_grays, "No Grays", bg=True)
bin_closing_5 = filter.filter_binary_closing(no_grays, disk_size=5)
util.display_img(bin_closing_5, "Binary Closing (5)", bg=True)
bin_closing_20 = filter.filter_binary_closing(no_grays, disk_size=20)
util.display_img(bin_closing_20, "Binary Closing (20)", bg=True)
Binary closing (disk_size = 5) Binary closing (disk_size = 20)
Binary Closing (disk_size = 5) Binary Closing (disk_size = 20)

Like opening, closing is a fairly expensive operation because it performs both a dilation and an erosion. Compute time increases with structuring element size.

RGB                  | Time: 0:00:00.179190  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.079992  Type: bool    Shape: (1385, 1810)
Binary Closing       | Time: 0:00:00.241882  Type: uint8   Shape: (1385, 1810)
Binary Closing       | Time: 0:00:02.592515  Type: uint8   Shape: (1385, 1810)

Remove small objects

The scikit-image remove_small_objects() function removes objects less than a particular minimum size. The filter_remove_small_objects() function wraps this and adds additional functionality. This can be useful for removing small islands of noise from images. We’ll demonstrate it here with two sizes, 100 pixels and 10,000 pixels, and we’ll perform this on the “No Grays” binary image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
util.display_img(no_grays, "No Grays", bg=True)
remove_small_100 = filter.filter_remove_small_objects(no_grays, min_size=100)
util.display_img(remove_small_100, "Remove Small Objects (100)", bg=True)
remove_small_10000 = filter.filter_remove_small_objects(no_grays, min_size=10000)
util.display_img(remove_small_10000, "Remove Small Objects (10000)", bg=True)

Notice in the “No grays” binary image that we see lots of scattered, small objects.

Original slide No grays
Original Slide No Grays

After removing small objects with a connected size less than 100 pixels, we see that the smallest objects have been removed from the binary image. With a minimum size of 10,000 pixels, we see that many larger objects have also been removed from the binary image.

Remove small objects (100) Remove small objects (10000)
Remove Small Objects (100) Remove Small Objects (10000)

The performance of the filters to remove small objects is quite fast.

RGB                  | Time: 0:00:00.177367  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.081827  Type: bool    Shape: (1385, 1810)
Remove Small Objs    | Time: 0:00:00.053734  Type: uint8   Shape: (1385, 1810)
Remove Small Objs    | Time: 0:00:00.044924  Type: uint8   Shape: (1385, 1810)

Remove small holes

The scikit-image remove_small_holes() function is similar to the remove_small_objects() function except it removes holes rather than objects from binary images. Here we demonstrate this using the filter_remove_small_holes() function with sizes of 100 pixels and 10,000 pixels.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
util.display_img(no_grays, "No Grays", bg=True)
remove_small_100 = filter.filter_remove_small_holes(no_grays, min_size=100)
util.display_img(remove_small_100, "Remove Small Holes (100)", bg=True)
remove_small_10000 = filter.filter_remove_small_holes(no_grays, min_size=10000)
util.display_img(remove_small_10000, "Remove Small Holes (10000)", bg=True)

Notice that using a minimum size of 10,000 removes more holes than a size of 100, as we would expect.

Remove small holes (100) Remove small holes (10000)
Remove Small Holes (100) Remove Small Holes (10000)

Console output:

RGB                  | Time: 0:00:00.171669  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.081116  Type: bool    Shape: (1385, 1810)
Remove Small Holes   | Time: 0:00:00.043491  Type: uint8   Shape: (1385, 1810)
Remove Small Holes   | Time: 0:00:00.044550  Type: uint8   Shape: (1385, 1810)

Fill holes

The scikit-image binary_fill_holes() function is similar to the remove_small_holes() function. Using its default settings, it generates results similar to but typically not identical to the remove_small_holes() function with a high minimum size value.

In the following code, we’ll display the result of the filter_binary_fill_holes() function on the image after gray shades have been removed. After this, we’ll perform exclusive-or operations to look at the differences between “Fill holes” and “Remove small holes” with size values of 100 and 10,000.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
no_grays = filter.filter_grays(rgb, output_type="bool")
fill_holes = filter.filter_binary_fill_holes(no_grays)
util.display_img(fill_holes, "Fill Holes", bg=True)

remove_holes_100 = filter.filter_remove_small_holes(no_grays, min_size=100, output_type="bool")
util.display_img(fill_holes ^ remove_holes_100, "Differences between Fill Holes and Remove Small Holes (100)", bg=True)

remove_holes_10000 = filter.filter_remove_small_holes(no_grays, min_size=10000, output_type="bool")
util.display_img(fill_holes ^ remove_holes_10000, "Differences between Fill Holes and Remove Small Holes (10000)", bg=True)
Original slide Fill holes
Original Slide Fill Holes

In this example, increasing the minimum small hole size results in less differences between “Fill holes” and “Remove small holes”.

Differences between fill holes and Remove small holes (100) Differences between fill holes and Remove small holes (10000)
Differences between Fill Holes and Remove Small Holes (100) Differences between Fill Holes and Remove Small Holes (10000)

Console output:

RGB                  | Time: 0:00:00.176696  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.082582  Type: bool    Shape: (1385, 1810)
Binary Fill Holes    | Time: 0:00:00.069583  Type: bool    Shape: (1385, 1810)
Remove Small Holes   | Time: 0:00:00.046232  Type: bool    Shape: (1385, 1810)
Remove Small Holes   | Time: 0:00:00.044539  Type: bool    Shape: (1385, 1810)

Entropy

The scikit-image entropy() function allows us to filter images based on complexity. Beause areas such as slide backgrounds are less complex than areas of interest such as cell nuclei, filtering on entropy offers interesting possibilities for tissue identification.

In the following code, we use the filter_entropy() function to filter the grayscale image based on entropy. We display the resulting binary image. After that, we mask the original image with the entropy mask and the inverse of the entropy mask.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original")
gray = filter.filter_rgb_to_grayscale(rgb)
util.display_img(gray, "Grayscale")
entropy = filter.filter_entropy(gray, output_type="bool")
util.display_img(entropy, "Entropy")
util.display_img(util.mask_rgb(rgb, entropy), "Original with Entropy Mask")
util.display_img(util.mask_rgb(rgb, ~entropy), "Original with Inverse of Entropy Mask")
Original slide Grayscale
Original Slide Grayscale
Entropy filter
Entropy Filter

The results of the original image with the inverse of the entropy mask are particularly interesting. Notice that much of the white background including the shadow region at the top of the slide has been filtered out. Additionally, notice that for the stained regions, a significant amount of the pink eosin-stained area has been filtered out while a smaller proportion of the purple-stained hemotoxylin area has been filtered out. This makes sense because hemotoxylin stains regions such as cell nuclei, which are structures with significant complexity. Therefore, entropy seems like a potential tool that could be used to identify regions of interest where mitoses are occurring.

Original with entropy mask Original with inverse of entropy mask
Original with Entropy Mask Original with Inverse of Entropy Mask

A drawback of using entropy is that its computation is significant. The entropy filter takes over 3 seconds to run in this example.

RGB                  | Time: 0:00:00.177166  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.116245  Type: uint8   Shape: (1385, 1810)
Entropy              | Time: 0:00:03.306786  Type: bool    Shape: (1385, 1810)
Mask RGB             | Time: 0:00:00.010422  Type: uint8   Shape: (1385, 1810, 3)
Mask RGB             | Time: 0:00:00.006140  Type: uint8   Shape: (1385, 1810, 3)

Canny edge detection

Edges in images are areas where there is typically a significant, abrupt change in image brightness. The Canny edge detection algorithm is implemented in sci-kit image. More information about edge detection can be found at https://en.wikipedia.org/wiki/Edge_detection. More information about Canny edge detection can be found at https://en.wikipedia.org/wiki/Canny_edge_detector.

The sci-kit image canny() function returns a binary edge map for the detected edges in an input image. In the following example, we call the filter_canny() function on the grayscale image and display the resulting Canny edges. After this, we crop a 600×600 area of the original slide and display it. We apply the inverse of the canny mask to the cropped original slide area and display it for comparison.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
gray = filter.filter_rgb_to_grayscale(rgb)
canny = filter.filter_canny(gray, output_type="bool")
util.display_img(canny, "Canny", bg=True)
rgb_crop = rgb[300:900, 300:900]
canny_crop = canny[300:900, 300:900]
util.display_img(rgb_crop, "Original", size=24, bg=True)
util.display_img(util.mask_rgb(rgb_crop, ~canny_crop), "Original with ~Canny Mask", size=24, bg=True)
Original Canny edges
Original Canny Edges

By applying the inverse of the canny edge mask to the original image, the detected edges are colored black. This visually accentuates the different structures in the slide.

Cropped original Cropped original with inverse canny edges mask
Cropped Original Cropped Original with Inverse Canny Edges Mask

In the console output, we see that Canny edge detection is fairly expensive because its computation took over 1 second.

RGB                  | Time: 0:00:00.174458  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.116023  Type: uint8   Shape: (1385, 1810)
Canny Edges          | Time: 0:00:01.017241  Type: bool    Shape: (1385, 1810)
Mask RGB             | Time: 0:00:00.001443  Type: uint8   Shape: (600, 600, 3)

Combining filters

Because our image filters use NumPy arrays, it is straightforward to combine our filters. For example, when we have filters that return boolean images for masking, we can use standard boolean algebra on our arrays to perform operations such as AND, OR, XOR, and NOT. We can also run filters on the results of other filters.

As an example, the following code runs our green pen and blue pen filters on the original RGB image to filter out the green and blue pen marks on the slide. We combine the resulting masks with a boolean AND (&) operation, and we display the resulting mask and this mask applied to the original image, masking out the green and blue pen marks from the image.

img_path = slide.get_training_image_path(74)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original")
no_green_pen = filter.filter_green_pen(rgb)
util.display_img(no_green_pen, "No Green Pen")
no_blue_pen = filter.filter_blue_pen(rgb)
util.display_img(no_blue_pen, "No Blue Pen")
no_gp_bp = no_green_pen & no_blue_pen
util.display_img(no_gp_bp, "No Green Pen, No Blue Pen")
util.display_img(util.mask_rgb(rgb, no_gp_bp), "Original with No Green Pen, No Blue Pen")
Original slide
Original Slide
No Green pen No blue pen
No Green Pen No Blue Pen
No green pen, No blue pen Original with no green pen, No blue pen
No Green Pen, No Blue Pen Original with No Green Pen, No Blue Pen

Console Output:

RGB                  | Time: 0:00:00.525283  Type: uint8   Shape: (2592, 3509, 3)
Filter Green Pen     | Time: 0:00:00.562343  Type: bool    Shape: (2592, 3509)
Filter Blue Pen      | Time: 0:00:00.414910  Type: bool    Shape: (2592, 3509)
Mask RGB             | Time: 0:00:00.054763  Type: uint8   Shape: (2592, 3509, 3)

Let’s try another combination of filters that should give us fairly good tissue segmentation for this slide, where the slide background and blue and green pen marks are removed. We can do this for this slide by ANDing together the “No Grays” filter, the “Green Channel” filter, the “No Green Pen” filter, and the “No Blue Pen” filter. Additionally, we can use our “Remove Small Objects” filter to remove small islands from the mask. We display the resulting mask. We apply this mask and the inverse of the mask to the original image to visually see which parts of the slide are passed through and which parts are masked out.

img_path = slide.get_training_image_path(74)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original")
mask = filter.filter_grays(rgb) & filter.filter_green_channel(rgb) & filter.filter_green_pen(rgb) & filter.filter_blue_pen(rgb)
mask = filter.filter_remove_small_objects(mask, min_size=100, output_type="bool")
util.display_img(mask, "No Grays, Green Channel, No Green Pen, No Blue Pen, No Small Objects")
util.display_img(util.mask_rgb(rgb, mask), "Original with No Grays, Green Channel, No Green Pen, No Blue Pen, No Small Objects")
util.display_img(util.mask_rgb(rgb, ~mask), "Original with Inverse Mask")
Original slide No grays, Green channel, No green pen, No blue pen, No small objects
Original Slide No Grays, Green Channel, No Green Pen, No Blue Pen, No Small Objects

We see that this combination does a good job at allowing us to filter the most relevant tissue sections of this slide.

Original with no grays, Green channel, No green pen, No blue pen, No small objects Original with inverse mask
Original with No Grays, Green Channel, No Green Pen, No Blue Pen, No Small Objects Original with Inverse Mask

Console Output:

RGB                  | Time: 0:00:00.496920  Type: uint8   Shape: (2592, 3509, 3)
Filter Grays         | Time: 0:00:00.361576  Type: bool    Shape: (2592, 3509)
Filter Green Channel | Time: 0:00:00.020190  Type: bool    Shape: (2592, 3509)
Filter Green Pen     | Time: 0:00:00.488955  Type: bool    Shape: (2592, 3509)
Filter Blue Pen      | Time: 0:00:00.369501  Type: bool    Shape: (2592, 3509)
Remove Small Objs    | Time: 0:00:00.178179  Type: bool    Shape: (2592, 3509)
Mask RGB             | Time: 0:00:00.047400  Type: uint8   Shape: (2592, 3509, 3)
Mask RGB             | Time: 0:00:00.048710  Type: uint8   Shape: (2592, 3509, 3)

In the wsi/filter.py file, the apply_filters_to_image(slide_num, save=True, display=False) function is the primary way we apply a set of filters to an image with the goal of identifying the tissue in the slide. This function lets us see the results of each filter and the combined results of different filters. If the save parameter is True, the various filter results will be saved to the file system. If the display parameter is True, the filter results will be displayed on the screen. The function returns a tuple consisting of the resulting NumPy array image and a dictionary of information that is used elsewhere for generating an HTML page to view the various filter results for multiple slides, as we will see later.

The apply_filters_to_image() function calls the apply_image_filters() function, which creates green channel, grays, red pen, green pen, and blue pen masks and combines these into a single mask using boolean ANDs. After this, small objects are removed from the mask.

mask_not_green = filter_green_channel(rgb)
mask_not_gray = filter_grays(rgb)
mask_no_red_pen = filter_red_pen(rgb)
mask_no_green_pen = filter_green_pen(rgb)
mask_no_blue_pen = filter_blue_pen(rgb)
mask_gray_green_pens = mask_not_gray & mask_not_green & mask_no_red_pen & mask_no_green_pen & mask_no_blue_pen
mask_remove_small = filter_remove_small_objects(mask_gray_green_pens, min_size=500, output_type="bool")

After each of the above masks is created, it is applied to the original image and the resulting image is saved to the file system, displayed to the screen, or both.

Let’s try out this function. In this example, we run apply_filters_to_image() on slide #337 and display the results to the screen.

filter.apply_filters_to_image(337, display=True, save=False)

Note that this function uses the scaled-down .png image for slide #337. If we have not generated .png images for all of the slides (typically by calling slide.multiprocess_training_slides_to_images()), we can generate the individual scaled-down .png image and then apply the filters to this image.

slide.training_slide_to_image(337)
filter.apply_filters_to_image(337, display=True, save=False)

We see the original slide #337 and the green channel filter applied to it. The original slide is marked as 0.12% masked because a small number of pixels in the original image are black (0 values for the red, green, and blue channels). Notice that the green channel filter with a default threshold of 200 removes most of the white background but only a relatively small fraction of the green pen. The green channel filter masks 72.60% of the original slide.

Slide 337, F001 Slide 337, F002
Slide 337, F001 Slide 337, F002

Here, we see the results of the grays filter and the red pen filter. For this slide, the grays filter masks 68.01% of the slide, which is actually less than the green channel filter. The red pen filter masks only 0.18% of the slide, which makes sense because there are no red pen marks on the slide.

Slide 337, F003 Slide 337, F004
Slide 337, F003 Slide 337, F004

The green pen filter masks 3.81% of the slide. Visually, we see that it does a decent job of masking out the green pen marks on the slide. The blue pen filter masks 0.12% of the slide, which is accurate because there are no blue pen marks on the slide.

Slide 337, F005 Slide 337, F006
Slide 337, F005 Slide 337, F006

Combining the previous filters with a boolean AND results in 74.57% masking. Cleaning up these results by removing small objects results in a masking of 76.11%. This potentially gives a good tissue segmentation that we can use for deep learning.

Slide 337, F007 Slide 337, F008
Slide 337, F007 Slide 337, F008

In the console, we see the slide #337 processing time takes approximately 12.6 seconds in this example. The filtering is only a relatively small fraction of this time. If we set display to False, processing only takes approximately 2.3 second.

Processing slide #337
RGB                  | Time: 0:00:00.568235  Type: uint8   Shape: (2515, 3149, 3)
Filter Green Channel | Time: 0:00:00.017670  Type: bool    Shape: (2515, 3149)
Mask RGB             | Time: 0:00:00.037547  Type: uint8   Shape: (2515, 3149, 3)
Filter Grays         | Time: 0:00:00.323861  Type: bool    Shape: (2515, 3149)
Mask RGB             | Time: 0:00:00.032874  Type: uint8   Shape: (2515, 3149, 3)
Filter Red Pen       | Time: 0:00:00.253547  Type: bool    Shape: (2515, 3149)
Mask RGB             | Time: 0:00:00.035073  Type: uint8   Shape: (2515, 3149, 3)
Filter Green Pen     | Time: 0:00:00.395172  Type: bool    Shape: (2515, 3149)
Mask RGB             | Time: 0:00:00.032597  Type: uint8   Shape: (2515, 3149, 3)
Filter Blue Pen      | Time: 0:00:00.314914  Type: bool    Shape: (2515, 3149)
Mask RGB             | Time: 0:00:00.034853  Type: uint8   Shape: (2515, 3149, 3)
Mask RGB             | Time: 0:00:00.034556  Type: uint8   Shape: (2515, 3149, 3)
Remove Small Objs    | Time: 0:00:00.160241  Type: bool    Shape: (2515, 3149)
Mask RGB             | Time: 0:00:00.030854  Type: uint8   Shape: (2515, 3149, 3)
Slide #337 processing time: 0:00:12.576835

Because the apply_filters_to_image() function returns the resulting image as a NumPy array, we can perform further processing on the image. If we look at the apply_filters_to_image() results for slide #337, we can see that some grayish greenish pen marks remain on the slide. We can filter out some of these using our filter_green() function with different threshold values and our filter_grays() function with an increased tolerance value.

We’ll compare the results by cropping two regions of the slide before and after this additional processing and displaying all four of these regions together.

rgb, _ = filter.apply_filters_to_image(337, display=False, save=False)

not_greenish = filter.filter_green(rgb, red_upper_thresh=125, green_lower_thresh=30, blue_lower_thresh=30, display_np_info=True)
not_grayish = filter.filter_grays(rgb, tolerance=30)
rgb_new = util.mask_rgb(rgb, not_greenish & not_grayish)

row1 = np.concatenate((rgb[1200:1800, 150:750], rgb[1150:1750, 2050:2650]), axis=1)
row2 = np.concatenate((rgb_new[1200:1800, 150:750], rgb_new[1150:1750, 2050:2650]), axis=1)
result = np.concatenate((row1, row2), axis=0)
util.display_img(result)

After the additional processing, we see that the pen marks in the displayed regions have been significantly reduced.

Remove more green and more gray
Remove More Green and More Gray

As another example, here we can see a summary of filters applied to a slide by apply_filters_to_image() and the resulting masked image.

Filter Example
Filter Example

Applying filters to multiple images

When designing our set of tissue-selecting filters, one very important requirement is the ability to visually inspect the filter results across multiple slides. Ideally, we should easily be able to alternate between displaying the results for a single image, a select subset of our training image data set, and our entire data set. Additionally, multiprocessing can result in a significant performance boost, so we should be able to multiprocess our image processing if desired.

A simple, powerful way to visually inspect our filter results is to generate an HTML page for a set of images.

The following functions in the wsi/filter.py file can be used to apply filters to multiple images:

apply_filters_to_image_list(image_num_list, save, display)
apply_filters_to_image_range(start_ind, end_ind, save, display)
singleprocess_apply_filters_to_images(save=True, display=False, html=True, image_num_list=None)
multiprocess_apply_filters_to_images(save=True, display=False, html=True, image_num_list=None)

The apply_filters_to_image_list() function takes a list of image numbers for processing. It does not generate an HTML page but it does generate information that other functions can use to generate an HTML page. The save parameter if True will save the generated images to the file system. If the display parameter is True, the generated images will be displayed to the screen. If several slides are being processed, display should be set to False.

The apply_filters_to_image_range() function is similar to the apply_filters_to_image_list() function except that rather than taking a list of image numbers, it takes a starting index number and ending index number for the slides in the training set. Like the apply_filters_to_image_list() function, the apply_filters_to_image_range() function has save and display parameters.

The singleprocess_apply_filters_to_images() and multiprocess_apply_filters_to_images() functions are the primary functions that should be called to apply filters to multiple images. Both of these functions feature save and display parameters. The additional html parameter if True generates an HTML page for displaying the filter results on the image set. The singleprocess_apply_filters_to_images() and multiprocess_apply_filters_to_images() functions also feature an image_num_list parameter that specifies a list of image numbers that should be processed. If image_num_list is not supplied, all training images are processed.

As an example, let’s use a single process to apply our filters to images 1, 2, and 3. We can accomplish this with the following code:

filter.singleprocess_apply_filters_to_images(image_num_list=[1, 2, 3])

In addition to saving the filtered images to the file system, this creates a filters.html file that displays all of the filtered slide images. If we open the filters.html file in a browser, we can see 8 images displayed for each slide. Each separate slide is displayed as a separate row. In the following images, we see the filter results for slides #1, #2, and #3 displayed in a browser.

Filters for slides 1, 2, 3
Filters for Slides 1, 2, 3

To apply all filters to all images in the training set using multiprocessing, we can use the multiprocess_apply_filters_to_images() function. Because there are 9 generated images per slide (8 of which are shown in the HTML summary) and 500 slides, this results in a total of 4,500 images and 4,500 thumbnails. Generating .png images and .jpg thumbnails, this takes approximately 24 minutes on my MacBook Pro.

filter.multiprocess_apply_filters_to_images()

If we display the filters.html file in a browser, we see that the filter results for the first 50 slides are displayed. By default, results are paginated at 50 slides per page. Pagination can be turned on and off using the FILTER_PAGINATE constant. The pagination size can be adjusted using the FILTER_PAGINATION_SIZE constant.

One useful action we can take is to group similar slides into categories. For example, we can group slides into sets that have red, green, and blue pen marks on them.

red_pen_slides = [4, 15, 24, 48, 63, 67, 115, 117, 122, 130, 135, 165, 166, 185, 209, 237, 245, 249, 279, 281, 282, 289, 336, 349, 357, 380, 450, 482]
green_pen_slides = [51, 74, 84, 86, 125, 180, 200, 337, 359, 360, 375, 382, 431]
blue_pen_slides = [7, 28, 74, 107, 130, 140, 157, 174, 200, 221, 241, 318, 340, 355, 394, 410, 414, 457, 499]

We can run our filters on the list of red pen slides in the following manner:

filter.multiprocess_apply_filters_to_images(image_num_list=red_pen_slides)

In this way, we can make tweaks to specific filters or combinations of specific filters and see how these changes apply to the subset of relevant training images without requiring reprocessing of the entire training data set.

Red pen slides with filter results
Red Pen Slides with Filter Results

Overmask avoidance

When developing filters and filter settings to perform tissue segmentation on the entire training set, we have to deal with a great amount of variation in the slide samples. To begin with, some slides have a large amount of tissue on them, while other slides only have a minimal amount of tissue. There is a great deal of variation in tissue staining. We also need to deal with additional issues such as pen marks and shadows on some of the slides.

Slide #498 is an example of a slide with a large tissue sample. After filtering, the slide is 46% masked.

Slide with large tissue sample Slide with large tissue sample after filtering
Slide with Large Tissue Sample Slide with Large Tissue Sample after Filtering

Slide #127 is an example of a small tissue sample. After filtering, the slide is 93% masked. With such a small tissue sample to begin with, we need to be careful that our filters don’t overmask this slide.

Slide with small tissue sample Slide with small tissue sample after filtering
Slide with Small Tissue Sample Slide with Small Tissue Sample after Filtering

Being aggressive in our filtering might generate excellent results for many of the slides but might result in overmasking of other slides, where the amount of non-tissue masking is too high. For example, if 99% of a slide is masked, it has been overmasked.

Avoiding overmasking across the entire training data set can be difficult. For example, suppose we have a slide that has only a proportionally small amount of tissue on it to start, say 10%. If this particular tissue sample has been poorly stained so that it is perhaps a light purplish grayish color, applying our grays or green channel filters might result in a significant portion of the tissue being masked out. This could also potentially result in small islands of non-masked tissue, and because we use a filter to remove small objects, this could result in the further masking out of additional tissue regions. In such a situation, masking of 95% to 100% of the slide is possible.

Slide #424 has a small tissue sample and its staining is a very faint lavender color. Slide #424 is at risk for overmasking with our given combination of filters.

Slide with small tissue sample and faint staining
Slide with Small Tissue Sample and Faint Staining

Therefore, rather than having fixed settings, we can automatically have our filters tweak parameter values to avoid overmasking, if desired. As examples, the filter_green_channel() and filter_remove_small_objects() functions have this ability. If masking exceeds a certain overmasking threshold, a parameter value can be changed to lower the amount of masking until the masking is below the overmasking threshold.

filter.filter_green_channel(np_img, green_thresh=200, avoid_overmask=True, overmask_thresh=90, output_type="bool")
filter.filter_remove_small_objects(np_img, min_size=3000, avoid_overmask=True, overmask_thresh=95, output_type="uint8")

For the filter_green_channel() function, if a green_thresh value of 200 results in masking over 90%, the function will try with a higher green_thresh value (228) and the masking level will be checked. This will continue until the masking doesn’t exceed the overmask threshold of 90% or the threshold is 255.

For the filter_remove_small_objects() function, if a min_size value of 3000 results in a masking level over 95%, the function will try with a lower min_size value (1500) and the masking level will be checked. These min_size reductions will continue until the masking level isn’t over 95% or the minimum size is 0. For the image filtering specified in the apply_image_filters function, a starting min_size value of 500 for filter_remove_small_objects() is used.

Examining our full set of images using the multiprocess_apply_filters_to_images() function, we can identify slides that are at risk for overmasking. We can create a list of these slide numbers and use multiprocess_apply_filters_to_images() with this list of slide numbers to generate the filters.html page that allows us to visually inspect the filters applied to this set of slides.

overmasked_slides = [1, 21, 29, 37, 43, 88, 116, 126, 127, 142, 145, 173, 196, 220, 225, 234, 238, 284, 292, 294, 304,
                     316, 401, 403, 424, 448, 452, 472, 494]
filter.multiprocess_apply_filters_to_images(image_num_list=overmasked_slides)

Let’s have a look at how we reduce overmasking on slide 21, which is a slide that has very faint staining.

Slide 21
Slide 21

We’ll run our filters on slide #21.

filter.singleprocess_apply_filters_to_images(image_num_list=[21])

If we set the filter_green_channel() and filter_remove_small_objects() avoid_overmask parameters to False, 97.69% of the original image is masked by the “green channel” filter and 99.92% of the original image is masked by the subsequent “remove small objects” filter. This is significant overmasking.

Overmasked by green channel filter (97.69%) Overmasked by remove small objects filter (99.92%)
Overmasked by Green Channel Filter (97.69%) Overmasked by Remove Small Objects Filter (99.92%)

If we set avoid_overmask to True for the filter_remove_small_objects() function, we see that the “remove small objects” filter does not perform any further masking because the 97.69% masking from the previous “green channel” filter already exceeds its overmasking threshold of 95%.

Overmasked by green channel filter (97.69%) Avoid overmask by remove small objects filter (97.69%)
Overmasked by Green Channel Filter (97.69%) Avoid Overmask by Remove Small Objects Filter (97.69%)

If we set avoid_overmask back to False for the filter_remove_small_objects() function and we set avoid_overmask to True for the filter_green_channel() function, we see that 87.91% of the original image is masked by the “green channel” filter (under the 90% overmasking threshold for the filter) and 97.40% of the image is masked by the subsequent “remove small objects” filter.

Avoid overmask by green channel filter (87.91%) Overmask by remove small objects filter (97.40%)
Avoid Overmask by Green Channel Filter (87.91%) Overmask by Remove Small Objects Filter (97.40%)

If we set avoid_overmask to True for both the filter_green_channel() and filter_remove_small_objects() functions, we see that the resulting masking after the “remove small objects” filter has been reduced to 94.88%, which is under its overmasking threshold of 95%.

Avoid overmask by green channel filter (87.91%) Avoid overmask by remove small objects filter (94.88%)
Avoid Overmask by Green Channel Filter (87.91%) Avoid Overmask by Remove Small Objects Filter (94.88%)

Thus, in this example, we’ve reduced the masking from 99.92% to 94.88%.

We can see the filter adjustments being made in the console output.

Processing slide #21
RGB                  | Time: 0:00:00.095414  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.617039  Name: ../data/filter_png/TUPAC-TR-021-001-rgb.png
Save Thumbnail       | Time: 0:00:00.019557  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-001-rgb.jpg
Mask percentage 97.69% >= overmask threshold 90.00% for Remove Green Channel green_thresh=200, so try 228
Filter Green Channel | Time: 0:00:00.005335  Type: bool    Shape: (1496, 1576)
Filter Green Channel | Time: 0:00:00.010499  Type: bool    Shape: (1496, 1576)
Mask RGB             | Time: 0:00:00.009980  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.322629  Name: ../data/filter_png/TUPAC-TR-021-002-rgb-not-green.png
Save Thumbnail       | Time: 0:00:00.018244  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-002-rgb-not-green.jpg
Filter Grays         | Time: 0:00:00.072200  Type: bool    Shape: (1496, 1576)
Mask RGB             | Time: 0:00:00.010461  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.295995  Name: ../data/filter_png/TUPAC-TR-021-003-rgb-not-gray.png
Save Thumbnail       | Time: 0:00:00.017668  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-003-rgb-not-gray.jpg
Filter Red Pen       | Time: 0:00:00.055296  Type: bool    Shape: (1496, 1576)
Mask RGB             | Time: 0:00:00.008704  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.595753  Name: ../data/filter_png/TUPAC-TR-021-004-rgb-no-red-pen.png
Save Thumbnail       | Time: 0:00:00.016758  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-004-rgb-no-red-pen.jpg
Filter Green Pen     | Time: 0:00:00.088633  Type: bool    Shape: (1496, 1576)
Mask RGB             | Time: 0:00:00.008860  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.585474  Name: ../data/filter_png/TUPAC-TR-021-005-rgb-no-green-pen.png
Save Thumbnail       | Time: 0:00:00.016964  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-005-rgb-no-green-pen.jpg
Filter Blue Pen      | Time: 0:00:00.069669  Type: bool    Shape: (1496, 1576)
Mask RGB             | Time: 0:00:00.009665  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.589634  Name: ../data/filter_png/TUPAC-TR-021-006-rgb-no-blue-pen.png
Save Thumbnail       | Time: 0:00:00.016736  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-006-rgb-no-blue-pen.jpg
Mask RGB             | Time: 0:00:00.009115  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.294103  Name: ../data/filter_png/TUPAC-TR-021-007-rgb-no-gray-no-green-no-pens.png
Save Thumbnail       | Time: 0:00:00.017540  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-007-rgb-no-gray-no-green-no-pens.jpg
Mask percentage 97.40% >= overmask threshold 95.00% for Remove Small Objs size 500, so try 250
Mask percentage 96.83% >= overmask threshold 95.00% for Remove Small Objs size 250, so try 125
Mask percentage 95.87% >= overmask threshold 95.00% for Remove Small Objs size 125, so try 62
Remove Small Objs    | Time: 0:00:00.031198  Type: bool    Shape: (1496, 1576)
Remove Small Objs    | Time: 0:00:00.062300  Type: bool    Shape: (1496, 1576)
Remove Small Objs    | Time: 0:00:00.095616  Type: bool    Shape: (1496, 1576)
Remove Small Objs    | Time: 0:00:00.128008  Type: bool    Shape: (1496, 1576)
Mask RGB             | Time: 0:00:00.007214  Type: uint8   Shape: (1496, 1576, 3)
Save Image           | Time: 0:00:00.235025  Name: ../data/filter_png/TUPAC-TR-021-008-rgb-not-green-not-gray-no-pens-remove-small.png
Save Thumbnail       | Time: 0:00:00.016905  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-008-rgb-not-green-not-gray-no-pens-remove-small.jpg
Save Image           | Time: 0:00:00.232206  Name: ../data/filter_png/TUPAC-TR-021-32x-50432x47872-1576x1496-filtered.png
Save Thumbnail       | Time: 0:00:00.017285  Name: ../data/filter_thumbnail_jpg/TUPAC-TR-021-32x-50432x47872-1576x1496-filtered.jpg
Slide #021 processing time: 0:00:04.596086

Summary

This third article in the automatic identification of tissues from big whole-slide images series explained morphology operators and how we combined filters and applied filters to multiple images. In Part 4, we’ll end the series with a discussion on tiling and top tile retrieval.