Taxonomy Icon

Data Science

In the first article in this series, we gave an overview of this project and explained how we scaled down the images. Next, we will investigate image filters and will determine a set of filters that can be utilized for effective tissue segmentation with our data set. We will mask out non-tissue by setting non-tissue pixels to 0 for their red, green, and blue channels. For our particular data set, our mask will AND together a green channel mask, a grays mask, a red pen mask, a green pen mask, and a blue pen mask. Following this, we will mask out small objects from the images.

The filtering approach that we develop here has several benefits. All relevant filters are centralized in a single file, wsi/filter.py, for convenience. Filters return results in a standard format and the returned datatype can easily be changed (boolean, uint8, float). Critical filter debug information (for example, shape, type, and processing time) is output to the console. Filter results can be easily viewed across the entire data set or subsets of the data set. Multi-processing is used for increased performance. Additionally, filters can easily be combined, strung together, or otherwise modified.

To filter our scaled-down 500 .png image training set and generate 4,500 .png filter preview images and 4,500 .jpg thumbnails takes about 23 minutes abd 30 seconds on my MacBook Pro. Filtering the 500 image training set without saving files takes approximately 6 minutes.

Filters

Let’s take a look at several ways that our images can be filtered. Filters are represented by functions in the wsi/filter.py file and have filter_ prepended to the function names.

RGB to grayscale

A very common task in image processing is to convert an RGB image to a grayscale image. In this process, the three color channels are replaced by a single grayscale channel. The grayscale pixel value is computed by combining the red, green, and blue values in set percentages. The filter_rgb_to_grayscale() function multiplies the red value by 21.25%, the green value by 71.54%, and the blue value by 7.21%, and these values are added together to obtain the grayscale pixel value.

Although the PIL Image convert("L") function can also be used to convert an RGB image to a grayscale image, we instead use the filter_rgb_to_grayscale() function because having a reference to the RGB image as a NumPy array can often be very useful during image processing.

The follow code opens a slide as a PIL image, converts this to an RGB NumPy array, and then converts this to a grayscale NumPy array.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
util.display_img(grayscale, "Grayscale")

Now you can see the displayed grayscale image.

Grayscale filter
Grayscale Filter

In the console, we see that the grayscale image is a two-dimensional NumPy array because the 3 color channels have been combined into a single grayscale channel. The data type is uint8 and pixels are represented by integer values between 0 and 255.

RGB                  | Time: 0:00:00.159974  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.101953  Type: uint8   Shape: (1385, 1810)

Complement

In our whole-slide image training set, the slide backgrounds are illuminated by white light, which means that a uint8 pixel in the background of a grayscale image is usually close to or equal to 255. However, conceptually and mathematically it is often useful to have background values close to or equal to 0. For example, this is useful in thresholding, where we might ask if a pixel value is above a particular threshold value. This can also be useful in masking out a background of 0 values from an image.

The filter_complement() function inverts the values, and therefore, the colors in the NumPy array representation of an image. In the following code, we use the filter_complement() function to invert the previously obtained grayscale image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
complement = filter.filter_complement(grayscale)
util.display_img(complement, "Complement")
Complement filter
Complement Filter

In the console output, we see that computing the complement is a very fast operation.

RGB                  | Time: 0:00:00.177398  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.105015  Type: uint8   Shape: (1385, 1810)
Complement           | Time: 0:00:00.001439  Type: uint8   Shape: (1385, 1810)

Thresholding

Basic threshold

With basic thresholding, a binary image is generated, where each value in the resulting NumPy array indicates whether the corresponding pixel in the original image is above a particular threshold value. So, a pixel with a value of 160 with a threshold of 150 would generate a True (or 255, or 1.0), and a pixel with a value of 140 with a threshold of 150 would generate a False (or 0, or 0.0).

In the following code, we apply a basic threshold with a threshold value of 100 to the grayscale complement of the original image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
complement = filter.filter_complement(grayscale)
thresh = filter.filter_threshold(complement, threshold=100)
util.display_img(thresh, "Threshold")

The result is a binary image where pixel values that were above 100 are shown in white and pixel values that were 100 or lower are shown in black.

Basic threshold filter
Basic Threshold Filter

In the console output, we see that basic thresholding is a very fast operation.

RGB                  | Time: 0:00:00.164464  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.102431  Type: uint8   Shape: (1385, 1810)
Complement           | Time: 0:00:00.001397  Type: uint8   Shape: (1385, 1810)
Threshold            | Time: 0:00:00.001456  Type: bool    Shape: (1385, 1810)

Hysteresis threshold

Hysteresis thresholding is a two-level threshold. The top-level threshold is treated in a similar fashion as basic thresholding. The bottom-level threshold must be exceeded and must be connected to the top-level threshold. This processes typically results in much better thresholding than basic thresholding. Reasonable values for the top and bottom thresholds for images can be determined through experimentation.

The filter_hysteresis_threshold() function uses default bottom and top threshold values of 50 and 100. The default array output type from this function is uint8. Because the output of this function is a binary image, the values in the output array will be either 255 or 0. The output type of this function can be specified using the output_type parameter. Note that when performing masking, it is typically more useful to have a NumPy array of boolean values.

In the following code, we perform a hysteresis threshold on the complement of the grayscale image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
complement = filter.filter_complement(grayscale)
hyst = filter.filter_hysteresis_threshold(complement)
util.display_img(hyst, "Hysteresis Threshold")

In the displayed image, the result is a binary image. All pixel values are either white (255) or black (0). The red display text in the corner can be ignored because it is for informational purposes only and is not present when we save the images to the file system.

Notice that the shadow area along the top edge of the slide makes it through the hysteresis threshold filter even though conceptually it is background and should not be treated as tissue.

Hysteresis threshold filter
Hysteresis Threshold Filter

In the following output, we see the console output from our filter operations.

RGB                  | Time: 0:00:00.167947  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.109109  Type: uint8   Shape: (1385, 1810)
Complement           | Time: 0:00:00.001453  Type: uint8   Shape: (1385, 1810)
Hysteresis Threshold | Time: 0:00:00.079292  Type: uint8   Shape: (1385, 1810)

Otsu threshold

Thresholding using Otsu’s method is another popular thresholding technique. This technique was used in the image processing described in A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology. This technique is described in more detail at https://en.wikipedia.org/wiki/Otsu%27s_method.

Let’s try Otsu’s method on the complement image as we did when demonstrating hysteresis thresholding.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
complement = filter.filter_complement(grayscale)
otsu = filter.filter_otsu_threshold(complement)
util.display_img(otsu, "Otsu Threshold")

In the resulting image, we see that Otsu’s method generates roughly similar results as hysteresis thresholding. However, Otsu’s method is less aggressive in terms of what it lets through for the tissue in the upper-left area of the slide. The background shadow area at the top of the slide is passed through the filter in a similar fashion as hysteresis thresholding. Most of the slides in the training set do not have such a pronounced shadow area, but it would be nice to have an image processing solution that treats the shadow area as background.

Otsu threshold filter
Otsu Threshold Filter

In terms of performance, thresholding using Otsu’s method is very fast, as we see in the console output.

RGB                  | Time: 0:00:00.166855  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.111960  Type: uint8   Shape: (1385, 1810)
Complement           | Time: 0:00:00.001746  Type: uint8   Shape: (1385, 1810)
Otsu Threshold       | Time: 0:00:00.014615  Type: uint8   Shape: (1385, 1810)

Contrast

For an image, suppose we have a histogram of the number of pixels (intensity on y-axis) plotted against the range of possible pixel values (x-axis, 0 to 255). Contrast is a measure of the difference in intensities. An image with low contrast is typically dull and details are not clearly seen visually. An image with high contrast is typically sharp and details can clearly be discerned. Increasing the contrast in an image can be used to bring out various details in the image.

Contrast stretching

One form of increasing the contrast in an image is contrast stretching. Suppose that all intensities in an image occur between 100 and 150 on a scale from 0 to 255. If we rescale the intensities so that 100 now corresponds to 0 and 150 corresponds to 255 and we linearly rescale the intensities between these points, we have increased the contrast in the image and differences in detail can more clearly be seen. This is contrast stretching.

As an example, in the following code, we perform contrast stretching with a low pixel value of 100 and a high pixel value of 200 on the complement of the grayscale image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
complement = filter.filter_complement(grayscale)
contrast_stretch = filter.filter_contrast_stretch(complement, low=100, high=200)
util.display_img(contrast_stretch, "Contrast Stretch")

This can be used to visually inspect details in the previous intensity range of 100 to 200 because the image filter has spread out this range across the full spectrum.

Contrast stretching filter
Contrast Stretching Filter

The following output is from this set of filters.

RGB                  | Time: 0:00:00.171582  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.110818  Type: uint8   Shape: (1385, 1810)
Complement           | Time: 0:00:00.002410  Type: uint8   Shape: (1385, 1810)
Contrast Stretch     | Time: 0:00:00.058357  Type: uint8   Shape: (1385, 1810)

Histogram equalization

Histogram equalization is another technique that can be used to increase contrast in an image. However, unlike contrast stretching, which has a linear distribution of the resulting intensities, the histogram equalization transformation is based on probabilities and is non-linear. For more information about histogram equalization, see https://en.wikipedia.org/wiki/Histogram_equalization.

For example, the following code displays the grayscale image. We increase contrast in the grayscale image using histogram equalization and display the resulting image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
util.display_img(grayscale, "Grayscale")
hist_equ = filter.filter_histogram_equalization(grayscale)
util.display_img(hist_equ, "Histogram Equalization")

Comparing the grayscale image and the image after histogram equalization, we see that contrast in the image has been increased.

Grayscale filter Histogram equalization filter
Grayscale Filter Histogram Equalization Filter

The console output following histogram equalization is shown below.

RGB                  | Time: 0:00:00.175498  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.110181  Type: uint8   Shape: (1385, 1810)
Hist Equalization    | Time: 0:00:00.116568  Type: uint8   Shape: (1385, 1810)

Adaptive equalization

Rather than applying a single transformation to all pixels in an image, adaptive histogram equalization applies transformations to local regions in an image. As a result, adaptive equalization allows contrast to be enhanced to different extents in different regions based on the regions’ intensity histograms. For more information about adaptive equalization, see https://en.wikipedia.org/wiki/Adaptive_histogram_equalization.

The filter_adaptive_equalization() function uses the scikit-image contrast limited adaptive histogram equalization (CLAHE) implementation. In the following code, we apply adaptive equalization to the grayscale image and display both the grayscale image and the image after adaptive equalization for comparison.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
grayscale = filter.filter_rgb_to_grayscale(rgb)
util.display_img(grayscale, "Grayscale")
adaptive_equ = filter.filter_adaptive_equalization(grayscale)
util.display_img(adaptive_equ, "Adaptive Equalization")
Grayscale filter Adaptive equalization filter
Grayscale Filter Adaptive Equalization Filter

In the console output, we can see that adaptive equalization is more compute-intensive than contrast stretching and histogram equalization.

RGB                  | Time: 0:00:00.167076  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.106797  Type: uint8   Shape: (1385, 1810)
Adapt Equalization   | Time: 0:00:00.223172  Type: uint8   Shape: (1385, 1810)

Color

The WSI tissue samples in the training data set have been H&E stained. Eosin stains basic structures such as most cytoplasm proteins with a pink tone. Hematoxylin stains acidic structures such as DNA and RNA with a purple tone. This means that cells tend to be stained pink, and particular areas of the cells such as the nuclei tend to be stained purple. However, note that appearance can vary greatly based on the types of cells that are stained and the amounts of stain applied.

As an example of staining differences, the following image shows a slide that has pink and purple staining next to another slide where all of the tissue appears purple.

Pink and purple slide Purple slide
Pink and Purple Slide Purple Slide

Another factor regarding color is that many slides have been marked with red, green, and blue pens. Whereas in general we would like our filters to include pink and purple colors because these typically indicate stained tissue, we would like our filters to exclude red, green, and blue colors because these typically indicate pen marks on the slides that are not tissue.

The following image shows an example of a slide that has been marked with red pen and some green pen.

Slide marked with red and green pen
Slide Marked with Red and Green Pen

Developing color filters that can be used to filter tissue areas can be fairly challenging for a variety of reasons, including:

  1. Filters need to be general enough to work across all slides in the dataset.
  2. Filters should handle issues such as variations in shadows and lighting.
  3. The amount of H&E (purple and pink) staining can vary greatly from slide to slide.
  4. Pen mark colors (red, green, and blue) vary due to issues such as lighting and pen marks over tissue.
  5. There can be color overlap between stained tissue and pen marks, so we need to balance how aggressively stain colors are inclusively filtered and how pen colors are exclusively filtered.

RGB to HED

The scikit-image skimage.color package features an rgb2hed() function that performs color deconvolution on the original RGB image to create HED (Hematoxylin, Eosin, Diaminobenzidine) channels. The filter_rgb_to_hed() function encapsulates rgb2hed(). The filter_hed_to_hematoxylin() and filter_hed_to_eosin() functions read the hematoxylin and eosin channels and rescale the resulting 2-dimensional NumPy arrays (for example, 0 to 255 for uint8) to increase contrast.

The following code converts the RGB image to an HED image. It then obtains the hematoxylin and eosin channels and displays the resulting images.

img_path = slide.get_training_image_path(4)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
hed = filter.filter_rgb_to_hed(rgb)
hema = filter.filter_hed_to_hematoxylin(hed)
util.display_img(hema, "Hematoxylin Channel")
eosin = filter.filter_hed_to_eosin(hed)
util.display_img(eosin, "Eosin Channel")

Notice that the hematoxylin channel does fairly well at detecting the purple areas of the original slide, which could potentially be used to narrow in on tissue with cell nuclei and thus on regions that can be inspected for mitoses. Both the hematoxylin and eosin channel filters include the background in the resulting image, which is rather unfortunate in terms of differentiating tissue from non-tissue. Also, notice in the eosin channel that the red pen is considered to be part of the eosin stain spectrum.

Hematoxylin channel Eosin channel
Hematoxylin Channel Eosin Channel

Console output:

RGB                  | Time: 0:00:00.397570  Type: uint8   Shape: (2594, 2945, 3)
RGB to HED           | Time: 0:00:01.322220  Type: uint8   Shape: (2594, 2945, 3)
HED to Hematoxylin   | Time: 0:00:00.136750  Type: uint8   Shape: (2594, 2945)
HED to Eosin         | Time: 0:00:00.086537  Type: uint8   Shape: (2594, 2945)

Green channel filter

If we look at an RGB color wheel, we see that purple and pink are next to each other. On the other side of color wheel, we have yellow and green. Because green is one of our 3 NumPy array RGB color channels, filtering out pixels that have a high green channel value can be one way to potentially filter out parts of the slide that are not pink or purple. This includes the white background because white also has a high green channel value along with high red and blue channel values.

We’ll use the default green threshold value of 200 for the filter_green_channel() function, meaning that any pixels with green channel values of 200 or greater will be rejected.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_green = filter.filter_green_channel(rgb)
util.display_img(not_green, "Green Channel Filter")

The green channel filter does a decent job of differentiating the tissue from the white background. However, notice that the shadow area at the top of the slide is not excluded by the filter.

Original slide Green channel filter
Original Slide Green Channel Filter

A filter such as the green channel filter most likely would be used in conjunction with other filters for masking purposes. As a result, the default output type for the green channel filter is bool, as we see in the console output. If another output type is wanted, this can be set with the function’s output_type parameter.

RGB                  | Time: 0:00:00.169249  Type: uint8   Shape: (1385, 1810, 3)
Filter Green Channel | Time: 0:00:00.005618  Type: bool    Shape: (1385, 1810)

Grays filter

Next, let’s utilize a filter that can filter out the annoying shadow area at the top of slide #2. Notice that the shadow area consists of a gradient of dark-to-light grays. A gray pixel has red, green, and blue channel values that are close together. The filter_grays() function filters out pixels that have red, blue, and green values that are within a certain tolerance of each other. The default tolerance for filter_grays() is 15. The grays filter also filters out white and black pixels because they have similar red, green, and blue values.

The following code runs the grays filter on the original RGB image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_grays = filter.filter_grays(rgb)
util.display_img(not_grays, "Grays Filter")

Notice that in addition to filtering out the white background, the grays filter has indeed filtered out the shadow area at the top of the slide.

Original slide Grays filter
Original Slide Grays Filter

Like the green channel filter, the default type of the returned array is bool becasue the grays filter will typically be used in combination with other filters. Because the grays filter is fast, it offers a potentially low-cost way to filter out shadows from the slides during preprocessing.

RGB                  | Time: 0:00:00.169642  Type: uint8   Shape: (1385, 1810, 3)
Filter Grays         | Time: 0:00:00.082075  Type: bool    Shape: (1385, 1810)

Red filter

Next, let’s turn our attention to filtering out shades of red, which can be used to filter out a significant amount of the red pen color. The red pen consists of a wide variety of closely related red shades. Certain shades are reddish, others are maroonish, and others are pinkish, for example. These color gradations are a result of a variety of factors, such as the amount of ink, lighting, shadowing, and tissue under the pen marks.

The filter_red() function filters out reddish colors through a red channel lower threshold value, a green channel upper threshold value, and a blue channel upper threshold value. The generated mask is based on a pixel being above the red channel threshold value and below the green and blue channel threshold values. One way to determine these values is to display the slide image in a web browser and use a tool such as the Chrome ColorPick Eyedropper to click on a red pen pixel to determine the approximate red, green, and blue values.

In this example with slide #4, we use a red threshold value of 150, a green threshold value of 80, and a blue threshold value of 90. In addition, to help us visualize the filter results, we apply the red filter to the original RGB image as a mask and also apply the inverse of the red filter to the original image as a mask.

img_path = slide.get_training_image_path(4)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_red = filter.filter_red(rgb, red_lower_thresh=150, green_upper_thresh=80, blue_upper_thresh=90, display_np_info=True)
util.display_img(not_red, "Red Filter (150, 80, 90)")
util.display_img(util.mask_rgb(rgb, not_red), "Not Red")
util.display_img(util.mask_rgb(rgb, ~not_red), "Red")

We see that the red filter filters out much of the red pen.

Original slide Red filter
Original Slide Red Filter

Applying the red filter and the inverse of the red filter as masks to the original image, we see that our threshold values did quite well at filtering out a large amount of the red pen.

Not red Red
Not Red Red

The following shows the console output from the previous image filtering:

RGB                  | Time: 0:00:00.404069  Type: uint8   Shape: (2594, 2945, 3)
Filter Red           | Time: 0:00:00.034864  Type: bool    Shape: (2594, 2945)
Mask RGB             | Time: 0:00:00.053997  Type: uint8   Shape: (2594, 2945, 3)
Mask RGB             | Time: 0:00:00.022750  Type: uint8   Shape: (2594, 2945, 3)

Red pen filter

Next, let’s turn our attention to a more inclusive red pen filter that handles more shades of red. Because the filter_red() function returns a boolean array result, we can combine multiple sets of filter_red() threshold values (red_lower_thresh, green_upper_thresh, blue_upper_thresh) using boolean operators such as &. We can determine these values using a color picker tool such as the Chrome ColorPick Eyedropper, as mentioned previously. In addition to determining various shades of red pen on a single slide, shades of red pen from other slides should be identified and included. Note that we need to be careful with pinkish shades of red due to the similarity of these shades to eosin staining.

Using the color picker technique, the filter_red_pen() function uses the following sets of red threshold values.

result = filter_red(rgb, red_lower_thresh=150, green_upper_thresh=80, blue_upper_thresh=90) & \
         filter_red(rgb, red_lower_thresh=110, green_upper_thresh=20, blue_upper_thresh=30) & \
         filter_red(rgb, red_lower_thresh=185, green_upper_thresh=65, blue_upper_thresh=105) & \
         filter_red(rgb, red_lower_thresh=195, green_upper_thresh=85, blue_upper_thresh=125) & \
         filter_red(rgb, red_lower_thresh=220, green_upper_thresh=115, blue_upper_thresh=145) & \
         filter_red(rgb, red_lower_thresh=125, green_upper_thresh=40, blue_upper_thresh=70) & \
         filter_red(rgb, red_lower_thresh=200, green_upper_thresh=120, blue_upper_thresh=150) & \
         filter_red(rgb, red_lower_thresh=100, green_upper_thresh=50, blue_upper_thresh=65) & \
         filter_red(rgb, red_lower_thresh=85, green_upper_thresh=25, blue_upper_thresh=45)

Let’s apply the red pen filter to slide #4.

img_path = slide.get_training_image_path(4)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_red_pen = filter.filter_red_pen(rgb)
util.display_img(not_red_pen, "Red Pen Filter")
util.display_img(util.mask_rgb(rgb, not_red_pen), "Not Red Pen")
util.display_img(util.mask_rgb(rgb, ~not_red_pen), "Red Pen")
Original slide Red pen filter
Original Slide Red Pen Filter

Compared with using a single set of red threshold values, we can see that the red pen filter is significantly more inclusive in terms of the shades of red that are accepted. As a result, more red pen is filtered. However, notice that some of the pinkish-red from eosin-stained tissue is also included as a result of this more aggressive filtering.

Not red pen Red pen
Not Red Pen Red Pen

Even though the red pen filter ANDs nine sets of red filter results together, we see that the performance is excellent.

RGB                  | Time: 0:00:00.392082  Type: uint8   Shape: (2594, 2945, 3)
Filter Red Pen       | Time: 0:00:00.251170  Type: bool    Shape: (2594, 2945)
Mask RGB             | Time: 0:00:00.037256  Type: uint8   Shape: (2594, 2945, 3)
Mask RGB             | Time: 0:00:00.026589  Type: uint8   Shape: (2594, 2945, 3)

Blue filter

If we visually examine the 500 slides in the training data set, we see that several of the slides have been marked with blue pen. Rather than blue lines, many of the blue marks consist of blue dots surrounding particular areas of interest on the slides, although this is not always the case. Some of the slides also have blue pen lines. Once again, the blue pen marks consist of several gradations of blue.

We’ll start by creating a filter to filter out blue. The filter_blue() function operates in a similar way as the filter_red() function. It takes a red channel upper threshold value, a green channel upper threshold value, and a blue channel lower threshold value. The generated mask is based on a pixel being below the red channel threshold value, below the green channel threshold value, and above the blue channel threshold value.

Once again, we’ll apply the results of the blue filter and the inverse of the blue filter as masks to the original RGB image to help visualize the filter results.

img_path = slide.get_training_image_path(241)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_blue = filter.filter_blue(rgb, red_upper_thresh=130, green_upper_thresh=155, blue_lower_thresh=180, display_np_info=True)
util.display_img(not_blue, "Blue Filter (130, 155, 180)")
util.display_img(util.mask_rgb(rgb, not_blue), "Not Blue")
util.display_img(util.mask_rgb(rgb, ~not_blue), "Blue")

We see that a lot of the blue pen has been filtered out.

Original slide Blue filter
Original Slide Blue Filter
Not blue Blue
Not Blue Blue

Console output:

RGB                  | Time: 0:00:00.432772  Type: uint8   Shape: (2058, 3240, 3)
Filter Blue          | Time: 0:00:00.029066  Type: bool    Shape: (2058, 3240)
Mask RGB             | Time: 0:00:00.038966  Type: uint8   Shape: (2058, 3240, 3)
Mask RGB             | Time: 0:00:00.021153  Type: uint8   Shape: (2058, 3240, 3)

Blue pen filter

In filter_blue_pen(), we AND together various blue shade ranges using filter_blue() with sets of red, green, and blue threshold values to create a blue pen filter that filters out various shades of blue.

result = filter_blue(rgb, red_upper_thresh=60, green_upper_thresh=120, blue_lower_thresh=190) & \
         filter_blue(rgb, red_upper_thresh=120, green_upper_thresh=170, blue_lower_thresh=200) & \
         filter_blue(rgb, red_upper_thresh=175, green_upper_thresh=210, blue_lower_thresh=230) & \
         filter_blue(rgb, red_upper_thresh=145, green_upper_thresh=180, blue_lower_thresh=210) & \
         filter_blue(rgb, red_upper_thresh=37, green_upper_thresh=95, blue_lower_thresh=160) & \
         filter_blue(rgb, red_upper_thresh=30, green_upper_thresh=65, blue_lower_thresh=130) & \
         filter_blue(rgb, red_upper_thresh=130, green_upper_thresh=155, blue_lower_thresh=180) & \
         filter_blue(rgb, red_upper_thresh=40, green_upper_thresh=35, blue_lower_thresh=85) & \
         filter_blue(rgb, red_upper_thresh=30, green_upper_thresh=20, blue_lower_thresh=65) & \
         filter_blue(rgb, red_upper_thresh=90, green_upper_thresh=90, blue_lower_thresh=140) & \
         filter_blue(rgb, red_upper_thresh=60, green_upper_thresh=60, blue_lower_thresh=120) & \
         filter_blue(rgb, red_upper_thresh=110, green_upper_thresh=110, blue_lower_thresh=175)

We apply the filter and its inverse to the original slide to help us visualize the results.

img_path = slide.get_training_image_path(241)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_blue_pen = filter.filter_blue_pen(rgb)
util.display_img(not_blue_pen, "Blue Pen Filter")
util.display_img(util.mask_rgb(rgb, not_blue_pen), "Not Blue Pen")
util.display_img(util.mask_rgb(rgb, ~not_blue_pen), "Blue Pen")

For this slide, we see that filter_blue_pen() filters out more blue than the previous filter_blue() example.

Original slide Blue pen filter
Original Slide Blue Pen Filter
Not blue pen Blue pen
Not Blue Pen Blue Pen

We see from the console output that the blue pen filter is quite fast.

RGB                  | Time: 0:00:00.348514  Type: uint8   Shape: (2058, 3240, 3)
Filter Blue Pen      | Time: 0:00:00.288286  Type: bool    Shape: (2058, 3240)
Mask RGB             | Time: 0:00:00.033348  Type: uint8   Shape: (2058, 3240, 3)
Mask RGB             | Time: 0:00:00.019622  Type: uint8   Shape: (2058, 3240, 3)

As an aside, we can quantify the differences in filtering between the filter_blue() and filter_blue_pen() results.

not_blue = filter.filter_blue(rgb, red_upper_thresh=130, green_upper_thresh=155, blue_lower_thresh=180, display_np_info=True)
not_blue_pen = filter.filter_blue_pen(rgb)
print("filter_blue: " + filter.mask_percentage_text(filter.mask_percent(not_blue)))
print("filter_blue_pen: " + filter.mask_percentage_text(filter.mask_percent(not_blue_pen)))

The filter_blue() example filters out 0.45% of the slide pixels and the filter_blue_pen() example filters out 0.69% of the slide pixels.

filter_blue: 0.45%
filter_blue_pen: 0.69%

Green filter

We utilize the filter_green() function to filter green color shades. Using a color picker tool, if we examine the green pen marks on the slides, the green and blue channel values for pixels appear to track together. As a result of this, this function has a red channel upper threshold value, a green channel lower threshold value, and a blue channel lower threshold value.

img_path = slide.get_training_image_path(51)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_green = filter.filter_green(rgb, red_upper_thresh=150, green_lower_thresh=160, blue_lower_thresh=140, display_np_info=True)
util.display_img(not_green, "Green Filter (150, 160, 140)")
util.display_img(util.mask_rgb(rgb, not_green), "Not Green")
util.display_img(util.mask_rgb(rgb, ~not_green), "Green")

Using a red upper threshold of 150, a green lower threshold of 160, and a blue lower threshold of 140, we see that much of the green ink above the background is filtered out, but most of the green ink above the tissue is not filtered out.

Original slide Green filter
Original Slide Green Filter
Not green Green
Not Green Green

Console output:

RGB                  | Time: 0:00:00.611914  Type: uint8   Shape: (2291, 3839, 3)
Filter Green         | Time: 0:00:00.077429  Type: bool    Shape: (2291, 3839)
Mask RGB             | Time: 0:00:00.049026  Type: uint8   Shape: (2291, 3839, 3)
Mask RGB             | Time: 0:00:00.027211  Type: uint8   Shape: (2291, 3839, 3)

Green pen filter

To handle the green pen shades, the filter_green_pen() function combines different shade results using sets of red, green, and blue threshold values passed to the filter_green() function.

result = filter_green(rgb, red_upper_thresh=150, green_lower_thresh=160, blue_lower_thresh=140) & \
         filter_green(rgb, red_upper_thresh=70, green_lower_thresh=110, blue_lower_thresh=110) & \
         filter_green(rgb, red_upper_thresh=45, green_lower_thresh=115, blue_lower_thresh=100) & \
         filter_green(rgb, red_upper_thresh=30, green_lower_thresh=75, blue_lower_thresh=60) & \
         filter_green(rgb, red_upper_thresh=195, green_lower_thresh=220, blue_lower_thresh=210) & \
         filter_green(rgb, red_upper_thresh=225, green_lower_thresh=230, blue_lower_thresh=225) & \
         filter_green(rgb, red_upper_thresh=170, green_lower_thresh=210, blue_lower_thresh=200) & \
         filter_green(rgb, red_upper_thresh=20, green_lower_thresh=30, blue_lower_thresh=20) & \
         filter_green(rgb, red_upper_thresh=50, green_lower_thresh=60, blue_lower_thresh=40) & \
         filter_green(rgb, red_upper_thresh=30, green_lower_thresh=50, blue_lower_thresh=35) & \
         filter_green(rgb, red_upper_thresh=65, green_lower_thresh=70, blue_lower_thresh=60) & \
         filter_green(rgb, red_upper_thresh=100, green_lower_thresh=110, blue_lower_thresh=105) & \
         filter_green(rgb, red_upper_thresh=165, green_lower_thresh=180, blue_lower_thresh=180) & \
         filter_green(rgb, red_upper_thresh=140, green_lower_thresh=140, blue_lower_thresh=150) & \
         filter_green(rgb, red_upper_thresh=185, green_lower_thresh=195, blue_lower_thresh=195)

If we apply the green pen filter, we see that it includes most of the green shades above the tissue in slide 51.

img_path = slide.get_training_image_path(51)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "RGB")
not_green_pen = filter.filter_green_pen(rgb)
util.display_img(not_green_pen, "Green Pen Filter")
util.display_img(util.mask_rgb(rgb, not_green_pen), "Not Green Pen")
util.display_img(util.mask_rgb(rgb, ~not_green_pen), "Green Pen")
Original slide Green pen filter
Original Slide Green Pen Filter
Not green pen Green pen
Not Green Pen Green Pen

Like the other pen filters, the green pen filter’s performance is quite good.

RGB                  | Time: 0:00:00.540223  Type: uint8   Shape: (2291, 3839, 3)
Filter Green Pen     | Time: 0:00:00.487728  Type: bool    Shape: (2291, 3839)
Mask RGB             | Time: 0:00:00.044024  Type: uint8   Shape: (2291, 3839, 3)
Mask RGB             | Time: 0:00:00.022867  Type: uint8   Shape: (2291, 3839, 3)

K-means segmentation

The scikit-image library contains functionality that allows for image segmentation using k-means clustering based on location and color. This allows regions of similarly colored pixels to be grouped together. These regions are colored based on the average color of the pixels in the individual regions. This could potentially be used to filter regions based on their colors, where we could filter on pink shades for eosin-stained tissue and purple shades for hematoxylin-stained tissue.

The filter_kmeans_segmentation() function has a default value of 800 segments. We’ll increase this to 3000 using the n_segments parameter. In the following example, we’ll perform k-means segmentation on the original image. In addition, we’ll create a threshold using Otsu’s method and apply the resulting mask to the original image. We’ll then perform k-means segmentation on that image.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
kmeans_seg = filter.filter_kmeans_segmentation(rgb, n_segments=3000)
util.display_img(kmeans_seg, "K-Means Segmentation", bg=True)
otsu_mask = util.mask_rgb(rgb, filter.filter_otsu_threshold(filter.filter_complement(filter.filter_rgb_to_grayscale(rgb)), output_type="bool"))
util.display_img(otsu_mask, "Image after Otsu Mask", bg=True)
kmeans_seg_otsu = filter.filter_kmeans_segmentation(otsu_mask, n_segments=3000)
util.display_img(kmeans_seg_otsu, "K-Means Segmentation after Otsu Mask", bg=True)
Original slide K-means segmentation
Original Slide K-Means Segmentation
Image after Otsu Mask K-means segmentation after Otsu Mask
Image after Otsu Mask K-Means Segmentation after Otsu Mask

Note that there are a couple of practical difficulties in terms of implementing automated tissue detection using k-means segmentation. To begin with, due to the variation in tissue stain colors across the image data set, it can be difficult to filter on “pinkish” and “purplish” colors across all of the slides. In addition, the k-means segmentation technique is very computationally expensive. As we can see in the console output, the compute time increases with the number of segments. For 3000 segments, we have a filter time of approximately 20 seconds, whereas all operations that we have seen up to this point are sub-second. If we use the default value of 800 segments, the compute time for the k-means segmentation filter is approximately 7 seconds.

RGB                  | Time: 0:00:00.172848  Type: uint8   Shape: (1385, 1810, 3)
K-Means Segmentation | Time: 0:00:20.238886  Type: uint8   Shape: (1385, 1810, 3)
Gray                 | Time: 0:00:00.076287  Type: uint8   Shape: (1385, 1810)
Complement           | Time: 0:00:00.000374  Type: uint8   Shape: (1385, 1810)
Otsu Threshold       | Time: 0:00:00.013864  Type: bool    Shape: (1385, 1810)
Mask RGB             | Time: 0:00:00.008522  Type: uint8   Shape: (1385, 1810, 3)
K-Means Segmentation | Time: 0:00:20.130044  Type: uint8   Shape: (1385, 1810, 3)

The sci-kit image library also makes it possible to combine similarly colored regions. One way to do this with the k-means segmentation results is to build a region adjacency graph (RAG) and combine regions based on a threshold value. The filter_rag_threshold() function performs k-means segmentation, builds the RAG, and allows us to pass in the RAG threshold value.

In the following code, we perform k-means segmentation, build a RAG, and apply different RAG thresholds to combine similar regions.

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
util.display_img(rgb, "Original", bg=True)
rag_thresh = filter.filter_rag_threshold(rgb)
util.display_img(rag_thresh, "RAG Threshold (9)", bg=True)
rag_thresh = filter.filter_rag_threshold(rgb, threshold=1)
util.display_img(rag_thresh, "RAG Threshold (1)", bg=True)
rag_thresh = filter.filter_rag_threshold(rgb, threshold=20)
util.display_img(rag_thresh, "RAG Threshold (20)", bg=True)
Original slide RAG threshold = 9
Original Slide RAG Threshold = 9
RAG threshold = 1 RAG threshold = 20
RAG Threshold = 1 RAG Threshold = 20

Even using the default 800 number of segments for the k-means segmentation, we see that this technique is very computationally expensive.

RGB                  | Time: 0:00:00.462239  Type: uint8   Shape: (1385, 1810, 3)
RAG Threshold        | Time: 0:00:24.677776  Type: uint8   Shape: (1385, 1810, 3)
RAG Threshold        | Time: 0:00:26.683581  Type: uint8   Shape: (1385, 1810, 3)
RAG Threshold        | Time: 0:00:23.774296  Type: uint8   Shape: (1385, 1810, 3)

RGB to HSV

Comparing hematoxylin and eosin staining can be challenging in the RGB color space. One way to simplify this comparison is to convert to a different color space such as HSV (Hue-Saturation-Value). The scikit-image skimage.color package features an rgb2hsv() function that converts an RGB image to an HSV image. The filter_rgb_to_hsv() function wraps this scikit-image function. In the HSV color model, the hue is represented by 360 degrees. Purple has a hue of 270 and pink has a hue of 330. We discuss hematoxylin and eosin stain comparison in our later discussion of tile scoring, where we favor hematoxylin-stained tissue over eosin-stained tissue.

As an example, in the wsi/tiles.py file, the display_image_with_rgb_and_hsv_histograms() function takes in an image as a NumPy array in RGB color space and displays the image along with its RGB and HSV histograms. Internally, this function utilizes the filter_rgb_to_hsv() function.

# To get around renderer issue on OSX going from Matplotlib image to NumPy image.
import matplotlib
matplotlib.use('Agg')

from deephistopath.wsi import slide
from deephistopath.wsi import tiles
from deephistopath.wsi import util

img_path = slide.get_training_image_path(2)
img = slide.open_image(img_path)
rgb = util.pil_to_np_rgb(img)
tiles.display_image_with_rgb_and_hsv_histograms(rgb)

Here we see slide #2 along with its RGB and HSV histograms. Notice that the HSV hue histogram columns have additionally been colored based on their corresponding hue values to aid in visual inspection.

Slide 2 RGB and HSV histograms
Slide 2 RGB and HSV Histograms

Summary

This second article in the automatic identification of tissues from big whole-slide images series showed how we investigated image filters and determined a set of filters that can be utilized for effective tissue segmentation with our data set. We masked out non-tissue by setting non-tissue pixels to 0 for their red, green, and blue channels. In Part 3 of this series we’ll discuss morphology operators and combining filters and applying filters to multiple images.