Course Notes Solutions Answers Image Processing in Python
Course Notes Solutions Answers Image Processing in Python
This is a memo to share what I have learnt in Image Processing (in Python), capturing the learning objectives
as well as my personal notes. The course is taught by Rebeca Gonzalez from DataCamp, and it includes 4
chapters:
Images are everywhere! We live in a time where images contain lots of information, which is sometimes
difficult to obtain. This is why image pre-processing has become a highly valuable skill, applicable in many
use cases. In this course, you will learn to process, transform, and manipulate images at your will, even when
they come in thousands. You will also learn to restore damaged images, perform noise reduction, smart-resize
images, count the number of dots on a dice, apply facial detection, and much more, using scikit-image.
After completing this course, you will be able to apply your knowledge to different domains such as machine
learning and artificial intelligence, machine and robotic vision, space and medical image analysis, retailing,
and many more. Take the step and dive into the wonderful world that is computer vision!
Chapter 1. Introducing Image Processing and scikit-image
Jump into digital image structures and learn to process them! Extract data, transform and analyze images
using NumPy and Scikit-image. With just a few lines of code, you will convert RGB images to grayscale, get
data from them, obtain histograms containing very useful information, and separate objects from the
background!
What is an image?
A digital image is an array, or a matrix, of square pixels
(picture elements) arranged in columns and rows: in other
words, a 2-dimensional matrix. These pixels contain
information about color and intensity.
Here's an example of the matrix for a 2D grayscale image. Here we can see that the first image is a pixelated
image. The numbers that we see on top of the second image correspond to the intensity of each pixel in the
image. So at the end, an image can be treated as an intensities matrix.
coffee_image = data.coffee()
coins_image = data.coins()
Choose the right answer that best describes the main difference related to color and dimensional structure.
In the console, use the function shape() from NumPy, to obtain the image shape (Height, Width, Dimensions) and find out. NumPy is already
imported as np.
The coffee image is RGB-3 colored, that's why it has a 3 at the end, when displaying the shape (H, W, D) of it.
While the coins image is grayscale and has a single color channel.
RGB to grayscale
In this exercise you will load an image from scikit-image
module data and make it grayscale, then compare both of
them in the output.
Here we can see the individual color intensities along the image. For example, we obtain the red color of an
image by keeping the height and width pixels and selecting only the values of the first color layer.
This Madrid picture is 426 pixels high and 640 pixels wide. It
has three layers for color representation: it's an RGB-3 image.
So it has shape of (426, 640, 3). And a total number of pixels of 817920.
Matplotlib has a histogram method. It takes an input array (frequency) and bins as parameters. The
successive elements in bin array act as the boundary of each bin. We obtain the red color channel of the image
by slicing it. We then use the histogram function. Use ravel to return a continuous flattened array from the
color values of the image, in this case red. And pass this ravel and the bins as parameters.
We set bins to 256 because we'll show the number of pixels for every pixel value, that is, from 0 to 255.
Meaning you need 256 values to show the histogram.
Flipping out
As a prank, someone has turned an image from a photo album of a trip to Seville upside-down and back-to-front! Now, we need to straighten the
image, by flipping it.
Using the NumPy methods learned in the course, flip the image horizontally and vertically. Then display the corrected image using
the show_image() function.
Histograms
In this exercise, you will analyze the amount of red in the image. To do this, the histogram of the red channel will be computed for the image
shown below:
Extracting information from images is a fundamental part of image enhancement. This way you can balance the red and blue to make the image
look colder or warmer.
You will use hist() to display the 256 different intensities of the red color. And ravel() to make these color values an array of one flat
dimension.
Remember that if we want to obtain the green color of an image we would do the following:
green = image[:, :, 1]
# Obtain the red channel
red_channel = image[:, :, 0]
With this histogram we see that the image is quite reddish, meaning it has a sensation of warmness. This is
because it has a wide and large distribution of bright red pixels, from 0 to around 150.
Getting started with thresholding
Thresholding is used to partition the background and foreground of grayscale
images, by essentially making them black and white. We compare each pixel
to a given threshold value.
• If pixel > thresh value: 255 (white)
• If pixel < thresh value: 0 (black)
Here we have an image to compare. In this case, it seems that local is without a doubt the best option.
If the image doesn't have high contrast or the background is uneven, local thresholding produces better
results.
Apply global thresholding
In this exercise, you'll transform a photograph to binary so you can separate the foreground from the background.
To do so, you need to import the required modules, load the image, obtain the optimal thresh value using threshold_otsu() and apply it to the
image.
You'll see the resulting binarized image when using the show_image() function, previously explained.
Remember we have to turn colored images to grayscale. For that we will use the rgb2gray() function learned in previous video. Which has
already been imported for you.
You just converted the image to binary and we can separate the foreground from the background.
You will compare both types of thresholding methods (global and local), to find the optimal way to obtain the binary image we need.
You will apply this function to this image, matplotlib.pyplot has been loaded as plt. Remember that you can use try_all_threshold() to try
multiple global algorithms.
What type of thresholding would you use judging by the characteristics of the image? Is the background illumination and intensity even or
uneven?
By using a global thresholding method, you obtained the precise binarized image. If you would have used
local instead nothing would have been segmented.
Try it yourself and see it! In the next chapters, we'll get into image restoration, face detection, and much
more. Stay tuned!
Chapter 2. Filters, Contrast, Transformation and
Morphology
You will learn to detect object shapes using edge detection filters, improve medical images with contrast
enhancement and even enlarge pictures to five times its original size! You will also apply morphology to make
thresholding more accurate when segmenting images and go to the next level of processing images with
Python.
Neighborhoods
Certain image processing operations involve processing an image in
sections, called blocks or neighborhoods, rather than processing the
entire image at once. This is the case for filtering, histogram
equalization for contrast enhancement, and morphological functions,
all three of which use this approach.
Edge detection
This technique can be used to find the
boundaries of objects within images,
like in this image, where we spot the
chocolate kisses shapes in the image.
Edge detection technique can also segment and extract information like how many coins are in an image.
Most of the shape information of an image is enclosed in edges. Edge detection works by detecting
discontinuities in brightness. A common edge detection algorithm is Sobel.
The used coins image is preloaded as image_coins. We apply the filter by passing the image we want to detect
the edges from as parameter. This function requires a 2-dimensional grayscale image as input.
# Apply filter
gaussian_image = gaussian(building_image, multichannel=True)
Contrast enhancement
Often medical images like this X-ray can have low contrast, making it hard to spot important details. When
we improve the contrast,the details become more visible.
The contrast of an image can be seen as the measure of its dynamic range, or the "spread" of its histogram.
The contrast is the difference between the maximum and minimum pixel intensity in the image. The
histogram of this image is shown on the right. The maximum value of pixel intensity is 255 while the
minimum is 0. 255 - 0 = 255.
An image of low contrast has small difference between its dark and light pixel values. Is usually skewed either
to the right (being mostly light), to the left (when is mostly dark), or located around the middle (mostly gray).
We can enhance contrast through
• contrast stretching which is used to stretch the histogram so the full range of intensity values of the
image is filled
• histogram equalization, that spreads out the most frequent histogram intensity values using probability
distribution.
There are three types of histogram equalization. The standard, the, and the limited adaptive. In scikit-image
we can apply
• standard histogram equalization - spreads out the most frequent intensity values
• contrast stretching histogram equalization
• contrast limited adaptive histogram equalization (CLAHE)
We get a result that, despite the increased contrast, doesn't look natural. In fact, it doesn't even look like the
image has been enhanced at all.
But if you look closer and compare the results, you will see that the adaptive method is not that intense, so it
looks more natural. This is because it is not taking the global histogram of the entire image, but operates on
small regions called tiles or neighborhoods.
Comparing them, the resulting image is enhanced and we can better detail small objects and figures (like the
footprints in the ground).
You can obtain the maximum pixel intensity of the image by using the np.max() method from NumPy and the minimum with np.min() in the
console.
The image has already been loaded as clock_image, NumPy as np and the show_image() function.
You calculated the range of the pixels intensities in the histogram, and so, the contrast of the image!
Medical images
You are trying to improve the tools of a hospital by pre-processing the X-ray
images so that doctors have a higher chance of spotting relevant details.
You'll test our code on a chest X-ray image from the National Institutes of
Health Chest X-Ray Dataset
First, you'll check the histogram of the image and then apply standard
histogram equalization to improve the contrast. Remember we obtain the
histogram by using the hist() function from Matplotlib, which has been
already imported as plt.
# Import the required module
from skimage import exposure
Now you can apply this code and knowledge to other similar images.
Aerial image
In this exercise, we will improve the quality of an aerial image of a city. The image has low contrast and therefore we can not distinguish all the
elements in it. For this we will use the normal or standard technique of Histogram Equalization.
Even though this is not our Sunday morning coffee cup, you can still apply the same
methods to any of our photos.
A function called show_image(), that displays an image using Matplotlib, has already
been defined. It has the arguments image and title, with title being 'Original' by
default.
Rotating
Rescaling
Anti-aliasing in digital images
In a digital image, aliasing is a pattern or a rippling effect. Aliasing
makes the image look like it has waves or ripples radiating from a
certain portion. This happens because the pixelation of the image is
poor and does not look smooth.
Remember that aliasing is an effect that causes different signals, in this case pixels, to become indistinguishable or distorted.
You'll make this cat image upright by rotating it 90 degrees and then rescaling it two times. Once with the anti aliasing filter applied before
rescaling and a second time without it, so you can compare them.
You rotated and rescaled the image. Seems like the anti-aliasing filter prevents the poor pixelation effect to
happen, making it look better but also less sharp.
Enlarging images
Have you ever tried resizing an image to make it larger? This usually results in loss of quality, with the enlarged image looking blurry.
The good news is that the algorithm used by scikit-image works very well for enlarging images up to a certain point.
You'll do this by rescaling the image of a rocket, that will be loaded from the data module.
The number of pixels added or removed from the objects in an image depends on the size and shape of a
structuring element used to process the image.
The structuring element is a small binary image used to probe the input image. We try to "fit" in the image
object we want to get its shape.
So if we want to select an apple in a table, we want the structuring element fit in that apple so then expands,
probe and obtain the shape.
The dimensions specify the size of the structuring element. Like a square of 5 by 5 pixels. The pattern of ones
and zeros specifies the shape of the structuring element. This should be of a similar form to the shape of the
object we want to select. So we see in here different types of shapes, from squares, to diamond. The pink cell is
the center or origin of the structuring element. Identifies the pixel being processed.
scikit-image has multiple shapes for this structured element, each one with its own method from the
morphology module. If we want square as the structured element, we can obtain it with the square method.
Or a rectangle with width and height. This will return the desired shape and if we print we'll see how these are
formed with 1s.
To apply erosion we can use the binary erosion function. With this we can optionally set a structuring element
to use in the operation. Here we import it and load a binary horse image. Set the structuring element to a
rectangular-shaped, since it's somewhat similar to the shape we want to obtain, which is a horse. And obtain
the eroded image by using this function, passing the image and structuring element as parameters. If not set,
the function will use a cross-shaped structured element by default.
Showing the resulting
image, next to the original
to compare them, we see
that the resulted image is
missing some pixels. But
still kind of showing the
horse shape.
We see that dilation is indeed adding just a little bit in some parts, like the lower legs and paws. We see that
the default structuring element works well.
Handwritten letters
A very interesting use of computer vision in real-life solutions is performing Optical Character Recognition (OCR) to distinguish printed or
handwritten text characters inside digital images of physical documents.
Let's try to improve the definition of this handwritten letter so that it's easier to classify.
As we can see it's the letter R, already binary, with some noise in it. It's already loaded as upper_r_image.
Apply the morphological operation that will discard the pixels near the letter boundaries.
# See results
show_image(upper_r_image, 'Original')
show_image(eroded_image_shape, 'Eroded image')
As you can see, erosion is useful for removing minor white noise.
# See results
show_image(world_image, 'Original')
show_image(dilated_image, 'Dilated image')
You removed the noise of the segmented image and now it's more uniform.
Chapter 3. Image restoration, Noise, Segmentation and
Contours
So far, you have done some very cool things with your image processing skills! In this chapter, you will apply
image restoration to remove objects, logos, text, or damaged areas in pictures! You will also learn how to
apply noise, use segmentation to speed up processing, and find elements in images by their contours.
Image restoration
Besides fixing damaged images, image restoration or reconstruction, is also used for text removing, deleting
logos from images and even removing small objects, like tattoos you prefer not to show on a picture.
Reconstructing lost or deteriorated parts of images is known as inpainting. The reconstruction is supposed
to be performed in a fully automatic way by exploiting the information presented in non-damaged regions of
the image.
In scikit-image, we can apply inpainting with
the inpaint biharmonic function, from the
restoration module. It needs the location of the
damaged pixels to be filled, as a mask image on
top of the image to work with. A mask image is
simply an image where some of the pixel
intensity values are zero, and others are non-
zero.
In this example, we can see how the masked pixels get inpainted by the inpainting algorithm based on the
biharmonic equation assumption.
Mask
Imagine you have an old picture of your parents you want to fix. In this image, we intentionally added the
missing pixels by setting them to black. In case you want to remove an object you can manually delineate it in
the mask. And if you want to automatically detect it, you would need to use Thresholding or segmentation to
do so. Something we will learn later on. In the right image we see the damaged areas of the image as a mask.
Loaded as defect_image.
We'll work on an image from the data module, obtained by data.astronaut(). Some of the pixels have been replaced by 1s using a binary mask,
on purpose, to simulate a damaged image. Replacing pixels with 1s turns them totally black. The defective image is saved as an array
called defect_image.
The mask is a black and white image with patches that have the position of the image bits that have been corrupted. We can apply the
restoration function on these areas. This mask is preloaded as mask.
Remember that inpainting is the process of reconstructing lost or deteriorated parts of images and videos.
Removing logos
As we saw in the video, another use of image restoration is removing objects from an scene. In this exercise, we'll remove the Datacamp logo
from an image.
You will create and set the mask to be able to erase the logo by inpainting this area.
Remember that when you want to remove an object from an image you can either manually delineate that object or run some image analysis
algorithm to find it.
Noise
Images are signals and real-world signals usually contain departures from the ideal signal, which is the
perfect image, as we observe with our eyes in real life. Such departures are referred to as noise. We can see
how this image has some color grains when zoomed in.
More specifically, noise is the result of errors in the image acquisition process that result in pixel values that
do not reflect the true intensities of the real scene. In this image we can see how there is a variation of
brightness and color that does not correspond to reality, which is produced by the camera.
By using the random_noise function, we obtain the original image with a lot of added noise, that is
distributed randomly. This type of noise is known as "salt and pepper".
Most of the times we will want to remove or reduce the noise of images instead of adding it in, by using
several algorithms in scikit-image. The higher the resolution of the image, the longer it may take to eliminate
the noise.
Reducing noise
We have a noisy image that we want to improve by removing the noise in it.
Preloaded as noisy_image.
Preloaded as landscape_image.
Since we prefer to preserve the edges in the image, we'll use the bilateral denoising filter.
# Import bilateral denoising function
from skimage.restoration import denoise_bilateral
For example, before a tumor is analyzed in a computed tomography, it has to be detected and somehow
isolated from the rest of the image. Or before recognizing a face, it has to also be picked out from its
background. Previously we learned about Thresholding, which is the simplest method of segmentation.
Separating foreground from background. Now we'll learn about separating more than that.
A single pixel, standing alone by itself, is not a natural representation. We can explore more logical meanings
in an image that's formed by bigger regions or grouped pixels. These are known as superpixels. A superpixel
is a group of connected pixels with similar colors or gray levels. These carry more meaning than their simple
pixel grid counterparts.
Superpixel segmentation is dividing an image into superpixels. It has been applied to many computer
vision tasks, like visual tracking and image classification. Some advantages for using them are
• can compute features on more meaningful regions
• can reduce an image from thousands of pixels down to some regions for subsequent algorithms, so you
have computational efficiency.
Use .shape from NumPy which is preloaded as np, in the console to check the width and height of the image.
The input to a contour-finding function should be a binary image, which we can produce by first applying
thresholding. In such binary image, the objects we wish to detect should be white, while the background
remains black.
Constant level value
The level value varies between 0 and 1, the closer to 1 the more sensitive the method is to detecting contours,
so more complex contours will be detected. We have to find the value that best detects the contours we care
for.
A contour's shape
After executing these steps we obtain a list of contours. Each contour is an ndarray of shape (n, 2), consisting
of n row and column coordinates along the contour. In this way, a contour is like an outline formed by
multiple points joined together. The bigger the contour, the more points joined together and the wider the
perimeter formed. Here we can see the shapes of the contours found in the domino's tokens image.
Contouring shapes
In this exercise we'll find the contour of a horse.
For that we will make use of a binarized image provided by scikit-image in its data module. Binarized images are easier to process when finding
contours with this algorithm. Remember that contour finding only supports 2D image arrays.
Once the contour is detected, we will display it together with the original image. That way we can check if our analysis was correct!
show_image_contour(image, contours) is a preloaded function that displays the image with all contours found using Matplotlib. Remember you
can use the find_contours() function from the measure module, by passing the thresholded image and a constant value.
We'll process an image of two purple dice loaded as image_dice and determine what number was rolled for each dice.
In this case, the image is not grayscale or binary yet. This means we need to perform some image pre-processing steps before looking for the
contours. First, we'll transform the image to a 2D array grayscale image and next apply thresholding. Finally, the contours are displayed together
with the original image.
color, measure and filters modules are already imported so you can use the functions to find contours and apply thresholding.
We also import the io module to load the image_dice from local memory, using imread. Read more here.
# Apply thresholding
binary = image_dice > thresh
# Find contours at a constant value of 0.8
contours = measure.find_contours(binary, 0.8)
You made the image a 2D array by slicing, applied thresholding and succesfully found the contour. Now you
can apply it to any image you work on in the future.
In the previous exercise, we prepared a purple dices image to find its contours:
This time we'll determine what number was rolled for the dice, by counting the dots in the image.
Create a list with all contour's shapes as shape_contours. You can see all the contours shapes by calling shape_contours in the console, once
you have created it.
Check that most of the contours aren't bigger in size than 50. If you count them, they are the exact number of dots in the image.
show_image_contour(image, contours) is a preloaded function that displays the image with all contours found using Matplotlib.
You calculated the dice's number in the image by classifing its contours.
Chapter 4. Advanced Operations, Detecting Faces and
Features
After completing this chapter, you will have a deeper knowledge of image processing as you will be able to
detect edges, corners, and even faces! You will learn how to detect not just front faces but also face profiles,
cat, or dogs. You will apply your skills to more complex real-world applications. Learn to master several
widely used image processing techniques with very few lines of code!
Representing an image by its edges has the advantage that the amount of data is reduced significantly while
retaining most of the image information, like the shapes.
the Canny edge detection. This is widely considered to be the standard edge detection method in image
processing. And produces higher accuracy detecting edges and less execution time compared with Sobel
algorithm.
We see how the edges are highlighted with thick white lines and that some details are more pronounced than
the rest of the image. We can also spot the boundaries and shapes of coins; by knowing that for each closed
circle or ellipse, there's a coin.
Apply a gaussian filter to remove noise in the image. Set the intensity of this Gaussian filter to be applied in
the image by using the sigma attribute. The lower the value of this sigma, the less of gaussian filter effect is
applied on the image, so it will spot more edges. On the other hand, if you set a higher value, more noise will
be removed and the result is going to be a less edgy image. The default value of this parameter is 1. In this
example we set it to 0.5, let's see the effect in the image.
The resulting image has a lot more edges than the previous one and this is because noise was removed before
continuing with the rest of the steps in the algorithm.
Edges
In this exercise you will identify the shapes in a grapefruit image by detecting the edges,
using the Canny algorithm.
You can see the shapes and details of the grapefruits of the original image being highlighted.
Less edgy
Let's now try to spot just the outer shape of the grapefruits, the circles. You can do this by applying a more intense Gaussian filter to first make
the image smoother. This can be achieved by specifying a bigger sigma in the canny function.
In this exercise, you'll experiment with sigma values of the canny() function.
The bigger the sigma value, the less edges are detected because of the gaussian filter pre applied.
So by detecting corners as interest points, we can match objects from different perspectives. Like in this
image, where we detect the corners of the original image on the left and then match them in a downscaled
image in the right.
Here is another example of corner matching, this time, in a rotated image. We see how the relevant points.
are still being matched.
Less corners
In this exercise, you will test what happens when you set the minimum distance between
corner peaks to be a higher number. Remember you do this with the min_distance attribute
parameter of the corner_peaks() function.
With a 40-pixel distance between the corners there are a lot less corners than with 2 pixels.
Face detection
Several social networks platforms and smart-phones are using face detection to
know if there is someone in a picture and if so, apply filters, add focus in the
face area, or recommend you to tag friends. You can even automatically blur
faces for privacy protection.
Face detection can be useful in other cases as well. Human faces are able to
convey many different emotions such as happiness, sadness and many others.
That's why face detection is the key first step before recognizing emotions.
To apply the detector on images, we need to use the detect_multi_scale method, from the same cascade class.
This method searches for the object, in this case a face. It creates a window that will be moving through the
image until it finds something similar to a human face.
Searching happens on multiple scales. The window will have a minimum size, to spot the small or far-away
faces. And a maximum size to also find the larger faces in the image.
This method takes the input image as the first parameter, a scale factor, by which the searching window is
multiplied in each step, a step ratio, in which 1 represents an exhaustive search and usually is slow. By setting
this parameter to higher values the results will be worse but the computation will be much faster. Usually,
values in the interval 1 to 1.5 give good results. Then the minimum and maximum window size are defined.
These specify the interval for the search windows that are applied to the input image to detect the faces.
The detector will return the coordinates of the box that contains the face.
With this function I draw a rectangle around detected faces.
Is someone there?
In this exercise, you will check whether or not there is a person present in an image taken at night.
The Cascade of classifiers class from feature module has been already imported. The same is
true for the show_detected_face() function, that is used to display the face marked in the image
and crop so it can be shown separately.
# Load the trained file from data
trained_file = data.lbp_frontal_face_cascade_filename()
The detector found the face even when it's very small and pixelated. Note though that you would ideally want
a well-illuminated image for detecting faces.
Multiple faces
In this exercise, you will detect multiple faces in an image and show them individually. Think of this as
a way to create a dataset of your own friends' faces!
The Cascade of classifiers class from feature module has already been imported, as well as
the show_detected_face() function which is used to display the face marked in the image and crop it
so it can be shown separately.
# Load the trained file from data
trained_file = data.lbp_frontal_face_cascade_filename()
The detector gave you a list with all the detected faces.
The Cascade class, the slic() function from segmentation module, and the show_detected_face() function for visualization have already been
imported. The detector is already initialized and ready to use as detector.
Real-world applications
Some cases where we might need to combine several techniques are, for example
• converting to images to grayscale before detecting edges or corners.
• detecting faces to later on blur them by applying a gaussian filter.
• reducing noise and restoring a damaged image.
• approximation of objects’ sizes
Let's look at how we would solve a privacy protection case by detecting faces and then anonymizing them.
We'll first need to detect faces, using the cascade of classifiers detector and then apply a gaussian filter to the
cropped faces.
For each detected face, as the variable d, in the detected list, we'll use the coordinates to crop it out of the
image, in other words, extract it.
So it results in an image that no longer contains people's faces in it and in this way, personal data is
anonymized.
The classifier was only trained to detect the front side of faces, not profile faces. So, if someone is turning the
head too much to a side, it won't recognize it. If you wnat to do that, you'll need to train the classifier with xml
files of profile faces, that you can find available online. Like some provided by the OpenCV image processing
library.
Privacy protection
Let's look at a real-world application of what you have learned in the course.
In this exercise, you will detect human faces in the image and for the sake of privacy, you will anonymize data by blurring people's faces in the
image automatically.
The face detector is ready to use as detector and all packages needed have been imported.
You solved this important issue by applying what you have learned in the course.
Help Sally restore her favorite portrait which was damaged by noise, distortion, and missing information due to a breach in her laptop.
show_image(result)
You have learned a lot about image processing methods and algorithms: You performed rotation, removed
annoying noise, and fixed the missing pixels of the damaged image.
Course completed!
Recap the topics you have learned:
• Improved contrast
• Restored images
• Applied filters
• Rotated, flipped and resized
• Segmented: supervised and unsupervised
• Applied morphological operators
• Created and reduced noise
• Detected edges, corners and faces
• Combination of the above to solve problems
Next steps:
• Tinting gray scale images
• Matching
• Approximation
• etc
Happy learning!