Final Modified LAB Manal Image Processing and Computer Vision Copy
Final Modified LAB Manal Image Processing and Computer Vision Copy
VI SEMESTER
LAB MANUAL
Prepared by
Dr. Gouranga Mandal
Associate Professor
Computer Science and Engineering (AI&ML)
School of Engineering
‘Instructions to the Candidates’
Mission
The Department Computer Science and Engineering (Artificial Intelligence &Machine
Learning) is committed to:
Impart quality education through the state-of-the-art curriculum, infrastructure facilities,
cutting edge technologies, sustainable learning practices and lifelong learning.
Collaborate with industry-academia and inculcate interdisciplinary research to transform
professionals into technically competent.
Produce engineers and techno-entrepreneurs for global needs.
Values
The values that drive DSU and support its vision:
The Pursuit of Excellence
A commitment to strive continuously to improve ourselves and our systems with the aim
of becoming the best in our field.
Fairness
A commitment to objectivity and impartiality, to earn the trust and respect of society.
Leadership
A commitment to lead responsively and creatively in educational and research processes.
Integrity and Transparency
A commitment to be ethical, sincere and transparent in all activities and to treat all
individuals with dignity and respect.
Program Educational Objectives (PEO's)
PEO1: Apply appropriate theory, practices, and tools of machine intelligence to the
specification, design, implementation, maintenance, and evaluation in the workplace or in
higher education.
PEO2: Adapt, contribute and innovate new technologies in their computing profession by
working in teams to design, implement, and maintain in the key domains of Artificial
Intelligence & Machine Learning.
PEO3: Function effectively in the work place as competent Artificial Intelligence &
Machine Learning Professionals, Entrepreneurs or Researchers or maintain employment
through lifelong learning such as professional conferences, certificate programs or other
professional educational activities, ethics, and societal awareness.
1. To understand the algorithms available for the processing of linguistic information and
6. To learn to use deep learning tools and framework for solving real-life problems related
1. Simulation and display of an Image, Negative of an Image (Binary & Gray Scale)
2. Implementation of the Transformation of an Image.
3. Implementation of Histogram, and Histogram Equalization.
4. Implement the different filtering techniques for noise removal based on spatial and frequency
domains using OpenCV.
5. Implementation of various image segmentation techniques. (Edge-Based, Region-Based and
Threshold-Based)
6. Implementation of different Morphological Image Processing Techniques.
7. Implement the Harris Corner Detector algorithm without the inbuilt Open CV() function.
8. Write a program to compute the SIFT feature descriptors of a given image.
9. Write a program to detect the specific objects in an image using HOG.
10. Implementation of object detection using OpenCV
11. Implementation of Face Recognition using OpenCV
Instructions to Run the Program:
OpenCV is the huge open-source library for computer vision, machine learning, and image
processing and now it plays a major role in real-time operation which is very important in
today’s systems. By using it, one can process images and videos to identify objects, faces, or
even the handwriting of a human. When it integrated with various libraries, such as Numpy,
python is capable of processing the OpenCV array structure for analysis. To Identify image
patterns and its various features we use vector space and perform mathematical operations on
these features. To install OpenCV, one must have Python and PIP, preinstalled on their
system. To check if your system already contains Python, go through the following
instructions: Open the Command line(search for cmd in the Run dialog( + R). Now run the
following command:
python --version
If Python is already installed, it will generate a message with the Python version available.
If Python is not present, go through How to install Python on Windows? and follow the
instructions provided. PIP is a package management system used to install and manage
software packages/libraries written in Python. These files are stored in a large “on-line
repository” termed as Python Package Index (PyPI). To check if PIP is already installed on
your system, just go to the command line and execute the following command:
pip -V
If PIP is not present, go through How to install PIP on Windows? and follow the instructions
provided.
Downloading and Installing OpenCV:
OpenCV can be directly downloaded and installed with the use of pip (package manager). To
install OpenCV, just go to the command-line and type the following command:
pip install opencv-python
Beginning with the installation:
• Type the command in the Terminal and proceed:
• Installing Packages:
• Finished Installation:
To check if OpenCV is correctly installed, just run the following commands to perform a
version check:
python
>>>import cv2
>>>print(cv2.__version__)
To use the OpenCV library in python, we need to install these libraries as a prerequisite:
1. Numpy Library : The computer processes images in the form of a matrix for which NumPy is
used and OpenCV uses it in the background.
2. OpenCV python : OpenCV library previously it was cv but the updated version is cv2. It is
used to manipulate images and videos.
Below codes are implementations to read images and display images on the screen using
OpenCV and matplotlib libraries functions.
Display of an Image:
Theory:
The OpenCV module is an open-source computer vision and machine learning software library. It is a huge
open-source library for computer vision, machine learning, and image processing. OpenCV supports a wide
variety of programming languages like Python, C++, Java, etc. It can process images and videos to identify
objects, faces, or even the handwriting of a human. When it is integrated with various libraries, such
as numpy which is a highly optimized library for numerical operations, then the number of weapons increases
in your Arsenal i.e whatever operations one can do in Numpy can be combined with OpenCV.
First, let’s look at how to display images using OpenCV:
Now there is one function called cv2.imread() which will take the path of an image as an argument. Using this
function you will read that particular image and simply display it using the cv2.imshow() function.
Code:
# import required module
import cv2
# read the Image by giving path
img=cv2.imread(r'C:\Users\DSU CSCL9-10\Desktop\1.jpg')
# display that image
cv2.imshow('myimage', img)
Sample Output:
Code:
import cv2
img=cv2.imread(r'C:\Users\DSU CSCL9-10\Desktop\1.jpg')
cv2.imshow('myimage', img)
# Subtract the img array values from max value(calculated from dtype)
img_neg = 255 - img
# Show the negative image
cv2.imshow('negative',img_neg)
cv2.waitKey(0)
Sample Output:
Input image
Output image
Code:
Sample Output:
Input image
Output image
Experiment No 2
Write a program for implementation of the Transformation of an Image.
Image Transformation involves the transformation of image data in order to retrieve information from the
image or preprocess the image for further usage. In this tutorial we are going to implement the following
image transformation:
• Image Translation
• Reflection
• Rotation
• Scaling
• Cropping
• Shearing in x-axis
• Shearing in y-axis
Image Translation
In computer vision or image processing, image translation is the rectilinear shift of an image from one location
to another, so the shifting of an object is called translation. In other words, translation is the shifting of an
object’s location.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 100], [0, 1, 50]])
dst = cv.warpAffine(img, M, (cols, rows))
cv.imshow('img', dst)
cv.waitKey(0)
cv.destroyAllWindows()
In the above code, we have imported NumPy and OpenCV module then read the image by
using imread() function, and then translation takes place with the warpAffine() method which is defined as
follows:
In the first argument, we passed the image, in the second argument it takes a matrix as a parameter in the
matrix we give x = 100, which means we are telling the function to shift the image 70 units on the right side
and y= 50, which means we are telling the function to shift the image 50 units downwards. In the third
argument, where we mentioned the cols and rows, we told the function to do not to crop the image from both
the x and y sides.
dst = cv.warpAffine(img,M,(cols,rows))
Output:
Image Reflection
Image reflection is used to flip the image vertically or horizontally. For reflection along the x-axis, we set the
value of Sy to -1, Sx to 1, and vice-versa for the y-axis reflection.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 0], [0, -1, rows], [0, 0, 1]])
reflected_img = cv.warpPerspective(img, M, (int(cols),int(rows)))
cv.imshow('img', reflected_img)
cv.imwrite('reflection_out.jpg', reflected_img)
cv.waitKey(0)
cv.destroyAllWindows()
To flip the image horizontally:
M = np.float32([[1, 0, 0], [0, -1, rows],[0, 0, 1]])
To flip the image vertically:
M = np.float32([[-1, 0, cols], [0, 1, 0], [0, 0, 1]])
Output:
Image Rotation
Image rotation is a common image processing routine with applications in matching, alignment, and other
image-based algorithms, in image rotation the image is rotated by a definite angle. It is used extensively in
data augmentation, especially when it comes to image classification.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 0], [0, -1, rows], [0, 0, 1]])
img_rotation = cv.warpAffine(img, cv.getRotationMatrix2D((cols/2, rows/2), 30, 0.6), (cols, rows))
cv.imshow('img', img_rotation)
cv.imwrite('rotation_out.jpg', img_rotation)
cv.waitKey(0)
cv.destroyAllWindows()
We have used the get rotation matrix function to define the parameter required in the warpAffine function to
tell the function to make a matrix that can give a required rotation angle( here it is 30 degrees) with shrinkage
of the image by 40%.
img_rotation = cv.warpAffine(img,
cv.getRotationMatrix2D((cols/2, rows/2), 30, 0.6),
(cols, rows))
Output:
Image Scaling
Image scaling is a process used to resize a digital image. We perform two things in the image scaling either we
enlarge the image or we shrink the image, OpenCV has a built-in function cv2.resize() for image scaling.
Shrinking an image:
img_shrinked = cv2.resize(image, (350, 300),
interpolation = cv2.INTER_AREA)
Note: Here 350 and 300 are the height and width of the shrunk image respectively
Enlarging Image:
img_enlarged = cv2.resize(img_shrinked, None,
fx=1.5, fy=1.5,
interpolation=cv2.INTER_CUBIC)
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
img_shrinked = cv.resize(img, (250, 200), interpolation=cv.INTER_AREA)
cv.imshow('img', img_shrinked)
img_enlarged = cv.resize(img_shrinked, None, fx=1.5, fy=1.5, interpolation=cv.INTER_CUBIC)
cv.imshow('img', img_enlarged)
cv.waitKey(0)
cv.destroyAllWindows()
Output:
Image Cropping
Cropping is the removal of unwanted outer areas from an image.
cropped_img = img[100:300, 100:300]
OpenCV loads the image as a NumPy array, we can crop the image simply by indexing the array, in our case,
we choose to get 200 pixels from 100 to 300 on both axes.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
cropped_img = img[100:300, 100:300]
cv.imwrite('cropped_out.jpg', cropped_img)
cv.waitKey(0)
cv.destroyAllWindows()
Output:
Image Shearing in X-Axis
While the shearing image is on the x-axis, the boundaries of the image that are parallel to the x-axis keep their
location, and the edges parallel to the y-axis change their place depending on the shearing factor.
M = np.float32([[1, 0.5, 0], [0, 1, 0], [0, 0, 1]])
sheared_img = cv.warpPerspective(img, M,
(int(cols*1.5),
int(rows*1.5)))
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0.5, 0], [0, 1, 0], [0, 0, 1]])
sheared_img = cv.warpPerspective(img, M, (int(cols*1.5), int(rows*1.5)))
cv.imshow('img', sheared_img)
cv.waitKey(0)
cv.destroyAllWindows()
Output:
Histogram Equalization:
Theory:
Histogram equalization is a method in image processing of contrast adjustment using the image’s
histogram. This method usually increases the global contrast of many images, especially when the
usable data of the image is represented by close contrast values. Through this adjustment, the
intensities can be better distributed on the histogram. This allows for areas of lower local contrast to
gain a higher contrast. Histogram equalization accomplishes this by effectively spreading out the
most frequent intensity values. The method is useful in images with backgrounds and foregrounds
that are both bright or both dark.
Consider an image whose pixel values are confined to some specific range of values only. For eg,
brighter image will have all pixels confined to high values. But a good image will have pixels from
all regions of the image. So you need to stretch this histogram to either ends and that is what
Histogram Equalization does (in simple words). This normally improves the contrast of the image.
Code:
import cv2
from matplotlib import pyplot as plt
def run_histogram_equalization(image_path):
rgb_img = cv2.imread(r'C:\Users\Dr. G Mandal\Desktop\1.jpg')
# convert from RGB color-space to YCrCb
ycrcb_img = cv2.cvtColor(rgb_img, cv2.COLOR_BGR2YCrCb)
# equalize the histogram of the Y channel
ycrcb_img[:, :, 0] = cv2.equalizeHist(ycrcb_img[:, :, 0])
# convert back to RGB color-space from YCrCb
equalized_img = cv2.cvtColor(ycrcb_img, cv2.COLOR_YCrCb2BGR)
cv2.imshow('equalized_img', equalized_img)
plt.hist(equalized_img.ravel(),256,[0,256]);
plt.show()
cv2.waitKey(0)
Sample Output:
Experiment No 4
Implement the different filtering techniques for noise removal based on spatial and frequency
domains using OpenCV.
Theory
image
The edge A is above the maxVal, so considered as "sure-edge". Although edge C is below maxVal, it
is connected to edge A, so that also considered as valid edge and we get that full curve. But edge B,
although it is above minVal and is in same region as that of edge C, it is not connected to any "sure-
edge", so that is discarded. So it is very important that we have to select minVal and maxVal
accordingly to get the correct result.
This stage also removes small pixels noises on the assumption that edges are long lines.
So what we finally get is strong edges in the image.
Canny Edge Detection in OpenCV
OpenCV puts all the above in single function, cv.Canny(). We will see how to use it. First argument
is our input image. Second and third arguments are our minVal and maxVal respectively. Fourth
argument is aperture_size. It is the size of Sobel kernel used for find image gradients. By default it is
3. Last argument is L2gradient which specifies the equation for finding gradient magnitude. If it is
True, it uses the equation mentioned above which is more accurate, otherwise it uses this
function: Edge_Gradient(G)=|Gx|+|Gy|. By default, it is False.
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread(r'C:\Users\Dr. G Mandal\Desktop\1.jpg')
edges = cv.Canny(img,100,200)
plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image'), plt.xticks([]), plt.yticks([])
plt.show()
Region-based Image segmentation is a widely used technique in image processing that involves
partitioning an image into regions or objects of interest based on their similarity in colour, texture, or
other features. In this tutorial, we will explore how to implement region-based segmentation using
OpenCV, a popular computer vision library.
Step 1: Read the input image The first step is to read the input image using
the cv2.imread() function of OpenCV. Make sure that the image is in the same directory as your
Python file.
import cv2
Step 2: Preprocessing Before applying the Watershed Algorithm, we need to perform some
preprocessing steps to improve the image segmentation result. The preprocessing steps include noise
reduction and image smoothing.
Step 3: Thresholding The next step is to apply thresholding to the preprocessed image.
Thresholding is the process of converting an image into a binary image by selecting a threshold
value. In this tutorial, we will use the cv2.threshold() function of OpenCV to apply thresholding.
# Apply thresholding
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV +
cv2.THRESH_OTSU)
Step 4: Morphological Operations Morphological operations are used to remove noise and fill
holes in the segmented regions. In this tutorial, we will use the cv2.morphologyEx() function of
OpenCV to perform morphological operations for image segmentation.
After applying the distance transform, we can perform the marker-based Watershed Algorithm to
segment the image into multiple regions. In this tutorial, we will use
the cv2.connectedComponents() function of OpenCV to obtain the markers and then apply the
Watershed Algorithm.
Step 6: Display the results: Finally, we can display the segmented image and the contours found in
the previous step using the following code:
# Cleanup
cv2.destroyAllWindows()
The cv2.imshow() function is used to display the output of the image segmentation process and the
contours separately. The cv2.drawContours() function is used to draw the contours on the original
image. The last command cv2.destroyAllWindows() is used to close all the windows.
Threshold-Based Image Segmentation:
Thresholding One of the simplest and most widely used image segmentation techniques is thresholding, which
involves converting an image into a binary image by setting all pixels with intensities above a certain
threshold to white and all other pixels to black. In OpenCV, we can apply thresholding using the
cv2.threshold() function as follows:
import cv2
# Read image
img = cv2.imread('image.jpg')
# Convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply thresholding
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
# Display thresholded image
cv2.imshow('thresholded', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
In this code, we first convert the image to grayscale using the cv2.cvtColor() function. Then, we apply
thresholding using the cv2.threshold() function, where grey is the input image, 127 is the threshold
value, 255 is the maximum pixel value, and cv2.THRESH_BINARY is the thresholding type. Finally, we
display the thresholded image using the cv2.imshow() function.
Output:
Experiment No 6
Implementation of different Morphological Image Processing Techniques.
Python OpenCV Morphological operations are one of the Image processing techniques that
processes image based on shape. This processing strategy is usually performed on binary images.
Morphological operations based on OpenCV are as follows:
• Erosion
• Dilation
• Opening
• Closing
• Morphological Gradient
• Top hat
• Black hat
Erosion
Just like water rushing along a river bank erodes the soil, an erosion in an image “erodes” the
foreground object and makes it smaller. Simply put, pixels near the boundary of an object in an
image will be discarded, “eroding” it away.
Erosion works by defining a structuring element and then sliding this structuring element from left-to-
right and top-to-bottom across the input image.
A foreground pixel in the input image will be kept only if all pixels inside the structuring element
are > 0. Otherwise, the pixels are set to 0 (i.e., background).
Erosion is useful for removing small blobs in an image or disconnecting two connected objects.
--image
that we’ll be applying erosions to.
In most examples in this lesson we’ll be applying morphological operations to the PyImageSearch
logo, which we can see below:
Figure 4: The
example PyImageSearch logo that we’ll be applying morphological operations to in this lesson.
As I mentioned earlier in this lesson, we typically (but not always) apply morphological operations
to binary images. As we’ll see later in this lesson, there are exceptions to that, especially when
using the black hat and white hat operators, but for the time being, we are going to assume we are
working with a binary image, where the background pixels are black and the foreground pixels
are white.
Applying erosion to our input image. As the number of iterations increases, more and more of the
logo is eroded away.
On the very top we have our original image. And then underneath the image, we have the logo
being eroded a total of 1, 2, and 3 times, respectively. Notice as the number of erosion iterations
increases, more and more of the logo is eaten away.
Again, erosions are most useful for removing small blobs from an image or disconnecting two
connected components. With this in mind, take a look at the letter “p” in the PyImageSearch logo.
Notice how the circular region of the “p” has disconnected from the stem after 2 erosions — this is
an example of disconnecting two connected components of an image.
Dilation
The opposite of an erosion is a dilation. Just like an erosion will eat away at the foreground pixels,
a dilation will grow the foreground pixels.
Dilations increase the size of foreground objects and are especially useful for joining broken parts of
an image together.
Dilations, just as an erosion, also utilize structuring elements — a center pixel p of the structuring
element is set to white if ANY pixel in the structuring element is > 0.
Again, at the very top we have our original input image. And below the input image we have our
image dilated 1, 2, and 3 times, respectively.
Unlike an erosion where the foreground region is slowly eaten away at, a dilation actually grows our
foreground region.
Dilations are especially useful when joining broken parts of an object — for example, take a look at
the bottom image where we have applied a dilation with 3 iterations. By this point, the gaps
between all letters in the logo have been joined.
Opening
An opening is an erosion followed by a dilation.
Performing an opening operation allows us to remove small blobs from an image: first an
erosion is applied to remove the small blobs, then a dilation is applied to regrow the size of
the original object.
cv2.destroyAllWindows()
cv2.imshow("Original", image)
kernelSizes = [(3, 3), (5, 5), (7, 7)]
# loop over the kernels sizes
for kernelSize in kernelSizes:
# construct a rectangular kernel from the current size and then
# apply an "opening" operation
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
opening = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
cv2.imshow("Opening: ({}, {})".format(
kernelSize[0], kernelSize[1]), opening)
cv2.waitKey(0)
Finally, display the output of applying our opening.
pyimagesearch_logo_noise.png
in our project directory structure):
Notice how by the time we are using a kernel of size 5×5, the small, random blobs are
nearly completely gone. And by the time it reaches a kernel of size 7×7, our opening
operation has not only removed all the random blobs, but also “opened” holes in the letter
“p” and the letter “a”.
Closing
The exact opposite to an opening would be a closing. A closing is a dilation followed
by an erosion.
As the name suggests, a closing is used to close holes inside of objects or for connecting
components together.
cv2.destroyAllWindows()
cv2.imshow("Original", image)
# loop over the kernels sizes again
for kernelSize in kernelSizes:
# construct a rectangular kernel form the current size, but this
# time apply a "closing" operation
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
closing = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
cv2.imshow("Closing: ({}, {})".format(
kernelSize[0], kernelSize[1]), closing)
cv2.waitKey(0)
We’ll go back to using our original image (without the random blobs). The
output for applying a closing operation with increasing structuring
element sizes can be seen below:
Applying a morphological closing operation to our input image.
Notice how the closing operation is starting to bridge the gap between
letters in the logo. Furthermore, letters such as “e”, “s”, and “a” are
practically filled in.
Morphological gradient
A morphological gradient is the difference between a dilation and erosion. It is useful
for determining the outline of a particular object of an image:
cv2.destroyAllWindows()
cv2.imshow("Original", image)
# loop over the kernels a final time
for kernelSize in kernelSizes:
# construct a rectangular kernel and apply a "morphological
# gradient" operation to the image
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
gradient = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel)
cv2.imshow("Gradient: ({}, {})".format(
kernelSize[0], kernelSize[1]), gradient)
cv2.waitKey(0)
A top hat operation is used to reveal bright regions of an image on dark backgrounds.
Up until this point we have only applied morphological operations to binary images. But we
can also apply morphological operations to grayscale images as well. In fact, both the top
hat/white hat and the black hat operators are more suited for grayscale images rather than
binary ones.
To demonstrate applying morphological operations, let’s take a look at the following image
where our goal is to detect the license plate region of the car:
Our goal is to apply morphological operations to find the license plate region of the car.
To test out the top hat operator, create a new file, name it
import argparse
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
args = vars(ap.parse_args())
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# construct a rectangular kernel (13x5) and apply a blackhat
# operation which enables us to find dark regions on a light
# background
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)
image from disk and convert it to grayscale, thereby preparing it for our black hat and white hat operations.
Then defines a rectangular structuring element with a width of 13 pixels and a height of 5 pixels. As I
mentioned earlier in this lesson, structuring elements can be of arbitrary size. And in this case, we
are applying a rectangular element that is almost 3x wider than it is tall.
By having some basic a priori knowledge of the objects you want to detect in images, we can
construct structuring elements to better aid us in finding them.
To specify a top hat/white hat operator instead of a blackhat, we simply change the type of operator
to
cv2.MORPH_TOPHAT
.
Below you can see the output of applying the top hat operators:
Notice how the right (i.e., the top hat/white hat) regions that are light against a dark
background are clearly displayed — in this case, we can clearly see that the license plate
region of the car has been revealed.
But also note that the license plate characters themselves have not been included. This is
because the license plate characters are dark against a light background.
To reveal our license plate characters you would first segment out the license plate itself via
a top hat operator and then apply a black hat operator (or thresholding) to extract the
individual license plate characters (perhaps using methods like contour detection).
Experiment No 7
Implement the Harris Corner Detector algorithm without the inbuilt Open CV() function.
SIFT (Scale Invariant Feature Transform) Detector is used in the detection of interest points on an
input image. It allows the identification of localized features in images which is essential in applications such
as:
In the above expression, D represents the Difference of Gaussian. To remove the unstable key points, the
value of z is calculated and if the function value at z is below a threshold value then the point is excluded.
Fig 03 Refinement of Keypoints after Keypoint Localization
Phase III: Assigning Orientation to Keypoints
To achieve detection which is invariant with respect to the rotation of the image, orientation needs to be
calculated for the key-points. This is done by considering the neighborhood of the keypoint and calculating the
magnitude and direction of gradients of the neighborhood. Based on the values obtained, a histogram is
constructed with 36 bins to represent 360 degrees of orientation(10 degrees per bin). Thus, if the gradient
direction of a certain point is, say, 67.8 degrees, a value, proportional to the gradient magnitude of this point,
is added to the bin representing 60-70 degrees. Histogram peaks above 80% are converted into a new keypoint
are used to decide the orientation of the original keypoint.
cv2.imwrite('image-with-keypoints.jpg', img)
Output:
The image on left is the original, the image on right shows the various highlighted interest points on the image
Experiment No 9
Write a program to detect the specific objects in an image using HOG.
Understanding HOG
The concept behind the HOG algorithm is to compute the distribution of gradient orientations in
localized portions of an image. HOG operates on a window, which is a region of fixed pixel size
on the image. A window is divided into small spatial regions, known as a block, and a block is
further divided into multiple cells. HOG calculates the gradient magnitude and orientation within
each cell, and creates a histogram of gradient orientations. Then the histograms within the
same block are concatenated.
Gradient measures how a pixel’s color intensity compares to its neighbors. The more drastic it
changes, the higher the magnitude. The orientation tells which direction is the steepest
gradient. Usually, this is applied on a single-channel image (i.e., grayscale), and each pixel can
have its own gradient. HOG gathers all gradients from a block and puts them into a histogram.
The clever way of making a histogram in HOG is that the bins in a histogram are determined by
the angle, but the value is interpolated between the closest bins. For example, if the bins are
assigned values 0, 20, 40, and so on while the gradient was 10 at angle 30, a value of 5 was
added to bins of 20 and 40. This way, HOG can effectively capture the texture and shape of
objects within the image.
HOG is particularly effective for detecting objects with distinguishable textures and patterns,
making it a popular choice for tasks such as pedestrian detection and other forms of object
recognition. With its ability to capture the distribution of gradient orientations, HOG provides a
robust representation invariant to variations in lighting conditions and shadows.
one block (red and blue boxes). There are many overlapping blocks in one window, but all blocks are the same size.
Each cell is of a fixed size. In the above, you used 64×64 pixels in a cell. Each block has an
equal number of cells. In the above, you used 4×4 cells in a block. Also, there is equal number
of cells in a window; you used 8×6 cells above. However, we are not dividing an image into
blocks or windows when we compute HOG. But instead,
1. Consider a window as a sliding window on the image, in which the sliding window’s
stride size is the size of one cell, i.e., it slides across one cell at a time
2. We divide the window into cells of fixed size
3. We set up the second sliding window that matches the block size and scan the
window. It slides across one cell at a time
4. Within a block, HOG is computed from each cell
The returned HOG is a vector for the entire image. In the code above, you reshaped it to make
it clear the hierarchy of windows, blocks, cells, and histogram bins. For
example, hog_feats[i][j] corresponds to the window (in numpy slicing syntax):
1img[n_cells[1]*i : n_cells[1]*i+(n_cells[1]*win_size[1]),
2 n_cells[0]*j : n_cells[0]*j+(n_cells[0]*win_size[0])]
Or, equivalently, the window with the cell (i,j) at the top left corner.
A sliding window is a common technique in object detection because you cannot be sure a
particular object lies exactly in a grid cell. Making smaller cells but larger windows is a better
way to catch the object than just seeing a part of it. However, there’s a limitation: An object
larger than the window will be missed. Also, an object too small may be dwarfed by other
elements in the window.
Usually, you have some downstream tasks associated with HOG, such as running an SVM
classifier on the HOG features for object detection. In this case, you may want to reshape the
HOG output into vectors of the entire block rather than in the hierarchy of each cell like above.
This is a picture of people crossing a street. OpenCV has a “people detector” in HOG that was
trained on a 64×128 pixel window size. Using it to detect people in a photo is surprisingly
simple:
1 import cv2
2
3 # Load the image and convert it to grayscale
4 img = cv2.imread('people.jpg')
5
6 hog = cv2.HOGDescriptor()
7 hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
8
9 # Detect people in the image
10locations, confidence = hog.detectMultiScale(img)
11
12# Draw rectangles around the detected people
13for (x, y, w, h) in locations:
14 cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)
15
16# Display the image with detected people
17cv2.imshow('People', img)
18cv2.waitKey(0)
19cv2.destroyAllWindows()
In the above, you created a HOG descriptor with the parameters
from cv2.HOGDescriptor_getDefaultPeopleDetector() will initialize an SVM classifier to
detect a particular object, which in this case is people.
You call the descriptor on an image and run the SVM in one pipeline
using hog.detectMultiScale(img), which returns the bounding boxes for each object detected.
While the window size is fixed, this detection function will resize the image in multiple scales to
find the best detection result. Even so, the bounding boxes returned are not tight. The code
above also annotates the people detected by marking the bounding box on the image. You may
further filter the result using the confidence score reported by the detector. Some filtering
algorithms, such as non-maximum suppression, may be appropriate but are not discussed here.
The following is the output:
Bounding box as produced by the people detector using HOG in OpenCV
You can see such detectors can find people only if the full body is visible. The output has false
positives (non-people detected) and false negatives (people not detected). Using it to count all
people in a crowd scene would be challenging. But it is a good start to see how easily you can
get something done using OpenCV.
Unfortunately, there are not any detectors that come with OpenCV other than people. But you
can train your own SVM or other models using the HOG as feature vectors. Facilitating a
machine learning model is the key point of extracting feature vectors from an image.
Experiment No 10
Implementation of object detection using OpenCV
In this part, we will write the Python programs to do the object detection and
understand the implementation of it. We will use the following image in our Python
program to perform the object detection on it:
We will first open the image given above and create the environment of the picture to
show it in the output. Let's first look at an example program to understand the
implementation, and then we will look at the explanation part.
Example 1: Opening the image using OpenCV and matplotlib library in a Python
program:
Explanation:
First, we have imported the OpenCV (as cv2) and matplotlib (as plt) libraries into the
program to use their functions in the code. After that, we have opened the image file
using the imread() function of cv2.
Then, we have defined the properties for the image we opened in the program using
the cv2 functions. Then, we subplot the image using the subplot() function of plt and
giving parameters in it. In last, we have used the imshow() and show() function of the plt
module to show the image in the output.
Advertisement
As we can see in the output, the image is displayed as a result of the program, and its
borders have been sub-plotted.
Now, we will use the detectMultiScale() in the program to detect the object present in
the image. Following is the syntax for using detectMultiScale() function in the code:
1. found = xml_data.detectMultiScale(img_gray,
2. minSize = (30, 30))
We will use a condition statement with this function in the program to check if any
object from the image is detected or not and highlight the detected part. Let's
understand the implementation of object detection in the image through an example
program.
Example 2: Object detection in the image using the detectMultiScale() in the following
Python program:
Output:
Explanation:
After opening the image in the program, we have imported the cascade classifier XML
file into the program. Then, we used the detectMultiScale() function with the imported
cascade file to detect the object present in the image or not.
We used if condition in the program to check that object is detected or not, and if the
object is detected, we have highlighted the detected object part using for loop with cv2
functions. After highlighting the detected object part in the image, we have displayed
the processed image using the plt show() and imshow() function.
Experiment No 11
Implementation of Face Recognition using OpenCV
import cv2
import numpy as np
# Haarcascade file path (make sure this file is uploaded)
haar_file = 'haarcascade_frontalface_default.xml'
# Load the image from your uploaded file
img_path = '/content/sample_image.jpg' # Update with your image path
image = cv2.imread(r'C:\Users\Dr. G Mandal\Desktop\1.jpg')
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_eye.xml')
if image is None:
print("Image not loaded correctly.")
else:
# Convert the image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
Output: