Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Final Modified LAB Manal Image Processing and Computer Vision Copy

The document is a lab manual for a Computer Science & Engineering course focused on Image Processing and Computer Vision for the VI semester, prepared by Dr. Gouranga Mandal. It includes instructions for students, the department's vision and mission, program educational objectives, course objectives, and a list of experiments to be conducted using Python and OpenCV. Additionally, it provides guidelines for software installation and basic coding examples for image processing tasks.

Uploaded by

ark
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Final Modified LAB Manal Image Processing and Computer Vision Copy

The document is a lab manual for a Computer Science & Engineering course focused on Image Processing and Computer Vision for the VI semester, prepared by Dr. Gouranga Mandal. It includes instructions for students, the department's vision and mission, program educational objectives, course objectives, and a list of experiments to be conducted using Python and OpenCV. Additionally, it provides guidelines for software installation and basic coding examples for image processing tasks.

Uploaded by

ark
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

School of Engineering

Devarakaggalahalli, Harohalli, Kanakapura Road, Dt, Ramanagara,


Karnataka 562112

Department of Computer Science & Engineering


(Artificial Intelligence & Machine Learning)

VI SEMESTER

LAB MANUAL

Image Processing and Computer Vision


(22AM3604)

SESSION: Jan 2025 - Jun 2025

Prepared by
Dr. Gouranga Mandal
Associate Professor
Computer Science and Engineering (AI&ML)
School of Engineering
‘Instructions to the Candidates’

1) Students should come with thorough preparation for the experiment to be


conducted.
2) Students will not be permitted to attend the laboratory unless they bring
the practical record fully completed in all respects pertaining to the
experiment conducted in the previous class.
3) Practical record should be neatly maintained.
4) They should obtain the signature of the staff-in-charge in the observation
book after completing each experiment.
5) Theory regarding each experiment should be written in the practical
record before procedure in your own words.
6) Ask lab technician for assistance if you have any problem.
7) Save your class work, assignments in system.
8) Do not download or install software without the assistance of the
laboratory technician.
9) Do not alter the configuration of the system.
10) Turnoff the systems after use.
Vision
To produce graduates in Computer Science and Engineering (Artificial Intelligence &
Machine Learning) through excellence in education and research with an emphasis on
sustainable eco-system that contributes significantly to the society.

Mission
The Department Computer Science and Engineering (Artificial Intelligence &Machine
Learning) is committed to:
Impart quality education through the state-of-the-art curriculum, infrastructure facilities,
cutting edge technologies, sustainable learning practices and lifelong learning.
Collaborate with industry-academia and inculcate interdisciplinary research to transform
professionals into technically competent.
Produce engineers and techno-entrepreneurs for global needs.
Values
The values that drive DSU and support its vision:
The Pursuit of Excellence
A commitment to strive continuously to improve ourselves and our systems with the aim
of becoming the best in our field.
Fairness
A commitment to objectivity and impartiality, to earn the trust and respect of society.
Leadership
A commitment to lead responsively and creatively in educational and research processes.
Integrity and Transparency
A commitment to be ethical, sincere and transparent in all activities and to treat all
individuals with dignity and respect.
Program Educational Objectives (PEO's)

PEO1: Apply appropriate theory, practices, and tools of machine intelligence to the
specification, design, implementation, maintenance, and evaluation in the workplace or in
higher education.
PEO2: Adapt, contribute and innovate new technologies in their computing profession by
working in teams to design, implement, and maintain in the key domains of Artificial
Intelligence & Machine Learning.
PEO3: Function effectively in the work place as competent Artificial Intelligence &
Machine Learning Professionals, Entrepreneurs or Researchers or maintain employment
through lifelong learning such as professional conferences, certificate programs or other
professional educational activities, ethics, and societal awareness.

Programme Outcome (PO's)


PO1. Engineering knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals, and an engineering specialization to the solution of complex
engineering problems.
PO2. Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO3. Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.
PO4. Conduct investigations of complex problems: Use research-based knowledge
and research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.
PO5. Modern tool usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
PO6. The engineer and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.
PO7. Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
PO8. Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
PO9. Individual and team work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.
PO10. Communication: Communicate effectively on complex engineering activities
with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make effective
presentations, and give and receive clear instructions.
PO11. Project management and finance: Demonstrate knowledge and understanding
of the engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary environments.
PO12. Life-long learning: Recognize the need for, and have the preparation and
ability to engage in independent and life-long learning in the broadest context of
technological change.
Program Specific Outcomes (PSO's)

PSO1: Cognitive Outcome: Solve complex engineering problems in computing by


applying the principles in Artificial Intelligence, Machine Learning, Network Engineering,
Software Engineering, Data Engineering and Intelligent Systems.
PSO2: Skill & Design Outcome: Apply technical skills and research skills through
professional societies, certification programs, projects, internships and laboratory exercises
to design & develop algorithms, programs, and projects using modern software tools to
provide the sustainable solutions to Computer Science, Artificial Intelligence and Machine
Learning problems related to the society and environment.
Course objectives:
This course will enable students:

1. To understand the algorithms available for the processing of linguistic information and

2. computational properties of natural languages.

3. To conceive basic knowledge on various morphological, syntactic, and semantic NLP


tasks.

4. To understand and analyze the fundamental concepts of Computer Vision

5. To understand and analyze the fundamental concepts of Computer Vision.

6. To learn to use deep learning tools and framework for solving real-life problems related

to images and signals.


Program List
Implement the following in Python using OpenCV Library:

1. Simulation and display of an Image, Negative of an Image (Binary & Gray Scale)
2. Implementation of the Transformation of an Image.
3. Implementation of Histogram, and Histogram Equalization.
4. Implement the different filtering techniques for noise removal based on spatial and frequency
domains using OpenCV.
5. Implementation of various image segmentation techniques. (Edge-Based, Region-Based and
Threshold-Based)
6. Implementation of different Morphological Image Processing Techniques.
7. Implement the Harris Corner Detector algorithm without the inbuilt Open CV() function.
8. Write a program to compute the SIFT feature descriptors of a given image.
9. Write a program to detect the specific objects in an image using HOG.
10. Implementation of object detection using OpenCV
11. Implementation of Face Recognition using OpenCV
Instructions to Run the Program:

OpenCV is the huge open-source library for computer vision, machine learning, and image
processing and now it plays a major role in real-time operation which is very important in
today’s systems. By using it, one can process images and videos to identify objects, faces, or
even the handwriting of a human. When it integrated with various libraries, such as Numpy,
python is capable of processing the OpenCV array structure for analysis. To Identify image
patterns and its various features we use vector space and perform mathematical operations on
these features. To install OpenCV, one must have Python and PIP, preinstalled on their
system. To check if your system already contains Python, go through the following
instructions: Open the Command line(search for cmd in the Run dialog( + R). Now run the
following command:
python --version
If Python is already installed, it will generate a message with the Python version available.

If Python is not present, go through How to install Python on Windows? and follow the
instructions provided. PIP is a package management system used to install and manage
software packages/libraries written in Python. These files are stored in a large “on-line
repository” termed as Python Package Index (PyPI). To check if PIP is already installed on
your system, just go to the command line and execute the following command:
pip -V
If PIP is not present, go through How to install PIP on Windows? and follow the instructions
provided.
Downloading and Installing OpenCV:
OpenCV can be directly downloaded and installed with the use of pip (package manager). To
install OpenCV, just go to the command-line and type the following command:
pip install opencv-python
Beginning with the installation:
• Type the command in the Terminal and proceed:

• Collecting Information and downloading data:

• Installing Packages:

• Finished Installation:
To check if OpenCV is correctly installed, just run the following commands to perform a
version check:
python
>>>import cv2
>>>print(cv2.__version__)

To use the OpenCV library in python, we need to install these libraries as a prerequisite:
1. Numpy Library : The computer processes images in the form of a matrix for which NumPy is
used and OpenCV uses it in the background.
2. OpenCV python : OpenCV library previously it was cv but the updated version is cv2. It is
used to manipulate images and videos.

To install these libraries, we need to run these pip commands in cmd:


pip install opencv-python
pip install numpy
pip install matplotlib
The steps to read and display an image in OpenCV are:
1. Read an image using imread() function.
2. Create a GUI window and display image using imshow() function.
3. Use function waitkey(0) to hold the image window on the screen by the specified number
of seconds, o means till the user closes it, it will hold GUI window on the screen.
4. Delete image window from the memory after displaying using destroyAllWindows()
function.
Let’s start reading an image. using cv2.
To read the images cv2.imread() method is used. This method loads an image from the
specified file. If the image cannot be read (because of missing file, improper permissions,
unsupported or invalid format) then this method returns an empty matrix.
Syntax: cv2.imread(path, flag)
Parameters:
path: A string representing the path of the image to be read.
flag: It specifies the way in which image should be read. It’s default value is
cv2.IMREAD_COLOR
Return Value: This method returns an image that is loaded from the specified file.
Note:
1. The image should be in the working directory or a full path of image should be given.
2. By default, OpenCV stores colored images in BGR(Blue Green and Red) format.
All three types of flags are described below:
cv2.IMREAD_COLOR: It specifies to load a color image. Any transparency of image will be
neglected. It is the default flag. Alternatively, we can pass integer value 1 for this flag.
cv2.IMREAD_GRAYSCALE: It specifies to load an image in grayscale mode. Alternatively,
we can pass integer value 0 for this flag.
cv2.IMREAD_UNCHANGED: It specifies to load an image as such including alpha
channel. Alternatively, we can pass integer value -1 for this flag.

Below codes are implementations to read images and display images on the screen using
OpenCV and matplotlib libraries functions.

Example #1 (Using OpenCV) :


Image used is:

# Python code to read image


import cv2
# To read image from disk, we use
# cv2.imread function, in below method,
img=cv2.imread(r'C:\Users\DSU CSCL9-10\Desktop\1.jpg')

# Creating GUI window to display an image on screen


# first Parameter is windows title (should be in string format)
# Second Parameter is image array
cv2.imshow('myimage', img)

# To hold the window on screen, we use cv2.waitKey method


# Once it detected the close input, it will release the control
# To the next line
# First Parameter is for holding screen for specified milliseconds
# It should be positive integer. If 0 pass an parameter, then it will
# hold the screen until user close it.
cv2.waitKey(0)

# It is for removing/deleting created GUI window from screen


# and memory
cv2.destroyAllWindows()
Output:

Reading an image using cv2 in BGR format


Experiment No 1
Simulation and display of an Image, Negative of an Image (Binary & Gray Scale)

Display of an Image:

Theory:
The OpenCV module is an open-source computer vision and machine learning software library. It is a huge
open-source library for computer vision, machine learning, and image processing. OpenCV supports a wide
variety of programming languages like Python, C++, Java, etc. It can process images and videos to identify
objects, faces, or even the handwriting of a human. When it is integrated with various libraries, such
as numpy which is a highly optimized library for numerical operations, then the number of weapons increases
in your Arsenal i.e whatever operations one can do in Numpy can be combined with OpenCV.
First, let’s look at how to display images using OpenCV:
Now there is one function called cv2.imread() which will take the path of an image as an argument. Using this
function you will read that particular image and simply display it using the cv2.imshow() function.

Code:
# import required module
import cv2
# read the Image by giving path
img=cv2.imread(r'C:\Users\DSU CSCL9-10\Desktop\1.jpg')
# display that image
cv2.imshow('myimage', img)
Sample Output:

DIsplay image using OpenCV

Negative transformation of an image:


Theory:
As we know that an 8-bit image has a max intensity value is 255, therefore we need to subtract each pixel
from 255(max intensity value) to produce the negative image.
s = T(r) = L – 1 – r
In this article, we will see different ways to get the negative transformation of an image using OpenCV
Python.

Code:
import cv2
img=cv2.imread(r'C:\Users\DSU CSCL9-10\Desktop\1.jpg')
cv2.imshow('myimage', img)
# Subtract the img array values from max value(calculated from dtype)
img_neg = 255 - img
# Show the negative image
cv2.imshow('negative',img_neg)
cv2.waitKey(0)
Sample Output:
Input image
Output image

Negative of grayscale images


For a grayscale images, the light parts of the image appear dark and the darker parts appear lighter in the
negative transformation.
Example
In this example, we will read the image in grayscale mode to generate its negative.
import cv2

Code:

# Load the image and convert it into gray image


gray=cv2.imread(r'C:\Users\DSU CSCL9-10\Desktop\1.jpg', 0)
cv2.imshow('Gray image:', gray)
# Invert the image using cv2.bitwise_not
gray_neg=255-gray / gray_neg = cv2.bitwise_not(gray)
# Show the image
cv2.imshow('negative',gray_neg)
cv2.waitKey(0)

Sample Output:

Input image

Output image
Experiment No 2
Write a program for implementation of the Transformation of an Image.

Implementation of the Transformation of an Image

Image Transformation involves the transformation of image data in order to retrieve information from the
image or preprocess the image for further usage. In this tutorial we are going to implement the following
image transformation:
• Image Translation
• Reflection
• Rotation
• Scaling
• Cropping
• Shearing in x-axis
• Shearing in y-axis

Image Translation
In computer vision or image processing, image translation is the rectilinear shift of an image from one location
to another, so the shifting of an object is called translation. In other words, translation is the shifting of an
object’s location.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 100], [0, 1, 50]])
dst = cv.warpAffine(img, M, (cols, rows))
cv.imshow('img', dst)
cv.waitKey(0)
cv.destroyAllWindows()
In the above code, we have imported NumPy and OpenCV module then read the image by
using imread() function, and then translation takes place with the warpAffine() method which is defined as
follows:
In the first argument, we passed the image, in the second argument it takes a matrix as a parameter in the
matrix we give x = 100, which means we are telling the function to shift the image 70 units on the right side
and y= 50, which means we are telling the function to shift the image 50 units downwards. In the third
argument, where we mentioned the cols and rows, we told the function to do not to crop the image from both
the x and y sides.
dst = cv.warpAffine(img,M,(cols,rows))
Output:
Image Reflection
Image reflection is used to flip the image vertically or horizontally. For reflection along the x-axis, we set the
value of Sy to -1, Sx to 1, and vice-versa for the y-axis reflection.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 0], [0, -1, rows], [0, 0, 1]])
reflected_img = cv.warpPerspective(img, M, (int(cols),int(rows)))
cv.imshow('img', reflected_img)
cv.imwrite('reflection_out.jpg', reflected_img)
cv.waitKey(0)
cv.destroyAllWindows()
To flip the image horizontally:
M = np.float32([[1, 0, 0], [0, -1, rows],[0, 0, 1]])
To flip the image vertically:
M = np.float32([[-1, 0, cols], [0, 1, 0], [0, 0, 1]])
Output:

Image Rotation
Image rotation is a common image processing routine with applications in matching, alignment, and other
image-based algorithms, in image rotation the image is rotated by a definite angle. It is used extensively in
data augmentation, especially when it comes to image classification.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 0], [0, -1, rows], [0, 0, 1]])
img_rotation = cv.warpAffine(img, cv.getRotationMatrix2D((cols/2, rows/2), 30, 0.6), (cols, rows))
cv.imshow('img', img_rotation)
cv.imwrite('rotation_out.jpg', img_rotation)
cv.waitKey(0)
cv.destroyAllWindows()
We have used the get rotation matrix function to define the parameter required in the warpAffine function to
tell the function to make a matrix that can give a required rotation angle( here it is 30 degrees) with shrinkage
of the image by 40%.
img_rotation = cv.warpAffine(img,
cv.getRotationMatrix2D((cols/2, rows/2), 30, 0.6),
(cols, rows))
Output:

Image Scaling
Image scaling is a process used to resize a digital image. We perform two things in the image scaling either we
enlarge the image or we shrink the image, OpenCV has a built-in function cv2.resize() for image scaling.
Shrinking an image:
img_shrinked = cv2.resize(image, (350, 300),
interpolation = cv2.INTER_AREA)
Note: Here 350 and 300 are the height and width of the shrunk image respectively
Enlarging Image:
img_enlarged = cv2.resize(img_shrinked, None,
fx=1.5, fy=1.5,
interpolation=cv2.INTER_CUBIC)
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
img_shrinked = cv.resize(img, (250, 200), interpolation=cv.INTER_AREA)
cv.imshow('img', img_shrinked)
img_enlarged = cv.resize(img_shrinked, None, fx=1.5, fy=1.5, interpolation=cv.INTER_CUBIC)
cv.imshow('img', img_enlarged)
cv.waitKey(0)
cv.destroyAllWindows()
Output:

Image Cropping
Cropping is the removal of unwanted outer areas from an image.
cropped_img = img[100:300, 100:300]
OpenCV loads the image as a NumPy array, we can crop the image simply by indexing the array, in our case,
we choose to get 200 pixels from 100 to 300 on both axes.
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
cropped_img = img[100:300, 100:300]
cv.imwrite('cropped_out.jpg', cropped_img)
cv.waitKey(0)
cv.destroyAllWindows()
Output:
Image Shearing in X-Axis
While the shearing image is on the x-axis, the boundaries of the image that are parallel to the x-axis keep their
location, and the edges parallel to the y-axis change their place depending on the shearing factor.
M = np.float32([[1, 0.5, 0], [0, 1, 0], [0, 0, 1]])
sheared_img = cv.warpPerspective(img, M,
(int(cols*1.5),
int(rows*1.5)))
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0.5, 0], [0, 1, 0], [0, 0, 1]])
sheared_img = cv.warpPerspective(img, M, (int(cols*1.5), int(rows*1.5)))
cv.imshow('img', sheared_img)
cv.waitKey(0)
cv.destroyAllWindows()
Output:

Image Shearing in Y-Axis


When shearing is done in the y-axis direction, the boundaries of the image that are parallel to the y-axis keep
their location, and the edges parallel to the x-axis change their place depending on the shearing factor.
M = np.float32([[1, 0, 0], [0.5, 1, 0], [0, 0, 1]])
sheared_img = cv.warpPerspective(img, M,
(int(cols*1.5),
int(rows*1.5)))
• Python3
import numpy as np
import cv2 as cv
img = cv.imread('girlImage.jpg', 0)
rows, cols = img.shape
M = np.float32([[1, 0, 0], [0.5, 1, 0], [0, 0, 1]])
sheared_img = cv.warpPerspective(img, M, (int(cols*1.5), int(rows*1.5)))
cv.imshow('sheared_y-axis_out.jpg', sheared_img)
cv.waitKey(0)
cv.destroyAllWindows()
Output:
Experiment No 3
Implement Histogram, and Histogram Equalization.

Histogram Equalization:

Theory:
Histogram equalization is a method in image processing of contrast adjustment using the image’s
histogram. This method usually increases the global contrast of many images, especially when the
usable data of the image is represented by close contrast values. Through this adjustment, the
intensities can be better distributed on the histogram. This allows for areas of lower local contrast to
gain a higher contrast. Histogram equalization accomplishes this by effectively spreading out the
most frequent intensity values. The method is useful in images with backgrounds and foregrounds
that are both bright or both dark.
Consider an image whose pixel values are confined to some specific range of values only. For eg,
brighter image will have all pixels confined to high values. But a good image will have pixels from
all regions of the image. So you need to stretch this histogram to either ends and that is what
Histogram Equalization does (in simple words). This normally improves the contrast of the image.

Code:
import cv2
from matplotlib import pyplot as plt
def run_histogram_equalization(image_path):
rgb_img = cv2.imread(r'C:\Users\Dr. G Mandal\Desktop\1.jpg')
# convert from RGB color-space to YCrCb
ycrcb_img = cv2.cvtColor(rgb_img, cv2.COLOR_BGR2YCrCb)
# equalize the histogram of the Y channel
ycrcb_img[:, :, 0] = cv2.equalizeHist(ycrcb_img[:, :, 0])
# convert back to RGB color-space from YCrCb
equalized_img = cv2.cvtColor(ycrcb_img, cv2.COLOR_YCrCb2BGR)
cv2.imshow('equalized_img', equalized_img)
plt.hist(equalized_img.ravel(),256,[0,256]);
plt.show()
cv2.waitKey(0)

Sample Output:
Experiment No 4
Implement the different filtering techniques for noise removal based on spatial and frequency
domains using OpenCV.

Frequency Domain Filtering

Image Frequency Filtering


Pre-requisite: Installation of Python and libraries (Numpy, Pandas and OpenCV) are required. Refer
to the respective Documentation to install and learn more about them.
Note: I would not be able to cover all the Filtering techniques.
Refer to the Documentations and Try them all.
Let us First Import the OpenCV Library and the Image on which we will perform operations.
import numpy as np
import pandas as pd
import cv2img_root=”Images/”
img_name=”testImage.jpg”# Reading the Image
img = cv2.imread(img_path,cv2.IMREAD_UNCHANGED)
Frequency Domain Filters are used for smoothing and sharpening of images by removal of high or
low-frequency components.
Frequency domain filters are different from spatial domain filters as it mainly focuses on the
frequency of the images. It is done for two basic operations i.e., Smoothing and Sharpening.
Let us perform some Domain Filter using cv2.edgePreservingFilter() method
domainFilter = cv2.edgePreservingFilter(img, flags=1, sigma_s=60, sigma_r=0.6)
cv2.imshow('Domain Filter',domainFilter)
cv2.waitKey(0)
cv2.destroyAllWindows()

Domain Filtered Image vs Original Image


Let us try to Smoothen this image using the Gaussian Blur Method from OpenCV Library.
Gaussian blur (also known as Gaussian smoothing) is the result of blurring an image by a
Gaussian function.
It is a widely used effect in graphics software, typically to reduce image noise and reduce detail. The
visual effect of this blurring technique is a smooth blur resembling that of viewing the image through
a translucent screen, distinctly different from the bokeh effect produced by an out-of-focus lens or
the shadow of an object under usual illumination.
gaussBlur = cv2.GaussianBlur(img,(5,5),cv2.BORDER_DEFAULT)
cv2.imshow("Gaussian Smoothing",np.hstack((img,gaussBlur)))
cv2.waitKey(0)
cv2.destroyAllWindows()

Gaussian Blur vs Original Image


Let us try to perform Mean Filtering Techniques on this image.
The idea of mean filtering is simply to replace each pixel value in an image with the mean
(`average’) value of its neighbours, including itself. This has the effect of eliminating pixel values
that are unrepresentative of their surroundings. Mean filtering is usually thought of as a convolution
filter. Like other convolutions, it is based around a kernel, which represents the shape and size of the
neighbourhood to be sampled when calculating the mean.
kernel = np.ones((10,10),np.float32)/25
meanFilter = cv2.filter2D(img,-1,kernel)
cv2.imshow("Mean Filtered Image",np.hstack((img, meanFilter)))
cv2.waitKey(0)
cv2.destroyAllWindows()

Mean Filter vs Original Image


Let us try to perform Median Filtering Techniques on this image.
Median filtering is a nonlinear process useful in reducing impulsive, or salt-and-pepper noise. It is
also useful in preserving edges in an image while reducing random noise. Impulsive or salt-and-
pepper noise can occur due to a random bit error in a communication channel. In a median filter, a
window slides along the image, and the median intensity value of the pixels within the window
becomes the output intensity of the pixel being processed.
#Median Filter
medianFilter = cv2.medianBlur(img,5)
cv2.imshow("Median Filter",np.hstack((img, medianFilter)))
cv2.waitKey(0)
cv2.destroyAllWindows()

Median Filter Image vs Original Image


Let us try to perform Bilateral Filtering Techniques on this image.
A Bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images. It
replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels.
This weight can be based on a Gaussian distribution.
Crucially, the weights depend not only on the Euclidean distance of pixels but also on the
radiometric differences (e.g., range differences, such as colour intensity, depth distance, etc.). This
preserves sharp edges.
# Bilateral filter
print("Bilateral Filter")
bilFil = cv2.bilateralFilter(img, 60, 60, 60)
cv2.imshow("Bilateral Filter",np.hstack((img, bilFil)))
cv2.waitKey(0)
cv2.destroyAllWindows()

Bilateral Filtered Image vs Original Image


Let us try to perform Frequency Band Filtering Techniques on this image.
Frequency filters process an image in the frequency domain. The image is Fourier transformed,
multiplied with the filter function and then re-transformed into the spatial domain. Attenuating high
frequencies results in a smoother image in the spatial domain, attenuating low frequencies
enhances the edges.
All frequency filters can also be implemented in the spatial domain and, if there exists a simple
kernel for the desired filter effect, it is computationally less expensive to perform the filtering in the
spatial domain. Frequency filtering is more appropriate if no straightforward kernel can be found in
the spatial domain, and may also be more efficient.
For High Band Pass Filter :
highPass = img – gaussBlur
# or We can use this statement to filter the high pass image
#highPass = highPass + 127*np.ones(img.shape, np.uint8)
cv2.imshow("High Pass",np.hstack((img, highPass)))
cv2.waitKey(0)cv2.destroyAllWindows()
For Low Band Pass Filter :
lowPass = cv2.filter2D(img,-1, kernel)
lowPass = img – lowpass
cv2.imshow("Low Pass",np.hstack((img, lowPass)))
cv2.waitKey(0)
cv2.destroyAllWindows()

High Pass vs Low Pass vs Original Image


Experiment No 5
Implementation of various image segmentation techniques.
(Edge-Based, Region-based and Threshold-Based)

Theory

Edge-Based image segmentation:


Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in
1. It is a multi-stage algorithm and we will go through each stages.
2. Noise Reduction
Since edge detection is susceptible to noise in the image, first step is to remove the noise in the image
with a 5x5 Gaussian filter. We have already seen this in previous chapters.
3. Finding Intensity Gradient of the Image
Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to get
first derivative in horizontal direction ( Gx) and vertical direction ( Gy). From these two images, we
can find edge gradient and direction for each pixel as follows:
Edge_Gradient(G)=Gx2+Gy2Angle(θ)=tan−1⁡(GyGx)
Gradient direction is always perpendicular to edges. It is rounded to one of four angles representing
vertical, horizontal and two diagonal directions.
4. Non-maximum Suppression
After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted
pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a local
maximum in its neighborhood in the direction of gradient. Check the image below:
image
Point A is on the edge ( in vertical direction). Gradient direction is normal to the edge. Point B and C
are in gradient directions. So point A is checked with point B and C to see if it forms a local
maximum. If so, it is considered for next stage, otherwise, it is suppressed ( put to zero).
In short, the result you get is a binary image with "thin edges".
1. Hysteresis Thresholding
This stage decides which are all edges are really edges and which are not. For this, we need two
threshold values, minVal and maxVal. Any edges with intensity gradient more than maxVal are sure
to be edges and those below minVal are sure to be non-edges, so discarded. Those who lie between
these two thresholds are classified edges or non-edges based on their connectivity. If they are
connected to "sure-edge" pixels, they are considered to be part of edges. Otherwise, they are also
discarded. See the image below:

image
The edge A is above the maxVal, so considered as "sure-edge". Although edge C is below maxVal, it
is connected to edge A, so that also considered as valid edge and we get that full curve. But edge B,
although it is above minVal and is in same region as that of edge C, it is not connected to any "sure-
edge", so that is discarded. So it is very important that we have to select minVal and maxVal
accordingly to get the correct result.
This stage also removes small pixels noises on the assumption that edges are long lines.
So what we finally get is strong edges in the image.
Canny Edge Detection in OpenCV
OpenCV puts all the above in single function, cv.Canny(). We will see how to use it. First argument
is our input image. Second and third arguments are our minVal and maxVal respectively. Fourth
argument is aperture_size. It is the size of Sobel kernel used for find image gradients. By default it is
3. Last argument is L2gradient which specifies the equation for finding gradient magnitude. If it is
True, it uses the equation mentioned above which is more accurate, otherwise it uses this
function: Edge_Gradient(G)=|Gx|+|Gy|. By default, it is False.

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread(r'C:\Users\Dr. G Mandal\Desktop\1.jpg')
edges = cv.Canny(img,100,200)
plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image'), plt.xticks([]), plt.yticks([])
plt.show()

Region-Based Image Segmentation

Region-based Image segmentation is a widely used technique in image processing that involves
partitioning an image into regions or objects of interest based on their similarity in colour, texture, or
other features. In this tutorial, we will explore how to implement region-based segmentation using
OpenCV, a popular computer vision library.

Step 1: Read the input image The first step is to read the input image using
the cv2.imread() function of OpenCV. Make sure that the image is in the same directory as your
Python file.

import cv2

# Read input image


img = cv2.imread('input_image.jpg')

Step 2: Preprocessing Before applying the Watershed Algorithm, we need to perform some
preprocessing steps to improve the image segmentation result. The preprocessing steps include noise
reduction and image smoothing.

• Noise Reduction Noise reduction is essential to improve segmentation accuracy. In this


tutorial, we will use the cv2.fastNlMeansDenoisingColored() function of OpenCV to reduce
noise from the input image.
• # Reduce noise using fastNlMeansDenoisingColored() function
• img = cv2.fastNlMeansDenoisingColored(img, None, 10, 10, 7, 21)
• Image Smoothing Image smoothing helps to remove small variations in intensity that can
cause over-image segmentation. In this tutorial, we will use the cv2.medianBlur() function of
OpenCV to smooth the input image.
• # Smooth image using medianBlur() function
• img = cv2.medianBlur(img, 5)

Step 3: Thresholding The next step is to apply thresholding to the preprocessed image.
Thresholding is the process of converting an image into a binary image by selecting a threshold
value. In this tutorial, we will use the cv2.threshold() function of OpenCV to apply thresholding.

# Convert image to grayscale


gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply thresholding
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV +
cv2.THRESH_OTSU)

Step 4: Morphological Operations Morphological operations are used to remove noise and fill
holes in the segmented regions. In this tutorial, we will use the cv2.morphologyEx() function of
OpenCV to perform morphological operations for image segmentation.

# Perform morphological operations


kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
sure_bg = cv2.dilate(opening, kernel, iterations=3)

Step 5: Distance Transform and Marker-Based Watershed Algorithm for image


segmentation The next step is to apply the distance transform to the thresholded image to obtain the
foreground regions. In this tutorial, we will use the cv2.distanceTransform() function of OpenCV to
calculate the distance transform.

# Apply distance transform


dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255,
0)

After applying the distance transform, we can perform the marker-based Watershed Algorithm to
segment the image into multiple regions. In this tutorial, we will use
the cv2.connectedComponents() function of OpenCV to obtain the markers and then apply the
Watershed Algorithm.

Step 6: Display the results: Finally, we can display the segmented image and the contours found in
the previous step using the following code:

# Display the segmented image


cv2.imshow('Segmented Image', segmented_image)
cv2.waitKey(0)

# Draw the contours on the original image


cv2.drawContours(image, contours, -1, (0,255,0), 3)
cv2.imshow('Contours', image)
cv2.waitKey(0)

# Cleanup
cv2.destroyAllWindows()

The cv2.imshow() function is used to display the output of the image segmentation process and the
contours separately. The cv2.drawContours() function is used to draw the contours on the original
image. The last command cv2.destroyAllWindows() is used to close all the windows.
Threshold-Based Image Segmentation:
Thresholding One of the simplest and most widely used image segmentation techniques is thresholding, which
involves converting an image into a binary image by setting all pixels with intensities above a certain
threshold to white and all other pixels to black. In OpenCV, we can apply thresholding using the
cv2.threshold() function as follows:

import cv2
# Read image
img = cv2.imread('image.jpg')
# Convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply thresholding
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
# Display thresholded image
cv2.imshow('thresholded', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

In this code, we first convert the image to grayscale using the cv2.cvtColor() function. Then, we apply
thresholding using the cv2.threshold() function, where grey is the input image, 127 is the threshold
value, 255 is the maximum pixel value, and cv2.THRESH_BINARY is the thresholding type. Finally, we
display the thresholded image using the cv2.imshow() function.

Output:
Experiment No 6
Implementation of different Morphological Image Processing Techniques.

Python OpenCV Morphological operations are one of the Image processing techniques that
processes image based on shape. This processing strategy is usually performed on binary images.
Morphological operations based on OpenCV are as follows:
• Erosion
• Dilation
• Opening
• Closing
• Morphological Gradient
• Top hat
• Black hat

Erosion
Just like water rushing along a river bank erodes the soil, an erosion in an image “erodes” the
foreground object and makes it smaller. Simply put, pixels near the boundary of an object in an
image will be discarded, “eroding” it away.

Erosion works by defining a structuring element and then sliding this structuring element from left-to-
right and top-to-bottom across the input image.

A foreground pixel in the input image will be kept only if all pixels inside the structuring element
are > 0. Otherwise, the pixels are set to 0 (i.e., background).

Erosion is useful for removing small blobs in an image or disconnecting two connected objects.

We can perform erosion by using the

We only have a single command line argument to parse, our input

--image
that we’ll be applying erosions to.
In most examples in this lesson we’ll be applying morphological operations to the PyImageSearch
logo, which we can see below:

Figure 4: The
example PyImageSearch logo that we’ll be applying morphological operations to in this lesson.

As I mentioned earlier in this lesson, we typically (but not always) apply morphological operations
to binary images. As we’ll see later in this lesson, there are exceptions to that, especially when
using the black hat and white hat operators, but for the time being, we are going to assume we are
working with a binary image, where the background pixels are black and the foreground pixels
are white.

Let’s load our input

image = cv.imread(r'C:\Users\Dr. G Mandal\Desktop\j.png')


gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Original", image)
# apply a series of erosions
for i in range(0, 3):
eroded = cv2.erode(gray.copy(), None, iterations=i + 1)
cv2.imshow("Eroded {} times".format(i + 1), eroded)
cv2.waitKey(0)

Applying erosion to our input image. As the number of iterations increases, more and more of the
logo is eroded away.
On the very top we have our original image. And then underneath the image, we have the logo
being eroded a total of 1, 2, and 3 times, respectively. Notice as the number of erosion iterations
increases, more and more of the logo is eaten away.

Again, erosions are most useful for removing small blobs from an image or disconnecting two
connected components. With this in mind, take a look at the letter “p” in the PyImageSearch logo.
Notice how the circular region of the “p” has disconnected from the stem after 2 erosions — this is
an example of disconnecting two connected components of an image.

Dilation
The opposite of an erosion is a dilation. Just like an erosion will eat away at the foreground pixels,
a dilation will grow the foreground pixels.

Dilations increase the size of foreground objects and are especially useful for joining broken parts of
an image together.

Dilations, just as an erosion, also utilize structuring elements — a center pixel p of the structuring
element is set to white if ANY pixel in the structuring element is > 0.

We apply dilations using the

# close all windows to cleanup the screen


cv2.destroyAllWindows()
cv2.imshow("Original", image)
# apply a series of dilations
for i in range(0, 3):
dilated = cv2.dilate(gray.copy(), None, iterations=i + 1)
cv2.imshow("Dilated {} times".format(i + 1), dilated)
cv2.waitKey(0)

The output of our dilation can be seen below:


Applying a dilation to our input image. Notice how the foreground region has grown.

Again, at the very top we have our original input image. And below the input image we have our
image dilated 1, 2, and 3 times, respectively.

Unlike an erosion where the foreground region is slowly eaten away at, a dilation actually grows our
foreground region.

Dilations are especially useful when joining broken parts of an object — for example, take a look at
the bottom image where we have applied a dilation with 3 iterations. By this point, the gaps
between all letters in the logo have been joined.
Opening
An opening is an erosion followed by a dilation.

Performing an opening operation allows us to remove small blobs from an image: first an
erosion is applied to remove the small blobs, then a dilation is applied to regrow the size of
the original object.

Let’s look at some example code to apply an opening to an image:

cv2.destroyAllWindows()
cv2.imshow("Original", image)
kernelSizes = [(3, 3), (5, 5), (7, 7)]
# loop over the kernels sizes
for kernelSize in kernelSizes:
# construct a rectangular kernel from the current size and then
# apply an "opening" operation
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
opening = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
cv2.imshow("Opening: ({}, {})".format(
kernelSize[0], kernelSize[1]), opening)
cv2.waitKey(0)
Finally, display the output of applying our opening.

As I mentioned above, an opening operation allows us to remove small blobs in an image. I


went ahead and added some blobs to the PyImageSearch logo (

pyimagesearch_logo_noise.png
in our project directory structure):

The PyImageSearch logo with random blobs/noise added to it.


When you apply our opening morphological operations to this noisy image you’ll receive the
following output:
The results of applying an opening operation to our image — notice how the small, random
blobs have been removed.

Notice how by the time we are using a kernel of size 5×5, the small, random blobs are
nearly completely gone. And by the time it reaches a kernel of size 7×7, our opening
operation has not only removed all the random blobs, but also “opened” holes in the letter
“p” and the letter “a”.
Closing
The exact opposite to an opening would be a closing. A closing is a dilation followed
by an erosion.

As the name suggests, a closing is used to close holes inside of objects or for connecting
components together.

The below code block contains the code to perform a closing:

cv2.destroyAllWindows()
cv2.imshow("Original", image)
# loop over the kernels sizes again
for kernelSize in kernelSizes:
# construct a rectangular kernel form the current size, but this
# time apply a "closing" operation
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
closing = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
cv2.imshow("Closing: ({}, {})".format(
kernelSize[0], kernelSize[1]), closing)
cv2.waitKey(0)

Performing the closing operation is again accomplished by making a call


to

We’ll go back to using our original image (without the random blobs). The
output for applying a closing operation with increasing structuring
element sizes can be seen below:
Applying a morphological closing operation to our input image.

Notice how the closing operation is starting to bridge the gap between
letters in the logo. Furthermore, letters such as “e”, “s”, and “a” are
practically filled in.
Morphological gradient
A morphological gradient is the difference between a dilation and erosion. It is useful
for determining the outline of a particular object of an image:

cv2.destroyAllWindows()
cv2.imshow("Original", image)
# loop over the kernels a final time
for kernelSize in kernelSizes:
# construct a rectangular kernel and apply a "morphological
# gradient" operation to the image
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
gradient = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel)
cv2.imshow("Gradient: ({}, {})".format(
kernelSize[0], kernelSize[1]), gradient)
cv2.waitKey(0)

A morphological gradient can be used to find the outline of an object in an image.


Notice how the outline of the PyImageSearch logo has been clearly revealed after applying
the morphological gradient operation.

Top hat/white hat and black hat


A top hat (also known as a white hat) morphological operation is the difference
between the original (grayscale/single channel) input image and the opening.

A top hat operation is used to reveal bright regions of an image on dark backgrounds.

Up until this point we have only applied morphological operations to binary images. But we
can also apply morphological operations to grayscale images as well. In fact, both the top
hat/white hat and the black hat operators are more suited for grayscale images rather than
binary ones.

To demonstrate applying morphological operations, let’s take a look at the following image
where our goal is to detect the license plate region of the car:

Our goal is to apply morphological operations to find the license plate region of the car.

So how are we going to go about doing this?


Well, taking a look at the example image above, we see that the license plate is bright since
it’s a white region against a dark background of the car itself. An excellent starting point to
finding the region of a license plate would be to use the top hat operator.

To test out the top hat operator, create a new file, name it

import argparse
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
args = vars(ap.parse_args())

Let’s load our input

image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# construct a rectangular kernel (13x5) and apply a blackhat
# operation which enables us to find dark regions on a light
# background
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

image from disk and convert it to grayscale, thereby preparing it for our black hat and white hat operations.
Then defines a rectangular structuring element with a width of 13 pixels and a height of 5 pixels. As I
mentioned earlier in this lesson, structuring elements can be of arbitrary size. And in this case, we
are applying a rectangular element that is almost 3x wider than it is tall.

And why is this?

Because a license plate is roughly 3x wider than it is tall!

By having some basic a priori knowledge of the objects you want to detect in images, we can
construct structuring elements to better aid us in finding them.

it applies the black hat operator.

In a similar fashion we can also apply a top hat/white hat operation:


tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, rectKernel)
# show the output images
cv2.imshow("Original", image)
cv2.imshow("Blackhat", blackhat)
cv2.imshow("Tophat", tophat)
cv2.waitKey(0)

To specify a top hat/white hat operator instead of a blackhat, we simply change the type of operator
to

cv2.MORPH_TOPHAT
.

Below you can see the output of applying the top hat operators:

Applying a top hat operation reveals light regions on a dark background.

Notice how the right (i.e., the top hat/white hat) regions that are light against a dark
background are clearly displayed — in this case, we can clearly see that the license plate
region of the car has been revealed.

But also note that the license plate characters themselves have not been included. This is
because the license plate characters are dark against a light background.

To help remedy that, we can apply a black hat operator:


Applying a black hat operation reveals dark regions on a light background.

To reveal our license plate characters you would first segment out the license plate itself via
a top hat operator and then apply a black hat operator (or thresholding) to extract the
individual license plate characters (perhaps using methods like contour detection).
Experiment No 7
Implement the Harris Corner Detector algorithm without the inbuilt Open CV() function.

Harris Corner Detection


Goal
• We will understand the concepts behind Harris Corner Detection.
• We will see the following functions: cv.cornerHarris(), cv.cornerSubPix()
Theory
We saw that corners are regions in the image with large variation in intensity in all the directions.
One early attempt to find these corners was done by Chris Harris & Mike Stephens in their paper A
Combined Corner and Edge Detector in 1988, so now it is called the Harris Corner Detector. He took
this simple idea to a mathematical form. It basically finds the difference in intensity for a
displacement of (u,v) in all directions. This is expressed as below:
E(u,v)=∑x,yw(x,y)⏟window function[I(x+u,y+v)⏟shifted intensity−I(x,y)⏟intensity]2
The window function is either a rectangular window or a Gaussian window which gives weights to
pixels underneath.
We have to maximize this function E(u,v) for corner detection. That means we have to maximize the
second term. Applying Taylor Expansion to the above equation and using some mathematical steps
(please refer to any standard text books you like for full derivation), we get the final equation as:
E(u,v)≈[uv]M[uv]
where
M=∑x,yw(x,y)[IxIxIxIyIxIyIyIy]
Here, Ix and Iy are image derivatives in x and y directions respectively. (These can be easily found
using cv.Sobel()).
Then comes the main part. After this, they created a score, basically an equation, which determines if
a window can contain a corner or not.
R=det(M)−k(trace⁡(M))2
where
• det(M)=λ1λ2
• trace⁡(M)=λ1+λ2
• λ1 and λ2 are the eigenvalues of M
So the magnitudes of these eigenvalues decide whether a region is a corner, an edge, or flat.
• When |R| is small, which happens when λ1 and λ2 are small, the region is flat.
• When R<0, which happens when λ1>>λ2 or vice versa, the region is edge.
• When R is large, which happens when λ1 and λ2 are large and λ1∼λ2, the region is a corner.
It can be represented in a nice picture as follows:
image
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a
suitable score gives you the corners in the image. We will do it with a simple image.
Harris Corner Detector in OpenCV
OpenCV has the function cv.cornerHarris() for this purpose. Its arguments are:
• img - Input image. It should be grayscale and float32 type.
• blockSize - It is the size of neighbourhood considered for corner detection
• ksize - Aperture parameter of the Sobel derivative used.
• k - Harris detector free parameter in the equation.
See the example below:
import numpy as np
import cv2 as cv
filename = 'chessboard.png'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.04)
#result is dilated for marking the corners, not important
dst = cv.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
cv.imshow('dst',img)
if cv.waitKey(0) & 0xff == 27:
cv.destroyAllWindows()
Below are the three results:
Experiment No 8
Write a program to compute the SIFT feature descriptors of a given image.

SIFT (Scale Invariant Feature Transform) Detector is used in the detection of interest points on an
input image. It allows the identification of localized features in images which is essential in applications such
as:

• Object Recognition in Images


• Path detection and obstacle avoidance algorithms
• Gesture recognition, Mosaic generation, etc.
Unlike the Harris Detector, which is dependent on properties of the image such as viewpoint, depth, and scale,
SIFT can perform feature detection independent of these properties of the image. This is achieved by the
transformation of the image data into scale-invariant coordinates. The SIFT Detector has been said to be a
close approximation of the system used in the primate visual system.
Steps for Extracting Interest Points

Fig 01: Sequence of steps followed in SIFT Detector


Phase I: Scale Space Peak Selection
The concept of Scale Space deals with the application of a continuous range of Gaussian Filters to the target
image such that the chosen Gaussian have differing values of the sigma parameter. The plot thus obtained is
called the Scale Space. Scale Space Peak Selection depends on the Spatial Coincidence Assumption.
According to this, if an edge is detected at the same location in multiple scales (indicated by zero crossings in
the scale space) then we classify it as an actual edge.
Fig 02: Peaks are selected across Scales.
In 2D images, we can detect the Interest Points using the local maxima/minima in Scale Space of Laplacian of
Gaussian. A potential SIFT interest point is determined for a given sigma value by picking the potential
interest point and considering the pixels in the level above (with higher sigma), the same level, and the level
below (with lower sigma than current sigma level). If the point is maxima/minima of all these 26 neighboring
points, it is a potential SIFT interest point – and it acts as a starting point for interest point detection.
Phase II: Key Point Localization
Key point localization involves the refinement of keypoints selected in the previous stage. Low contrast key-
points, unstable key points, and keypoints lying on edges are eliminated. This is achieved by calculating
the Laplacian of the keypoints found in the previous stage. The extrema values are computed as follows:

In the above expression, D represents the Difference of Gaussian. To remove the unstable key points, the
value of z is calculated and if the function value at z is below a threshold value then the point is excluded.
Fig 03 Refinement of Keypoints after Keypoint Localization
Phase III: Assigning Orientation to Keypoints
To achieve detection which is invariant with respect to the rotation of the image, orientation needs to be
calculated for the key-points. This is done by considering the neighborhood of the keypoint and calculating the
magnitude and direction of gradients of the neighborhood. Based on the values obtained, a histogram is
constructed with 36 bins to represent 360 degrees of orientation(10 degrees per bin). Thus, if the gradient
direction of a certain point is, say, 67.8 degrees, a value, proportional to the gradient magnitude of this point,
is added to the bin representing 60-70 degrees. Histogram peaks above 80% are converted into a new keypoint
are used to decide the orientation of the original keypoint.

Fig 04: Assigning Orientation to Neighborhood and creating Orientation Histogram


Phase IV: Key Point Descriptor
Finally, for each keypoint, a descriptor is created using the keypoints neighborhood. These descriptors are
used for matching keypoints across images. A 16×16 neighborhood of the keypoint is used for defining the
descriptor of that key-point. This 16×16 neighborhood is divided into sub-block. Each such sub-block is a
non-overlapping, contiguous, 4×4 neighborhood. Subsequently, for each sub-block, an 8 bin orientation is
created similarly as discussed in Orientation Assignment. These 128 bin values (16 sub-blocks * 8 bins per
block) are represented as a vector to generate the keypoint descriptor.
Example: SIFT detector in Python
Running the following script in the same directory with a file named “geeks.jpg” generates the “image-with-
keypoints.jpg” which contains the interest points, detected using the SIFT module in OpenCV, marked using
circular overlays.
SIFT’s patent has expired in March 2020. in versions > 4.4, the detector init command has changed to
cv2.SIFT_create().
pip install opencv-contrib-python>=4.4
Below is the implementation:
• Python3

# Important NOTE: Use opencv >=4.4


import cv2

# Loading the image


img = cv2.imread('geeks.jpg')

# Converting image to grayscale


gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Applying SIFT detector


sift = cv.SIFT_create()
kp = sift.detect(gray, None)

# Marking the keypoint on the image using circles


img=cv2.drawKeypoints(gray ,
kp ,
img ,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imwrite('image-with-keypoints.jpg', img)

Output:

The image on left is the original, the image on right shows the various highlighted interest points on the image
Experiment No 9
Write a program to detect the specific objects in an image using HOG.

Understanding HOG
The concept behind the HOG algorithm is to compute the distribution of gradient orientations in
localized portions of an image. HOG operates on a window, which is a region of fixed pixel size
on the image. A window is divided into small spatial regions, known as a block, and a block is
further divided into multiple cells. HOG calculates the gradient magnitude and orientation within
each cell, and creates a histogram of gradient orientations. Then the histograms within the
same block are concatenated.
Gradient measures how a pixel’s color intensity compares to its neighbors. The more drastic it
changes, the higher the magnitude. The orientation tells which direction is the steepest
gradient. Usually, this is applied on a single-channel image (i.e., grayscale), and each pixel can
have its own gradient. HOG gathers all gradients from a block and puts them into a histogram.

The clever way of making a histogram in HOG is that the bins in a histogram are determined by
the angle, but the value is interpolated between the closest bins. For example, if the bins are
assigned values 0, 20, 40, and so on while the gradient was 10 at angle 30, a value of 5 was
added to bins of 20 and 40. This way, HOG can effectively capture the texture and shape of
objects within the image.

HOG is particularly effective for detecting objects with distinguishable textures and patterns,
making it a popular choice for tasks such as pedestrian detection and other forms of object
recognition. With its ability to capture the distribution of gradient orientations, HOG provides a
robust representation invariant to variations in lighting conditions and shadows.

Computing HOG in OpenCV


OpenCV provides a straightforward method to compute the HOG descriptor, making it easily
accessible for developers and researchers. Let’s take a look at a basic example of how to
compute HOG in OpenCV:
1 import cv2
2
3 # Load the image and convert to grayscale
4 img = cv2.imread('image.jpg')
5 gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
6
7 # define each block as 4x4 cells of 64x64 pixels each
8 cell_size = (128, 128) # h x w in pixels
9 block_size = (4, 4) # h x w in cells
10win_size = (8, 6) # h x w in cells
11
12nbins = 9 # number of orientation bins
13img_size = img.shape[:2] # h x w in pixels
14
15# create a HOG object
16hog = cv2.HOGDescriptor(
17 _winSize=(win_size[1] * cell_size[1],
18 win_size[0] * cell_size[0]),
19 _blockSize=(block_size[1] * cell_size[1],
20 block_size[0] * cell_size[0]),
21 _blockStride=(cell_size[1], cell_size[0]),
22 _cellSize=(cell_size[1], cell_size[0]),
23 _nbins=nbins
24)
25n_cells = (img_size[0] // cell_size[0], img_size[1] // cell_size[1])
26
27# find features as a 1xN vector, then reshape into spatial hierarchy
28hog_feats = hog.compute(img)
29hog_feats = hog_feats.reshape(
30 n_cells[1] - win_size[1] + 1,
31 n_cells[0] - win_size[0] + 1,
32 win_size[1] - block_size[1] + 1,
33 win_size[0] - block_size[0] + 1,
34 block_size[1],
35 block_size[0],
36 nbins)
37print(hog_feats.shape)
HOG computes features for one window at a time. There are multiple blocks in a window. In a
block, there are multiple “cells”. See the following illustration:
Assume this entire picture is one window. A window is divided into cells (green grids), and several cells are combined into

one block (red and blue boxes). There are many overlapping blocks in one window, but all blocks are the same size.
Each cell is of a fixed size. In the above, you used 64×64 pixels in a cell. Each block has an
equal number of cells. In the above, you used 4×4 cells in a block. Also, there is equal number
of cells in a window; you used 8×6 cells above. However, we are not dividing an image into
blocks or windows when we compute HOG. But instead,

1. Consider a window as a sliding window on the image, in which the sliding window’s
stride size is the size of one cell, i.e., it slides across one cell at a time
2. We divide the window into cells of fixed size
3. We set up the second sliding window that matches the block size and scan the
window. It slides across one cell at a time
4. Within a block, HOG is computed from each cell
The returned HOG is a vector for the entire image. In the code above, you reshaped it to make
it clear the hierarchy of windows, blocks, cells, and histogram bins. For
example, hog_feats[i][j] corresponds to the window (in numpy slicing syntax):
1img[n_cells[1]*i : n_cells[1]*i+(n_cells[1]*win_size[1]),
2 n_cells[0]*j : n_cells[0]*j+(n_cells[0]*win_size[0])]
Or, equivalently, the window with the cell (i,j) at the top left corner.
A sliding window is a common technique in object detection because you cannot be sure a
particular object lies exactly in a grid cell. Making smaller cells but larger windows is a better
way to catch the object than just seeing a part of it. However, there’s a limitation: An object
larger than the window will be missed. Also, an object too small may be dwarfed by other
elements in the window.

Usually, you have some downstream tasks associated with HOG, such as running an SVM
classifier on the HOG features for object detection. In this case, you may want to reshape the
HOG output into vectors of the entire block rather than in the hierarchy of each cell like above.

Using HOG for People Detection


The feature extraction technique in the code above is useful if you want to get the raw feature
vectors for other purposes. But for some common tasks, OpenCV comes with pre-trained
machine learning models for your disposal without much effort.

A photo is used as an example to detect people using HOG.

This is a picture of people crossing a street. OpenCV has a “people detector” in HOG that was
trained on a 64×128 pixel window size. Using it to detect people in a photo is surprisingly
simple:
1 import cv2
2
3 # Load the image and convert it to grayscale
4 img = cv2.imread('people.jpg')
5
6 hog = cv2.HOGDescriptor()
7 hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
8
9 # Detect people in the image
10locations, confidence = hog.detectMultiScale(img)
11
12# Draw rectangles around the detected people
13for (x, y, w, h) in locations:
14 cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)
15
16# Display the image with detected people
17cv2.imshow('People', img)
18cv2.waitKey(0)
19cv2.destroyAllWindows()
In the above, you created a HOG descriptor with the parameters
from cv2.HOGDescriptor_getDefaultPeopleDetector() will initialize an SVM classifier to
detect a particular object, which in this case is people.
You call the descriptor on an image and run the SVM in one pipeline
using hog.detectMultiScale(img), which returns the bounding boxes for each object detected.
While the window size is fixed, this detection function will resize the image in multiple scales to
find the best detection result. Even so, the bounding boxes returned are not tight. The code
above also annotates the people detected by marking the bounding box on the image. You may
further filter the result using the confidence score reported by the detector. Some filtering
algorithms, such as non-maximum suppression, may be appropriate but are not discussed here.
The following is the output:
Bounding box as produced by the people detector using HOG in OpenCV
You can see such detectors can find people only if the full body is visible. The output has false
positives (non-people detected) and false negatives (people not detected). Using it to count all
people in a crowd scene would be challenging. But it is a good start to see how easily you can
get something done using OpenCV.

Unfortunately, there are not any detectors that come with OpenCV other than people. But you
can train your own SVM or other models using the HOG as feature vectors. Facilitating a
machine learning model is the key point of extracting feature vectors from an image.
Experiment No 10
Implementation of object detection using OpenCV

In this part, we will write the Python programs to do the object detection and
understand the implementation of it. We will use the following image in our Python
program to perform the object detection on it:

Opening the Image

We will first open the image given above and create the environment of the picture to
show it in the output. Let's first look at an example program to understand the
implementation, and then we will look at the explanation part.

Example 1: Opening the image using OpenCV and matplotlib library in a Python
program:

1. # Import OpenCV module


2. import cv2
3. # Import pyplot from matplotlib as pltd
4. from matplotlib import pyplot as pltd
5. # Opening the image from files
6. imaging = cv2.imread("opencv-od.png")
7. # Altering properties of image with cv2
8. img_gray = cv2.cvtColor(imaging, cv2.COLOR_BGR2GRAY)
9. imaging_rgb = cv2.cvtColor(imaging, cv2.COLOR_BGR2RGB)
10. # Plotting image with subplot() from plt
11. pltd.subplot(1, 1, 1)
12. # Displaying image in the output
13. pltd.imshow(imaging_rgb)
14. pltd.show()
Output:

Explanation:

First, we have imported the OpenCV (as cv2) and matplotlib (as plt) libraries into the
program to use their functions in the code. After that, we have opened the image file
using the imread() function of cv2.

Then, we have defined the properties for the image we opened in the program using
the cv2 functions. Then, we subplot the image using the subplot() function of plt and
giving parameters in it. In last, we have used the imshow() and show() function of the plt
module to show the image in the output.

Advertisement
As we can see in the output, the image is displayed as a result of the program, and its
borders have been sub-plotted.

Recognition or object detection in the image

Now, we will use the detectMultiScale() in the program to detect the object present in
the image. Following is the syntax for using detectMultiScale() function in the code:

1. found = xml_data.detectMultiScale(img_gray,
2. minSize = (30, 30))

We will use a condition statement with this function in the program to check if any
object from the image is detected or not and highlight the detected part. Let's
understand the implementation of object detection in the image through an example
program.

Example 2: Object detection in the image using the detectMultiScale() in the following
Python program:

1. # Import OpenCV module


2. import cv2
3. # Import pyplot from matplotlib as plt
4. from matplotlib import pyplot as pltd
5. # Opening the image from files
6. imaging = cv2.imread("opencv-od.png")
7. # Altering properties of image with cv2
8. imaging_gray = cv2.cvtColor(imaging, cv2.COLOR_BGR2GRAY)
9. imaging_rgb = cv2.cvtColor(imaging, cv2.COLOR_BGR2RGB)
10. # Importing Haar cascade classifier xml data
11. xml_data = cv2.CascadeClassifier('XML-data.xml')
12. # Detecting object in the image with Haar cascade classifier
13. detecting = xml_data.detectMultiScale(imaging_gray, minSize = (30, 30))
14. # Amount of object detected
15. amountDetecting = len(detecting)
16. # Using if condition to highlight the object detected
17. if amountDetecting != 0:
18. for (a, b, width, height) in detecting:
19. cv2.rectangle(imaging_rgb, (a, b), # Highlighting detected object with rectan
gle
20. (a + height, b + width),
21. (0, 275, 0), 9)
22. # Plotting image with subplot() from plt
23. pltd.subplot(1, 1, 1)
24. # Displaying image in the output
25. pltd.imshow(imaging_rgb)
26. pltd.show()

Output:
Explanation:

After opening the image in the program, we have imported the cascade classifier XML
file into the program. Then, we used the detectMultiScale() function with the imported
cascade file to detect the object present in the image or not.

We used if condition in the program to check that object is detected or not, and if the
object is detected, we have highlighted the detected object part using for loop with cv2
functions. After highlighting the detected object part in the image, we have displayed
the processed image using the plt show() and imshow() function.
Experiment No 11
Implementation of Face Recognition using OpenCV

Approach/Algorithms used for Face Detection


1. This project uses LBPH (Local Binary Patterns Histograms) Algorithm to detect faces.
It labels the pixels of an image by thresholding the neighborhood of each pixel and
considers the result as a binary number.
2. LBPH uses 4 parameters :
(i) Radius: the radius is used to build the circular local binary pattern and represents
the radius around the
central pixel.
(ii) Neighbors : the number of sample points to build the circular local binary pattern.
(iii) Grid X : the number of cells in the horizontal direction.
(iv) Grid Y : the number of cells in the vertical direction.
3. The model built is trained with the faces with tag given to them, and later on, the
machine is given a test data and machine decides the correct label for it.
Face Detection using Python
Step 1: Setup Your Google Colab Environment
• Open Google Colab: Go to Google Colab.
• Create a new notebook: Click on File -> New Notebook.
• Install OpenCV: Since Colab doesn’t have OpenCV pre-installed, run the following
command to install it.
!pip install opencv-python opencv-contrib-python
Step 2:Upload Haarcascade File
The code relies on haarcascade_frontalface_default.xml to detect faces. You need to upload this
file. Download the file from this link.
In Colab, upload it by clicking on the “Files” icon on the left sidebar and selecting “Upload.”
Step 3:Code for Face Detection (First Part)
Now you can run the first part of the code for capturing face images and saving them to a
dataset. Copy and paste the following code into a Colab cell.

import cv2
import numpy as np
# Haarcascade file path (make sure this file is uploaded)
haar_file = 'haarcascade_frontalface_default.xml'
# Load the image from your uploaded file
img_path = '/content/sample_image.jpg' # Update with your image path
image = cv2.imread(r'C:\Users\Dr. G Mandal\Desktop\1.jpg')
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_eye.xml')
if image is None:
print("Image not loaded correctly.")
else:
# Convert the image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Load Haarcascade for face detection


face_cascade = cv2.CascadeClassifier(haar_file)

# Detect faces
faces = face_cascade.detectMultiScale(gray, 1.3, 5)

# Draw rectangles around faces


for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 0, 0), 2)

# Show the image with detected faces


cv2.imshow('Detected Faces', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output:

You might also like