Computer Vision-Unit 1 Notes
Computer Vision-Unit 1 Notes
UNIT-1
INTRODUCTION TO IMAGE FORMATION AND PROCESSING
Computer Vision - Geometric primitives and transformations – Photometric
image formation-The digital camera-Point operators- Linear filtering - More
neighborhood operators - Fourier transforms - Pyramids and wavelets -
Geometric transformations - Global optimization.
1. Computer Vision:
Computer vision is a multidisciplinary field that enables machines to interpret and make
decisions based on visual data. It involves the development of algorithms and systems
that allow computers to gain high-level understanding from digital images or videos. The
goal of computer vision is to replicate and improve upon human vision capabilities,
enabling machines to recognize and understand visual information.
2. Object Detection: Locating and classifying multiple objects within an image or video
stream.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
8. 3DReconstruction:Creating
Computer vision applications are diverse and found in various fields, including
healthcare (medical image analysis), autonomous vehicles, surveillance,
augmentedreality,robotics,industrialautomation,andmore.Advancesindeep learning,
especially convolutional neural networks (CNNs), have significantly contributed to the
progress and success of computer vision tasks by enabling efficient feature learning from
large datasets.
Geometric primitives and transformations are fundamental concepts in computer graphics and computer vision.
They form the basis for representing and manipulating visual elements in both 2D and 3D spaces. Let's explore
each of these concepts:
Geometric Primitives:
1. Points: Represented by coordinates (x, y) in 2D or (x, y, z) in 3D space.
2. Lines and Line Segments: Defined by two points or a point and a direction vector.
3. Polygons: Closed shapes with straight sides. Triangles, quadrilaterals, and other polygons are common
geometric primitives.
4. Circles and Ellipses: Defined by a center point and radii (or axes in the case of ellipses).
5. Curves: Bézier curves, spline curves, and other parametric curves are used to represent smooth shapes.
Geometric Transformations:
Geometric transformations involve modifying the position, orientation, and scale of geometric primitives.
Common transformations include
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
4. Shearing: Distorts the shape of an object by stretching or compressing along one or more axes.
Applications:
Computer Graphics: Geometric primitives and transformations are fundamental for rendering 2D and 3D
graphics in applications such as video games, simulations, and virtual reality.
Computer-Aided Design (CAD): Used for designing and modeling objects in engineering and architecture.
Computer Vision: Geometric transformations are applied to align and process images, correct distortions, and
perform other tasks in image analysis.
Robotics: Essential for robot navigation, motion planning, and spatial reasoning.
Understanding geometric primitives and transformations is crucial for creating realistic and visually appealing
computer-generated images, as well as for solving various problems in computer vision and robotics.
Photometric image formation refers to the process by which light interacts with surfaces and is captured by a
camera, resulting in the creation of a digital image. This process involves various factors related to the properties
of light, the surfaces of objects, and the characteristics of the imaging system. Understanding photometric Image
formation is crucial in computer vision, computer graphics, and image processing.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Illumination:
- Ambient Light: The overall illumination of a scene that comes from all directions.
- Directional Light: Light coming from a specific direction, which can create highlights and shadows.
Reflection:
- Diffuse Reflection: Light that is scattered in various directions by rough surfaces.
- Specular Reflection: Light that reflects off smooth surfaces in a concentrated direction, creating highlights.
Shading:
- Lambertian Shading: A model that assumes diffuse reflection and constant shading across a surface.
- Phong Shading: A more sophisticated model that considers specular reflection, creating more realistic
highlights.
Surface Properties:
- Reflectance Properties: Material characteristics that determine how light is reflected (e.g., diffuse and specular
reflectance).
- Albedo: The inherent reflectivity of a surface, representing the fraction of incident light that is reflected.
Lighting Models:
- Phong Lighting Model: Combines diffuse and specular reflection components to model lighting.
- Blinn-Phong Model: Similar to the Phong model but computationally more efficient.
Shadows:
- Cast Shadows: Darkened areas on surfaces where light is blocked by other objects.
- Self Shadows: Shadows cast by parts of an object onto itself.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Cameras:
- Camera Exposure: The amount of light allowed to reach the camera sensor or film.
- Camera Response Function: Describes how a camera responds to light of different intensities.
A digital camera is an electronic device that captures and stores digital images. It differs from traditional film
cameras in that it uses electronic sensors to record images rather than photographic film. Digital cameras have
become widespread due to their convenience, ability to instantly review images, and ease of sharing and storing
photos digitally. Here are key components and concepts related to digital cameras:
Image Sensor:
- Digital cameras use image sensors (such as CCD or CMOS) to convert light into electrical signals.
- The sensor captures the image by measuring the intensity of light at each pixel location.
Lens:
- The lens focuses light onto the image sensor.
- Zoom lenses allow users to adjust the focal length, providing optical zoom.
Aperture:
- The aperture is an adjustable opening in the lens that controls the amount of light entering the camera.
Shutter:
- The shutter mechanism controls the duration of light exposure to the image sensor.
- Fast shutter speeds freeze motion, while slower speeds create motion blur.
Image Processor:
- Digital cameras include a built-in image processor to convert raw sensor data into a viewable image.
- Image processing algorithms may enhance color, sharpness, and reduce noise.
Memory Card:
- Digital images are stored on removable memory cards, such as SD or CF cards.
- Memory cards provide a convenient and portable way to store and transfer images.
White Balance:
- White balance settings adjust the color temperature of the captured image to match different lighting
conditions.
Connectivity:
- USB, HDMI, or wireless connectivity allows users to transfer images to computers, share online, or connect to
other devices.
Battery:
- Digital cameras are powered by rechargeable batteries, providing the necessary energy for capturing and
processing images.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
5. Point operators:
Point operators, also known as point processing or pixel-wise operations, are basic image processing operations that
operate on individual pixels independently. These operations are applied to each pixel in an image without considering the
values of neighboring pixels. Point operators typically involve mathematical operations or functions that transform the
pixel values, resulting in changes to the image's appearance. Here are some common point operators:
Brightness Adjustment:
- Addition/Subtraction: Increase or decrease the intensity of all pixels by adding or subtracting a constant value.
- Multiplication/Division: Scale the intensity values by multiplying or dividing them by a constant factor.
Contrast Adjustment:
- Linear Contrast Stretching: Rescale the intensity values to cover the full dynamic range.
- Histogram Equalization: Adjust the distribution of pixel intensities to enhance contrast.
Gamma Correction:
- Adjust the gamma value to control the overall brightness and contrast of an image.
Thresholding:
- Convert a grayscale image to binary by setting a threshold value. Pixels with values above the threshold become white,
and those below become black.
Bit-plane Slicing:
- Decompose an image into its binary representation by considering individual bits.
Color Mapping:
- Apply color transformations to change the color balance or convert between color spaces (e.g., RGB to grayscale).
Inversion:
- Invert the intensity values of pixels, turning bright areas dark and vice versa.
Image Arithmetic:
- Perform arithmetic operations between pixels of two images, such as addition, subtraction, multiplication, or division.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Point operators are foundational in image processing and form the basis for more complex operations. They are
often used in combination to achieve desired enhancements or modifications to images. These operations are
computationally efficient, as they can be applied independently to each pixel, making them suitable for real-time
applications and basic image manipulation tasks.
It's important to note that while point operators are powerful for certain tasks, more advanced image processing
techniques, such as filtering and convolution, involve considering the values of neighboring pixels and are
applied to local image regions.
Linear filtering:
Linear filtering is a fundamental concept in image processing that involves applying a linear operator to an
image. The linear filter operates on each pixel in the image by combining its value with the values of its
neighboring pixels according to a predefined convolution kernel or matrix. The convolution operation is a
mathematical operation that computes the weighted sum of pixel values in the image, producing a new value for
the center pixel.
Where:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Edge Detection:
- Sobel filter: Emphasizes edges by computing gradients in the x and y directions.
- Prewitt filter: Similar to Sobel but uses a different kernel for gradient computation.
Sharpening:
- Laplacian filter: Enhances high-frequency components to highlight edges.
- High-pass filter: Emphasizes details by subtracting a blurred version of the image.
Embossing:
- Applies an embossing effect by highlighting changes in intensity.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Linear filtering is a versatile technique and forms the basis for more advanced image processing operations. The
convolution operation can be efficiently implemented using convolutional neural networks (CNNs) in deep
learning, where filters are learned during the training process to perform tasks such as image recognition,
segmentation, and denoising. The choice of filter kernel and parameters determines the specific effect achieved
through linear filtering.
6. More neighborhood operators :
Neighborhood operators in image processing involve the consideration of pixel values in the vicinity of a target
pixel, usually within a defined neighborhood or window. Unlike point operators that operate on individual pixels,
neighborhood operators take into account the local structure of the image. Here are some common neighborhood
operators:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Median Filter:
- Computes the median value of pixel intensities within a local neighborhood.
- Effective for removing salt-and-pepper noise while preserving edges.
Gaussian Filter:
- Applies a weighted average to pixel values using a Gaussian distribution.
- Used for blurring and smoothing, with the advantage of preserving edges.
Bilateral Filter:
- Combines spatial and intensity information to smooth images while preserving edges.
- Uses two Gaussian distributions, one for spatial proximity and one for intensity similarity.
Anisotropic Diffusion:
- Reduces noise while preserving edges by iteratively diffusing intensity values along edges.
- Particularly useful for images with strong edges.
Morphological Operators:
- Dilation: Expands bright regions by considering the maximum pixel value in a neighborhood.
Erosion:
- Contracts bright regions by considering the minimum pixel value in a neighborhood.
- Used for operations like noise reduction, object segmentation, and shape analysis.
Homomorphic Filtering:
- Adjusts image intensity by separating the image into illumination and reflectance components.
- Useful for enhancing images with non-uniform illumination.
These neighborhood operators play a crucial role in image enhancement, denoising, edge detection, and other
image processing tasks. The choice of operator depends on the specific characteristics of the image and the
desired outcome.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
7. Fourier transforms:
Fourier transforms play a significant role in computer vision for analyzing and processing images. They are used
to decompose an image into its frequency components, providing valuable information for tasks such as image
filtering, feature extraction, and pattern recognition. Here are some ways Fourier transforms are employed in
computer vision:
Frequency Analysis:
- Fourier transforms help in understanding the frequency content of an image. High-frequency components
correspond to edges and fine details, while low-frequency components represent smooth regions.
Image Filtering:
Filtering in the frequency domain allows for efficient operations such as blurring or sharpening. Low-pass filters
remove high-frequency noise, while high-pass filters enhance edges and fine details.
Image Enhancement:
- Adjusting the amplitude of specific frequency components can enhance or suppress certain features in an
image. This is commonly used in image enhancement techniques.
Texture Analysis:
- Fourier analysis is useful in characterizing and classifying textures based on their frequency characteristics. It
helps distinguish between textures with different patterns.
Pattern Recognition:
- Fourier descriptors, which capture shape information, are used for representing and recognizing objects in
images. They provide a compact representation of shape by capturing the dominant frequency components.
Image Compression:
- Transform-based image compression, such as JPEG compression, utilizes Fourier transforms to transform
image data into the frequency domain. This allows for efficient quantization and coding of frequency
components.
Image Registration:
- Fourier transforms are used in image registration, aligning images or transforming them to a common
coordinate system. Cross-correlation in the frequency domain is often employed for this purpose.
Homomorphic Filtering:
- Homomorphic filtering, which involves transforming an image to a logarithmic domain using Fourier
transforms, is used in applications such as document analysis and enhancement.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Image Reconstruction:
- Fourier transforms are involved in techniques like computed tomography (CT) or magnetic resonance imaging
(MRI) for reconstructing images from their projections.
The efficient computation of Fourier transforms, particularly through the use of the Fast Fourier Transform (FFT)
algorithm, has made these techniques computationally feasible for real-time applications in computer vision. The ability to
analyze images in the frequency domain provides valuable insights and contributes to the development of advanced image
processing techniques.
Image Pyramids:
Image pyramids are a series of images representing the same scene but at different resolutions. There are two main types of
image pyramids:
Gaussian Pyramid:
- Created by repeatedly applying Gaussian smoothing and downsampling to an image.
- At each level, the image is smoothed to remove high-frequency information, and then it is subsampled to reduce its size.
- Useful for tasks like image blending, image matching, and coarse-to-fine image processing.
Laplacian Pyramid:
- Derived from the Gaussian pyramid.
- Each level of the Laplacian pyramid is obtained by subtracting the expanded version of the higher level Gaussian pyramid
from the original image.
- Useful for image compression and coding, where the Laplacian pyramid represents the residual information not captured
by the Gaussian pyramid.
Image pyramids are especially useful for creating multi-scale representations of images, which can be beneficial for various
computer vision tasks.
Wavelets:
Wavelets are mathematical functions that can be used to analyze signals and images. Wavelet transforms provide a multi-
resolution analysis by decomposing an image into approximation (low-frequency) and detail (high-frequency) components.
Key concepts include:
Wavelet Transform:
- The wavelet transform decomposes an image into different frequency components by convolving the image with wavelet
functions.
- The result is a set of coefficients that represent the image at various scales and orientations.
Multi-resolution Analysis:
- Wavelet transforms offer a multi-resolution analysis, allowing the representation of an image at different scales.
- The approximation coefficients capture the low-frequency information, while detail coefficients capture high-frequency
information.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Haar Wavelet:
- The Haar wavelet is a simple wavelet function used in basic wavelet transforms.
- It represents changes in intensity between adjacent pixels.
Wavelet Compression:
- Wavelet-based image compression techniques, such as JPEG2000, utilize wavelet transforms to efficiently represent
image data in both spatial and frequency domains.
Image Denoising:
- Wavelet-based thresholding techniques can be applied to denoise images by thresholding the wavelet coefficients.
Edge Detection:
- Wavelet transforms can be used for edge detection by analyzing the high-frequency components of the image.
Both pyramids and wavelets offer advantages in multi-resolution analysis, but they differ in terms of their representation
and construction. Pyramids use a hierarchical structure of smoothed and subsampled images, while wavelets use a
transform-based approach that decomposes the image into frequency components. The choice between pyramids and
wavelets often depends on the specific requirements of the image processing task at hand.
8. Geometric transformations :
Geometric transformations are operations that modify the spatial configuration of objects in a digital image. These
transformations are applied to change the position, orientation, scale, or shape of objects while preserving certain geometric
properties. Geometric transformations are commonly used in computer graphics, computer vision, and image processing.
Here are some fundamental geometric transformations:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
1. Translation:
- Description: Moves an object by a specified distance along the x and/or y axes.
- Transformation Matrix (2D):
2. Rotation:
● Description: Rotates an object by a specified angle about a fixed point.
● Transformation Matrix(2D):
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
3. Scaling:
● Description: Changes the size of an object by multiplying its coordinates by
scaling factors.
● Transformation Matrix(2D):
4. Shearing:
● Description: Distorts the shape of an object by varying its coordinates linearly.
● Transformation Matrix(2D):
5. Affine Transformation:
● Description:Combines translation, rotation, scaling, and shearing.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
● Transformation Matrix(2D):
6. Perspective Transformation:
● Description: Represents a perspective projection, useful for simulating three-
dimensional effects.
● Transformation Matrix(3D):
7. Projective Transformation:
● Description: Generalization of perspective transformation with additional control points.
● Transformation Matrix(3D):More complex than the perspective transformation matrix.
● Applications: Computer graphics, augmented reality.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
These transformations are crucial for various applications, including image manipulation, computer-aided design (CAD),
computer vision, and graphics rendering. Understanding and applying geometric transformations are fundamental skills in
computer science and engineering fields related to digital image processing.
9. Global optimization:
Global optimization is a branch of optimization that focuses on finding the global minimum or maximum of a
function over its entire feasible domain. Unlike local optimization, which aims to find the optimal solution
within a specific region, global optimization seeks the best possible solution across the entire search space.
Global optimization problems are often challenging due to the presence of multiple local optima or complex,
non-convex search spaces.
Here are key concepts and approaches related to global optimization:
Concepts:
Objective Function:
- The function to be minimized or maximized.
Feasible Domain:
- The set of input values (parameters) for which the objective function is defined.
Global Minimum/Maximum:
- The lowest or highest value of the objective function over the entire feasible domain.
Local Minimum/Maximum:
● A minimum or maximum within a specific region of the feasible domain.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Approaches:
Grid Search:
- Dividing the feasible domain into a grid and evaluating the objective function at each grid point to find the optimal
solution.
Random Search:
- Randomly sampling points in the feasible domain and evaluating the objective function to explore different regions.
Evolutionary Algorithms:
- Genetic algorithms, particle swarm optimization, and other evolutionary techniques use populations of solutions and
genetic operators to iteratively evolve toward the optimal solution.
Simulated Annealing:
- Inspired by the annealing process in metallurgy, simulated annealing gradually decreases the temperature to allow the
algorithm to escape local optima.
Genetic Algorithms:
- Inspired by biological evolution, genetic algorithms use mutation, crossover, and selection to evolve a population of
potential solutions.
Bayesian Optimization:
- Utilizes probabilistic models to model the objective function and guide the search toward promising regions.
Quasi-Newton Methods:
- Iterative optimization methods that use an approximation of the Hessian matrix to find the optimal solution efficiently.
Global optimization is applied in various fields, including engineering design, machine learning,
finance, and parameter tuning in algorithmic optimization. The choice of a specific global
optimization method depends on the characteristics of the objective function, the dimensionality
of the search space, and the available computational resources.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
B.Tech [AIML/DS]