Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
27 views

Pattern Recognition Assignments

The document discusses performing sentiment analysis on movie reviews from the IMDB dataset. It describes sentiment analysis and different approaches like rule-based, machine learning and neural networks. The objectives are to learn and implement sentiment analysis on the IMDB dataset.

Uploaded by

Jahan Chaware
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Pattern Recognition Assignments

The document discusses performing sentiment analysis on movie reviews from the IMDB dataset. It describes sentiment analysis and different approaches like rule-based, machine learning and neural networks. The objectives are to learn and implement sentiment analysis on the IMDB dataset.

Uploaded by

Jahan Chaware
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Pattern Recognition Assignments

Assignment 2
Title : Face Recognition Using PCA and multiclass LDA
Problem Statement: Face Recognition Using PCA and multiclass LDA.
Objectives: To learn and understand Face recognition using PCA and LDA.
Outcomes : We will be able to implement Face Recognition Using PCA and multiclass LDA.
S/W & H/W requirements:
Jupyter Notebook, Python, 64-bit open-source LINUX.
Theory :
PCA:
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal Components.
PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the power
allocation in various communication channels. It is a feature extraction technique, so it contains
the important variables and drops the least important variable.
Steps in PCA:

1. Getting the dataset


2. Standardizing the data
3. Calculating the Covariance of Z
4. Calculating the Eigen Values and Eigen Vectors
5. Sorting the Eigen Vectors
6. Calculating the new features Or Principal Components
7. Remove less or unimportant features from the new dataset.

LDA:

Linear Discriminant analysis is one of the most popular dimensionality reduction techniques
used for supervised classification problems in machine learning. It is also considered a pre-
processing step for modeling differences in ML and applications of pattern classification.
Whenever there is a requirement to separate two or more classes having multiple features
efficiently, the Linear Discriminant Analysis model is considered the most common technique to
solve such classification problems. For e.g., if we have two classes with multiple features and need
to separate them efficiently. When we classify them using a single feature, then it may show
overlapping.

To overcome the overlapping issue in the classification process, we must increase the number of
features regularly.

Difference Between PCA and LCA:

Below are some basic differences between LDA and PCA:

o PCA is an unsupervised algorithm that does not care about classes and labels and only aims
to find the principal components to maximize the variance in the given dataset. At the same
time, LDA is a supervised algorithm that aims to find the linear discriminants to represent
the axes that maximize separation between different classes of data.
o LDA is much more suitable for multi-class classification tasks compared to PCA.
However, PCA is assumed to be as good performer for a comparatively small sample size.
o Both LDA and PCA are used as dimensionality reduction techniques, where PCA is first
followed by LDA.

Conclusion:
Hence we have successfully implemented Face recognition using PCA and LCA.

Code: https://github.com/bellatrix007/Face-Recognition/tree/master
Assignment Number 3
Title: Fruit shape recognition using Eigen Faces and Fisher Faces
Problem Statement: Fruit shape recognition using Eigen Faces and Fisher Faces
Objectives: To learn and understand fruit shape recognition using Eigen Faces and Fisher Faces.
Outcomes: We will be able to implement Fruit shape recognition using Eigen Faces and Fisher
Faces
S/W & H/W requirements:
Jupyter Notebook, Python, 64-bit open-source LINUX.
Theory :
Eigen Faces:
Eigenfaces is a representation learning method in computer vision focusing on facial images. The
goal of the method is to represent an image that depicts the face of a person as a linear
combination of a set of basic images that are called eigenfaces. Suppose all H x W images
representing a human face lie in a manifold in RH x W. If we can find the optimal eigenfaces, we
can represent any facial image as a linear combination.
Steps Involves in Eigen Faces Face Recognition:
1. Collect a Training Dataset
2. Preprocess Images
3. Vectorize Images
4. Compute the Mean Face
5. Compute Eigenfaces
6. Select Top Eigenfaces
7. Project Faces onto Eigenspace
8. Represent Faces with Eigenface Coefficients
9. Face Recognition
10. Classification
FisherFaces Face Recognizer:
This algorithm is an improved version of Eigenfaces face recognizer. Eigenfaces face recognizer
looks at all the training faces of all the persons at once and finds principal components from all
of them combined. By capturing principal components from all the of them combined you are
not focusing on the features that discriminate one person from the other but the features that
represent all the people in the training data as a whole. This approach has drawbacks, for
example, images with sharp changes (like light changes which is not a useful feature at all) may
dominate the rest of the images and you may end up with features that are from external source
like light and are not useful for discrimination at all. In the end, your principal components will
represent light changes and not the actual face features. Fisherfaces algorithm, instead of
extracting useful features that represent all the faces of all the persons, it extracts useful features
that discriminate one person from the others. This way features of one person do not dominate
over the others and you have the features that discriminate one person from the others.
Aspect Eigenfaces Fisherfaces
Dimensionality Principal Component Analysis Linear Discriminant Analysis (LDA)
Reduction (PCA)
Technique

Focus Global structure Discriminative power (global and


local structure)
Handling Variations Might not handle Effective at handling variations due
variations well (lighting, pose, to supervised learning and
expressions) discriminative focus
Learning Type Unsupervised Supervised

Discriminative Less discriminative due to global Maximizes discriminative power


Power focus between classes
Representation Linear transformations Can use non-linear transformations
for better discrimination
Computation Simpler and computationally Might be more computationally
efficient intensive, especially for large datasets

Conclusion:
Hence we have successfully implemented fruit shape recognition using Eigen Faces and Fisher
Faces.
Code: https://github.com/informramiz/opencv-face-recognition-python/tree/master
Assignment Number 4
Title: Perform sentiment analysis on the IMDB movie reviews dataset
Problem Statement: Perform sentiment analysis on the IMDB movie reviews dataset.
Objectives: To learn and understand the process of sentiment analysis.
Outcomes: We will be able to implement Sentiment Analysis on IMDB dataset.
S/W & H/W requirements:
Jupyter Notebook, Python, 64-bit open-source LINUX.
Theory :
Sentiment analysis is the process of classifying whether a block of text is positive, negative, or
neutral. The goal that Sentiment mining tries to gain is to analyze people’s opinions in a way that
can help businesses expand. It focuses not only on polarity (positive, negative & neutral) but also
on emotions (happy, sad, angry, etc.). It uses various Natural Language Processing algorithms
such as Rule-based, Automatic, and Hybrid.
Let’s consider a scenario, if we want to analyze whether a product is satisfying customer
requirements or is there a need for this product in the market. We can use sentiment analysis to
monitor that product’s reviews. Sentiment analysis is also efficient to use when there is a large
set of unstructured data, and we want to classify that data by automatically tagging it. Net
Promoter Score (NPS) surveys are used extensively to gain knowledge of how a customer
perceives a product or service. Sentiment analysis also gained popularity due to its feature to
process large volumes of NPS responses and obtain consistent results quickly.
Approaches to Sentiment Analysis
There are three main approaches used:
1. Rule-based : Over here, the lexicon method, tokenization, and parsing come in the rule-
based. The approach is that counts the number of positive and negative words in the
given dataset. If the number of positive words is greater than the number of negative
words then the sentiment is positive else vice-versa.
2. Machine Learning : This approach works on the machine learning technique. Firstly, the
datasets are trained, and predictive analysis is done. The next process is the extraction of
words from the text is done. This text extraction can be done using different techniques
such as Naive Bayes, Support Vector machines, hidden Markov model, and conditional
random fields like this machine learning techniques are used.
3. Neural Network : In the last few years neural networks have evolved at a very rate. It
involves using artificial neural networks, which are inspired by the structure of the human
brain, to classify text into positive, negative, or neutral sentiments. it has Recurrent neural
networks, Long short-term memory, Gated recurrent unit, etc. to process sequential data
like text.
4. Hybrid Approach : It is the combination of two or more approaches i.e. rule-based and
Machine Learning approaches. The surplus is that the accuracy is high compared to the
other two approaches.
Conclusion:
Hence we have performed Sentiment analysis on IMDB movie review dataset.
Code: https://github.com/Ankit152/IMDB-sentiment-
analysis/blob/master/imdbSentimentAnalysis.ipynb
Assignment Number 6
Title: Perform image segmentation on the Berkley Segmentation dataset.
Problem Statement: Perform image segmentation on the Berkley Segmentation dataset.
Objectives: To learn and understand image segmentation.
Outcomes: We will be able to implement image segmentation on the Berkley Segmentation
dataset.

S/W & H/W requirements:


Jupyter Notebook, Python, 64-bit open-source LINUX.
Theory :
Image Segmentation:
Image segmentation is a computer vision technique that partitions a digital image into discrete
groups of pixels—image segments—to inform object detection and related tasks. By parsing an
image’s complex visual data into specifically shaped segments, image segmentation enables
faster, more advanced image processing. Image segmentation techniques range from simple,
intuitive heuristic analysis to the cutting-edge implementation of deep learning. Conventional
image segmentation algorithms process high-level visual features of each pixel, like color or
brightness, to identify object boundaries and background regions. Machine learning, leveraging
annotated datasets, is used to train models to accurately classify the specific types of objects and
regions an image contains. Being a highly versatile and practical method of computer vision,
image segmentation has a wide variety of artificial intelligence use cases, from aiding diagnosis
in medical imaging to automating locomotion for robotics and self-driving cars to identifying
objects of interest in satellite images.
Image segmentation techniques include:
1. Similarity approach: Detects similarities between pixels to form a segment based on a
given threshold.
2. Discontinuity approach: Relies on the discontinuity of pixel intensity values of the image.
3. Edge-based segmentation: Identifies the edges of various objects in a given image.
4. Threshold-based segmentation: Divides pixels based on their intensity relative to a given
value or threshold.
Dataset Description :
The dataset consists of 500 natural images, ground-truth human annotations and benchmarking
code. The data is explicitly separated into disjoint train, validation and test subsets. The dataset is
an extension of the BSDS300, where the original 300 images are used for training / validation
and 200 fresh images, together with human annotations, are added for testing. Each image was
segmented by five different subjects on average.
Conclusion:
Hence we have performed image segmentation on the Berkley Segmentation dataset.
Code: https://github.com/alyswidan/Image_Segmentation

You might also like