Sentiment Analysis On Amazon Reviews Using Machine Learning
Sentiment Analysis On Amazon Reviews Using Machine Learning
MACHINE LEARNING
By
to
This is to certify that the work which is being presented in the project report titled
“Sentiment Analysis on Amazon Reviews using Machine Learning” in partial fulfilment of
the requirements for the award of the degree of B. Tech in Computer Science And
Engineering and submitted to the Department of Computer Science And Engineering,
Jaypee University of Information Technology, Waknaghat is an authentic record of work
carried out by “Ananya Joshi, 191218 and Vipasha Rana, 191226” during the period from
January 2022 to May 2022 under the supervision of Dr. Ekta Gandotra, Department of
Computer Science and Engineering, Jaypee University of Information Technology,
Waknaghat.
Ananya Joshi
(191218)
Vipasha Rana
(191226)
i
PLAGIARISM CERTIFICATE
ii
ACKNOWLEDGEMENT
Firstly, I express my heartiest thanks and gratefulness to almighty God for His divine blessing
makes it possible to complete the project work successfully.
I am really grateful and wish my profound indebtedness to Supervisor Dr. Ekta Gandotra
Associate Professor, Department of CSE Jaypee University of Information
Technology,Wakhnaghat. Deep Knowledge & keen interest of my supervisor in the field of
“Sentiment Analysis on Amazon Reviews using Machine Learning” to carry out this
project. His endless patience, scholarly guidance, continual encouragement, constant and
energetic supervision, constructive criticism, valuable advice, reading many inferior drafts
and correcting them at all stages have made it possible to complete this project.
Finally, I must acknowledge with due respect the constant support and patients of my parents.
Ananya Joshi
(191218)
Vipasha Rana
(191226
iii
TABLE OF CONTENT
List of Abbreviations v
List of Figures vi
List of Tables ix
Abstract x
Chapter-1 Introduction 1
Chapter-5 Conclusions 54
References 57
Appendices 60
iv
LIST OF ABBREVIATIONS
2. ML Machine Learning
3. DL Deep Learning
6. GHz Gigahertz
v
LIST OF FIGURES
vi
S. No. Figure No. Figure Caption
vii
S. No. Figure No. Figure Caption
55. 7.6 Generating the Review page link for all products
viii
LIST OF TABLES
1. I Literature Survey
ix
ABSTRACT
With the advent of social media, people are now more comfortable than ever to express their
thoughts, opinions, and emotions online. The proliferation of these comments, whether
positive or negative, makes it crucial to analyse them accurately in order to grasp the true
intentions of the writer. To achieve this, sentiment analysis is used to decipher the perspective
of the text. In our study, we propose a novel approach that takes into account the sentimental
aspects of the item being reviewed. To validate our approach, we utilized Amazon consumer
reviews, specifically the Amazon musical Instruments Reviews dataset collected from the
Kaggle repository by Eswar Chand. In this dataset, user ratings were initially detected in each
analysis, after which we conducted pre-processing operations, such as creating a sentiment
column, tokenization, reviewing text-punctuation cleaning, and eliminating stop-words to
extract meaningful information such as the positivity or negativity of the feedback. Our main
goal was to analyse this data on an aspect level, which would be highly beneficial to marketers
in comprehending consumer preferences and adapting their strategies accordingly.
Furthermore, we also provide insights into possible future work for text classification.
Ultimately, our study presents a new approach to sentiment analysis that can enhance our
understanding of online feedback and facilitate more effective marketing practices.
x
Chapter 01: INTRODUCTION
1.1 Introduction
Online stores such as Amazon provide a website where consumers can express
their opinions about various products. In fact, it has been established that
approximately 90% of consumers test different websites and channels to
determine the quality of their purchase before making a final decision. With
most people expressing their ideas, views, and opinions over social media, these
feedbacks or comments carry an emotion that requires careful analysis to extract
meaningful insights. With sentiment analysis, companies can quickly
understand the sentiment of their customers and use this information to make
1
more informed decisions that benefit both their business and their customers.
While sarcasm can often be detected in person, it may not be as evident in online
comments or headlines. This poses a challenge for computers, as they attempt
to identify sarcastic language through linguistic cues or context. Our team is
tackling this issue by developing a sarcasm detector using advanced machine
learning techniques through neural networks. The project involves analyzing a
collection of newspaper articles labeled as either sarcastic or non-sarcastic,
including current sarcastic renditions of events from The Onion. By utilizing
these resources, we aim to improve the accuracy of detecting sarcasm in text-
based communication.
The digitalization of the world has brought about significant changes, and one
of the most noticeable changes is the increasing popularity of eCommerce. With
the rise of eCommerce, customers now have access to a vast range of products
that are within their reach. Additionally, eCommerce websites enable customers
to express their thoughts and feelings about products. In fact, customers are
increasingly relying on the experiences of other customers when making
purchasing decisions. The opinions and feedback of others have a significant
impact on our purchasing decision-making processes. We ask for opinions and
experiences of others to benefit from their knowledge, hence the growing
importance of product reviews.
However, with the vast number of product reviews available online, it is almost
impossible for customers to read them all. Therefore, sentiment analysis plays
a crucial role in analyzing these reviews. Sentiment analysis uses natural
language processing and machine learning algorithms to determine the
emotional tone of text data. By analyzing the sentiment of customer reviews,
we can gain valuable insights into customers' opinions and experiences. This
information can then be used by companies to improve their products and
2
services to meet the needs and preferences of their customers better.
1.3 Objectives
The aim of customer reviews and ratings is to convey the writer's attitude
towards a product, which can be either positive, negative, or neutral. Some
individuals award products with four or five stars to express their complete
satisfaction, while others give one or two stars to convey their dissatisfaction.
This poses no challenge in sentiment analysis. However, some people award
three stars, despite expressing their satisfaction with the product, which can be
confusing for businesses and other customers who seek to understand their
genuine opinion. Therefore, analyzing reviews and comprehending customer
satisfaction becomes challenging for both businesses and customers.
Consequently, the three-star rating may not truly represent a neutral sentiment
since those who assign a 3-star rating to a product may not necessarily have a
balanced opinion between positive and negative.
Based on this premise, this research proposes the use of sentiment analysis to
predict the polarity of Amazon mobile phone dataset reviews. The three-star
rating will be considered neutral, with the intention of increasing the difficulty
of the study and measuring the efficacy of state-of-the-art NLP models, such as
3
BERT, in solving complex classification problems. Moreover, four machine-
learning models with diverse feature extraction techniques, namely Logistic
Regression, Naïve Bayes, Random Forest, and Bi-LSTM, will be utilized in this
research. Subsequently, the best-performing model will be analyzed to
investigate its sentiment classification. Finally, the top-performing model will
be retrained on the dataset with the neutral class removed, converting the
problem into a binary-classification task. The purpose is to assess the extent to
which this change affects the model's performance
4
1.5 Technical Requirements (Hardware)
Processor: Intel core i5 or above. 64-bit, quad-core, 2.5 GHz minimum per core
Ram: 4 GB or more
Hard disk: 10 GB of available space or more.
Display: Dual XGA (1024 x 768) or higher resolution monitors
Operating system: Windows
5
Chapter 02: Literature Survey
Various studies have been conducted to explore the use of sentiment analysis in
e-commerce. One such study by AlQahtani et al. [1] used machine learning
techniques to perform sentiment analysis on the Amazon Reviews Dataset. The
authors used various text preprocessing methods such as stop-word elimination,
tokenization, stemming, lemmatization, and POS tagging to convert the review
text into numerical representations. After training the models with different
machine learning algorithms, they selected the best performing model based on
multiclass classification analysis and retrained it for binary classification. The
study demonstrated that effective text preprocessing is essential in sentiment
analysis and can significantly affect the model's performance.
Another study by Aashutosh Bhatt et al. [2] also highlighted the importance of
text preprocessing in sentiment analysis. The authors removed stop words from
each review as the first step in their preprocessing process and added additional
steps based on their specific task at hand. These steps included removing
repeated letter strings, stemming, lowercasing, punctuation removal, and
tokenization. The authors demonstrated that text preprocessing techniques can
vary based on the dataset and the specific task and that a customized
preprocessing pipeline can improve model performance. The authors performed
tokenization and a spelling check, using the procedures outlined in reference
[3], to further refine the gathered data. In [4] K.S. Kumar et. al. performed
sentiment analysis on Online customer reviews and did Opinin Mining on the
same.
6
One critical aspect of sentiment analysis is the selection of the appropriate
classification algorithm. The study by Sanjay Dey et al. [5] compared the
performance of SVM and Naive Bayes classifiers for sentiment analysis of
Amazon product reviews. The study found that SVM outperformed Naive
Bayes in terms of accuracy and F-measure for both positive and negative
sentiments. Similarly, Rabnawaz Jansher [6] presented a machine learning
approach for sentiment analysis of Amazon product reviews, using features such
as bag-of-words and n-grams, combined with classification algorithms like
Naive Bayes and Support Vector Machines. The study reported an accuracy of
up to 85% for classifying reviews as positive, negative, or neutral.
In [11], NB was conducted at the phrase and review level of granularity, and
the relative frequency of each word was computed using the TF-IDF algorithm.
The experiment's findings were assessed at the review level using Accuracy,
Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). In [12],
the dataset was subjected to stemming, lowercasing, punctuation removal, and
tokenization, while spelling checks and lemmatization were performed by
Aljuhani, S. A. et. al. [13].
7
In sentiment analysis and opinion mining, the detection of sarcasm holds
significant importance. Machine learning algorithms have been employed to
solve this problem. Earlier studies have implemented the Naive Bayes Classifier
and Support Vector Machines for data analysis on social media in Indonesia
[26]. The researchers applied vector support machines along with Tf-IDF and
Bag-Of-Words techniques to detect sarcasm. Davidov et.al [27] also worked on
detecting sarcasm through semi-supervised methods on two datasets consisting
of Amazon product reviews and tweets from Twitter. They primarily focused
on various features including punctuation, sentence syntax, and hashtags used
in the text. Their system demonstrated an accuracy rate of over 75% [27]. The
classifiers employed in the study [26] were described as effective and the
decision to use Naive Bayes and SVM for the analysis was commended [28].
The methodology employed in this study yielded an accuracy rate of
approximately 80%. However, other researchers have also attempted to tackle
this problem using deep learning techniques [29]. Collecting data for the
purpose of detecting sarcasm is a challenging task, especially when dealing with
social media data. This study builds upon some of the methods described in
previous works and yields even better accuracy results, while also implementing
additional steps. In the study conducted by the researcher [30], Support Vector
Machines were utilized to detect sarcasm in media news headlines.
Extracting features in this manner is valuable for analyzing text data across all
domains. In [31], researchers explore the impact of pre-processing on text
extraction and consider factors such as slang usage and spelling accuracy. They
utilize an SVM classifier in their experiments. Another researcher proposes a
solution to emotional analysis through the use of vector representations,
achieving a high accuracy rate of 86% [32]. Other studies examine the use of
four different data sets and explore features such as Word Bag, Dictionary, and
section-based speech features [33]. These studies employ an integrated model
that utilizes SVM, Logistic Regression, and Naive Bayes for analysis. n [34],
the authors utilized three different techniques for removing features at varying
levels and incorporated three-dimensional dividers into their analysis. In a
separate study outlined in [35], a group of researchers employed ten distinct
8
methods for feature extraction in their work on emotion analysis. They
concluded that the problem-solving techniques employed in a given task have
the potential to enhance the performance of a segment separator. Their analysis
showed that TF-IDF yielded the best results with a difference of up to 4%.
Standard monitoring software applications commonly use SVM, Logistic
Regression, and Naive Bayes algorithms in research studies, leading to their
adoption in this analysis. Additionally, the analysis employed a CNN model that
utilized features generated by both Count Vectorizer and word2vec. The process
of converting words from text data into useful numeric features for machine
learning classifiers is referred to as text extraction [36].
9
techniques.
10
Sarhan Classifier for measure, experimented on
Wasif and Sentiment AUC this dataset.
Subrina Analysis of
Sultana Amazon Product
[5] Reviews: A
Comparative
Study
11
Chapter 03: System Development
It's common for web portals to receive a lot of user feedback. It might be
laborious to read through every comment. Opinions shared in discussion boards
must be categorized. For a feedback management system, this can be used.
Individual comments and reviews are categorized, and we use individual
comments and reviews to get the total rating. Consequently, the business will
have a thorough understanding of the client feedback and will be able to handle
those specific fields. As a result, the firm grows in size, notoriety, brand value,
and revenues, and its customers become more devoted.
The dataset used in the study consists of reviews of various musical instruments
from Amazon, and it was obtained from a Kaggle repository by Eswar Chand.
The reviews were originally extracted from web portals such as Bhuvan.
The dataset contains several columns, including the reviewer ID, user ID,
reviewer name, reviewer text, helpfulness rating, summary (obtained from the
reviewer text), overall rating on a scale of 5, and review time. The reviewer ID
and user ID columns are unique identifiers for each reviewer and user,
respectively. The reviewer name column contains the name of the person who
wrote the review. The reviewer text column contains the actual text of the
review, while the helpfulness rating column indicates how helpful the review
was to other users.
The summary column contains a summary of the review text, which is usually
a brief sentence or phrase that captures the main point of the review. The overall
rating column provides a rating of the product on a scale of 1 to 5. Finally, the
12
review time column indicates when the review was written.
This dataset is useful for studying sentiment analysis because it provides a large
number of reviews with ratings and text data, which can be used to train machine
learning models to predict the sentiment of the reviews. The dataset can also be
used to study other aspects of customer reviews, such as helpfulness and the
impact of summary text on overall ratings.
13
Figure. 3.1. Dataset features shown through code
14
Data Collection
Model Selection
A. Data Collection
In this work, the data used for Amazon product reviews sentiment analysis was
obtained from the Amazon Musical Instruments Reviews dataset available on
Eswar Chand's Kaggle repository [11]. However, in order to increase the
amount of data, web scraping was also performed. The two datasets were
integrated into the final dataset that was used for the analysis.
15
Web scraping is a technique that involves extracting data from websites. The
process of data extraction involves several steps, which are illustrated in Figure
3.2.
Web scraping involves extracting data from websites and is a useful tool for
collecting large amounts of data for analysis. To perform web scraping, several
Python libraries are commonly used, including re, requests, and BeautifulSoup.
16
With the re library, users can quickly extract relevant data from HTML and
XML files.
The requests library is another important package for web scraping in Python.
It allows users to send HTTP requests, making it easy to access data from
websites. The requests library provides several functions that range from
providing arguments in URLs to delivering custom headers and SSL
verification. These features make it a flexible and powerful tool for web
scraping. Finally, the BeautifulSoup library is an excellent choice for parsing
HTML and XML files in Python. It provides an intuitive way to discover, locate,
and modify the parse tree, making it easy to extract the required data from web
pages. With BeautifulSoup, users can identify the HTML tags that contain the
data they need and extract it quickly and efficiently.
Overall, these three packages are essential tools for web scraping in Python.
They provide the necessary functionality to extract data from websites,
manipulate strings, and parse HTML and XML files.
B. Data Pre-processing
17
Handling NaN values
The data has got to be rigorously pre-processed before sending the reviews to
the model. The flowchart of data preprocessing is represented in Figure 3.4.
1. Handling NaN values: Firstly, we handle null values which were essentially
present in review name and review text as seen below in Figure 3.5. Review text
is what is of importance to us as that is what helps us determine the sentiment
of a customer while purchasing the product.
18
Fig. 3.5. Handling NaN values
2. Concatenating Review text and Review Title: Review text and review title are
merged into one column so that the sentiments won't be contradicting in nature
which is visualized in Figure 3.6.
19
Fig. 3.7-3.8. Creating ‘sentiment’ column
4. Handling Time Column: Further time column is handled, it has date and year
which once done split it further into month and date as can be seen below in
Figure 3.9 and Figure 3.10.
20
Fig. 3.9. Handling Date and Time column
6. Review text- Stop words: When it comes to stop words, the common nltk list
includes terms like not, hasn't, and wouldn't, as shown in Figure 3.12 and Figure
3.13, which really express a bad attitude. That will conflict with the target
variable if we remove it (sentiment). Therefore, I have chosen stop words that
don't have any bad connotations or negative alternatives.
21
Fig. 3.12-3.13. Review text- Stop words
1. Year vs Sentiment count: This block will display the number of reviews based
on sentiments that were posted in each year from 2004 to 2014. Year Vs
Sentiment count is depicted below in Fig. 3.16. The plot makes it evident that
the number of favorable reviews has increased since 2010. All the review rates
declined at this point as the trend peaked about 2013 and began to decline.
22
Reviews that are neutral or negative are far less common than reviews that are
good.
23
Fig. 3.15-3.16. Day of Month vs Review Count
3. Creating few more features for text analysis: In order to analyze the reviews,
we calculated several metrics including polarity, review duration, and word
count. Polarity was calculated using Textblob, which assigns a sentiment score
between -1 and 1, with -1 indicating a negative sentiment and 1 indicating a
positive sentiment. Review duration was determined by measuring the time
interval between the review creation date and the review date. Lastly, word
count was calculated by counting the total number of words in each review,
including all letters and spaces. Word length was also calculated to indicate the
average number of words in the reviews. These metrics provide valuable
insights into the sentiment and structure of the reviews, which can be used to
gain a better understanding of customer feedback.
24
Fig. 3.17. Creation of extra Features
25
Fig. 3.19. Review Rating Distribution
7. Review Text Word Count Distribution: Upon analyzing the reviews, it was
found that the majority of the reviews contained between 0 and 200 words. This
was evidenced by a right-skewed distribution of word count, with fewer reviews
containing higher numbers of words. This finding suggests that customers
generally preferred to provide concise feedback, with only a small percentage
of customers taking the time to write lengthy reviews. Businesses could use this
information to improve their customer feedback mechanisms by encouraging
customers to provide more detailed feedback, perhaps by offering incentives or
rewards for more extensive reviews. Additionally, this data could be used to
improve the analysis of customer feedback by identifying common themes and
sentiments expressed within concise reviews.
8. N-gram Analysis: As part of our deep text analysis, we utilize N-grams to gain
further insight into the sentiment expressed in customer reviews.
a. Monogram Analysis: Here, we'll map the one term that appears most frequently
in reviews based on sentiments. As can be seen, the words rarely reflect the
sentiment. We cannot evaluate a monogram only on a single word to convey a
sentiment. So let's try using two words that are used frequently.
26
b. Bigram Analysis: Here, we'll map the two words that appear most frequently in
reviews based on sentiments. As seen below we can get a clear idea about
sentiments from the bi-words.
c. Trigram Analysis: Here, we'll map the three words that appear most frequently
in reviews based on sentiments.
27
negative words and sentiments to gain a more comprehensive understanding of
the customer experience.
10. Wordcloud Neutral Reviews: After analyzing the text of neutral reviews, we
generated a word cloud to identify the most frequently occurring words. It
became apparent that the majority of neutral reviews focused on the product and
28
its potential areas of improvement. Customers often used words related to
functionality, usability, and performance when discussing the product in neutral
reviews. While these reviews did not express either positive or negative
sentiment, they still provided valuable feedback to businesses on how they
could improve their product or service to better meet customer needs and
expectations. By identifying common themes and sentiments expressed in
neutral reviews, businesses can make data-driven decisions to improve their
products and services, and ultimately provide a better customer experience.
29
Fig. 3.24. Word Cloud for Neutral Review
11. Word cloud Negative Reviews: Upon analyzing the reviews, it is apparent that
certain negative words frequently appear in the reviews. These negative words
include "noisy," "didn't," "noise," "wasn't," "snap," "problems," "tension," and
others. These words suggest that customers have experienced issues with the
product or service provided. These negative sentiments can be used to identify
areas that require improvement to enhance customer satisfaction. By analyzing
the frequency and context of these negative words, we can gain a better
understanding of the issues and take appropriate actions to address them.
30
Fig. 3.25. Code Screenshot for Word cloud Negative Reviews
31
D. Extracting features from Cleaned reviews
32
Fig. 3.27. Target variable-sentiment encoding
2. Stemming Reviews: One way to get the root word from the inflected term is by
stemming. Here, we extract the terms from the reviews and change them to their
root words. for instance, going to go and finally to fina. You'll note that the root
words don't necessarily need to have a semantic meaning. Another method is
called lemmatization, which reduces words to their root words, each of which
has a semantic meaning. Since it requires time. We are utilizing stemming. As
a machine cannot understand words or their sentiment, we must translate them
into 1s and 0s. This is how a line now appears. We use TFIDF to encrypt it.
33
analyzing a text about search engines or document retrieval. Additionally, the
top 5000 words from the reviews are used for analysis. This is a common
technique in natural language processing to remove stop words and other
common words that do not carry much meaning or contribute to the
understanding of a text. By focusing only on the most frequent and important
words, we can reduce noise and increase the signal-to-noise ratio in our analysis,
which can lead to more accurate and meaningful results.
4. We can validate that we have 5000 columns because we took 5000 words into
account. The reason why we have 5000 columns in the TF-IDF matrix is
because we selected the top 5000 words from the reviews. This is a common
technique in natural language processing and text mining to reduce the
dimensionality of the matrix while still focusing on the most important and
informative words in the corpus. The TF-IDF weight of each word represents
its significance to the document and the corpus as a whole, and by using the top
5000 words, we can extract meaningful insights from the data while reducing
computational complexity.
5. Handling Imbalance target feature-SMOTE: We saw that there were far more
favorable comments about our desired feature than negative or indifferent ones.
34
In such a situation, it is imperative to balance the classes.In this case, SMOTE
(Synthetic Minority Oversampling Technique) is used to address the unbalanced
dataset problem. It aims to balance the distribution of classes by randomly
increasing minority class samples and duplicating them. SMOTE combines
existing minority instances to produce new minority instances. It uses linear
interpolation to provide virtual training records for the minority class. These
synthetic training records are selected at random from one or more of the k-
nearest neighbors for each example in the minority class. The data is recreated
after the oversampling process and can then be exposed to a variety of
classification algorithms. The screenshot of the same is visible in Figure 3.29.
6. Train- TEST Split: To train and evaluate the performance of our machine
learning model, we have divided the dataset into a 75:25 ratio for the train and
test sets. This was done using the train-test split function, which randomly splits
the dataset into two subsets: one for training the model and the other for testing
its performance. By reserving a portion of the dataset for testing, we can
evaluate how well the model generalizes to new, unseen data. This approach
helps to prevent overfitting, where the model memorizes the training data and
performs poorly on new data. By using a train-test split, we can optimize the
performance of our machine learning model and ensure that it accurately
predicts the sentiment of customer reviews.
35
E. Model Building
In the field of machine learning, one of the most critical tasks is to classify data
into different categories or classes. One common approach is to split the dataset
into a training set and a test set using the train test split function. The dataset
was split into a training set and a test set using the train test split function, with
a 75:25 ratio for the training and test sets, respectively.
Several machine learning methods have been developed to classify data into
multiple categories, and these have different strengths and weaknesses. In this
context, a variety of methods have been applied, including multinomial Naive
Bayes, Gaussian Naive Bayes, Bernoulli Naive Bayes, logistic regression,
decision trees, support vector machines, and k-nearest neighbors.
ii. Gaussian Naive Bayes: it can provide fast and accurate classification results.
However, it is important to note that Gaussian Naive Bayes[25] may not
perform well when the data does not follow a Gaussian distribution, or when
the features are highly correlated. In such cases, other machine learning
algorithms may be more appropriate. Additionally, it is important to carefully
36
select and preprocess the features before applying any machine learning
algorithm, including Gaussian Naive Bayes.:
iii. Bernoulli Naive Bayes: This approach can be useful in tasks such as text
classification, where the presence or absence of a certain word in a document
can be represented as a binary value. In Bernoulli Naive Bayes, each feature is
assumed to be independent and the probability of a class given a set of features
is calculated using Bayes' theorem. Like Gaussian Naive Bayes, the parameters
of the Bernoulli distribution (i.e., the probability of each feature taking on the
value 1 or 0) are estimated from the training data. However, it is important to
note that Bernoulli Naive Bayes may not perform well when the features are not
binary or when the features are not independent. In such cases, other machine
learning algorithms may be more appropriate.
37
vi. Random Forest: Random Forest is a machine learning algorithm that
combines multiple decision trees to produce more accurate predictions. It is
useful for both classification and regression problems, and reduces overfitting
while handling high-dimensional data. By aggregating the outputs of multiple
trees, Random Forest[25] improves the accuracy and robustness of the model,
and is widely used in various domains.
38
Fig. 3.33. Dataset features
Model Selection: By applying cross validation, first choose the model that
performs the best. Let's run the model selection procedure while taking into
account all the classification methods.
39
The findings show that logistic regression outperformed the other
methods, and all of the accuracy levels are more than 80%. That's fantastic.
Consequently, let's use logistic regression with hyperparameter adjustment.
This code as shown in Figure 3.36 takes a list of sentences and converts them
into sequences of numerical indices using the previously created tokenizer
object. It then pads the sequences to a maximum length using the
"pad_sequences" function from Keras. Padding ensures that all sequences have
the same length, which is necessary for feeding the data into a neural network.
40
The resulting padded sequences are printed to the console along with their
shape. This code is useful for preparing text data for use in machine learning
models such as neural networks for sentiment analysis or text classification.
The below code declares several variables that will be used to configure the
neural network for training on text data.
● "vocab_size" specifies the maximum number of words to keep based on word
frequency.
● embedding_dim" is the dimension of the dense embedding space where the
words will be represented as vectors.
● ""max_length" is the maximum length of the padded sequences.
● "trunc_type" specifies how to handle sequences that exceed the specified length,
in this case, by truncating them at the end of the sequence.
● "padding_type" specifies where to add padding to the sequences, in this case, at
the end of the sequence.
● "oov_tok" is the out-of-vocabulary token used to represent words that are not in
the tokenizer's vocabulary.
● "training_size" is the number of training samples to use when training the neural
network.
These variables are important for configuring the neural network architecture
and preprocessing the input text data to ensure that it is in the appropriate format
for training.
41
Fig. 3.37. Variable Configurations
This code block preprocesses the text data for use in a neural network model for
natural language processing tasks such as sentiment analysis or text
classification. The text data is tokenized, which involves converting each word
in the text into a numerical index using a dictionary-like structure called a
tokenizer. The tokenizer is initialized with a maximum vocabulary size and an
out-of-vocabulary token to handle words not in the vocabulary. The tokenizer
is then fitted on the training sentences to create a word-index dictionary. The
training and testing sentences are then converted into sequences of numerical
indices using the word-index dictionary and padded to a fixed length using the
Keras "pad_sequences" function. The resulting preprocessed data is in a suitable
format for training and evaluating a neural network model. Splitting the data
into training and testing sets and preprocessing the text data are important steps
to ensure the model learns the underlying patterns in the data and performs well
on unseen data. We can see this in Figure 3.38 below.
42
F. Model Testing
75% of the dataset has been used for training and the remaining 25% has been
used for testing purposes as shown in Figure 3.39. In this project, a significant
portion of the dataset, specifically 75%, was allocated for training the sentiment
analysis model. This process involved feeding the machine with the encoded
reviews and their corresponding sentiments, enabling it to learn from the
patterns and relationships within the data. The remaining 25% of the dataset was
reserved for testing purposes. This portion was used to evaluate the model's
performance and determine its accuracy in predicting the sentiments of new,
previously unseen reviews.
The results obtained from the training and testing process were analyzed in the
next chapter, where we discussed the model's effectiveness in accurately
predicting sentiment based on the review text. The analysis involved comparing
the predicted sentiment with the actual sentiment of the reviews and calculating
metrics such as precision, recall, and F1 score to evaluate the model's
performance. Additionally, the analysis involved identifying any areas where
the model could be improved to achieve better performance. Overall, the results
of the sentiment analysis model's training and testing phases were vital in
determining its effectiveness and ensuring its ability to accurately predict
sentiment in new reviews. The results that are obtained have been discussed in
the next chapter.
43
Fig. 3.39. Model Testing
This code, as shown below in Figure 3.41 trains and evaluates the previously
defined neural network model for text classification. The model is trained on
the preprocessed training data for 50 epochs, with the training and validation
accuracy tracked and displayed during training. After training, the model is
evaluated on the preprocessed testing data using the "evaluate" method and the
loss and accuracy metrics are computed. Finally, the accuracy metric is printed
to the console as a percentage. This code allows us to assess the performance of
44
the model on unseen data and determine if the model has learned the underlying
patterns in the data.
45
Chapter 04: Experiments and Results Analysis
4.1 Performance Parameters and Analysis
The following parameters have been used to assess the prediction results:
Precision: It is the measure of points that are actually positive, out of all the
points in a model predicted positive.
Accuracy: Out of the total number of points in a model, the points which are
correctly classified is called accuracy.
46
Fig. 4.1. Logistic Regression Classifier
2 K Nearest 87 % 86 % 86 % 86 %
Neighbors
3 SVC 87 % 87 % 87 % 88 %
4 Bernoulli Naive 80 % 81 % 80 % 80 %
Bayes
5 Multinomial 85 % 85 % 86 % 86 %
Naive Bayes
6 Decision Tree 82 % 81 % 81 % 81 %
ROC-AUC Curve: The AUC (Area Under The Curve) ROC (Receiver
47
Operating Characteristics) curve is used to assess or depict the performance of
the multi-class classification issue. It is one of the most crucial assessment
criteria for assessing the effectiveness of any classification model. It is also
spelled AUROC (Area Under the Receiver Operating Characteristics). The
model is more accurate at classifying 0 classes as 0, and classifying 1 classes as
1, the higher the AUC. By example, the model is more effective at
differentiating between individuals with the condition and those who do not
have it the higher the AUC. TPR is plotted against FPR on the ROC curve, with
FPR on the x-axis and TPR on the y-axis.
The Hyper Parameter Tuning process has been meticulously carried out to
ensure that the model's accuracy is maximized, while also avoiding overfitting,
which is a common challenge in machine learning. The results of the Hyper
Parameter Tuning process are depicted in Figure 4.2, where we can observe that
the accuracy of the logistic regression classifier has been increased to 94% after
the tuning process.
This high accuracy rate is a significant achievement, as it ensures that our model
can accurately predict the sentiment of new reviews with a high degree of
certainty. Additionally, the results of the Hyper Parameter Tuning process have
provided us with valuable insights into the optimal hyperparameter settings for
our logistic regression model, which can be used in future sentiment analysis
projects to achieve better performance. Overall, the Hyper Parameter Tuning
48
process has been a crucial step in enhancing the effectiveness of our logistic
regression model for sentiment analysis.
Here we plot a confusion matrix with ROC and check our f1-score that has been
derived from Precision and Recall. As shown in Figure 4.3.
49
Confusion Matrix is visible between True label and Predicted label. The
diagonal elements which are darker are correctly predicted records and the rest
are incorrectly classified which can also be visualized.
We are taking this into consideration all three positive, negative and neutral
reviews because it is crucial to forecast both favorable and negative evaluations.
We received a decent f1 score. As can be seen in Figure 4.4, it received high
scores in all categories of even recall precision and accuracy.
50
Fig. 4.5. ROC Curve code screenshot Part 1
51
Fig. 4.6. ROC Curve code screenshot Part 2
The ROC Curve displayed in Figure 4.8 provides us with valuable insights into
the classification performance of the model. We can observe that the classes 0
and 2 have been accurately classified due to their high area under the curve. To
achieve the best true positive rate (TPR) and false positive rate (FPR), we can
choose any threshold value between 0.6 and 0.8. This threshold selection
depends on the objective standards set for the classification task.
52
Furthermore, it is apparent that the macro average doesn't perform as well as the
micro average. The micro-average, which gives equal weightage to each
sample, has shown better scores than the macro-average. The latter average is
computed by averaging the scores of all classes equally. This implies that the
model is better at identifying the minority classes than the majority class.
Overall, the ROC curve provides us with a clear understanding of the model's
classification performance, enabling us to make informed decisions about
selecting the optimal threshold value for a given task.
53
Chapter 05: Conclusion
5.1 Conclusion
Sentiment analysis is a popular technique for extracting insights from text data
in eCommerce platforms, including comments, reviews, feedback, and tweets.
The use of emoticons, ratings, and reviews helps to convey user opinions, and
this information can be used by customers to make informed purchasing
decisions.
5.2 Applications
54
product. Customers can easily identify the overall rating or impact of the
product based on the reviews, while sellers can analyze the response generated
by the reviews of their product. In the case of musical instruments, sentiment
analysis helps to understand how users feel about a particular product, and what
aspects they like or dislike.
In today's world, where everyone seeks out other people's opinions in order to
learn from their experiences, the importance of reviews has increased
significantly. However, customers hardly ever read all of the reviews available,
and sentiment analysis plays a crucial role in interpreting them. By analyzing
the sentiments expressed in reviews, customers can make informed decisions
about their purchases, based on the opinions of others.
55
dataset focuses on Amazon mobile evaluations, it may also be used to analyze
Amazon reviews in general. Due to the restricted resources of my personal
laptop, the study's constraint is the implementation in Google Colab to speed up
implementation.
56
References
[1] AlQahtani, Arwa SM. "Product sentiment analysis for amazon reviews."
International Journal of Computer Science & Information Technology (IJCSIT)
Vol 13 (2021).
[5] S. Dey, S. Wasif1 and S. Sultana, "A Comparative Study of SVM and Naive
Bayes Classifier for Sentiment Analysis on Amazon Product Reviews”,
International Conference on Contemporary Computing and Applications 2020.
[11] S. Yoichi, and V. Klyuev. "Classifying user reviews at sentence and review
levels utilizing Naïve Bayes." 2019 21st International Conference on Advanced
Communication Technology (ICACT). IEEE, 2019.
57
[13] A. S. Ashour, and N. S. Alghamdi. "A comparison of sentiment analysis
methods on Amazon reviews of Mobile Phones." International Journal of
Advanced Computer Science and Applications 10.6 (2019).
[16] https://www.kaggle.com/datasets/eswarchandt/amazon-music-reviews
[17] https://github.com/BenRoshan100/Sentiment-analysis-Amazon-reviews
[19] B. Faltings, M. Boia, C.-C. Musat, and P. Pu, “A:) Is worth a thousand
words: How people attach sentiment to emoticons and words in tweets,” in 2013
International Conference on Social Computing, pp. 345–350, 2013.
[24] T. Finin and J. Marineau, “Delta TFIDF: An improved feature space for
sentiment analysis,” in Proc. AAAI International Conference on Weblogs and
Social Media, 2009.
58
[26] E. Lunando and A. Purwarianti, “Indonesian Social Media Sentiment
Analysis With Sarcasm Detection,” In: Int. Conf. Adv. Comput. Sci. Inf. Syst.
ICACSIS, pp. 195-198. September (2013).
[30] P. Mandal and R. Mahto, “Deep CNN-LSTM with Word Embeddings for
News Headline Sarcasm Detection” (2019).
59
Appendix
60
Fig. 7.3. Response of the data request
61
Fig. 7.4. Setting definite path for each product request
62
63
Fig. 7.5. Generating product page link using html tags on amazon webpage
64
Fig. 7.6. Generating the Review page link for all products
65
Fig. 7.7. Appending the extracted information into a list
66