Predicting The Reviews of The Restaurant Using Natural Language Processing Technique

PREDICTING THE REVIEWS OF THE RESTAURANT USING
NATURAL LANGUAGE PROCESSING TECHNIQUE
ABSTRACT
In the era of the web, a huge amount of information is now flowing over the network. Since
the range of web content covers subjective opinion as well as objective information, it is
now common for people to gather information about products and services that they want
to buy. However since a considerable amount of information exists as text-fragments
without having any kind of numerical scales, it is hard to classify their evaluation efficiently
without reading full text. Here we will focus on extracting scored ratings from text
fragments on the web and suggests various experiments in order to improve the quality of
a classifier.
EXISTING SYSTEM
Many researchers have done experiments to classify the sentiments of the customers on
different datasets earlier. Like Turney (2002) used a semantic orientation algorithm to
classify reviews based on the numbers of positively oriented and negatively oriented
phrases in each review. Pang et al. (2002) used machine learning tools such as Naïve Bayes,
Maximum Entropy and Support Vector Machine (SVM) classifiers to classify movie reviews
using a number of simple textual features.
DISADVANTAGES
 This type of classification is only done when the classifier has to work on the binary
data which is not the case with Restaurant Reviews.
 However, from a practical point of view perhaps the most serious problem with SVMs
is the high algorithmic complexity and extensive memory requirements of the
required quadratic programming in large-scale tasks.
 If categorical variable has a category (in test data set), which was not observed in
training data set, then model will assign a 0 (zero) probability and will be unable to
make a prediction. This is often known as “Zero Frequency”.
PROPOSED SYSTEM
Our proposed system is to apply natural language processing techniques to classify a set of
restaurant reviews based on the number of stars that each review received. We develop a
maximum entropy classifier to categorize each review from 1-star to 5-stars. We implement
a set of features that we believe to be relevant to the sentiment expressed in reviews and
analyze their effect on performance, providing insights into what works and why sentiment
categorization can be so difficult. We analyze how a review’s conformance to a particular
language model can be affected by the sentiment of the review We experiment with
different linguistically motivated models of sentiment expression, again using the results to
improve the performance of our classifier We examine the effects of part-of-speech tagging
on our ability to predict sentiment. We experimented with different methods of
preprocessing the data. Because the reviews are unstructured in terms of user input,
reviews can look like anything from a paragraph of well-formatted text to a jumble of
seemingly unrelated words to a run-on sentence with no apparent regard for grammar or
Punctuation. Our initial pass over the data simply tokenized the reviews based on
whitespace and treated each token as a unigram, but we were able to improve
performance by removing punctuation in addition to the whitespace and converting all
letters to lowercase. In this way, we treat the occurrences of “good”, “Good”, and “good.”
all as the same, which gives better predictive power to any test set review containing any of
these three forms. Before converting into the unigram stemming was also done which
means the various forms (tenses, verbs) of the words were removed and treated as a single
word. After the matrix is build the non-frequent words are removed by setting a threshold
in order to improve the accuracy. So our matrix includes relevant unigrams as well as
bigrams which are occurring more than the threshold times.
ADVANTAGES
 Good at pattern recognition problems

 Data-driven, and performance is high in many problems
 End-to-End training: little or no domain knowledge is needed in system construction
 Learn of representations: cross-modal processing is possible
 Gradient-based learning: learning algorithm is simple
 Mainly supervised learning methods
ARCHITECTURE
ALGORITHMS
NATURAL LANGUAGE PROCESSING
Natural Language Processing (NLP) is a sub-field of Artificial Intelligence that is focused on

enabling computers to understand and process human languages, to get computers closer
to a human-level understanding of language. That being said, recent advances in Machine
Learning (ML) have enabled computers to do quite a lot of useful things with natural
language! Deep Learning has enabled us to write programs to perform things like language
translation, semantic understanding, and text summarization. Since, text is the most
unstructured form of all the available data, various types of noise are present in it and the
data is not readily analysable without any pre-processing. The entire process of cleaning
and standardization of text, making it noise-free and ready for analysis is known as text pre-
processing. To analyse a pre-processed data, it needs to be converted into features.
Depending upon the usage, text features can be constructed using assorted techniques –
Syntactical Parsing, Entities / N-grams / word-based features, Statistical features, and word
embedding’s. Text classification, in common words is defined as a technique to
systematically classify a text object (document or sentence) in one of the fixed category. It
is really helpful when the amount of data is too large, especially for organizing, information
filtering, and storage purposes.
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
 OS : Windows
 Python IDE : python 2.7.x and above
: Pycharm IDE
 setup tools and pip to be installed for 3.6.x and above
HARDWARE REQUIREMENTS:
 RAM : 4GB and Higher
 Processor : Intel i3 and above
 Hard Disk : 500GB: Minimum

Predicting The Reviews of The Restaurant Using Natural Language Processing Technique

Uploaded by

Copyright:

Available Formats

Predicting The Reviews of The Restaurant Using Natural Language Processing Technique

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting The Reviews of The Restaurant Using Natural Language Processing Technique

Uploaded by

Copyright:

Available Formats

PREDICTING THE REVIEWS OF THE RESTAURANT USING

NATURAL LANGUAGE PROCESSING TECHNIQUE

 Good at pattern recognition problems

NATURAL LANGUAGE PROCESSING

Natural Language Processing (NLP) is a sub-field of Artificial Intelligence that is focused on

You might also like