Opinion mining is a process of identifying user's opinion
about movie, hotel, and product from reviews. Opinion
mining is classification of user's expressed opinion into
positive or negative polarity. Opinion summarization is a
process of representation of review information in short and
summarized form. It also involves selecting important
aspects and representing related expressed opinions from user reviews.
Opinion Summarization
Sentence scoring
(Relevance score)
Positive reviews Negative reviews
Summary generation
III. OVERVIEW AND ARCHITECTURE OF SYSTEM Naive Bayes, Support Vector Machine and Decision Tree.
The prepared classifier models of different classifiers are
The prepared classifier models of different classifiers are evaluated for their accuracy.
particular topic from reviews. It is classification of a review
text as positive or negative opinion polarity review. Opinion 2) SentiWordNet based method
summarization is a process of finding most important aspects
about topic and related opinion sentences from reviews to SentiWordNet [10] is one of the popular lexical resources
represent a summary. Our proposed architecture performs for sentiment analysis. SentiWordNet is a database that
opinion mining and summarization of hotel reviews is shown contains word with its polarity score for positive or negative
in Figure 1. sentiment based on its part of speech. By calculating scores of
different words in review we get polarity of review text. We
The proposed system consist of three modules review text also need to handle negation words such as ‘not’, ‘don’t’,
retrieval, classification and summarization. Reviews about ’never’ etc that affect the polarity of sentence. If S is polarity
hotel are retrieved from review websites such as of document, for a word w sentiment score is calculated by
www.tripadvisor.com by web crawling techniques. Review text function sentiment (w) and a function modifier (w) which
is classified as positive or negative review using machine handles word position or negation in sentence [2]. General
learning classifiers or SentiWordNet based algorithm. equation to identify sentiment polarity of review text
Classified review text is pre-processed and sentence scores are document D is [2]
calculated. Finally most informative and context relevant
sentences represented in summary. σ௪א ݐ݊݁݉݅ݐ݊݁ݏሺݓሻ ȉ ݉ݎ݂݁݅݅݀ሺݓሻ
ܵሺܦሻ ൌ ሺʹሻ
A. Opinion mining of hotel reviews σ ݐ݊݁݉݅ݐ݊݁ݏሺݓሻ
Opinion mining is a text classification problem where B. Opinion summarization of hotel reviews
review text document is classified into classes as positive or
negative opinion review. Machine learning approach [4] [11] For opinion summarization of reviews we are using
and resource based approach [6] [7] for opinion classification sentence extraction method. Sentence extraction is an approach
of review text can be used. for text summarization. In this method sentences are filtered
and most informative sentences are selected from document to
1) Machine learning approach represent a summary. To find summary from large number of
reviews we are selecting those sentence which are informative
Machine learning approach is supervised text and most relevant to the context. Review sentences are scored
classification, where algorithms are trained on sample labeled and sorted based on relevance score [6]. Relevance score is
review text to build a classifier model. A trained model of calculated by following formula [6]. It is proposed in [6] by
classifier is then used for categorization of new test reviews. Lloret et al.
We trained and tested machine learning classifiers such as
Number of 1000 (500-positive, 500-negative)
reviews This paper presented opinion mining and summarization of
Training hotel reviews on the web. For opinion classification of hotel
Number of 1000 (500-positive, 500-negative) reviews we used machine learning algorithms and
SentiWordNet based method. We obtained about 87%
Classifier NB SVM Decision SentiWordNet
accuracy for classification of hotel review text as positive or
Tree method negative opinion polarity review by machine learning Naïve
Correctly 880 835 784 876 Bayes algorithm. For opinion summarization, we used term
classified frequency and relevance scoring method to represent most
Accuracy 88% 83.5% 78.4% 87.6% informative sentences in summary. This classified and
Precision 0.895 0.844 0.786 0.90
Recall 0.890 0.835 0.784 0.876
summarized review information will assist users in decision
F-measure 0.879 0.834 0.784 0.888 making about hotels.
ܴܵ ൌ ݂ݐ௪ ሺ͵ሻ
