Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Opinion Mining On Social Media Data: 2013 IEEE 14th International Conference On Mobile Data Management

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2013 IEEE 14th International Conference on Mobile Data Management

Opinion Mining on Social Media Data


Po-Wei Liang Bi-Ru Dai
Department of Computer Science and Information Department of Computer Science and Information
Engineering, National Taiwan University of Science and Engineering, National Taiwan University of Science and
Technology, Taipei, Taiwan, ROC. Technology, Taipei, Taiwan, ROC.
Email: M10015085@mail.ntust.edu.tw Email: brdai@csie.ntust.edu.tw

AbstractMicroblogging (Twitter or Facebook) has become a important because whenever people need to make a decision,
very popular communication tool among Internet users in recent they want to hear others opinions. The same is true for
years. Information is generated and managed through either organizations. However, many real-life applications require
computer or mobile devices by one person and is consumed by very detailed analyses in order to gather information from, for
many other persons, with most of this user-generated content example, a product review, whose data could help users or
being textual information. As there are a lot of raw data of people managers make important product-related decisions. This
posting real time messages about their opinions on a variety of approach is also being actively employed by governments or
topics in daily life, it is a worthwhile research endeavor to collect companies to collect and analyze feedback on their policies or
and analyze these data, which may be useful for users or
products.
managers to make informed decisions, for example. However this
problem is challenging because a micro-blog post is usually very Most of the user-generated messages on microblogging
short and colloquial, and traditional opinion mining algorithms websites are textual information, identifying their sentiments
do not work well in such type of text. Therefore, in this paper, has become an important issue. The research in the field
we propose a new system architecture that can automatically started with sentiment classification, which treated the
analyze the sentiments of these messages. We combine this system problem as a text classification problem. Text classification
with manually annotated data from Twitter, one of the most using machine learning is a well studied field [1], and there is
popular microblogging platforms, for the task of sentiment
ample research of the effects of using various machine
analysis. In this system, machines can learn how to automatically
learning techniques (Naive Bayes (NB), Maximum Entropy
extract the set of messages which contain opinions, filter out non-
opinion messages and determine their sentiment directions (i.e.
(ME), and Support Vector Machines (SVM) [2]. After
positive, negative). Experimental results verify the effectiveness building and testing models using Naive Bayes, MaxEnt and
of our system on sentiment analysis in real microblogging Support Vector Machines (SVM), they reported that SVM
applications. showed the best performance. However most of the traditional
research has focused on classifying long texts, such as reviews
KeywordsMicroblogging; Sentiment analysis; Opinion Mining [2]. However since Microblogging messages that are short and
colloquial, traditional algorithms do not perform as well as
I. INTRODUCTION they do for long texts. For this reason, there has been in recent
Microblogging websites have evolved to become a source years a lot of research in the area of sentiment classification
of a diverse variety information, with millions of messages that has targeted microblogging data [11,13]. Go et al. (2009)
appearing daily on popular web-sites. Users can post real time focused on distant learning to acquire sentiment data, and use
messages about their life, share opinions on variety of topics tweets containing positive emoticons like :), :-) to denote
and discuss current issues on these microblogging websites. positive sentiment, and negative emoticons like :(, :-( for
Product reviewing has been rapidly growing in recent years negative emotional content. However, there are many
because more and more products are selling on the Web. The contradictions, for example, RT @MarieAFrnndz: The
large number of reviews allows customers to make an Twilight Saga is over :( I'm so sad, I'll miss it, in this
informed decisions on product purchases. However, it is example we can find that this message means positive but the
difficult for product manufacturers or businesses to keep track emotion is negative. In this paper, to overcome these
of customer opinions and sentiments on their products and challenges, we aim to design a system which automatically
services. In order to enhance the customer shopping combines supervised learning that is capable of extracting,
experiences a system is needed to help people analyze the learning and classifying tweets, with opinion expressions. The
sentiment content of product reviews. basic idea is to use domain-specific training data to build a
generic classification model from social media data to help
The power of social media as a marketing tool has been improve the performance. The experimental results
recognized, and is being actively taken advantage of by people, demonstrate the effectiveness of the proposed system is work
governments, major corporations, and schools. Twitter is well.
perhaps the most popular microblogging website where users
create status messages called tweets, which are short status The remainder of this paper is organized as follows. We
updates and musings from Twitters users that must be written discuss several supervised learning algorithms and short text
in 140 characters or less. Tweets containing opinions are classification related to our work in Section 2. In Section 3, we

978-0-7695-4973-6/13 $26.00 2013 IEEE 91


DOI 10.1109/MDM.2013.73
introduce the proposed system architecture, and experimental improved. Training data of different categories are then used
evaluations are presented in Section 4. The conclusions are to build classifiers.
given in Section 5.
A. Preprocessing
II. RELATED WORK First we will introduce various properties of messages that
There has been a wide range of research done on sentiment users post on Twitter. Some of the many unique properties
analysis, from rule-based, bag-of-words approaches to include the following:
machine learning techniques. Two main research directions of 1) Usernames: Users often include Twitter usernames in
opinion mining operate on either the document level [2,3,4] or
their tweets in order to direct their messages. A de facto
the sentence level [5,6,7]. Both document level and sentence
level classification methods are usually based on the standard is to include the @ symbol before the username (e.g.
identification of opinion words or phrases. For this, there are @liang).
basically two types of approaches: (1) lexicon-based 2) Hash Tags: Twitter allows users to tag their tweets with
approaches, and (2) rule-based approaches. In the former, a the help of a hash tag, which has the form of #<tag-
lexicon table will be built first, with each word in this table name>. Users can use this to convey what their tweet is
belonging to positive or negative evaluations. The echo count primarily about by using keywords that best represent the
of positive words and negative words will then be calculated, content of the tweet.
or some formula which considers the distance between each 3) RT: If a tweet is compelling and interesting enough,
opinion word and product feature will be used to determine users might republish that tweet, commonly known as re-
the sematic direction. In rule-based approaches, parts-of- tweeting, and twitter employs RT to represent re-tweeting
speech (POS) taggers will first be used to tag each word, and
(e.g. RT @RodyRoderos: I love iphone 5 but i want samsung
then co-occurrence patterns of words and tags will be found to
determine the sentiments. note 2 :().

However in this paper, we focus on microblogging data Second we eliminate tweets that:
like Twitter, on which users post real time reactions to and
opinions about everything. It is important to note that there Are not in English.
are differences between product reviews and messages on
microblogs. The messages on microblog are short, filled with Have too few words (threshold is set as five).
colloquial and often people do not care about the grammar of
their messages. In light of these characteristics, the use of Have too few words apart from greeting words.
traditional methods of sentiments analysis will yield poor Have just a URL.
results. A number of recent approaches on sentiment analysis
take this into account, such as sentiment classification that
classifies opinion texts or sentences as positive, negative or
neutral [8-13].
III. SYSTEM ARCHITECTURE
In this paper, we will determine the datas category first,
because we assume that different domains are associated with
different customary terms and expressions, which will affect
the accuracy of sentiment analysis. As such, we propose a new
system architecture and combine various supervised learning
methods to improve the final accuracy of sentiment
classification. Because the labeling of data is very time-
consuming, many researchers use data which contains
emoticons to identify sentiment, and use these data as training
data [12,14]. However, since emoticons are not always
consistent with the sentiment, there will be many mistakes in
the training data. Therefore, in this paper, we use the manually
labeled data as the training data to build our model.
In Figure 1, we introduce our system architecture called
Opinion Miner. First, we will crawl tweets from Twitter, and
perform some pre-processing steps; then tweets which contain
opinions are extracted. After that, tweets containing opinions
are classified, because it is often the case that each of the
different areas or categories of text data has its special
terminology and common representation, and thus we hope
that through text classification the overall accuracy rate can be Fig. 1. System Artitecture of Opinion Miner

92
And then three resources, introduced in Agarwal, A.[15], Bayes classification is maximum a posteriori class c, we
are deployed for pre-processing twitter data: 1) a stop word compute the arg max as follows:
dictionary, 2) an emoticon dictionary, and 3) an acronym
dictionary. The stop word dictionary identifies stop words, and        =        (2)
the emoticon dictionary identifies the 170 emoticons listed on
Wikipedia with their emotional state to extract the emotion It is therefore better to perform the computation by adding
logarithms of probabilities instead of multiplying probabilities.
content in tweets. An online acronym dictionary, with
Hence, the maximization that is actually done in most
translations for 5,184 acronyms, is used to re-store the implementations of NB is:
acronyms. Thus, we pre-process all the remaining tweets as
follows:         (3)

All words are transformed into lower case. To eliminate zeroes in each term of count, we use add-one,
which simply adds one to each count, and  represents the
All the emoticons with a their sentiment polarity are
count of each term in class c.
extracted and saved with reference to the emoticon
dictionary.  
      (4)
Targets (e.g. @liang) are replaced with USER.  

Each word is enriched by Part-of-Speech (POS) tag In this step, the system can classify the tweets into opinion
(Verb, Noun, Adjective, etc.) in the learning corpus. To class and non-opinion class. Then the system passes the
do this, we use the tool Tree Tagger [19], which opinion part into the next step.
automatically gives a Part-of-Speech tag for each word
in a text. C. Short Text Classification
WordNet (Fellbaum 1998) is employed to determine The intrinsic idea of this part is that we observed that a
whether the word is an English word or not. word may have different meanings in different domains. For
example, @htc I wouldn't know, no beats headphones came
Each word is checked to ascertain if it is a stop word or with my beats device :( sad and @teresa_fnts YES! they
not by referencing the stop word dictionary. were. So sad that was the last twilight movie :( I loved wreck
it Ralph. I took my nephew to see it :) he loved it!. We can
Replace a sequence of repeated characters by one
see that sad means a negative word in the former example
character, for example, convert coooooooool to col.
and the sad in the latter means a positive word. So the
According to previous preprocessing steps, all words unigram Nave Bayes classifier method is used together with
are transformed into (word, POS tag, English_word, the pre-labeled training data to build the multi-classifier. We
Stop_word) tuples, where English_word identifies use distinct categories of training data. However, since for
whether this word is an English word or not (EN unigram features, there are usually many different features,
represents English Word, and NEN represents Non- and as such it is helpful if we discard some useless features.
English Word) and Stop_word identifies whether this (Even though our training set just contains thousands of
word is a stop word or not (ST represents Stop Word, sentences, it is still a large number of features for our training
and NST represents Non-Stop Word). For example, set.) In order to solving this problem, we try two different
(iPhone,NN,NEN,NST). feature selection algorithms. The first is Mutual Information
(MI) [11]. The idea of mutual information is that, for each
B. Extracting Tweets Containing Opinions class C and each feature F, there is a score to measure how
In real world, the tweets containing opinion is more much F can contribute to making a correct decision on class C.
valuable. So in this part we want to filter out the tweets The formula of MI score is equation (5):
without opinion. To do this, we use Naive Bayes (NB)
classifier [18] on the training that data we labeled manually to 
        (5)
classify tweets composed of opinions or non-opinions. Naive 
Bayes is a simple model which works well on text
categorization [16]. We implemented a unigram Naive Bayes In practice, we also use add 1 smoothing for each Count of
model [11,13] and employed the Naive Bayes simplifying C and F to avoid a denominator of zero. The second algorithm
independence assumption and class c is assigned to tweet d: is   Feature Selection [11]. The idea of   Feature selection
is similar to mutual information. In that for each feature and
        (1) class, there is also a score to measure whether the feature and
the class are independent of each other. For this reason, the  
In this formula,  represents the kth token in a tweet, P(c) test is employed, which is a statistic to determine to what
is the prior probability of a document occurring in class c, and degree two events are independent. It assumes the feature and
n is the size of a tweet. In text classification, our goal is to find class are independent and calculates the   value with a larger
the best class for the document. As the best class in Naive score implying they are not independent. The formula of  
score is equation (6):

93
 A. Data Sets
      ,(6)

There is no large public available data set of Twitter tweets
with sentiment, so we use Twitter API to help us collect data.
where N is the total number of training sentences. N11 is the However since the Twitter API has a limit of 100 tweets in a
number of co-occurrences of the feature F and the class C. response for any request, we crawl tweets of three distinct
N10 is the number of sentences containing the feature F but categories (camera, mobile phone, and movie) as our training
that are not in class C. N01 is the number of sentences in class set from the time period between November 1, 2012 to
C that do not contain feature F. N00 is the number of January 31, 2013. The Twitter API has a parameter that
sentences not in C and that do not contain feature F. specifies which language to retrieve tweets in. We set this
In this part, the system gets the tweets containing opinion parameter to English, and to test our system on tweets in
from the previous step, and classifies them according to the English. We believe that our system also can be extended to
content of the tweets. Finally, the result will be sent to the next work in other languages. Then we manually labeled the tweets
step to determine the orientation of a tweet. crawl from Twitter. Because labeling the data is a time
consuming task, the amount of training data set is not very
D. Training Multiple Classifiers in Distinct Categories much currently. The details are presented in Table 1. Table 2
We now reach the step of predicting the orientation of an represents the details of our testing data which was also
opinion sentence, i.e., positive or negative. As we mentioned manually labeled.
above, some words in different areas or categories can have
different meanings. In order to improve the final identification TABLE I. TRAINING DATA

accuracy, we need to first classify the short texts according to Positive Negative Non-opinion
their domains, so that the classifier can automatically classify
Camera 449 297 446
with greater performance the tweets as being either positive or
negative. Unlike other researches using the identification of Mobile Phone 472 724 603
emotions is performed to ascertain positive and negative Movie 798 168 485
training data [11, 13], in this paper we use the training data Total 1719 1189 1534
that we labeled manually. In [11, 13], if a tweet contains
positive emoticons like :) and :-), it is considered as
positive training data. On the contrary, tweets with negative
emoticons like :( and :-( are regarded as negative training TABLE II. TESTING DATA

data. However, it is likely that this method to determine the Positive Negative Non-opinion
training data will result in many mistakes in the training data. Camera 62 20 36
For example, RT @MarieAFrnndz: The Twilight Saga is
over :( I'm so sad, I'll miss it is not a negative tweet, but Mobile Phone 116 35 59
rather represents the users feelings about how touching the Movie 140 11 31
movie is, which implies a positive category, so that is why we Total 318 66 126
use labeled data to build the model. As shown in Figure 1, we
will have different training data of positive and negative
tweets for different categories. By using these training data
and Naive Bayes method, we can build many binary classifiers B. Results
of different categories to complete the system. However since In this subsection we will show the result of each part of
labeling data is very time-consuming, the size of this training the proposed system called Opinion Miner.
data is small. In the next section we will show the experiment 1) Extracting Tweets Containing Opinions: This part of
results of using the two different types of training data. results are shown in Table 3. In this paper we treat the positive
IV. EXPERIMENT and negative tweets as opinion, and others as non-opinion.
In this section, we evaluate the whole system and present TABLE III. RESULT OF EXTRACTING TWEETS CONTAINING OPINION
results for predicting the semantic orientations on Twitter. The
main task of the system is to classify tweets to positive versus Opinion (384) Non-opinion (126)
negative versus non-opinion. Next we will show the results for Opinion 346 70
each part, and then for the whole system.
Non-opinion 38 56
The datasets used in our experiments are described in part
A. Experimental result of each step and discussions are
presented in part B, and the final result of Opinion Miner and As shown in Table 3, the total accuracy is 76.8%. In order
the comparison with an existing method are shown in part C. to eliminating the influence of unbalanced training data, we
try to use the Mutual Information feature selection algorithm
introduced in (5). However, as shown in Figure 2, the
accuracy is not improved significantly. Therefore, how to deal

94
with unbalanced training data is still a challenge that we need TABLE IV. RESULT OF SHORT TEXT CLASSIFICATION
to solve in our future research. Mobile Phone
Camera (118) Movie (182)
(210)
Camera 107 6 0
Mobile Phone 11 204 0
Movie 0 0 182

TABLE V. RESULT OF TRAINING CLASSIFIER

Positive (297) Negative (49) Non-opinion


(70)
Positive 282 28 64
Negative 15 21 6
Fig. 2. Effect of Feature Size on Naive Bayes Classifier for Extracting
Tweets Containing Opinions
On the other hand, we also to generated another training
2) Short Text Classificaton: In short text classification, we data using the emotions to denote positive or negative tweets
use Naive Bayes classifier and feature selection to train the [11,13]. We want to compare the effect of two distinct training
muti-classifier. If we only use Naive Bayes, the accuracy is data sets. A new training data set is shown in Table 6. Then
91%. Because the performance of   Feature selection is very we use the new training data in Table 6 to train the classifier in
similar to that of mutual information in our experiement, both this part, and the result is shown in Table 7.
methods increase accuracy. We choose the mutual information TABLE VI. TRAINING DATA USING EMOTIONS
as our feature selection method. After calculating mutual
information (MI) score, only top N features with highest Positive Negative
scores will be picked for the feature set to test. Camera 1484 1201
Mobile Phone 1819 2242
Movie 2330 1938
Total 5663 5381

TABLE VII. RESULT OF TRAINING CLASSIFIER USING THE NEW TRAINING


DATA

Positive (297) Negative (49) Non-opinion


(70)

Fig. 3. Effect of Feature Size on Naive Bayes Classifier for Short Text Positve 106 9 27
Classification Negatvie 191 40 43

In Figure 3, the best accuracy is 96.6%, and Table 4 is


shown the detail for short text classification. Similarly we still do not consider the non-opinion part, and
we get the accuracy in Table 7 of 58.65%, which is not very
3) Training Multiple Classifiers in Distinct Categories: In good. Besides the unbalanced data problem, we observed that
this part, we want to determine the semantic orientation of a there are many types of sentence I want xxx :( in the
tweet (positive/negetive). We use unigram Naive Bayes to Twitter. For example, I want my grandma's htc.... :(, Bored
bulid the model, and the result is shown in Table 5. In this of my Olympus. I want a Lumix :(, and I want to watch
part, we do not consider the tweets of non-opinion, because it Wreck It Ralph so bad :(. We think these examples are all
were already wrong when they were not be filtered out positive data, so using the emotions to collect training data is
through the previous process. We only focus on positive and not always correct.
negative data, so the accuracy is 90.17%. However the result
is affected by unbalanced testing data, we observed the weight
of positive data is higher than negative data, so the result tends
to the positive one.

95
C. Comparison of models method can be employed to once more determine the direction
Finally we combine the result of Table 3 and Table 5 and of the semantic content. Otherwise remain the same result.
show the final result of the whole system called Opinion Second distinct machine learning techniques can be
Miner in Table 8. strategically deployed in different parts, to analyze which
method is more suitable. Finally, rule-based models or
TABLE VIII. RESULT OF OPINION MINER methods of natural language processing can be incorporated
Positive (318) Negative (66) Non-opinion into our system.
(126)
REFERENCES
Positive 282 28 64
[1] Fabrizio Sebastiani. Machine learning in automated text categorisation.
Negative 15 21 6 Technical Report IEI-B4-31-1999, Istituto di Elaborazione
dellInformazione, 2001.
Non-opinion 21 17 56
[2] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment
classification using machine learning techniques. In Proceedings of the
We use a unigram model for comparison. The unigram Conference on Empirical Methods in Natural Language Processing
feature extractor is the simplest way to retrieve features from a (EMNLP), pages 7986, 2002.
tweet, and researchers report well performance for sentiment [3] P. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied
to Unsupervised Classification of Reviews. ACL02, 2002.
analysis on Twitter data using a unigram model [11, 13]. The
result is shown in Table 9, and we can see the accuracy is [4] K. Dave, S. Lawrence, and D. Pennock. Mining the Peanut Gallery:
Opinion Extraction and Semantic Classification of Product Reviews.
67.58%. We think the reason is still the unbalanced training WWW03, 2003.
data. As shown in Table 9 most of testing data are classified to [5] M. Gamon, A. Aue, S. Corston-Oliver, and E. K. Ringger. Pulse: Mining
be positive, which result in low accuracy. customer opinions from free text. IDA2005.
[6] M. Hu and B. Liu. Mining and summarizing customer reviews. KDD04,
TABLE IX. RESULT OF UNIGREAM MODEL 2004.
Positive (318) Negative (66) Non-opinion [7] S. Kim and E. Hovy. Determining the Sentiment of Opinions.
(126) COLING04, 2004.
[8] A-M. Popescu and O. Etzioni. Extracting Product Features and Opinions
Positive 250 19 59 from Reviews. EMNLP-05, 2005.
Negative 20 40 13 [9] Kunpeng Zhang, Yu Cheng, Yusheng Xie, Daniel Honbo, Ankit
Agrawal, Diana Palsetia , Kathy Lee , Wei-keng Liao , Alok Choudhary,
Non-opinion 48 7 54 SES: Sentiment Elicitation System for Social Media Data, Proceedings
of the 2011 IEEE 11th International Conference on Data Mining
Workshops, p.129-136, December 11-11, 2011.
Finally, we show the result of two models in Table 10. We [10] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca
can observe that with the same training data, the opinion miner Passonneau. 2011. Sentiment analysis of twitter data. In Proceedings of
works better than the unigram model. the Workshop on Languages in Social Media, pages 3038. Association
for Computational Linguistics.
[11] Go, A., Huang, L., Bhayani, R.: Twitter sentiment classification using
TABLE X. ACCURACY OF UNIGRAM MODEL AND OPINION MINER
distant supervision. In: CS224N Project Report, Stanford (2009).
Accuracy [12] X. Ding, B. Liu, and P. S. Yu, A holistic lexicon-based approach to
opinion mining, Proceedings of the Conference on Web Search and
Unigram Model 67.58% Web Data Mining (WSDM), 2008.
Opinion Miner 70.39% [13] Alexander Pak and Patrick Paroubek. 2010. Twitter as a corpus for
sentiment analysis and opinion mining. Proceedings of LREC.
[14] Wei Jin, Hung Hay Ho, Rohini K. Srihari, OpinionMiner: a novel
machine learning system for web opinion mining and extraction,
V. CONCLUSION AND FUTURE WORK Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining, June 28-July 01, 2009, Paris,
We designed a system called opinion miner which France.
integrated machine learning techniques and domain-specific [15] J. Read. Using emoticons to reduce dependency in machine learning
data, and experimental results demonstrated the effectiveness techniques for sentiment classification. In Proceedings of ACL-05, 43nd
of the whole system. Meeting of the Association for Computational Linguistics. Association
for Computational Linguistics, 2005.
Machine learning performed well in the classification of
[16] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND
sentiments in tweets. We believe that their accuracy can still PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of
be improved. In this paper, we demonstrated the use of the ACL 2011 Workshop on Languages in Social Media (2011).
domain-specific training data to build the model, and obtained [17] C. D. Manning and H. Schutze. Foundations of statistical natural
a very positive performance. In our future work, we plan to language processing. MIT Press, 1999.
further improve and refine our techniques in order to enhance [18] Dumais, Susan, et al. "Inductive learning algorithms and representations
the accuracy of the system. With this in mind, the following is for text categorization." Proceedings of the seventh international
conference on Information and knowledge management. ACM, 1998.
a list of possible research directions. First, emoticon data can
[19] H. Schmid. Treetagger. In TC project at the Institute for Computational
be used to check the results of the classification, and if the Linguistics of the University of Stuttgart, 1994.
results conflict with the emotion meaning of the tweet, another

96

You might also like