Sentiment Analysis of Twitter Data
Presented by :-
Under the guidance of
Mrs. Madhura M
Asst. Professor
Department of Information Science & Engineering,
Dayananda Sagar College of Engineering, Bangalore
• Introduction
• Literature Survey
• Motivation
• Proposed System
• Code Snippets
• Applications
• Results & Conclusion
• References
twitter.com is a popular microblogging website.
Each tweet is 140 characters in length.
Tweets are frequently used to express a tweeter's
emotion on a particular subject.
There are firms which poll twitter for analysing
sentiment on a particular topic.
The challenge is to gather all such relevant data,
detect and summarize the overall sentiment on a
• The problem in sentiment analysis is classifying
the polarity of a given text at the document,
sentence, or feature/aspect level .
• whether the expressed opinion in a document, a
sentence or an entity feature/aspect is positive,
negative, or neutral .
To implement an algorithm for automatic
classification of text into positive, negative or neutral.
Sentiment Analysis to determine the attitude of the
mass is positive, negative or neutral towards the
subject of interest.
Graphical representation of the sentiment in form of
• Efthymios Kouloumpis, TheresaWilson, Johns Hopkins University, USA,
Johanna Moore, School of Informatics University of Edinburgh, Edinburgh,
UK in a paper on Twitter Sentiment Analysis:The Good the Bad and the
OMG! in July 2011 have investigate the utility of linguistic features for
detecting the sentiment of Twitter messages. We evaluate the usefulness of
existing lexical resources as well as features that capture information about
the informal and creative language used in microblogging. We take a
supervised approach to the problem, but leverage existing hashtags in the
Twitter data for building training data.
• Hassan Saif, Yulan He and Harith Alani, Knowledge Media Institute, The
Open University, United Kingdom in a paper Semantic Sentiment Analysis
of Twitter in Nov 2012 they have introduce a novel approach of adding
semantics as additional features into the training set for sentiment analysis.
For each extracted entity (e.g. iPhone) from tweets, we add its semantic
concept (e.g. “Apple product”) as an additional feature, and measure the
correlation of the representative concept with negative/positive sentiment.
• Subhabrata Mukherjee1, Akshat Malu1, Balamurali A.R.12, Pushpak
Bhattacharyya1,1Dept. of Computer Science and Engineering, IIT Bombay,
2IITB-Monash Research Academy, IIT Bombay on a paper on TwiSent: A
Multistage System for Analyzing Sentiment in Twitter in Feb 2013 they
have presented TwiSent, a sentiment analysis system for Twitter. Based on the
topic searched, TwiSent collects tweets pertaining to it and categorizes them
into the different polarity classes positive, negative and objective. However,
analyzing micro-blog posts have many inherent challenges compared to the
other text genres.
• Isaac G. Councill, Ryan McDonald, Leonid Velikovich, Google, Inc., New
York on a paper on What’s Great and What’s Not: Learning to Classify the
Scope of Negation for Improved Sentiment Analysis in July 2010 presents
a negation detection system based on a conditional random field modelled
using features from an English dependency parser. The scope of negation
detection is limited to explicit rather than implied negations within single
• An aspect of social media data such as Twitter messages
is that it includes rich structured information about the
individuals involved in the communication .
• It can lead to more accurate tools for extracting semantic
• It provides means for empirically studying properties of
social interactions.
• Freely available, annotated corpus, Pre-written Classifier
Codes in Python using NLTK that can be used in NLP in
order to promote research that will lead to a better
understanding of how sentiment is conveyed in tweets and
Graphical Representation of the sentiment
Using Google Charts API graphical representation is shown as above.
We have used Baseline method and in-built classifiers from NLTK: Naive Bayes,
maximum entropy.
1. Baseline
Baseline approach is to use a list of positive and negative keywords. For this we
use Twittratr's list of keywords, which is publicly available. This list consists of
444 positive words and 588 negative words. For each tweet, we count the number
of negative keywords and positive keywords that appear. This classifier returns the
polarity with the higher count. If there is a tie, then positive polarity (the majority
class) is returned.
2. Naive Bayes
Naive Bayes is a simple model which works well on text categorization. We use a
multinomial Naive Bayes model.Class c* is assigned to tweet d, where
3.Maximum Entropy
● Maximum entropy classifiers are commonly used as alternatives to naive
Bayes classifiers because they do not assume statistical independence of the
random variables (commonly known as features) that serve as predictors.
● However, learning in such a model is slower than for a naive Bayes classifier,
and thus may not be appropriate given a very large number of classes to learn.
● Learning in a Naive Bayes classifier is a simple matter of counting up the
number of co-occurrences of features and classes, while in a maximum
entropy classifier the weights, which are typically maximized using maximum a
posteriori (MAP) estimation, must be learned using an iterative procedure.
System Requirements, Libraries & Languages used :-
* Linux Operating System (Ubuntu Prefered)
* Python 3.0 or above
* NLTK Package
* WebPy Framework Package
* Modern Web Browser
* HTML, CSS, JavaScript
* Twitter API, Google API
Code Snippets:-
Preprocessing the tweets:
#start process_tweet
def process_tweet(self, tweet):
#Conver to lower case
tweet = tweet.lower()
#Convert https?://* to URL
tweet = re.sub('((www.[s]+)|(https?://[^s]+))','URL',tweet)
#Convert @username to AT_USER
tweet = re.sub('@[^s]+','AT_USER',tweet)
#Remove additional white spaces
tweet = re.sub('[s]+', ' ', tweet)
#Replace #word ord(c)
tweet = re.sub(r'#([^s]+)', r'1', tweet)
tweet = tweet.strip()
#remove first/last " or 'at string end
tweet = tweet.rstrip(''"')
tweet = tweet.lstrip(''"')
return tweet
Classifying the tweets:-
#start processing each tweet
for i in self.tweets:
tw = self.tweets[i]
count = 0
res = {}
for t in tw:
neg_words = [word for word in negative_words if(self.string_found(word, t))]
pos_words = [word for word in positive_words if(self.string_found(word, t))]
if(len(pos_words) > len(neg_words)):
label = 'positive'
self.pos_count[i] += 1
elif(len(pos_words) < len(neg_words)):
label = 'negative'
self.neg_count[i] += 1
if(len(pos_words) > 0 and len(neg_words) > 0):
label = 'positive'
self.pos_count[i] += 1
label = 'neutral'
self.neut_count[i] += 1
Finalizing the Result and Output:-
* We make use of Google Chart Tools to show the sentiment in graphical
* Google Chart Tools provide a perfect way to visualize data on any host.
From simple line charts to complex hierarchical tree maps, the chart galley
provides a large number of well-designed chart types.
* We make use of Pie Chart and Line Chart
• Applications to Review-Related Websites
-Movie Reviews, Product Reviews etc.
• Applications as a Sub-Component Technology
-Detecting antagonistic, heated language in mails, spam
detection, context sensitive information detection etc.
• Applications in Business and Government Intelligence
-Knowing Consumer attitudes and trends
• Applications across Different Domains
-Knowing public opinions for political leaders or their
notions about rules and regulations in place etc.
• Real-time sentiment analysis of social media user content has become
increasingly critical for organizations to master in order to predict
market trends, analyze consumer opinions, and remain competitive.
Classifier Accuracy:-
We conclude that using different NLTK classifier it is easier to
classify the tweets and more we improve the training data set
more we can get accurate results.
Future Work:-
We look forward to use bigger dataset to improve the accuracy,
considering the emoticons and internationalization.
[1]. Aditya Joshi, Balamurali A.R., Pushpak Bhattacharyya,
2010, A Fall-Back Strategy
for Sentiment Analysis in a New Language: A Case Study for
Hindi, ICON 2010,Kharagpur,India
[2]. Alec, G.; Lei, H.; and Richa, B. Twitter sentiment
classification using distant supervision Technical report,
Standford University. 2009
[3]. http://help.sentiment140.com/for-students
[4]. http://www.gbsheli.com/2009/03/twitgraph-en.html
[5]. http://en.wikipedia.org

