Thesis Chapterwise

Chapter-1
INTRODUCTION
1.1 Introduction to Data Mining
Data Mining is the investigation periods of the "data discovery in documents" a

method for deciding plans in enormous information accumulations including
approaches at the association of reenacted knowledge, machine learning and record
frameworks. The total goal of the data mining methodology is to mine data from
information accumulations and change over it into a sensible setup for extra utilize.
Data mining is a predominant new ability with extreme planned to help enterprises
accentuation on the more fundamental material in their information stores. Data
mining devices estimate up and coming inclinations and exhibitions, allowing
enterprises to make data centered judgments. Data mining apparatuses would reply
expert interrogations be able to that for the most part were excessively time killing,
making it impossible to choose. They wash records for concealed examples,
disclosure investigative data that authorities may preclude as it trickeries outside to
their expectations.
The word is a logical inconsistency, as the point is the illustration out of plans and
data from awesome amount of information, not simply the withdrawal of information.
Data mining techniques have dynamically been considered, solely in their utilization
in genuine worldwide databases.
Data mining is an ordinary advance of the enhanced utilization of electronic stores to
gather information and convey answers to business master.
1.1.1 Process of Data mining
 Data mining includes of various stages. Data mining is an imperative stage in

the technique of data finding. Following are the rundown of stages in the data
recognition process:
 Data Integration: Firstly every one of the information is made and
consolidated from all the distinctive sources.
 Data Selection: As every one of the information gathered by the client isn't
completely required. Here we pick the information which we consider advantageous
for data mining.
1
 Data Cleaning: The information we have warehoused isn't spotless. This may
contain blunders, lost esteems or dishonest information. So we need to put on various
techniques to get free of such irregularities.
 Data Transformation: Modification of the information into the shape that is
required for mining operations is called information change.
 Data Mining: Is comprises of different procedures that can be utilized to
discover different in secret arrangements or likenesses in the given dataset.
 Pattern Evaluation and Knowledge Presentation: This progression incorporates
taking out or expelling the copy designs from the examples we created.
 Decisions/Use of Discovered Knowledge: This progression causes the client to
settle on choices on the data that is gathered.
The below diagram shows the procedure of knowledge discovery:
Figure 1.1: Process of data mining [1]

1.1.2 Applications of Data Mining
There is a tremendous measure of information accessible in the Information Industry.
It is important to dissect this tremendous measure of information and concentrate
helpful data from it. Information mining is exceptionally helpful in the accompanying
spaces:
 Market Analysis and Management
 Data mining figures out what sort of individuals purchase what sort of items.
 Data mining helps in recognizing the best items for various clients.
 Data mining helps in deciding client buying design.
2
 Financial Data Analysis
 Credit card spending by client gatherings can be recognized by utilizing
information mining.
 The concealed relationship among disparate financial markets can be
uncovered by utilizing information mining
 Data mining in health care and insurance
 Data mining is utilitarian in finding the fields in the therapeutic that can be
asserted together.
 Data mining advantages to figure what of the customers may buy the new
approaches.
 Data mining grants protection ventures to see indeterminate buyers' execution
game plans.
 Data mining in medicine
 Data mining breaks down the patient infection history so as to distinguishing
their up and coming visits to the healing center.
 It helps in finding the likenesses between the fruitful restorative medicines
among the diverse sicknesses of the patients.
 Data mining explores the patient past sickness history with a specific end goal
to discover the shot of new issues.
 Data mining in telecommunication industry
 Data mining helps in perceiving the media transmission arrangements
 Data mining get misleading occasions and recoup estimation of
administration.
 Fraud detection
 In counterfeit phone calls to the clients information mining distinguishes the
wellspring of call and its term.
 It likewise examines the examples that go astray from expected standards.
 Scientific Applications
 Data mining may help researchers in ordering and sectioning information.
 Identify new cosmic system via seeking sub bunches.
1.1.3 Data Mining Hierarchical Model

Various capable ways are existing to store the gigantic volumes of information,
computational procedures and models are required to separate the concealed examples
3
and learning. These strategies and instruments are utilized to change the information
into helpful data, to make advertise investigation, misrepresentation discovery and
discover the client expectations and so forth. These methods are all things considered
known as the Data Mining or at times perceived as Knowledge Discovery in
Databases. An entire progressive model for information mining is appeared in Fig 1.2.
Figure 1.2: Data Mining Hierarchical Model
The Text mining is a ground that is utilized to distinguish the advantageous data in the
literary archives or records. The content can be in any shape or in any dialect that can
be English, Punjabi, Hindi and numerous others. Web mining is the technique to
gather the useful information from the sites or online audits. It is difficult to gather or
investigate the online data in light of the fact that a lot of data is accessible online to
manage. Web mining is isolated into 3 sub parts. Web use mining is procedure to
discover the use of any sites i.e. how as often as possible the clients utilize some
specific site. Web structure mining is the strategy to discover the general structure of
the online destinations or web journals. Web content mining is the regularly utilized
territory these days. It is utilized to discover the valuable data from the real substance
or material that is composed on the sites which can be in any frame like tweets,
remarks, audits of various clients. Web content mining additionally ordered into
Opinion mining or slant examination. Opining mining is the further advance in the
Web content mining. The distinction between these two is that web content mining
4
just gathers the information from the web destinations while the sentiment mining
discover the point of view of open towards a particular subject or region.
1.2 Introduction to Sentiment Analysis
Sentiment Analysis or Opinion mining is the method for finding or hauling out the
sentiments and feelings of people to correct regions of consideration. It might be a
thing or a film, surveys of people truly matters. These surveys additionally influence
some other individual's approach making process.[5] In the event that a purchaser
desires to acquire another question, at first he would get a handle on the assessments
or remarks of different people. Contingent on the extremity of surveys he chooses
whether to purchase the item or not. Social collaborating sites, for example,
Facebook, twitter are where characters put their status or sentiments. People tweet on
their twitter account concerning any correct subject of their consideration. Conclusion
examination is utilized to conjecture the share trading system, to anticipate the
aftereffect of specific surveys, to distinguish the adequacy of any item or in some
more.
Feeling examination is a training to sort the demeanor of the person that might be
communicated as tweets. Tweets can be named positive, negative or nonpartisan[2].
For instance, the tweet "I am exceptionally cheerful today since I bested in my
interview" is a positive content and the content "I loathe this" is a negative content.[7]
Consider another case "robot is great film I recommend everyone to watch this motion
picture", plainly client survey is absolutely positive towards the motion picture robot.
Incidentally it is difficult to decipher whether the tweet is certain or negative, at that
point we call the tweet as impartial. "Robot isn't terrible however I don't comprehend
why individuals put it as number one film" these sorts of tweets considered as
impartial. The tweets given above are about the specific theme which a motion picture
is named Robot.
Twitter is a predominantly as often as possible utilized long range interpersonal

communication site that gives its clients to refresh a 140 characters status. It stores a
colossal measure of informational collection about the particular theme. WWW i.e.
Internet made it less demanding for individuals to share their thoughts over the web.
Assessment investigation basically makes utilization of common dialect taking care of
5
and content preparing to finish the whole undertaking i.e. to recognize the supposition
of the general population. For instance, in the event that one needs to know - if the
elections of Punjab are doing the activity legitimately or not? The greatest technique
to answer this is seeing any interpersonal interaction site. It is anything but difficult to
get some answers concerning the work done by Punjab election by survey the tweets
of client. In any case, the issue is that there countless how we perceive that what
numbers of individuals are sure or negative towards the Punjab election. The
overwhelming plausible answer is to utilize estimation investigation on the tweets and
discover what individuals say in regards to Punjab election.
1.2.1 Components of Sentiment Analysis
The main components of opinion mining or sentiment analysis are as follows:
Sentiment Holder: It is the individual who is giving the conclusion about

some subject. It might any association that is giving data or view point about
something. In online audits conclusion holder is the individual which is composing
the surveys, remarks.
Sentiment Object: It is the thing about which the supposition is given by
some assessment or feeling holder.
Sentiment Orientation: It is the grouping or notion examination of the slant.
It might be certain, negative or nonpartisan relying on the information in the
supposition.
Segments of notion examination are appeared in fig 1.4.
Components Of Sentiment Analysis

E.g. Modi Say That The India Is Great Country
Sentiment Sentiment Sentiment

Holder Object Orientation
(Country Views) (Positive)
(Modi)
Fig 1.3: Components of Sentiment analysis
6
1.2.2 Levels of Sentiment Analysis
The procedure of assumption investigation should be possible in mostly 3 levels:
1) Document Level: The whole record or document is considered for slant

investigation. The sentiment about the entire record is recognized whether it is certain,
negative or impartial.
2) Sentence Level: Each sentence is independently regarded and delegated positive,

negative or impartial.
3) Feature Level: It is otherwise called viewpoint level characterization. In this the

supposition is improved the situation the some particular highlights from the record.
This level manages specific highlights.
1.2.3 Classification of Sentiment Analysis
Up position investigation basically ordered into 3categories which are as given

beneath:
1) Positive Sentiment: It is the gathering of good or positive words in the

supposition. On the off chance that the amount of good contentions more noteworthy
than before it is referenced as a Positive assessment. For instance, if audits of an item
have more positive remarks then it is certain to be purchased by numerous clients.
2) Negative Sentiment: If the negative words are available in the survey then the
audit is called negative opinion. For instance, if the aggregate audits or tweets about
any item have more adverse surveys then the item isn't so helpful then it is purchased
by less number of individuals.
3) Neutral Sentiment: If the tweet is neither considered as negative nor positive

tweet then it is dealt with as impartial feeling in the slant investigation process.
The supposition “Robot- The motion picture was great" contains a positive word
marvelous so it is sure. "I watched this film" is a nonpartisan opinion and "This was
the most exceedingly terrible motion picture ever" contain the negative word most
exceedingly terrible, so it is negative notion as appeared in figure 1.3.
7
MOVIE WAS AMAZING MOVIE WAS WORST I WATCHED MOVIE
(POSITIVE) (NEGATIVE) (NEUTRAL)
Fig 1.4: Positive, Neutral and Negative sentiments (POSITIVE)
1.2.4 Techniques for Sentiment Classification
Sentiment analysis can be done through 2 types of procedures as below:
1. Sentiment arrangement utilizing regulated learning: Supervised learning is

actualized by making a classifier. It requires two arrangements of reports for order
one is preparing set other is trying set. This strategy is otherwise called machine
learning technique. The classifier is prepared by the illustrations which are physically
named.
2. Sentiment arrangement utilizing unsupervised learning: In the
unsupervised order the content is characterized by contrasting it and given words or
dictionaries. The feeling an incentive for these words or dictionaries is already
characterized. The report is checked and contrasted and positive and negative words.
1.2.5 Application Areas

Following are the different regions where conclusion examination can be
utilized:
o E-trade: Many sites give outline of their items that enable clients to present
their perspectives. These perspectives are useful for both different clients and the item
producers.
8
o Voice of client (VOC): It is the statistical surveying system that characterizes
client needs and desires. Subsequently they characterize the unwavering quality of the
items.
o Government: Government can see its qualities and shortcomings by general
society audits on the social sites on different social issues.
o Marketing: Sentiment examination encourages the item fabricates to discover
which clients are faithful and which are not and how to make new clients their
dedicated clients.
o Politics: Before the real decision result the perspective of open can be
dissected by their remarks or surveys on the online networking. A no. of voting
applications is accessible in market to break down the perspective of open.
o Blog investigation: Sentiment examination can be successfully used to mine
disputes in dialogs and open deliberation discussions. It can be connected to dissect
blog entries and perform subjectivity.
o Stock Market Prediction: Sentiment examination can be productively used
to conjecture occasions in securities exchange.
1.3 Twitter
Twitter is online news and person to person communication benefit where clients post
and interface with messages, known as "tweets." These messages were initially
confined to 140 characters, yet on November 7, 2017, the breaking point was
multiplied to 280 characters for all dialects aside from Japanese, Korean and Chinese.
Registered clients can post tweets, however the individuals who are unregistered can
just read them[20] . Clients get to twitter through its site interface, Short Message
Service (SMS) or cell phone application programming ("app").Twitter. A large
number of individuals publicize their enlisting administrations, their counseling
organizations, their retail locations by utilizing Twitter. Also, it works.
The advanced web keen client is worn out on a TV commercial. Individuals today
incline toward promoting that is quicker, less meddling, and can be turned on or off
freely. Twitter is precisely that. In the event that you figure out how the subtleties of
tweeting work, you can get great publicizing comes about by utilizing Twitter.
9
Twitter is a mix of texting, blogging, and messaging, yet with brief substance and an
exceptionally expansive group of onlookers. In the event that you favor yourself
somewhat of an author with a comment, at that point Twitter is unquestionably a
channel worth investigating. On the off chance that you don't prefer to compose
however are interested about a big name, a specific side interest subject, or even a
missing cousin, at that point Twitter is one approach to interface with that individual
or theme.
Tweets
Tweets are freely unmistakable as a matter of course, yet senders can confine message
conveyance to only their devotees. Clients can tweet by means of the Twitter site,
perfect outer applications, (for example, for cell phones), or by Short Message Service
(SMS) accessible in certain countries. Users may buy in to other clients' tweets—this
is known as "following" and endorsers are known as "followers" or "tweets", a
portmanteau of Twitter and peeps. Individual tweets can be sent by different clients to
their own encourage, a procedure known as a "retweet". Clients can likewise "like"
(once "top choice") individual tweets. Twitter enables clients to refresh their profile
by means of their cell phone either by content informing or by applications discharged
for certain cell phones and tablets.
As an interpersonal organization, Twitter spins around the standard of devotees.

When you take after another Twitter client, that client's tweets show up backward
sequential request on your principle Twitter page. On the off chance that you take
after 20 individuals, you'll see a blend of tweets looking down the page: breakfast-
grain refreshes, fascinating new connections, music proposals, even insights on the
fate of training.
Twitter is a well known stage as far as the media consideration it gets and it in
this manner draws in more research because of its social status
Twitter makes it less demanding to discover and take after discussions (i.e., by
the two its pursuit include and by tweets showing up in Google list items)
Twitter has hash tag standards which make it less demanding get-together,
arranging, and extending looks when gathering information
Twitter information is anything but difficult to recover as real episodes, news
stories and occasions on Twitter are have a tendency to be based on a hash tag
10
The Twitter API is more open and available contrasted with other web-based
social networking stages, which makes Twitter better to designers making apparatuses
to get to information. This subsequently expands the accessibility of apparatuses to
analysts.
Numerous analysts themselves are utilizing Twitter and as a result of their
good individual encounters, they feel greater with inquiring about a commonplace
stage.
11
Chapter-2
Literature Review
Introduction
Data mining techniques offer a standard & great tool set to produce numerous data
focused organization systems. This review of literature emphases on how data mining
methods are used for different use regions for discovery out significant arrangement
from the database.
Related Work
During the age of time, reading certain of the research papers has been done which is
summarized as below:
Guoning Hu, Preeti Bhargava, Saul Fuhrmann, Sarah Ellinger and Nemanja
Spasojevic (2017) [1] Analyzing users’ sentiment towards popular consumer
industries and brands on Twitter, Online networking fills in as a brought together
stage for clients to express their considerations on subjects running from their
everyday lives to their conclusion on shopper brands and items. These clients use a
huge impact in molding the suppositions of different customers also, impact mark
observation, mark steadfastness and mark support. In this paper, we dissect the
supposition of 19M Twitter clients towards 62 well known ventures, enveloping
12,898 undertaking and customer brands, as well as related topic subjects, by means
of estimation examination of 330M tweets over a period crossing a month. We
observe that clients have a tendency to be best towards fabricating and most negative
towards benefit ventures. Furthermore, they have a tendency to be more positive or
negative while collaborating with brands than by and large on Twitter. We likewise
find that notion towards brands inside an industry changes enormously and we
illustrate this utilizing two enterprises as utilize cases. What's more, we find that there
is no solid relationship between's theme estimations of various enterprises, illustrating
that theme feelings are profoundly reliant on the setting of the business that they are
specified in. We exhibit the estimation of such an investigation all together to evaluate
the effect of brands via web-based networking media. We trust that this underlying
examination will demonstrate profitable for both analysts and organizations in
understanding clients' recognition of businesses, marks and related points and
energize more research in this field.
12
Ankita Gupta, Jyotika Pruthi , Neha Sahu (2017) [2] Sentiment Analysis of Tweets
using Machine Learning Approach , Slant Analysis goes under investigation inside
Natural Language preparing. It helps in finding the conclusion or sentiment covered
up inside content. This exploration concentrates on discovering conclusions for twitter
information as it is all the more difficult because of its unstructured nature,
constrained size, and utilization of slangs, incorrectly spells, shortened forms and so
forth. The majority of the scientists managed different machine learning
methodologies of slant examination and think about their results yet utilizing different
machine learning approaches in mix have been underexplored in the writing. This
exploration has discovered that different machine learning approaches in a half and
half way gives better outcome when contrasted with utilizing these methodologies in
disconnection. Besides as the tweets are exceptionally crude in nature, this
examination makes utilization of different preprocessing steps so we get helpful
information for contribution to machine learning classifiers. This examination
essentially concentrates on two machine learning calculations K-Nearest Neighbors
(KNN) and Support Vector Machines (SVM) in a half and half way. The expository
perception is acquired as far as order exactness and F-measure for every assumption
class and their normal. The assessment investigation demonstrates that the proposed
crossover approach is better both regarding exactness and F-measure when contrasted
with singular classifiers.
L.Jaba Sheela (2016) [3] A Review of Sentiment Analysis in Twitter Data Using
Hadoop, Twitter is an online interpersonal interaction website which contains rich
measure of information that can be organized, semi-organized and un-organized
information. In this work, a technique which performs grouping of tweet notion in
Twitter is talked about. To enhance its versatility and proficiency, it is proposed to
actualize the work on Hadoop Ecosystem, a generally received circulated preparing
stage utilizing the Map Reduce parallel preparing worldview. At long last, broad tests
will be directed on genuine informational collections, with a desire to accomplish
practically identical or more prominent exactness than the proposed systems in
writing.
Komal Sutar, Snehal Kasab , Sneha Kindare, Pooja Dhule (2016) [4] Sentiment
Analysis: Opinion Mining of Positive, Negative or Neutral Twitter Data Using
Hadoop, Person to person communication Service (SNS), is a stage to give social
13
relations among people who share basic intrigue. Twitter has turned out to be
exceptionally well known. Millions of clients post their remarks on twitter; they
indicate their see on current issues. Day by day substantial measure of line
information is accessible and which can be useful for mechanical or business reason.
Consequently the twitter information can be investigated and utilized for various
organizations which will accommodate for choice making. This paper gives a method
for investigation of twitter information utilizing AFFIN, EMOTICON for regular
dialect preparing. To store, classifications and process expansive assessments we are
utilizing Hadoop an open source system.
B. M. Bandgar, Dr. S. Sheeja (2016) [5] Analysis of real time social tweets for
opinion mining, We built up the indigenous Windows based easy to understand
application in Java to concentrate, process and group the genuine time informal
organization tweet utilizing unstructured models. The significant continuous tweets
are acquired and the same is utilized for nostalgic examination. The prepared
significant tweets are ordered into three distinctive supposition mining classes
positive, negative and unbiased by utilizing unstructured calculations, for example,
EEC, IPC and SWNC demonstrate. The SWNC Model gave better comes about
finished the EEC and IPC show. Their outcomes are thought about utilizing the
perplexity framework, exactness and precision parameters. The outcomes are likewise
envisioned utilizing pie diagram.
Syed Akib Anwar Hridoy, M. Tahmid Ekram, Mohammad Samiul Islam, Faysal
Ahmed and Rashedur M. Rahman(2015) [6] Localized twitter opinion mining
using sentiment analysis, Examination of open data from online networking could
yield intriguing outcomes and experiences into the universe of general assessments
about any item, administration or identity. Informal community information is a
standout amongst the best and precise markers of open feeling. In this paper we have
examined a procedure which permits use also, elucidation of twitter information to
decide general suppositions. Examination was finished on tweets about the iPhone 6.
Highlight particular popularities and male– female particular examination has been
incorporated. Blended suppositions were found yet broad consistency with outside
surveys and remarks was watched.
14
Emma Haddi (2015) [7] Sentiment Analysis: Text Pre-Processing, Reader Views
And Cross Domains, Opinion investigation has developed as a field that has pulled in
a huge sum of consideration since it has a wide assortment of uses that could profit by
its comes about, for example, news examination, advertising, question replying,
learning administration et cetera. This region, be that as it may, is still right off the bat
in its improvement where earnest upgrades are required on many issues, especially on
the execution of slant characterization. In this proposal, three key testing issues
influencing slant characterization are plot and inventive methods for tending to these
issues are displayed. To start with, content pre-preparing has been discovered
essential on the slant grouping execution. Thusly, a blend of a few existing
preprocessing techniques is proposed for the notion characterization process. Second,
content properties of money related news are used to fabricate models to foresee
opinion. Two unique models are proposed, one that utilizations money related
occasions to foresee budgetary news notion, and alternate uses another intriguing
point of view that considers the assessment peruser see, rather than the great approach
that inspects the supposition holder see.
Prerna Chikersal (2015) [8] Modeling Public Sentiment in Twitter, Individuals

regularly utilize web-based social networking as an outlet for their feelings and
sentiments. Breaking down web-based social networking content to separate feeling
can help uncover the considerations and suppositions individuals have about the
world they live in. This theory adds to the field of Sentiment Examination, which
plans to see how individuals pass on opinion to eventually conclude their feelings and
sentiments. While a few assessment arrangement techniques have been contrived, the
expanding greatness and unpredictability of social information calls for examination
what's more, headway of these strategies. The extent of this task is to enhance
customary administered learning techniques for Twitter extremity recognition by
utilizing principle based classifiers, etymological examples, and presence of mind
learning based data.
Pragya Tripathi, Santosh Kr Vishwakarma, Ajay Lala (2015) [9] Sentiment

Analysis of English Tweets Using RapidMiner, Person to person communication
locales nowadays are incredible wellspring of correspondence for web clients. So
these are critical hotspot for understanding the feelings of individuals. In this paper,
we utilize information digging methods for the motivation behind order to perform
15
slant examination on the perspectives individuals have partaken in Twitter. We gather
dataset, i.e. the tweets from twitter that are in natural dialect and apply content mining
methods – tokenization, stemming and so forth to change over them into valuable
shape and after that utilization it for building estimation classifier that can foresee
upbeat, miserable and impartial slants for a specific tweet. Fast Miner instrument is
being utilized, that aides in building the classifier and additionally ready to apply it to
the testing dataset. We are utilizing two unique classifiers and furthermore contrast
their outcomes all together with find which one gives better outcomes.
Ion Smeureanu , Cristian Bucur (2012) [10] Applying Supervised Opinion Mining
Techniques on Online User Reviews, As of late, the breathtaking advancement of web
advances, prompt a tremendous amount of client produced data in online frameworks.
This extensive measure of data on web stages make them suitable for use as
information sources, in applications in light of supposition mining and conclusion
examination. The paper proposes a calculation for identifying opinions on film client
surveys, in view of gullible Bayes classifier. We make an investigation of the feeling
mining area, procedures utilized as a part of conclusion examination and its
appropriateness. We executed the proposed calculation and we tried its execution, and
recommended bearings of improvement.
16
Chapter-3
Problem Formulations
3.1 Research gaps
Today is the universe of innovation. For the most part the work is finished utilizing
the web. Web is the new reason for the beginning of learning, shopping and training.
Individuals put their remarks, perspectives or tweets over the web. There is huge
measure of information is accessible on the sites.
With a specific end goal to gather and investigate the information from the online
sites a system is utilized which is known as sentiment mining. It is otherwise called
notion examination or sentiment analysis. It is utilized to gather the client audits from
the place and break down the sentiment of open whether it is positive or negative.
Numerous calculations are accessible to manage slant examination. It should be
possible to discover the sentiment of open towards the new cell phones, motion
picture evaluations, current issues and some more. Thus it is up and coming field that
discovers the individuality of open towards any point. People write their comments
frequently & in shortcuts manner, so it is not possible to judging the comments which
are positive and which are negative & neutral. To know the views of people in right
manner this is the need of today.
3.2 Problem Formulation
Sentiment analysis can be seen as a utilization of content order. The primary
occupation of content gathering is how to stamp writings with a predefined set of
gatherings. Content gathering has been helpful in different zones, for example, article
ordering, content cleaning, word rationale disambiguation, and so on. One of the basic
issues in content gathering is the manner by which to portray the substance of content
in course to give a superior gathering. From looks into in information extraction
frameworks, the most prevalent and compelling path is to demonstrate a content by
the gathering or gathering of terms show up in it. Gigantic quantities of tweets or
surveys are posted by people in general every day. So to distinguish the assessment of
open towards a particular post is by physically perusing and perceive each tweet.
Perusing every last tweet at that point choosing whether it is sure or negative isn't a
simple errand. It is additional tedious. So a technique or calculation is required that
will gather the twitter information at that point procedure it and toward the end gives
an outcome that demonstrates the supposition of open towards that particular post.
17
Accordingly this will help the general population to get the perspective of open
towards a specific subject or item. So a calculation for assumption investigation ought
to be executed to get powerful precision of foreseeing general feeling. People use very
awkward words to express their feelings & most of the people use shortcuts e.g.osm
for awesome, lol for laughing out loud & many more, so this is sometime creating
difficulty for the person who is not familiar with these words. They can’t recognize
the sentiments of the person.
Today is the universe of innovation. For the most part the work is finished utilizing
the web. So web is the new reason for the wellspring of stimulation, learning,
shopping and training. Individuals utilize the web for each errand and work. They put
their remarks, perspectives or tweets over the web keeping in mind the end goal to
impart their perspective to the next open. So there is gigantic measure of information
is accessible on the sites. With a specific end goal to gather and investigate the
information from the online sites a system is utilized which is known as sentiment
mining. It is otherwise called notion examination or sentiment mining. It is utilized to
gather the client audits from the locales and break down the sentiment of open
whether it is certain or negative. Numerous calculations are accessible to manage
slant examination. Sentiment mining helps in anticipating securities exchange
exercises. It should be possible to discover the sentiment of open towards the new cell
phones, motion picture evaluations, Current issues and some more. Thus it is up and
coming field that discovers the disposition of open towards any point. People write
their comments very frequently & in shortcuts manner so it is not possible to judging
the comments which are positive which are negative & neutral. To know the views of
people in right manner this is the need of today.
3.3 Research Objectives
The goals of this examination are as per the following:
To review & explore sentiments of users in tweets.

To create a tweets database.
To preprocess (slang words) & mine the collected data.
To build up a proficient calculation for feeling investigation.
To predict the sentiment of tweets.
18
3.4 Methodology
Sentence level classification is used to analyze the tweets. For the purposes of the
research, it defines sentiment to be "a personal positive or negative feeling."
Data Collection, There is no current data indexes of Twitter assumption messages. It

gathered its own set of data. For the preparation data, it gathered messages that
contained the emoji’s :) furthermore, :( through the Twitter API. The test information
was manually. An arrangement of 98 negative tweets and 78 positive tweets were
manually checked. A web interface instrument was worked to help in the manual
arrangement undertaking. The dictionary will be creating for the positive and
Negative words. The tweets will be collect and store. Data pre-processing methods
will be applied. The algorithm will be applied to the tweets to analyze their sentiment.
Some of the devices have been tried and utilized by researchers over various years,
and most by far of these predominantly handle information from Twitter. It is pleasant
to have scholastic and social listening apparatuses to recover information from other
online networking stages, for example, Facebook, Instagram, and Amazon, and
furthermore dull web-based social networking stages, for example, WhatsApp. Be
that as it may, this may not be conceivable in light of the fact that these applications
are not liable to give the majority of their information to designers as Twitter does.
Additionally, there might be moral ramifications of getting to information from dim
web-based social networking stages.
In addition, there are various propelled information investigation and factual

applications which can be utilized to break down online networking information, for
example,
 R
 SPSS
 Weka
 Programming
It should start to make inquiries with respect to the kinds of research made
conceivable by utilizing devices that don't require end clients to hold specialized
learning. Besides, it should try to better comprehend the sorts of inquiries more
19
specialized instruments can address. Therefore, engineers of apparatuses should look
to liaise with social researchers at the advancement stage, to take into account the
likelihood of new highlights in light of sociologies inquire about inquiries.
The research follows the steps:-
1. The data will be collected from tweets about some specific topic.
2. The tables of database are created; it contains the positive & negative words.
3. The tweets will scored with some numbered values i.e.1 for positive tweet,-1
for negative tweets & 0 for neutral tweets.
4. Data filtering will be performing to remove the unnecessary data from tweets
e.g.URLs, usernames, duplicate & repeated characters.
20
5. The slang words (e.g.lol means laughter out loud) will be changed into actual
words.
6. The words with Negation (never, not, nor etc) will be handle.
7. The single tweets will perform the words which will analyze& compare with
the database.
8. Sentiments will be shown graphically.
The complete detail of the steps is given in following steps:
3.4.1) Create Dictionary: Make a dictionary of the positive and negative words. Two
different tables are created in the sentiment database one for positive words and other
for negative words. Firstly made a dictionary of Positive and Negative words.
Table 3.1: Database table
Table Name Field Name Data Type
NegWords Nwords Varchar
PosWords Pwords Varchar
Tweet Varchar
Tweets Database
Sentiment int
Table 3.2: Positive words table

Pwords
awesome
gorgeous
happy
beautiful
good
Table 3.3: Negative words table
21
Nwords
hate
destroy
bad
damage
hurt
3.4.2) Tweets Collection: The tweets are collected from the twitter. Firstly one have
to create a twitter account then login to that account to collect the tweets. SQL
database is used to store the tweets. www.sentiment140.com website is used to collect
the tweets. Manually assign the sentiment to each tweet i.e. 0 to neutral tweet, 1 to
positive tweet and -1 to negative tweet.
Table 3.4: revolution sentiment score database table
Sentiment
Sentiment Source Tweet Score
Scary that we are not yet out of

sentiment140 the thoughtless decisions and -1
poor execution #gst #revolution

And someone says #revolution
wasn't a good move by Modi! I
sentiment140 will repeat it was the best step 1
taken by Modi Government so

far!
sentiment140 Revolution Happened in India. 0
If
Tweet is positive, then Assign Sentiment Score=1
Tweet is Negative, then Assign Sentiment Score=-1
Tweet is Neutral, then Assign Sentiment Score=0
22
3.4.3) Data Pre-Processing: The Preprocessing is done on the retrieved tweets.
3.4.3.1) Filtering: Filtering helps to create a single data structure that is used by the
user for creating single mining method. It helps to use only single or some specific
part of document not the whole document. Hence, it reduces the load to carry the
whole data. Filters can be used in many ways. Some of them which are used are as
follows:
URLs: The tweets collected from the twitter contain some links or URLs which are
not used in estimating the sentiment of the tweets. These links does not have any link
with actual sentiment. So, these links are replaced by the empty space.
Usernames: Sometimes user in tweets refers to other users so they refer to them by
using @ symbol before their name. These names also do not affect the sentiment so
replaced by empty space.
Duplicate or Repeated characters: Users sometimes use casual language in tweets.
For example, users mostly write 'baaaaaaad' in place of bad word. But actually this the
same word bad. Sometimes they write 'happppppppppy'' instead of happy. The more
than two repeated characters in the document are replaced by only two character
occurrences. Hence happppppy is replaced by happy.
Here, URLs and Usernames are replaced by empty space to decrease the complexity
and time taken by the algorithm to compare each word with database.
Table 3.5: Data filtering
Tweets Having Replaced By
https://t.co//Htxxx Empty Space
@avneet Empty Space
@rupinder Empty space
hhhhaaaappppppy happy
fooooooodddddd food
3.4.3.2) Twitter slag removal: There is less space offered for writing a tweet on
twitter as tweet is only of 140 characters. Hence, most of the users prefer to write
short form of the actual words. The user created short form is called as slang words.
Sometimes public also use some abbreviations. For example, tmrw is used in place of
23
tomorrow, thx in place of thanks. These slang words should be replaced into their
original words. For this a different table is created in dictionary that stores the slang
words.
Table 3.6: Slang removal
Twitter Slang Actual Word
Gud good
Awsm awesome
Fav favorite
Thnx thanks
Bff best friends for ever
Tc take care
Sd sweet dreams
3.4.3.3) Stop words removal: Stop words are the words which are mainly used in
tweets or comments but these does not add to sentiment. Stop words are articles,
prepositions etc. These should be removed from the document and replaced by the
empty space.
3.4.3.4) Negation Handling: There are some words which change the meaning of
sentence these words are known as negation words. Words like never, not, does not,
no, nor are the negation words. If the tweet is positive these words change the
sentiment of tweet to negative. So these are handled with proper method. There are
two cases in negation, which are as follows:
1) Negation word used with positive word and it make it negative: In this, if the
whole sentiment of sentence is positive, but the positive word preceded by negation
then the sentiment of sentence is changed to negative.
"Story of serial is good"
This sentence gives the positive sentiment as the positive word good is present here.
Now consider the case:
"Story of serial is not good"
24
This sentence has negation word 'not', which changes the sentiment of sentence to
negative sentence.
2) Negation word used with negative word and make it positive: In this, if the
whole sentiment of sentence is negative, but the negative word preceded by negation
then the sentiment of sentence is changed to positive.
"Story of serial is bad"
This sentence gives the negative sentiment as the negative word bad is present here.
Now consider the case:
"Story of serial is not bad"
This sentence has negation word 'not', which changes the sentiment of sentence to
positive sentence.
3.4.3.5) Stemming: It is the process toconvert the words into their original form.
Sometimes users use the stemmed words for the original words which should be
replaced by actual words. For example, hate, hated, hates, hating all belong to the
single word hate. It will increase the efficiency of the software.
Table 3.7: Stemming
Original word Stemmed word
damaged damage
damages damage
damaging damage
3.4.3.6) Example for Pre-processing of tweets: Following table shows the complete
pre-processing of a tweet and its output.
Table 3.8: Example for tweets pre-processing
@avneetAnd someone says #revolution wasn't a good
move by Modi! I will repeat it was the best step taken by
Actual Tweet Modi Government so far!Happppy. Lol! checkout
https://www.raseerha.com
@avneetand someone says #revolution wasn't a good

Change to
move by Modi! I will repeat it was the best step taken by
Lowercase
modi government so far!happppy. lol! checkout
25
@ravneetand someone says revolution wasn't a good
Remove special move by Modi! I will repeat it was the best step taken by
characters modi government so farhappppy lol checkout

and someone says revolution wasn't a good move by
Remove Modi! I will repeat it was the best step taken by modi
Usernames government so farhappppy lol checkout

Remove URLs Modi I will repeat it was the best step taken by
modigovernment so farhappppy lol! checkout

Remove extra Modi I will repeat it was the best step taken by modi
space government so farhappppy lol checkout
Remove more and someone says revolution was not a good move by
than 2 repeated modii will repeat it was the best step taken by modi
characters government so farhappylolcheckout

Remove slang Modi I will repeat it was the best step taken by modi
word government so farhappylaugh out loud checkout
and says revolution was not good move by modi will

Stop words repeat was best step taken by modi government so far
removal happy laugh out loud checkout
3.4.3.7) Calculating Sentiment Score: Sentiment score is calculated by comparing

the words from the tweets with the dictionary words. If the tweet contains more
positive words than negative then the tweet is treated as positive.
For example,
26
1) iPhone has a difficulty of chargers breaking.
2) iPhones are the greatest phones all the time... i am happy to have an iPhone.
3) iPhone is the most problematical phone.
4) It must be really cool if someone works on iPhone.
These sentences show the tweets about the iPhone. Sentence (1) and (3) are negative
sentence whereas (2) and (4) are positive sentence. As (1) & (2) sentence contain
words like difficulty, breaking, problematical these are negative words so the
sentiment score is negative. Similarly for the (2) & (4) sentence, both are positive.
3.5 Algorithm for sentiment analysis
Problem: A list of tweets collected from twitter; calculate sentiment score for each
tweet.
Input: A tweet from twitter for analysis
Output: Sentiment for each tweet
Algorithm:
1. Select tweet from database.
2. Change tweet to lowercase
tweet.toLowerCase();
3. Replace URL in the tweet with empty space
tweet.replaceAll("https?://\\S+\\s?", "");
4. Replace special characters with empty space
tweet.replaceAll ("[^a-zA-Z0-9@'\\s]"," ");
5. Replace extra space
tweet.trim();
6. Split the tweet into words
String words[]=tweet.split(" ");
27
7. Remove more than 2 repeated characters from string
a) Add one space at end of word

b) Add single unrepeated character to output
c) Compare character with next character
d) Store 2 similar characters to output
e) Discard more than 2 similar characters
8. Repeat step 7 until words.length()
9. Create database connection
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
Connection con=DriverManager.getConnection ("jdbc:odbc:mydsn");
10. Replace slang word with its actual word from database
11. Remove unused words, prepositions, articles from tweet
12. If (negation==1)
If associated next word is positive then increment negative counter
Else if associated next word is negative then increment positive counter
13. PreparedStatementpstmt=con.prepareStatement ("select * from NegWords where

Nwords=?");
ResultSetrs=pstmt.executeQuery();
if rs.next()
Increment NegCounter
Prepared Statementpstmt=con.prepareStatement ("select * from PosWords where

Pwords=?");
ResultSetrs=pstmt.executeQuery();
if rs. next()
Increment Pos Counter
28
14. Repeat step 12 and 13 until words. length()
15. End Loop
16. Result=PosCounter-NegCounter;
17. If result>0, then tweet is positive
else if result<0, then tweet is negative
else, tweet is neutral
17. Calculate Error & Accuracy of Algorithm.
Error=Actual Value-Calculated Value
Accuracy%=((Total Tweets-Error)/Total Tweets)*100
Here, the actual value is human calculated value and calculated value is software
predicted.
29
3.6 Techniques used
1. Sentiment analysis: Sentiment analysis can be done through 2 types of

procedures as below:
1.1 Sentiment arrangement utilizing regulated learning: Supervised learning is

actualized by making a classifier. It requires two arrangements of reports for order
one is preparing set other is trying set. This strategy is otherwise called machine
learning technique.
30
1.2 Sentiment arrangement utilizing unsupervised learning: In the unsupervised
order the content is characterized by contrasting it and given words or dictionaries.
The feeling an incentive for these words or dictionaries is already characterized. The
report is checked and contrasted and positive and negative words.
2. Classification: Classification assigns items in a collection to target categories

or classes. The goal of classification is to accurately predict the target class for each
case in the data.
3.7 Tools used
In addition there are various actual applications which can be utilized to break down
online networking information for example:
1. Netbeans
2. Programming(Java)
3. Microsoft SQL Server Management Studio (database)
3.8 Parameters
1. Accuracy
2. Time
3. Predictor
4. Automation
31
Chapter-4
IMPLEMENTATION
For implementation we have used JAVA language. JAVA is high level object
oriented programming language. Netbeans IDE is used as front end.SQL is used as
the database to store the tweets and the dictionary words. Tweets are collected from
various fields such as cricket, IPhone, Badminton, Qismat song, Ishqbaaz serial and
Bahubali2 movie.
4.1 Netbeans IDE Interface
Netbeans IDE is a user friendly interface to develop JAVA codes. It provides easy
way to create the front end and a proper error handling mechanism. Fig 4.1 shows the
Netbeans IDE interface. By Netbeans JAVA users got a simple drag and drop system
to use any of its tools. To run the project click the green run arrow button on the menu
bar.
Fig 4.1 Netbeans IDE interface
Netbeans features:
 Best support for new Java Technologies
32
 Fast code editing
 Easy project management
 Effective project handling
 Write error free code
4.2 Main Window
Fig 4.2 shows the starting window of the thesis implementation. It consists of 2
buttons and 1 combo box. Combo Box consists of the list of the topics for sentiment
analysis. "Check Sentiment" button is used to run the algorithm on the selected
dataset. "Clear" button clear all the values of the labels and the variables used in the
program. The window also consists of various labels that are used to show the results
of the system. Calculated result field shows the calculated values on the chosen
dataset. Accuracy shows the truthfulness of the given algorithm. Actual result field
shows the no. of actual positive, negative or neutral tweets in the database. Choose the
list item to choose the database for which one wants to apply sentiment analysis. On
clicking "Check sentiment" button the sentiment classification algorithm is applied on
the selected tweets dataset. The results are shown in the respective labels.
Fig 4.2: Main executable window
33
4.3 Dictionary Creation
Dictionary for negative and positive words is created separately using two different
tables in SQL.
4.3.1 Positive Words Dictionary
Fig 4.3 shows the list of positive words that are stored in table.
Fig 4.3 Positive words table
4.3.2 Negative words Dictionary
Fig 4.4 shows the list of negative words that are stored in the database.
34
Fig 4.4 Negative words table
4.4 Slang words table
Sometimes people use their own abbreviations to represent any word. These
abbreviations are called slang words. Fig 4.5 shows the list of slang words in
database.
Fig 4.5 Slang words table
35
4.5 Stop words table
These are the words that are contained by the tweets but these do not affect the
sentiment of the tweets. So these should be removed to save the time of algorithm. Fig
4.6 shows the list of stop words stored in the database.
Fig 4.6 Stop words table
4.6 Tweets dataset
To check the accuracy of the algorithm 6 datasets are created collecting the tweets.
The tweets are collected for following topics:
1) Revaluation tweets
2) Sajjan Singh rangroot movie tweets
3) Padmanmovie tweets
4) KumkumbhagyaHindi serial tweets
These tweets are collected using online tweets collection tool sentiment140. Fig 4.7
shows the screenshot of the tool used. To use with the tool firstly one have to sign in
with his/her twitter account only then the tweets are collected.
36
4.6.1 Revaluation tweets table
74 tweets are collected forrevaluationwhich is shown in Fig 4.9.
Fig 4.9 Revaluation tweets tables
4.6.2 Sajjan Singh Rangroot movie tweets table
45 tweets are collected from twitter for Sajjan Singh Rangroot dataset as shown in fig
4.10.
37
Fig 4.10Sajjan Singh Rangroot tweets table
4.6.4 PadMan Hindi Movie tweets table
Tweets collected for the Hindi movie Pad Manas shown in fig 4.12.
Fig 4.11Padman movie tweets table
38
4.6.4 Kumkum Bhagya Hindi serial tweets table
Tweets collected for the Hindi serial Kumkum Bhagyaas shown in fig 4.12.
Fig 4.12Kum Kum Bhagya movie tweets
4.7 Summary
In this chapter, screenshot of the thesis implementation are properly explained.

Various types of tables that are used in sentiment analysis are also shown. Screenshots
of datasets are also explained.
39
Chapter-5
RESULT AND DISCUSSIONS
The main motive of the research is to develop this algorithm that easily calculates the
sentiment of the tweets collected from the Twitter.
Algorithm is applied on the tweets that are collected for a single day. The efficiency
of algorithm is measured in terms of accuracy rate which is near about 85 %.
5.1. Results for Revaluation Dataset
Total 220 tweets are collected. The Algorithm is applied on them. The software
calculated the sentiment with the efficiency of 42%. Fig 5.1 shows the analysis of the
revaluation tweets. Overall sentiment of tweets shows that the opinion of the public
towards the Revaluation is positive. 43 tweets from the total tweets are calculated
with wrong sentiment.
Fig 5.1 Result of Revaluation tweets
40
Fig 5.2 shows the results of Revaluation dataset into graphical representation.
Revaluation
0%
Positive
31%
Neutral
42%
Negative
27%
Positive Negative Neutral
Fig 5.2 Pie chart for "Revaluation" tweets
5.2. Results for Sajjan Singh Rangroot movie Dataset
Total 120 tweets are collected using the sentiment140 tool. The software calculates
the sentiment with efficiency of 64.44%. Fig 5.3 shows the overall sentiment of the
Sajjan singh rangroot movie. is a Hindi movie. Result show that public opinion
towards the movie is positive. The system retrieves 28 as positive tweets, 2 as
negative tweets and 15 as neutral tweets. Only 16 tweets are analyzed wrong. Lesser
the amount of wrong tweets analyzed more will the accuracy of the system.
41
Fig 5.3 Result of Sajjan Singh Rangroot movie tweets
Fig 5.4 shows the Sajjan Singh Rangroot movie results in graphical manner.
Sajjan Singh Rangroot

0%
Neutral
31%
Positive
Negative
64%
5%
Fig 5.4 Pie chart for "Sajjan Singh Rangroot" movie tweets
42
5.3. Results for Padman movie Dataset
Total 130 tweets are collected and the software calculates the sentiment with
efficiency of 57.14%. Fig 5.5 shows the results that the sentiment of people towards
Padman movie is positive. 9 tweets are analyzed with wrong sentiment. After the
results retrieved positive tweets are 10, retrieved negative tweets are 2 and retrieved
neutral tweets re 9. Lesser the no. of wrong tweets analyzed more will be the accuracy
of the system.
Fig 5.5 Results of Padman movie tweets
Fig 5.6 shows the graphical representation of Cricket tweets results. Blue part
represents the positive tweets, Red part represents the negative tweets and green part
represents neutral tweets. Pie chart shows more Blue part which clearly shows that the
opinion of public towards cricket is positive. Public wants to see the cricket matches
in other words public is fan of cricket.
43
PadMan
0%
Neutral
40%
Positive
47%
Negative
13%
Fig 5.6 Pie chart for "PadMan" tweets
5.4. Results for Kumkum bhagya Hindi Serial Dataset
Total 110 tweets are collected. The Algorithm is applied on them. The software
calculated the sentiment with the efficiency of 55%. It is clear from the Fig 5.7 that
overall sentiment of tweets is positive. 9 tweets are analyzed with wrong sentiment.
Retrieved positive tweets are 9, retrieved negative tweets are 2 and neutral tweets are
9.
44
Fig 5.7 Result of Kumkum bhagya tweets
Fig 5.8 shows the graphical representation of Kumkum Bhagya tweets.
Kum Kum Bhagya

0%
Neutral Positive
45% 45%
Negative
10%
Fig 5.8 Pie chart for "kumkum bhagya" tweets
45
5.5 Accuracy comparison of different datasets
70
64.44
60
53.33
50
50
42.67
40
30
20
10
0
1 2 3 4
Fig 5.13 Bar chart showing Accuracy of different datasets
5.6 Detail of 6 dataset results
35
32
30 29
25 23
20
No. of tweets
20
Positive
15 14 Negative
Neutral
10 9 9
7
6
5
2 2 2
0
Revaluation Sajjan Singh PadMan Kum Kum Bhagya
Rangroot
Fig 5.14 Graphical representation of Results
46
5.7 Summary
In this chapter, output of the sentiment analysis algorithm is shown. The tweets
analysis based on the different datasets is graphically represented in the form of pie
charts or histograms. The comparison of accuracy of different datasets is shown in
table form.
47
Chapter-6
CONCLUSION AND FUTURE SCOPE
6.1 Conclusion
Sentiment Analysis is the emerging field that is mainly used in many application
areas. Its scope is increasing. So a need arise to create or develop an algorithm that
could properly find the sentiment of the public tweets or opinion.
This paper shows a new algorithm that is developed in Java language. The algorithm
is applied on tweets and efficiency is calculated based on the accuracy rate of the
algorithm. The approximate efficiency of the algorithm is 86%.
6.2. Challenges
 Detection of spam tweets.

 Recognize the fake tweets.
 Recognize the co-reference between nouns and pronouns.
6.3. Future scope
The accuracy of algorithm can be checked by taking the comments from other
websites. Evaluation of two or more products or brands is also done for better
performance. A rich lexicon dictionary is created for enhanced processing of the
algorithm. Sentiment analysis can be applied to further more datasets for better
analysis. The work can be extended by collecting the tweets from different blogs and
sites and apply different types of classifiers on the dataset and their accuracy can be
compared to know which classifier is helpful for achieving better efficiency.
48
REFERENCES
[1] Guoning, Hu.,Bhargava, P., Fuhrmann, S., Ellinger, S., (2017), "Analyzing users’
sentiment towards popular consumer industries and brands on Twitter"
arXiv:1709.07434v1 [cs.CL] 21 Sep 2017.
[2] Gupta, A., Pruthi, J., Sahu, N., (2017), "Sentiment Analysis of Tweets using
Machine Learning Approach", Ankita Gupta et al, International Journal of Computer
Science and Mobile Computing, Vol.6 Issue.4, April- 2017, pg. 444-458.
[3] Sheela, L.J., (2016), "A Review of Sentiment Analysis in Twitter Data Using
Hadoop", International Journal of Database Theory and Application Vol.9, No.1,
pp.77-86
[4] Sutar, K., Kasab, S., Kindare, S., Dhule, P., (2016), "Sentiment Analysis: Opinion
Mining of Positive, Negative or Neutral Twitter Data Using Hadoop", IJCSN
International Journal of Computer Science and Network, Volume 5, Issue 1, February
2016 ISSN (Online): 2277-5420 www.IJCSN.org Impact Factor: 1.02.
[5] Bandgar, B. M., Sheeja, S., (2016), " Analysis of real time social tweets for
opinion mining", International Journal of Applied Engineering Research ISSN 0973-
4562 Volume 11, Number 2 pp 1404-1407 © Research India Publications.
[6] Hridoy, S.A.A., Ekram, M.T., Islam, M. S., Ahmed, F., and Rahman*, R.,
M.,(2015), "Localized twitter opinion mining using sentiment analysis", Anwar
Hridoy et al. Decis. Anal. (2015) 2:8 DOI 10.1186/s40165-015-0016-4
[7] HADDI, E., (2015), "Sentiment analysis: text preprocessing, reader views and
cross domains, Brunel university London college of engineering, design and physical
sciences department of computer science".
[8] Chikersal, P., (2015), "Modeling Public Sentiment in Twitter", o the School of
Computer Engineering, in partial fulfillment of the requirements of the degree of
Bachelor of Engineering (B.Eng.) in Computer Science at Nanyang Technological
University, Singapore.
49
[9] Tripathi, T., Vishwakarma, S.,Kr., Lala, A., (2015), "Sentiment Analysis of
English Tweets Using Rapid Miner", 2015 International Conference on
Computational Intelligence and Communication Networks, 978-1-5090-0076-0/15
$31.00 © 2015 IEEE DOI 10.1109/CICN.2015.137.
[10] Smeureanu, I., Bucur, C., (2012), "Applying Supervised Opinion Mining
Techniques on Online User Reviews", InformaticaEconomică vol. 16, no. 2/2012.
[11] Kaur, R., Gupta, G., Singh, G., (2017), “Sentiment Analysis and its Challenges”,
International Journal of Engineering Research in Computer Science & Engineering,
Vol. 4, No. 2, pp. 97-102.
[12] Ghai, A.S., Gupta, G., Bhathal, G.S., (2017), “Survey on the effects of Sports on
Vocational Academics”, International Journal for Multi Disciplinary Engineering &
Business Management, Vol. 5, No. 3, pp. 7-9.
[13] Sachdeva, A., Gupta, G., Bhathal, G.S., (2017), “Review of Data Mining in
Contrast to Modern Medical Equipments”, International Journal for Multi
Disciplinary Engineering & Business Management, Vol. 5, No. 3, pp. 18-20.
[14] Kaur, H., Gupta, G., Attwal, K.P.S., (2017), “Review of Electronic Library using
Data Mining”, International Journal for Multi Disciplinary Engineering & Business
Management, Vol. 5, No. 3, pp. 10-13.
[15] Kaur, H., Gupta, G., Attwal, K.P.S., (2017), “Review on Therapies and Medical
Treatment using Data Mining”, International Journal for Multi Disciplinary
Engineering & Business Management, Vol. 5, No. 3, pp. 14-17.
[16] Kaur, K., Bhathal, G.S., Gupta, G., (2017), “An Analytic and Comparative Study
of Map Reduce – A Systematic Review”, International Journal of Advanced Research
in Computer Science, Vol. 8, No. 5, pp. 2453-2459.
[17] Kaur, J., Bhathal, G.S., Gupta, G., (2017), “Cloud Computing: Types,
Topologies, Virtual Machines and VM Migration for Decrease in Power
Consumption”, International Journal of Advanced Research in Computer Science,
Vol. 8, No. 5, pp. 2357-2361.
50
[18] Kaur, D., Bhathal, G.S., Gupta, G., (2017), “Analysis of DDOS attacks in Cloud
Networks”, Asian Journal of Computer Science and Information Technology, Vol. 9,
No. 14, pp. 9-14.
[19] Kaur, A., Gupta, G., Singh, G., (2017), “Role of Virtualization in Cloud
Computing”, Global Journal of Engineering Science & Research, Vol. 4, No. 7, pp.
142-149.
[20] Gupta, G., Aggarwal, H and Rani, R. (2015), “Mining the Customers Data for
making Segments based on RFM Analysis in Building Successful and Profitable
CRM”, Ciência e TécnicaVitivinícola Journal, Vol. 30, No. 3, pp. 261-270.
[21] Gupta, G. & Aggarwal, H. (2012), “Improving Customer Relationship

Management Using Data Mining”, International Journal of Machine Learning and
Computing (IJMLC), Vol. 2, No. 6, pp. 874-877.
[22] Gupta, G. & Aggarwal, H. (2016), “Analyzing Customer Responses to Migrate

Strategies in Making Retailing and CRM Effective”, Int. J. Indian Culture and
Business Management (IJICBM). Vol. 12, No. 1, pp. 92-127.
[23] Gupta, G., Aggarwal, H. & Rani, R. (2016), “Segmentation of Retail Customers
based on Cluster Analysis in Building Successful CRM”, Int. J. of Business
Information Systems (IJBIS). Vol. 23, No. 2, pp. 212-228.
[24] Gupta, G. & Kahlon, J.S. (2016), “Predicting The Cause Of Absenteeism Among
Public Versus Private College Students Using Data Mining”, International Journal for
Multi Disciplinary Engineering and Business Management (IJMDEBM). Vol. 4, No.
4, pp. 1-5.
[25] Kaur, S. & Gupta, G. (2016), “Data Mining Approach To Crm In Banking
Sector”, International Journal for Multi Disciplinary Engineering and Business
Management (IJMDEBM). Vol. 4, No. 2, pp. 81-89.
[26] Kaur, N. & Gupta, G. (2016), “Effectiveness Of Crm In Retail Sector Using Data
Mining”, International Journal for Multi Disciplinary Engineering and Business
51
[27] Kaur, K. & Gupta, G. (2015), “Predicting The Use Of Internet Among Teachers
And Students Using Data Mining”, International Journal for Multi Disciplinary
Engineering and Business Management (IJMDEBM). Vol. 3, No. 3, pp. 97-106.
[28] Kaur, N.K. & Gupta, G. (2015), “Predicting The Various Risk Factors Of
Leprosy Using Data Mining Techniques”, International Journal for Multi Disciplinary
Engineering and Business Management (IJMDEBM). Vol. 3, No. 3, pp. 79-87.
[29] Kaur, H. & Gupta, G. (2014), “Optimizing Cart Algorithm by Enhancing

Attributes”, International Journal for Multi Disciplinary Engineering and Business
[30] Kaur, J. & Gupta, G. (2014), “Hybrid of K-Means and Hierarchal Algorithms to
Optimize Clustering”, International Journal for Multi Disciplinary Engineering and
Business Management (IJMDEBM). Vol. 2, No. 3, pp. 39-44.
52

Thesis Chapterwise

Uploaded by

Copyright:

Available Formats

Thesis Chapterwise

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis Chapterwise

Uploaded by

Copyright:

Available Formats

Chapter-1

1.1 Introduction to Data Mining

Data Mining is the investigation periods of the "data discovery in documents" a

 Data mining includes of various stages. Data mining is an imperative stage in

The below diagram shows the procedure of knowledge discovery:

Figure 1.1: Process of data mining [1]

 Market Analysis and Management

1.1.3 Data Mining Hierarchical Model

Figure 1.2: Data Mining Hierarchical Model

1.2 Introduction to Sentiment Analysis

Twitter is a predominantly as often as possible utilized long range interpersonal

1.2.1 Components of Sentiment Analysis

The main components of opinion mining or sentiment analysis are as follows:

Sentiment Holder: It is the individual who is giving the conclusion about

Segments of notion examination are appeared in fig 1.4.

Components Of Sentiment Analysis

Sentiment Sentiment Sentiment

Fig 1.3: Components of Sentiment analysis

The procedure of assumption investigation should be possible in mostly 3 levels:

1) Document Level: The whole record or document is considered for slant

2) Sentence Level: Each sentence is independently regarded and delegated positive,

3) Feature Level: It is otherwise called viewpoint level characterization. In this the

1.2.3 Classification of Sentiment Analysis

Up position investigation basically ordered into 3categories which are as given

1) Positive Sentiment: It is the gathering of good or positive words in the

3) Neutral Sentiment: If the tweet is neither considered as negative nor positive

(POSITIVE) (NEGATIVE) (NEUTRAL)

Fig 1.4: Positive, Neutral and Negative sentiments (POSITIVE)

1.2.4 Techniques for Sentiment Classification

Sentiment analysis can be done through 2 types of procedures as below:

1. Sentiment arrangement utilizing regulated learning: Supervised learning is

1.2.5 Application Areas

As an interpersonal organization, Twitter spins around the standard of devotees.

Prerna Chikersal (2015) [8] Modeling Public Sentiment in Twitter, Individuals

Pragya Tripathi, Santosh Kr Vishwakarma, Ajay Lala (2015) [9] Sentiment

3.3 Research Objectives

The goals of this examination are as per the following:

To review & explore sentiments of users in tweets.

Data Collection, There is no current data indexes of Twitter assumption messages. It

In addition, there are various propelled information investigation and factual

The research follows the steps:-

The complete detail of the steps is given in following steps:

Table Name Field Name Data Type

NegWords Nwords Varchar

PosWords Pwords Varchar

Table 3.2: Positive words table

Table 3.3: Negative words table

Scary that we are not yet out of

poor execution #gst #revolution

taken by Modi Government so

sentiment140 Revolution Happened in India. 0

Tweets Having Replaced By

https://t.co//Htxxx Empty Space

@avneet Empty Space

@rupinder Empty space

Twitter Slang Actual Word

Bff best friends for ever

Original word Stemmed word