Twitter Policing
Twitter Policing
Twitter Policing
CSUSB ScholarWorks
12-2023
TWITTER POLICING
Hemanth Kumar Medisetty
Recommended Citation
Medisetty, Hemanth Kumar, "TWITTER POLICING" (2023). Electronic Theses, Projects, and Dissertations.
1815.
https://scholarworks.lib.csusb.edu/etd/1815
This Project is brought to you for free and open access by the Office of Graduate Studies at CSUSB ScholarWorks.
It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator
of CSUSB ScholarWorks. For more information, please contact scholarworks@csusb.edu.
TWITTER POLICING
A Project
Presented to the
Faculty of
San Bernardino
In Partial Fulfillment
Master of Science
in
Computer Science
by
December 2023
TWITTER POLICING
A Project
Presented to the
Faculty of
San Bernardino
by
December 2023
Approved by:
actively interact with the public. Social media offers an opportunity to share
police departments and the communities they serve. In this context sentiment
analysis of social media data has become a tool, for identifying sentiments and
interactions with particular data obtained from the Twitter (X). Initially, the project
gathers social media data, from twitter mentioned accounts on Twitter utilizing
classify media posts into three distinct sentiment categories positive, negative,
monthly basis. The project utilizes data visualization techniques like pie charts,
line charts, and clustered column charts to represent sentiment analysis data in
appealing ways that highlight the distribution and the trends of public opinion.
iii
ACKNOWLEDGEMENTS
Dr. Yan Zhang, who has served as the chair of my committee. Her unwavering
invaluable. I would also like to extend my gratitude to Dr. Qingquan Sun and Dr.
Yunfei Hou members of my committee for their belief, in my abilities and their
to the faculty at the university who have provided me with guidance and
mentorship that has played a crucial role in shaping my academic journey and
has supported my aspirations and also equipped me with the necessary skills for
a successful future.
education is a treasure that can never be taken away from us. My commitment
and determination are evident in this pursuit as it reflects both resilience and an
iv
DEDICATION
ACKNOWLEDGEMENTS .................................................................................... iv
Purpose ..................................................................................................... 1
Python ............................................................................................ 6
Scweet ............................................................................................ 6
TextBlob .......................................................................................... 7
vi
Pandas .......................................................................................... 8
REFERENCES ................................................................................................... 54
vii
LIST OF TABLES
viii
LIST OF FIGURES
Figure 10: Sentiment Analysis Comparison between TextBlob and NLTK ......... 27
Figure 17: Clustered Column for Monthly Sentiment Analysis using TextBlob ... 34
Figure 18: Clustered Column for Monthly Sentiment Analysis using NLTK ........ 35
ix
CHAPTER ONE
INTRODUCTION
In the digital age, social media has become a major platform for
communication and engagement between the Police Department and the public.
account from Twitter offering valuable insights into public sentiments, concerns,
and trends. Advanced Natural Language Processing (NLP) techniques, with the
two tools, TextBlob and Natural Language Toolkit (NLTK), classify social media
posts into positive, negative, and neutral sentiments. This classification provides
Beyond sentiment analysis, the project keeps track of the evolution of public
opinion over time using data visualization techniques like pie charts, line charts,
and custom clustered column charts. These visuals effectively convey sentiment
community involvement are crucial, this project uses social media and sentiment
Purpose
The main purpose of the project is to analyze public sentiments towards the
1
containing relevant hashtags, keywords, and mentioned accounts related to the
police department. But this project works on tweets that are related to the
tweet will be analyzed using techniques called TextBlob and NLTK to determine
be used to identify areas where the police department may need to improve
public relations and attract public opinion over time. This project will provide
valuable insights into how the police department is perceived by the public and
help the department improve its image and reputation by addressing the areas of
public concern. Sentiment analysis combined with social media monitoring allows
you to determine how interested your target audience is in any emerging trends.
Project Milestones
current trends. This project shows comparison results of two sentiment analysis
methods – TextBlob and NLTK. The process starts with collecting the real data
from Twitter and pre-processing the data and ensuring the processed data is
suitable to perform analysis. After analysis, the data has been processed to
generate a few charts that show how sentiments are changing monthly. The
2
1. Data Collection and Preprocessing:
a. At this stage the project includes gathering a set of tweets from the
Twitter platform.
time.
4. Data Visualization:
a. The main goal of this stage is to create charts that improve our
3
b. Pie charts, line charts, and clustered column charts are used to
These pictures make it easy for us to see what people are feeling.
4
CHAPTER TWO
To complete the project libraries, modules and software have been used.
code with the libraries scweet and selenium the data has been extracted and
processed. With the use of libraries - TextBlob and NLTK, sentiments have been
classified accordingly.
Visual Studio:
Visual Studio code is also known as VS code which is one of the widely
versatility, and developer friendly. In this project version 1.83.1 is used. VS code
is a cross-platform which can run on Mac OS, Linux, and Windows. It supports a
developers. It even allows developers to run command line tools and scripts
directly within VS code, eliminating the need to switch between a terminal and
editor.
5
Python:
language with a readable syntax. Python 3.10.9 is the version used in this
project. In Python, the braces-free, intended index which makes code easier to
read and reduces errors. It can be used for a variety of tasks such as data
everyone for usage and contribution towards its progress. The comprehensive
Python standard library includes functions and modules that cater to an array of
Scweet:
based on specific search parameters. It is a Python library that offers users the
details, and related data. With scweet users have the flexibility to define their
search terms hashtags, user mentions and date ranges to gather tweets. In this
project, users can conveniently scrape tweets using the mentioned accounts of
6
Moreover, scweet supports Twitter data collection making it well suited for
Selenium Webdriver:
compatible with browsers such as Firefox, Edge, Safari, and Chrome making it
languages like Java, JavaScript, Python, Ruby, and C# which makes it versatile
for developers and QA engineers. With Selenium webdriver web applications can
be tested by emulating user actions like clicking buttons filling forms and
TextBlob:
Through sentiment analysis, TextBlob can determine whether text data holds a
7
the benefits of using TextBlob is its ability to identify the part of speech, for each
subjectivity and polarity. Polarity measures the tone of the text on a scale from 1
NLTK is a Python library widely used for language processing and text
Sentiment Intensity Analyzer module. This module uses a lexicon of words and
analyzing the words and their scores, in the lexicon, the Sentiment Intensity
negativity, or neutrality.
Pandas:
Pandas is an open source powerful and widely used library module for
data manipulation and analysis. Panda provides two main data structures which
are Series and Data frames. While a data frame is a two-dimensional table with
8
structures make it simple to represent and work with data. Data frames were
used in this project. Pandas supports both import and export of data, to and from
various file formats like CSV, Excel, SQL database, and JSON. It effortlessly
interfaces with frameworks for data visualization like Matplotlib and Seaborn,
enabling users to produce helpful visualizations. Matplotlib library has been used
datasets.
Dataset Information
real-time. In the code, scrape is a function from the Python library. Scweet which
was customized according to the project. By using this function, it scrapes the
tweets from the mentioned Twitter account. Below are the following columns in
the dataset,
1. UserScreenName: The Twitter user’s screen name who posted the tweet.
7. Image link: The link to any images or media associated with the tweet.
9
8. Month: The full name of the month in which the tweet was posted
10. Polarity: The polarity is a measure of a tweet’s text indicating its sentiment
scores.
the NLTK library, including compound, neg, neu, and pos scores.
14. NLTK_Analysis: The sentiment analysis results from NLTK classifying the
Data Preprocessing
further analysis and classification of sentiments. All the unwanted content in the
text which is not useful for the analysis will get cleaned. Along with the cleaning
10
of text, there is one more important part separating the tweets according to
1. Cleaning Text Tweets: In this step, the code cleans unwanted details from
the tweet’s textual content, making the text more suitable for sentiment
cleaning the text tweets. It checks whether the input text is a string or not.
sentiment analysis, this clearing method ensures that the text data is
11
Figure 1: Method to Clean the Text.
processed by the code in this stage, which creates datetime objects from
allows you to segregate all of the tweets by month column which in turn
used to parse these timestamps into strings. The code verifies that
timestamps.
12
b. Conversions to datetime objects: Timestamp strings that pass the
timestamps. It extracts the full name of the month from the date
13
Figure 3: State Diagram.
14
Figure 4: Class Diagram.
15
Figure 5: Use Case Diagram.
16
Figure 6: Sequence Diagram.
17
CHAPTER THREE
METHODOLOGIES
The scrape method is called from scweet in the main code, and a few methods
from utils.py are called in scweet to collect the data. In accordance with the
A. The code supports the Chrome browser and uses Selenium for web
scrapping.
18
2. Data Extraction:
data.
3. Web Scraping:
A. The code opens a Twitter search page for a specific query (defined
4. Data Extraction:
tweet card.
C. Promoted tweets are also handled to exclude them from the data.
5. Data Processing:
of data.
19
B. Images associated with the tweets are optionally saved if specified
in the parameters.
6. Data Storage:
7. Scrolling and Pagination: The code keeps scrolling through the Twitter
elements such as links and XPath to get the required information from the
posts.
section offers an actual illustration of TextBlob’s use in this project. This method
demonstrates how TextBlob is used for particular tasks in this project and is
20
B. if ‘text’ is a string, the TextBlob library is used to analyze the text
and extract the subjectivity score with the help of the function
‘Textblob’s(text).sentiment.subjectivity’
2. getPolarity Function:
B. The polarity describes the tone of the text, which can be either
polarity’.
A. Using the ‘.apply’ method, the code applies this function to ‘Text’
B. The polarity scores are kept in the ‘Polarity’ column, but the
4. Sentiment Analysis:
21
I. if the polarity score < 0 the tweet is categorized ‘Negative’.
A. To save the sentiment labels the code adds a new column called
22
NLTK Method Analysis
This method is used to perform sentiment analysis on text data using the
1. sentiment_scores Functions:
as a text as input and it returns the ‘compound’ score from the input
dictionary.
23
B. ‘get_compound function’ applies to each row in NLTK_Sentiment
called ‘Compound_Value’.
‘get_NLTK_Analysis(value)’ function.
DataFrame.
24
CHAPTER FOUR
RESULTS
threshold, which separates the sentiment categories. This minor change has a
big effect on outcomes with NLTK categorizing more negative comments than
TextBlob. When comparing the two, NLTK shows a greater tendency than
TextBlob to classify feelings as negative. It's crucial to remember that this does
not necessarily mean that NLTK is always preferable to TextBlob. The choice
between the two, depends on the specific requirement of the user. While NLTK
on the desired results and particular use cases. In this project, sentiment analysis
was conducted on the Twitter account of SeattlePD. The results are presented
below in tabular form, indicating the counts of positive, negative and neutral
sentiments.
25
Sentiment Analysis SeattlePD
With the help of visualization, the data can be simplified, and it help one to
understand current trends and patterns. In this the analysis shown in three
categories with Sentiment Analysis Comparison using Pie Chart which gives
secondly, Track of monthly changes using Line Chart which shows individual
trends, and Monthly Sentiment Analysis using Clustered column chart which
The below chart visually compares sentiment analysis results for the
Police Department’s tweets using text blob and NLTK. It facilitates a quick
understanding of the sentiment distribution and allows for insights into public
26
sentiment regarding the Police Department on Twitter. The pie chart shows
percentages and number of each sentiment for both the TextBlob and NLTK
analysis.
analysis results for three categories: Positive, Negative and Neutral using
TextBlob. It customizes the color palette for each sentiment category and groups
the data by month and sentiment which are present on X-axis and Y-axis. The
27
charts provided a clear view of statement trends over time, enabling insights into
how public sentiment towards the Police Department on Twitter varies by month
and sentiment category. Additionally, the NLTK sentiment analysis uses the
28
Figure 12: Neutral - Monthly Sentiment Analysis using TextBlob.
29
Figure 13: Negative - Monthly Sentiment Analysis using TextBlob.
30
Figure 14: Neutral - Monthly Sentiment Analysis using NLTK .
31
Figure 15: Negative - Monthly Sentiment Analysis using NLTK.
32
Figure 16: Positive - Monthly Sentiment Analysis using NLTK.
Neutral, and Negative. The graph represents sentiment analysis results per
month, allowing for easy comparison of sentiment distribution over time. The
code ensures that the graph is visually appealing and informative. This
33
Figure 17: Clustered Column for Monthly Sentiment Analysis using TextBlob.
34
Figure 18: Clustered Column for Monthly Sentiment Analysis using NLTK.
35
CHAPTER FIVE
CONCLUSION
This project aimed to bring police and communities closer by using the power of
They showed some differences in how they work. NTLK and TextBlob are tools
0.05, NLTK tends to find more negative comments, while TextBlob shows a more
even distribution of positive and neutral sentiments. Neither tool is better overall;
it depends on the user specific needs. We looked at the how people feel about
this Seattle Police on Twitter. We use charts to show how these feelings change
each month this helps us understand what the community thinks. In summary
Major Contribution
With a Twitter API basic account, for $100 USD per month, users can
retrieve up to 10,000 tweets per month. The Twitter API Pro account, on the
other hand, allows for a maximum of 1,000,000 tweets and cost $5000 USD per
month. In the context of our code, it keeps retrieving data until a server
36
connection is lost. Even though it might take some time to compile a sizable
number of tweets, this period of time can handle them by taking the associated
cost into account. By executing the code and allowing it to operate for an
extended period, a significant volume of data can be collected, and that data can
Future Work
In the future, if the budget permits, we can consider using Twitter’s API
with a developer account for this project. This involves engaging in discussions
through Twitter. We can also make the project more user-friendly by creating
front-end interfaces and dashboards. This will help us better connect with our
37
APPENDIX A:
38
The code in the below snippet is used in the main program to collect dataset
in real-time. After data collection, the code is used to clean unwanted information
from tweets which is not useful for analysis. Subjectivity, Polarity values using
sentiment as well. NLTK, it able to produce sentiment scores, among the scores
SOURCE CODE:
import os
import pandas as pd
import nltk
39
####################### Clearing the Screen #######################
until="2023-08-01", interval= 5,
headless=False,
lang="en")
import re
def cleanText(text):
text = re.sub(r'\.+', '', text) # Escape the dot to match a literal dot
return text
Complete_Data['Text'] = Complete_Data['Text'].apply(cleanText)
40
####################### SUBJECTIVITY AND POLARITY ##############
def getSubjectivity(text):
if isinstance(text, str):
return TextBlob(text).sentiment.subjectivity
else:
def getPolarity(text):
if isinstance(text, str):
return TextBlob(text).sentiment.polarity
else:
Complete_Data['Subjectivity'] = Complete_Data['Text'].apply(getSubjectivity)
Complete_Data['Polarity'] = Complete_Data['Text'].apply(getPolarity)
def getAnalysis(value):
return 'Negative'
elif(value == 0):
return 'Neutral'
else:
41
return 'Positive'
Complete_Data['TextBlob_Sentiment'] =
Complete_Data['Polarity'].apply(getAnalysis)
sia = SentimentIntensityAnalyzer()
def sentiment_scores(text):
if isinstance(text, str):
return sia.polarity_scores(text)
else:
Complete_Data['NLTK_Sentiment_Scores'] =
Complete_Data['Text'].apply(sentiment_scores)
def get_compound(text):
return text['compound']
Complete_Data['Compound_Value'] =
Complete_Data['NLTK_Sentiment_Scores'].apply(get_compound)
42
def get_NLTK_Analysis(value):
return 'Negative'
return 'Positive'
else:
return 'Neutral'
Complete_Data['NLTK_Analysis'] =
Complete_Data['Compound_Value'].apply(get_NLTK_Analysis)
TextBlob_sentiment_counts =
Complete_Data['TextBlob_Sentiment'].value_counts()
NLTK_sentiment_counts = Complete_Data['NLTK_Analysis'].value_counts()
TextBlob_sentiment_counts =
NLTK_sentiment_counts = NLTK_sentiment_counts.reindex(index=['Positive',
'Negative', 'Neutral'])
43
# Create a figure with a single pie chart
ax1.labels = TextBlob_sentiment_counts.index
ax2.labels = NLTK_sentiment_counts.index
ax1.sizes = TextBlob_sentiment_counts.values
ax2.sizes = NLTK_sentiment_counts.values
# Explode the "Positive" section (you can adjust this for emphasis)
explode = (0.1, 0, 0)
ax1.sizes]
44
ax2.percentages = [f'{count} ({count / sum(ax2.sizes) * 100:.1f}%)' for count in
ax2.sizes]
# Create the pie chart with labels, values, colors, and explosion
# Add a title
fontsize=16,fontweight='bold')
circle.
plt.tight_layout()
45
# Show the pie chart
plt.show()
# Now, let's create separate line charts for each sentiment category (Positive,
Negative, Neutral).
grouped_data = Complete_Data.groupby(['Month',
'TextBlob_Sentiment']).size().unstack(fill_value=0)
available_months = Complete_Data['Month'].unique()
"%B"))
# Create separate line charts for each Text Blob sentiment category
46
plt.figure(figsize=(10, 6))
sentiment_data = grouped_data.loc[custom_order][sentiment]
x_ticks = range(len(custom_order))
x_labels = custom_order
linewidth=2, markersize=8)
ax.set_xticks(x_ticks)
plt.xlabel('Month')
plt.ylabel('Count')
ax.set_facecolor('#f0f0f0')
ax.set_axisbelow(True)
47
plt.title(f'TextBlob | {sentiment} | Sentiment Analysis per Month', fontsize=16,
fontweight='bold', pad=20)
plt.show()
# plt.figure(figsize=(10, 6))
ax = grouped_data.loc[custom_order].plot(kind='bar')
plt.xlabel('Month')
plt.ylabel('Count')
fancybox=True)
for p in ax.patches:
48
# Add a shadow effect
ax.set_facecolor('#f0f0f0')
ax.set_axisbelow(True)
ax.set_facecolor('#f0f0f0')
pad=20)
plt.show()
49
####################### NLTK LINE CHARTS #######################
# Now, let's create separate line charts for each sentiment category (Positive,
Negative, Neutral).
grouped_data = Complete_Data.groupby(['Month',
'NLTK_Analysis']).size().unstack(fill_value=0)
available_months = Complete_Data['Month'].unique()
"%B"))
# Create separate line charts for each Text Blob sentiment category
plt.figure(figsize=(10, 6))
sentiment_data = grouped_data.loc[custom_order][sentiment]
x_ticks = range(len(custom_order))
50
x_labels = custom_order
linewidth=2, markersize=8)
ax.set_xticks(x_ticks)
plt.xlabel('Month')
plt.ylabel('Count')
ax.set_facecolor('#f0f0f0')
ax.set_axisbelow(True)
fontweight='bold', pad=20)
plt.show()
# plt.figure(figsize=(10, 6))
ax = grouped_data.loc[custom_order].plot(kind='bar')
51
# Set labels and title
plt.xlabel('Month')
plt.ylabel('Count')
fancybox=True)
for p in ax.patches:
ax.set_facecolor('#f0f0f0')
ax.set_axisbelow(True)
52
ax.set_facecolor('#f0f0f0')
pad=20)
plt.show()
53
REFERENCES
https://arxiv.org/abs/1803.09875
10.23956/ijermt.v6i12.32.
[3] “Datetime - Basic date and time types," Python Documentation, Available:
September, 2023].
[5] E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in sentiment
analysis,” Procedia Computer Science, vol. 17, pp. 26–32, Jan. 2013, doi:
10.1016/j.procs.2013.05.005.
54
[6] I.V. Shravan, "Sentiment Analysis in Python using NLTK," ResearchGate,
Available: https://www.researchgate.net/profile/Shravan-
Iv/publication/312176414_Sentiment_Analysis_in_Python_using_NLTK/links/587
4d26908ae8fce4927e011/Sentiment-Analysis-in-Python-using-NLTK.pdf.
[Accessed: June,2023].
[7] J. Yao, “Automated Sentiment Analysis of Text Data with NLTK,” Journal of
6596/1187/5/052020.
55
[12] “Pandas - Python Data Analysis Library," Pandas, Available:
[13] S. Elbagir and J. Yang, “Analysis using Natural Language Toolkit and
Scientists, IMECS 2019, The Royal Garden Hotel, Kowloon, Hong Kong, March
13-15, 2019.
[16] S. V. Pandey and A. V. Deorankar, "A Study of Sentiment Analysis Task and
doi: 10.1109/ICECCT.2019.8869160.
56
in Computing, Communication & Materials (ICACCM), Dehradun, India, 2020, pp.
2023].
applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–
[21] Y.A. Jeddi, "Scweet - Collect and Analyze Tweets on Twitter," GitHub,
57