Twitter Spam Detection Based On Deep Learning: Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang

Twitter Spam Detection based on Deep Learning
Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang

School of Information Technology
Deakin University
221 Burwood Hwy,Burwood,
VIC 3125, Australia
{tingminw, shigang, jun.zhang, yang.xiang}@deakin.edu.au
ABSTRACT messages and share ideas around the world. Particularly,

Twitter spam has long been a critical but difficult problem Twitter tends to attract users as it provides free microblog-
to be addressed. So far, researchers have developed a series ging service for customers to broadcast or discover messages
of machine learning-based methods and blacklisting tech- within 140 characters, follow other users and so on through
niques to detect spamming activities on Twitter. According multiple devices [12]. For each month, there are even 42
to our investigation, current methods and techniques have million new accounts created in Twitter [3]. With the pop-
achieved the accuracy of around 80%. However, due to the ularity of Twitter, criminal accounts can post plenty of s-
problems of spam drift and information fabrication, these pam, which may include suspicious URLs to redirect users
machine-learning based methods cannot efficiently detect s- to phishing or malicious websites [3]. Consequently, Twitter
pam activities in real-life scenarios. Moreover, the blacklist- spam becomes a severe problem and has bad influence on
ing method cannot catch up with the variations of spam- individuals’ networking experience. It was reported that 8%
ming activities as manually inspecting suspicious URLs is URLs were spam in a 2-million-URL dataset [11]. Moreover,
extremely time-consuming. In this paper, we proposed a it was even more baleful than email spam, with the click-
novel technique based on deep learning techniques to ad- through rate at 0.13%, against much lower result of email
dress the above challenges. The syntax of each tweet will spam only at 0.0003%∼0.0006%.
be learned through WordVector Training Mode. We then There are many researchers focusing on tackling the prob-
constructed a binary classifier based on the preceding rep- lem for the purpose of maintaining the social network se-
resentation dataset. In experiments, we collected and im- curity in Twitter by filtering spam. For instance, a tweet
plemented a 10-day real Tweet datasets in order to evaluate content-based classifier was built according to message lin-
our proposed method. We first studied the performance of guistic analysis [25]. But it could not generate a set of com-
different classifiers, and then compared our method to oth- parable results since only one algorithm was employed in its
er existing text-based methods. We found that our method mechanism. Lately, most researches have put emphasis on
largely outperformed existing methods. We further com- establishing machine learning-based binary classifiers with
pared our method to non-text-based detection techniques. the input of statical features [3, 27, 31, 13, 7, 20]. The
According to the experiment results, our proposed method features could be picked up from Twitter’s Streaming APIs
was more accurate. and calculated by a JSON object, and they include account-
level attributes (like number of followings, amount of fol-
lowers and age of account) and user-level attributes (such as
CCS Concepts quantity of URLs, digits, hashtags in the tweet respective-
•Security and privacy → Phishing; Spoofing attacks; ly)[7]. However, there existed some issues in terms of feature
extraction and unsatisfied accuracy. In the procedure of col-
Keywords lecting data, it was observed that Twitter spam would drift
[7, 6] and features were easily fabricated [13, 26, 33].Be-
Twitter spam detection; deep learning; social network secu- sides, by organizing the performance of existing researches,
rity the average of accuracy can only achieve at 85% or so. An-
other technique covers blacklist services, but it was showed
1. INTRODUCTION that above 90% users might click the malicious URLs before
In the recent years, online social networks have become they were blacklisted [6]. At the same time, blacklisting
increasingly prevalent platforms where users can post their techniques are extremely time-consuming due to individuals
participation for unsolicited information recognition. Con-
Permission to make digital or hard copies of all or part of this work for personal or
sequently, these challenges contribute to the motivation of
classroom use is granted without fee provided that copies are not made or distributed our work.
for profit or commercial advantage and that copies bear this notice and the full cita- To deal with those problems including single support-
tion on the first page. Copyrights for components of this work owned by others than ing algorithm, feature extraction issue, accuracy shortage
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission and low speed, an effective classification method based on
and/or a fee. Request permissions from permissions@acm.org. deep learning is proposed by this paper. Firstly, we ap-
ACSW ’17, January 31-February 03, 2017, Geelong, Australia ply Word2Vec to pre-process the tweets instead of feature
⃝
c 2017 ACM. ISBN 978-1-4503-4768-6/17/01. . . $15.00 extraction, where the technique adopted is an advanced lan-
DOI: http://dx.doi.org/10.1145/3014812.3014815
dǁŝƚƚĞƌ^ƉĂŵ Lee and Kim [17] have proposed an innovative Twitter ma-
ĞƚĞĐƚŝŽŶ licious URL detecting system according to the relevance of
suspicious URLs, though their method could not be used for
dynamic redirections. Wang, et al. [32] developed a dataset
of click traffic feature to determine shortened URLs to be
^ǇŶƚĂǆŶĂůǇƐŝƐ &ĞĂƚƵƌĞŶĂůǇƐŝƐ ůĂĐŬůŝƐƚ spam or not spam. Moreover, there are some work consider-
ing tweet content or text. For example, Tang, et al. [29] ex-
tracted tweet content for the input of classifier by employing
a deep learning neural network model to learn syntactic con-
dǁĞĞƚ ĐĐŽƵŶƚ texts of embedded words and label information. For Tang, et
^ŚŽƌƚĞĚ dǁĞĞƚ ^ŽĐŝĂů
^ƚĂƚŝƐƚŝĐ ^ƚĂƚŝƐƚŝĐ al.aŕs
, method, there was still room for improvement in terms
hZ> ŽŶƚĞŶƚ 'ƌĂƉŚ
/ŶĨŽ /ŶĨŽ of its performance (highest F1-measure 87.61%<90%). Ry-
bina [25] also discussed the advantages of utilizing linguistic
Figure 1: Twitter spam detection category analysis to build text classification such as document-level
modelling. However, the text classification technique did
not include multiple machine learning methods, so it was
guage processing method in deep learning and it can convert not available to compare different performance results. Be-
word or document to representative vector [15]. Afterward- sides, the work [24] developed a text classifier on the basis
s, a binary detection model is built on the basis of several of several different machine learning algorithms. These two
machine learning algorithms to distinguish spam and non- work found that Naı̈ve Bayes is the most efficient algorithm
spam. At the next stage, parameter setting is assigned for among all the methods in terms of accuracy and time cost
spam filtering. The experiments are set up with a real-world (F1-measure >= 90%).
10-day ground-truth dataset. The following step refers to
compare our classification outcome to those also analyse the 2.2 Detection based on Feature Analysis
content of tweets. Finally, we make a further comparison In this field, there are also many methods that select ac-
between the new method and current detection techniques count and/or tweet features as training data for the input
which do not rely on text analysis with respect to accuracy. of machine learning based classifiers, such as the work [3,
As a result, our innovative methods are proved to outper- 27, 31, 13, 7, 20]. Account features can be the age of us-
form them. er account, the number of followings, and the number of
The main contribution of our work is summarized as fol- followers. Tweet features include the ratio of tweets which
lows: contain URLs, the average number of hashtags in a tweet
We put forward a new Twitter spam detection method and etc. To effectively select those features, some extrac-
based on deep learning which has addressed the challenges tion techniques were used and developed. For example, Ben-
of existing classifiers (low speed, under-standard accuracy evenuto, et al. [3] analysed ten most important attributes
and characteristic extraction problem). The automatic sys- by χ2 method, and Chen, et al. [7] collected features from
tem is operated fast without statistic feature input. What’s Twitter’s Streaming APIs and a JSON object. After the
more, it realizes higher accuracy of about 95% than current feature dataset is built, it is vital to determine the optimal
performance (87% averagely). classification algorithm to train them. Therefore, the focus
The rest of the paper is organized as follows. Section 2 of current Twitter spam detection methods is to examine
shows some related researches on Twitter spam detection. different machine learning techniques. For example, Wang
Our innovative classification method based on deep learning [31] chose Bayesian theory as the learning model to detect
is explained in detail at section 3. In Section 4, experimen- spammers. Benevenuto, et al. [3] inspected both spam and
tal setting and performance result are illustrated. And the spammers based on the Supporting Vector Machine method.
comparisons between achieved result and existing accuracies Stringhini, et al. [27] created a spam classifier by applying
(both at the same text-based and non-text-based classifiers) the Random Forest process. Lee, et al. [16] obtained fea-
are also presented. In Section 5, we describe some factors tures from spam profiles which were collected by honeypots,
which may influence the classifier performance. Finally, Sec- and then trained them in multiple machine learning algo-
tion 5 concludes the paper. rithms, such as Decorate and LogitBoost [1]. There are t-
wo major problems of using feature-based machine learning
classifiers. The first one is about Twitter spam drift which
2. RELATED WORK would influence the performance of a trained classifier [7,
Many efforts have been made to develop spam detection 6]. Liu, et al. [20] later solved this challenge by proposing
techniques on Twitter in the last decade. In this section, we a new technique which combined fuzzy-based redistribution
explain the state of the art in three categories: syntax anal- and asymmetric sampling together. Another problem is that
ysis, feature analysis, and blacklist techniques (See Figure although Twitter data features can be easily extracted with
1). the support of statistical methods, it is difficult to avoid fea-
ture fabrication in the data collection processes. To address
2.1 Detection based on Syntax Analysis the problem, researchers usually adopted social graph to ex-
Syntax-based detection methods analyse tweets at charac- pose robust features in order to prevent feature fabrication
ter or word level. There are some work focusing on inspect- [13, 26, 33]. Jin, et al. [13] reported social network features
ing shortened URLs inside tweets. Shortened URLs can be which consisted of individual characteristics of user profiles
generated by shortener services by inputting a long URL, and their behaviours. Song, et al. [26] generated Twitter
which are used by spammers to hide their malicious URLs. social graphs and expose spam according to social distance
dǁŝƚƚĞƌĂƚĂ
;dǁĞĞƚƐͿ
ZĂǁ^ƚƌŝŶŐƐ
&ĞĂƚƵƌĞǆƚƌĂĐƚŝŽŶ
;sĞĐƚŽƌƐͿ tŽƌĚsĞĐƚŽƌ
>ĞĂƌŶŝŶŐůŐŽƌŝƚŚŵ
>ĞĂƌŶŝŶŐůŐŽƌŝƚŚŵ ,ŝŐŚͲŝŵĞŶƐŝŽŶ
sĞĐƚŽƌ&ĞĂƚƵƌĞ
ůĂƐƐŝĨŝĐĂƚŝŽŶ
Figure 2: New Twitter classification workflow based Figure 3: The procedure of learning document vec-
on deep learning tor, where N represents the number of the words in
a document.
??
and connectivity between followees and followers. Yang, et
al. [33] constructed social graph according to the local clus-
tering coefficient, betweenness centrality and bidirectional representation for words are applied widely in systems of
links ratio. Based on the social graphs, spamming account linguistic analysis [8, 28].
will be detected by analysing graph mathematical features.
The features used in this method was proved to be more 3.2 Detection Framework
robust than existing algorithms. However, when consider- Different from the conventional detection, we complete
ing time cost on data collection, this method becomes too picking up attributes according to the content of tweets us-
complex to be used in the real world. ing Word2Vec instead of feature collection and generation.
First of all, we apply Word2Vec to map each word in
2.3 Blacklist Techniques the whole dataset into corresponding multidimensional vec-
Blacklist techniques are commonly deployed in web filter- tor. It employs a two-level neural network, where Huffman
ing services such as Twitter spam detection, with the func- technique is used as hierarchical softmax to allocate codes
tionality of blocking malicious websites according to their to frequent words [22]. It improves the efficiency of train-
information analysis like user feedback and website crawl- ing model, since high-frequency words can be processed fast
ing. Ma, et al. [21] presented a lightweight blacklisting [14]. Applying this technique, the word vector-based repre-
approach with lower cost than existing classifiers. Oliver, sentation is trained through stochastic gradient descent and
et al. [23] detected baleful URLs by using blacklisting tech- the gradient is achieved by backpropagation. What’s more,
nique which was integrated in a so-called Web Reputation optimal vectors are obtained for each word by CBOW or
Technology. However, this method has to rely on manual Skip-gram [22].
labelling which is too time-consuming. Furthermore, Doc2Vec training model is used to assign
In a nutshell, current spam detection methods on Twitter one vector representing every tweet using Paragraph Vector
are still not sufficient to detect spamming activities quickly modelling [14]. Based on Word2Vec, a tweet-length doc-
and accurately in terms of Recall, Precision and F1-measure. ument vector is trained by the combination of word vec-
To achieve less time consuming and better performance turn- tors and unique document vector per record. By repeating
s into the motivation of our work. the procedures, each optimal document-based vector can be
learned (as shown in Figure ??).
After document vectors with high-level dimension learned,
3. DEEP LEARNING BASED CLASSIFIER they are treated as the input features of several machine
This section describes a new Twitter spam detection tech- learning techniques, such as the Random Forest or Neural
nique including vector-based characteristics training by Word- Network, along with the label of spam/non-spam. The doc-
Vector techniques and binary classifier building using mul- ument representation d⃗ can be defined as
tiple machine learning algorithms. Figure 2 shows the work-
flow of distinguishing Twitter spam through our new method. ⃗ = {d1 , d2 , . . . , dM },
D
where M is the dimension amount of the document vector,
3.1 Deep Learning Primer d is the value for each level of it.
With the limitation of Natural Language Processing (NLP) By adding the variable binary label, the tweet can be in-
ability for conventional machine learning algorithm using dicated as
raw strings, deep learning was developed to be competent
to understand and analyse text using a deep neural network ⃗ label),
⃗t = (D,
with multiple layers [15]. Through the network, each output
where t represents the concatenate vector, and label is the
of the previous layer turns to be the input of the next level.
tweet flag of spam or non-spam.
In particular, deep leaning neural language techniques owe
Thus, the training dataset T is expressed as
strong ability on language analysis, with distributed vectors
trained under WordVector method [15]. Text-based vector T = (⃗t1 , ⃗t2 , . . . , ⃗tN ),
Table 1: The List of Methods for Comparison
Detection Method Description
Text-based using Random Forest: classification applying the Random Forest algorithm [5] to process the word rep-
Deep Learning resentation trained by WordVector Technique.
(Internal) Neural Network (MLP): detection method using the neural network MLP [10] with input extracted
by WordVector.
Decision Tree: employing a greedy splitting method to build a tree [9], along with WordVector
pre-processing.
Traditional Palladian: the text classifier working with n-grams which are a series of tokens for the length [30].
Text-based (Vertical Complementary Naive Bayes: Multinomial Naive Bayes model which can detect words distribution
Comparison) in documents [24].
Complementary Naive Bayes (Frequencies): Complementary one with term frequency. [30]
Feature-based Naive Bayes: a two-layer classification method, with one level representing the label of spam/non-
Supported by spam,and another including a set of features [2].
Machine Learning Random Forest: an anti-sensitive method, with an extra layer added [18].
(Horizontal Decision Tree (C4.5): a traditional machine learning technique with multiple retrieving and ordering
Comparisom) [19].
where N is the number of tweets in training dataset.

With a training dataset, a binary classification function C Table 2: Sample Datasets
applying traditional machine learning methods is generated Dataset No Dataset Type Spam : Non-spam
to predict attributes of testing data without labels, with a 1 Continuous 5k : 5k
⃗ in the order of corresponding messages. It is
label vector L 2 Continuous 5k : 95k
demonstrated as 3 Random 5k : 5k
4 Random 5k : 95k
⃗ = (l1 , l2 , . . . , ln ) = C(D
L ⃗ 1, D
⃗ 2, . . . , D
⃗ n ),
where n is the tweets number of testing data.

Table 3: Confusion Matrix
Predicted
4. PERFORMANCE EVALUATION Spam Non-spam
In the section, we first demonstrate our performance re- Spam TP FP
sults using several different classification methods based on Non-spam FN TN
the ground-truth datasets. We then compare our experimen-
tal outcomes to two kinds of existing classifications which are
text-based and non-text-based. These classifiers are shown
in Table 3, where the first one part represents three clas- on Windows 10 operation system at a server with Inter(R)
sifiers in our proposed methods (internal) and the rest two Core(TM) i7 CPU of 12 GB. In the workflow, at the first
parts are about similar (vertical) and different (horizontal) layer, the WordVector Training Model is set as Doc2Vec,
techniques. along with the learning rate of 2% and the layer size to be
200. Besides, another step consists of several different tradi-
4.1 Experiment Setting tional machine learning models (one is utilized each time).
To avoid error, we loop each experiment for 100 times and
4.1.1 Ground-Truth Twitter Dataset calculate the mean of each performance metric. For each
To evaluate our proposed Twitter spam detection tech- run of experiments, the dataset is independently split into
nique, we apply a real-life 10-day ground-truth dataset from 60% training data and 40% testing data.
Twitter, which contains 1,376,206 spam tweets and 673,836
non-spam messages excluding blank lines. We wrote a Ja- 4.1.3 Performance Metrics
va program to process the raw text including removing the To evaluate the performance of our created classification
blank lines and assigning labels classified as spam or non- and make it comparable to current approaches, we use Recall
spam. In order to determine the impact of dataset properties (Sensitivity), Precision, F-measure and Accuracy (ACC) to
on performance, 4 sub-datasets are picked up continuously measure the capability of classifiers.
or randomly. The 4 samples are shown in Table 2. In the- Traditionally, the result of spam classification contains the
ory, the ratio of spam to non-spam is 1:1, like Dataset 1 amount of projected spam and non-spam. Table 3 shows the
and Dataset 3. However, in the real world, there were ap- variables TP (True Positive), FP (False Positive), TN (True
proximately 95% tweets are non-spam [11]. Thus, we set Negative), and FN (False Negative). TP is the number of
the amount of non-spam to be 19 times as much as spam in spam tweets which are correctly classified as spam, and F-
Dataset 2 and Dataset 4. P represents the amount of non-spam which are wrongly
labeled as spam. On the contrary, TN refers to the quanti-
4.1.2 Basic Parameters Setup ty of non-spam which are exactly considered as non-spam,
To achieve the whole workflow of our innovative detec- while FN indicates the data for spam messages which are
tion technique, we build classification process on KNIME treated as spam by mistake.
Analysis Platform [4]. Basically, we run the experiments Accuracy (ACC) means the ratio of tweets identified cor-
Random Forest MLP Decision Tree
1 1 1 1
0.8 0.8 0.8 0.8
F-measure
Precision
Accuracy
0.6 0.6 0.6 0.6
Recall
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4
(A) (B) (C) (D)
Figure 4: Performance Value of our detection method based on deep learning based on 4 sampled datasets.
(A) Recall; (B) Precision; (C) F-measure; (D) Accuracy
MLP Palladian Naive Bayes Naive Bayes (Frequencies)
1 1 1 1
0.8 0.8 0.8 0.8

F-measure
Precision
Accuracy
0.6 0.6 0.6 0.6
Recall
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
(A) (B) (C) (D)
Figure 5: Vertical Comparison of performance values between our technique and traditional text-based
detection approaches based on 4 sampled datasets. (A) Recall; (B) Precision; (C) F-measure; (D) Accuracy
MLP Naive Bayes Random Forest Decision Tree

1 1 1 1
0.8 0.8 0.8 0.8

F-measure
Precision
Accuracy
0.6 0.6 0.6 0.6

Recall
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
(A) (B) (C) (D)
Figure 6: Horizontal Comparison of performance values between our technique and feature-based methods
based on 4 sampled datasets. (A) Recall; (B) Precision; (C) F-measure; (D) Accuracy
our proposed method using MLP performs is better than
Table 4: Impact of the Spam Ratio by Dataset 1 and all current work in terms of Recall, F-measure and Accura-
2 using MLP cy. For Precision, the performance of our method is about
Unit: % Recall Precision F-measure Accuracy 25% higher than the second place on Dataset 2 but 5% less
Dataset 1 93.48 95.04 94.25 94.30 than the best for other datasets. It even achieves double
Dataset 2 91.03 95.84 93.37 99.35 F-measure of Naive Bayes (Frequencies) on Dataset 2 and
4. Overall, it outperforms all the others.
Table 5: Impact on Sample Dataset Discretisation 4.4 Comparison (vs. Feature-based Methods)
of Dataset 1 and 3 using MLP We further compare our method to other feature-based
Unit: % Recall Precision F-measure Accuracy detection methods. The performances for all four metrics
Dataset 1 93.48 95.04 94.25 94.30 on for datasets are better than other all the time. As shown
Dataset 3 91.48 94.23 92.83 92.94 in Figure 6, the F-measure is much higher than others, with
averagely 30% higher than Random Forest and almost nine
times of Naive Bayes in Dataset 2 and 4. Even the Decision
rectly to all tweets. It is expressed as Tree method achieves almost the same as our method at
TP + TN Dataset 1, it only remains half when testing on Dataset 4.
Accuracy =
TP + FP + FN + TN
Recall (Sensitivity) is defined as the ratio of correctly clas- 5. DISCUSSION
sified spam in total actual spam, as According to our performance evaluation, there are two
TP factors that affects the classifier function in terms of dataset:
Recall = 1) the proportion of spam and non-spam and 2) the sample
TP + FN
dataset discretisation.
Precision is defined as true projected spam to classified
spam. It can be obtained by 5.1 Impact of Spam Ratio
TP We show the impact of the spam ratio in Table 4. It can be
P recision = found that with the change of spam ratio, the performance of
TP + FP
our proposed method remains stable. The best one achieves
F-measure is the harmonic mean of Precision and Recall, 2.45% on Racall. Therefore, it affects other text-based or
and it can be calculated as follow: non-text-based significantly. For example, in Figure 5, the
2 ∗ P recision ∗ Recall 2T P F-measure of Naive Bayes (Frequencies) is only half in the
F −measure = = ratio of 1:19 (spam:non-spam) dataset of it in 1:1 dataset.
P recision + Recall 2T P + F P + F N
In addition, the F-measure of Naive Bayes is averagely 60%
in 1:1 dataset, but it becomes one fifth in 1:19 dataset.
4.2 Comparison of Classifiers
In this subsection, we evaluate the performance of our 5.2 Impact of Sample Dataset Discretisation
work through three different classifying algorithms with vec- We further study the impact of sample dataset discretisa-
tors input trained by WordVector technique in a deep learn- tion. The results are shown in Table 5. It is found that with
ing style on four sampling datasets. The comparison results the change of the ratio of spam, the performance of our pro-
will suggest the optimal classifier that can be used in our posed method remains stable. The biggest difference is only
method (i.e. internal comparison). The list of classifiers is 2.% on Recall. Accordingly, the performance on continu-
shown in Table 1. ous dataset would be slightly better than randomly sampled
As is shown in Figure 4, all the three algorithms perform dataset for all detection from Figure 4, 5 and 6.
well. Almost all performance values are higher than 80%,
and most of them are more than 90%. As represented in
Figure 4, the technique of Random Forest outperforms the 6. CONCLUSIONS AND FUTURE WORK
other two methods all the time at the aspects of Precision In this paper, we explored the issues on the current Twit-
and Accuracy. Furthermore, MLP achieves the highest per- ter spam detection techniques, and proposed a new classifi-
formance in the metrics of Recall, Precision and Accuracy cation method based on deep learning algorithms to address
over all the four datasets. For the F-measure, MLP achieves them. For the purpose of judging its performance evalu-
highest performance in Dataset 2 and 4, and the second best ation, we firstly collected a part of labeled data (376,206
performance in Dataset 1 and 3. Since the ratio of spam on spam and 73,836 non-spam tweets) from a 10-day ground-
Dataset 2 and 4 is similar to the real world, it is reasonable truth dataset with more than 600 million real-world tweets.
to achieve the highest F-measure on them. Besides, there is Then we utilized WordVector technique for pre-processing
no significant difference among the four performance metrics them and converted them into high-dimension vectors.
(all ∼ 95% averagely). In the following, we select MLP as Future work may include several aspects: 1) The eval-
our classification method along with WordVector technique uation of this paper is mainly on empirical studies. We
to compare to other approaches as listed in Table 1. will carry out theoretical studies on the outperformance of
our methods in order to better understand the deep-learning
4.3 Comparison (vs. Syntax-based Methods) based spam detection framework. This will in addition help
In this section, we compare our method to 3 existing text- us improve the performance. 2) We will compare more clas-
based techniques vertically. Figure 5 describes the differ- sifier and other methods in the future in order to demon-
ences among different text-based methods. It indicates that strate the pros and cons of our proposed method. 3) We
will finally collect more real data from social media, partic- networks. Proceedings of the VLDB Endowment,
ularly the datasets from other social media such as Facebook 4(12):1458–1461, 2011.
and microblogs, and study the immigration of our spam de- [14] Q. V. Le and T. Mikolov. Distributed representations
tection framework. This part of work is very important to of sentences and documents. In ICML, volume 14,
both industries and academia because social spam is also pages 1188–1196, 2014.
very critical in other social media platforms. [15] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.
Nature, 521(7553):436–444, 2015.
7. REFERENCES [16] K. Lee, J. Caverlee, and S. Webb. Uncovering social
spammers: social honeypots+ machine learning. In
[1] R. Aires, A. Manfrin, S. M. Aluı́sio, and D. Santos. Proceedings of the 33rd international ACM SIGIR
Which Classification Algorithm Works Best with conference on Research and development in
Stylistic Features of Portuguese in Order to Classify information retrieval, pages 435–442. ACM, 2010.
Web Texts According to Users’ needs?. ICMC-USP, [17] S. Lee and J. Kim. Warningbird: Detecting suspicious
2004. urls in twitter stream. In NDSS, volume 12, pages
[2] N. B. Amor, S. Benferhat, and Z. Elouedi. Naive bayes 1–13, 2012.
vs decision trees in intrusion detection systems. In [18] A. Liaw and M. Wiener. Classification and regression
Proceedings of the 2004 ACM symposium on Applied by randomforest. R news, 2(3):18–22, 2002.
computing, pages 420–424. ACM, 2004. [19] S. Liu, J. Zhang, Y. Wang, and Y. Xiang. Fuzzy-based
[3] F. Benevenuto, G. Magno, T. Rodrigues, and feature and instance recovery. In Asian Conference on
V. Almeida. Detecting spammers on twitter. In Intelligent Information and Database Systems, pages
Collaboration, electronic messaging, anti-abuse and 605–615. Springer, 2016.
spam conference (CEAS), volume 6, page 12, 2010. [20] S. Liu, J. Zhang, and Y. Xiang. Statistical detection of
[4] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, online drifting twitter spam: Invited paper. In
T. Kötter, T. Meinl, P. Ohl, K. Thiel, and Proceedings of the 11th ACM on Asia Conference on
B. Wiswedel. Knime-the konstanz information miner: Computer and Communications Security, pages 1–10.
version 2.0 and beyond. AcM SIGKDD explorations ACM, 2016.
Newsletter, 11(1):26–31, 2009. [21] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker.
[5] L. Breiman. Random forests. Machine learning, Learning to detect malicious urls. ACM Transactions
45(1):5–32, 2001. on Intelligent Systems and Technology (TIST),
[6] C. Chen, J. Zhang, Y. Xiang, and W. Zhou. 2(3):30, 2011.
Asymmetric self-learning for tackling twitter spam [22] T. Mikolov, K. Chen, G. Corrado, and J. Dean.
drift. In 2015 IEEE Conference on Computer Efficient estimation of word representations in vector
Communications Workshops (INFOCOM WKSHPS), space. arXiv preprint arXiv:1301.3781, 2013.
pages 208–213. IEEE, 2015. [23] J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang.
[7] C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. An in-depth analysis of abuse on twitter. Trend Micro,
Hassan, A. AlElaiwi, and M. Alrubaian. A 225, 2014.
performance evaluation of machine learning-based [24] J. D. Rennie, L. Shih, J. Teevan, D. R. Karger, et al.
streaming spam tweets detection. IEEE Transactions Tackling the poor assumptions of naive bayes text
on Computational Social Systems, 2(3):65–76, 2015. classifiers. In ICML, volume 3, pages 616–623.
[8] R. Collobert, J. Weston, L. Bottou, M. Karlen, Washington DC), 2003.
K. Kavukcuoglu, and P. Kuksa. Natural language [25] K. Rybina. Sentiment analysis of contexts around
processing (almost) from scratch. Journal of Machine query terms in documents. PhD thesis, MasterâĂŹs
Learning Research, 12(Aug):2493–2537, 2011. thesis, 2012.
[9] T. G. Dietterich. Ensemble methods in machine [26] J. Song, S. Lee, and J. Kim. Spam filtering in twitter
learning. In International workshop on multiple using sender-receiver relationship. In International
classifier systems, pages 1–15. Springer, 2000. Workshop on Recent Advances in Intrusion Detection,
[10] V. N. Ghate and S. V. Dudul. Optimal mlp neural pages 301–317. Springer, 2011.
network classifier for fault detection of three phase [27] G. Stringhini, C. Kruegel, and G. Vigna. Detecting
induction motor. Expert Systems with Applications, spammers on social networks. In Proceedings of the
37(4):3468–3481, 2010. 26th Annual Computer Security Applications
[11] C. Grier, K. Thomas, V. Paxson, and M. Zhang. @ Conference, pages 1–9. ACM, 2010.
spam: the underground on 140 characters or less. In [28] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to
Proceedings of the 17th ACM conference on Computer sequence learning with neural networks. In Advances
and communications security, pages 27–37. ACM, in neural information processing systems, pages
2010. 3104–3112, 2014.
[12] A. Java, X. Song, T. Finin, and B. Tseng. Why we [29] D. Tang, F. Wei, B. Qin, T. Liu, and M. Zhou.
twitter: understanding microblogging usage and Coooolll: A deep learning system for twitter sentiment
communities. In Proceedings of the 9th WebKDD and classification. In Proceedings of the 8th International
1st SNA-KDD 2007 workshop on Web mining and Workshop on Semantic Evaluation (SemEval 2014),
social network analysis, pages 56–65. ACM, 2007. pages 208–212, 2014.
[13] X. Jin, C. Lin, J. Luo, and J. Han. A data [30] D. Urbansky, K. Muthmann, P. Katz, and S. Reichert.
mining-based spam detection system for social media
Tud palladian overview. TU Dresden, Department of
Systems Engineering, Chair Computer Networks, IIR
Group, 5, 2011.
[31] A. H. Wang. Don’t follow me: Spam detection in
twitter. In Security and Cryptography (SECRYPT),
Proceedings of the 2010 International Conference on,
pages 1–10. IEEE, 2010.
[32] D. Wang, S. B. Navathe, L. Liu, D. Irani,
A. Tamersoy, and C. Pu. Click traffic analysis of short
url spam on twitter. In Collaborative Computing:
Networking, Applications and Worksharing
(Collaboratecom), 2013 9th International Conference
Conference on, pages 250–259. IEEE, 2013.
[33] C. Yang, R. Harkreader, and G. Gu. Empirical
evaluation and new design for fighting evolving twitter
spammers. IEEE Transactions on Information
Forensics and Security, 8(8):1280–1293, 2013.

Twitter Spam Detection Based On Deep Learning: Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang

Uploaded by

Copyright:

Available Formats

Twitter Spam Detection Based On Deep Learning: Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Twitter Spam Detection Based On Deep Learning: Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang

Uploaded by

Copyright:

Available Formats

Twitter Spam Detection based on Deep Learning

Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang

ABSTRACT messages and share ideas around the world. Particularly,

where N is the number of tweets in training dataset.

where n is the tweets number of testing data.

0.8 0.8 0.8 0.8

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

MLP Palladian Naive Bayes Naive Bayes (Frequencies)

0.8 0.8 0.8 0.8

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

MLP Naive Bayes Random Forest Decision Tree

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

You might also like