Topic Categorization Based On User Behaviour in Random Social Networks Using Firefly Algorithm
Topic Categorization Based On User Behaviour in Random Social Networks Using Firefly Algorithm
2, April 2018 11
The remainder of this paper is organized as follows. and attracting their potential prey and protecting themselves
Section II discusses the briefly some related work applying from their predators. The swarm of fireflies will move to
firefly algorithm in similar setting. Our approach is detailed in brighter locations and more attractive locations by the flashing
section III. Experiments and results are provided in section IV. light intensity that associated with the objective function of
Section V is dedicated to a discussion of conclusion and future problem considered in order to obtain efficient optimal
work. solutions. Athraa Jasim Mohammed et al. [24] tells about the
content of online social netwoks communicated through text,
blogs, chats, news dynamically updated. Relevant information
II. RELATED WORK
are grouped by one clusters based on the similar attributes. A
Kraetzl et al. [24] describes the arbitrary degree text clustering that utilizes firefly algorithm is introduced. The
distribution is used to random graph model. In [25] Random proposed, aFA merge , clustering algorithm automatically groups
graph model give the simple unipartite networks is text documents into the appropriate number of clusters based
acquaintance networks and bipartite networks is affiliation on the behavior of firefly population and cluster combining
networks. Netwoks examples are friendship, families and process [4]. Dataset are the twenty newsgroups and Reuters
business relationships. In particular networks consult the news collection.
properties of clustering, diameter and degree distribution with
respect to these models. Charu C. Aggarwal [23] discusses Thomas Meller et al. [1] proposed to two nearest neighbor
the uncertain data management is a traditional database classification algorithms like naïve bayes algorithm and J48
management. In this traditional database management contains algorithm for making recommendations to students based on
the join processing, query processing, selectivity estimation, their academic history. Goldina Ghosh et al. [22] discuss about
OLAP queries and indexing. Mining problems are frequent the density based clustering technique is able to classified the
pattern mining, outlier detection, classification and clustering. four different topics. These classes are Interesting Motivating,
The data may contain errors or may be partially complete the Not Interesting Motivating, Interesting Deviating and Not
data. Interesting Deviating. In the ranges between the 0-1, then
Interesting Motivating range is 1-0.8, Not Interesting
Mitchell [6] describes the quick development is certainly Motivating range is 0.8-0.6, Interesting Deviating range is 0.5
occurs using the Machine learning and data mining concepts. and finally Not Interesting Deviating range is 0.4-0.2, below
Machine learning concepts are capable of extracting valuable 0.2 is noise.
knowledge from large data stores. It is discovery of models,
patterns and other regularities in data. Machine learning Bo Wu et al. [16] propose the analyzing users behavior
approaches categorized into two different approaches are the social roles to specify the collective decision-making process.
statistical or pattern-recognition methods including k-nearest These model propose the three layers are relation layer,
neighbor or instance-based learning, Bayesian classifiers, individual profile layer, content layer. In this layers related to
neural network learning and support vector machines. Seigo on real world-based roles and cyber world-based roles. The
Baba et al. [11],[14] discusses the social media can classified relation layer are the negotiation and voting. Individual profile
the retweets without text data, these method used demerits are layer are the negotiation, voting and opinion collecting.
the large amount of calculations. Users who retweet the same Content layer are the voting and opinion collecting. These are
tweet are interested in the same topic, it can classify tweets all the decision-making process [15]. S. A. M. Felicita tells
similar interests based on retweets. It related tweets based on about the achieved optimal resource utilization, maximize
retweets to make a retweet network that is connects similar throughput, minimize response time. Mosab Faqeeh et al. [18],
tweets and extracted clusters that contain similar tweets from [21] proposed to document classification, there are comparing
the constructed network by our classification method. the classification algorithms such as Naïve Bayes, Support
Vector Machines, Decision Trees. These are classified into the
Florian Michahelles et al. [9] gives the additional two topics through the facebook comments.
marketing channel that contains the professional marketing
and traditional marketing . Demerit is small dataset extracted Athraa Jasim Mahammed et al. [13], tells about the
from only one facebook page. First analyze the user discussion drawback of swarm-based algorithms such as particle swarm
with topics, intentions for participation and emotions shared optimization and ant colony optimization. Proposed another
by the users on a facebook page. Second, analyze the user swarm algorithm is firefly algorithm in text clustering. In this
activities and interactions in terms of their evolution over time algorithm discuss for two different ways, that namely weight-
and dependency based on the community size. Further, the based firefly algorithm (WFA) and weight-based firefly
user classification is based on the interaction patterns and then algorithm. weight-based firefly algorithm is a more restricted
analyzed the interactions in relation to the community. condition finding user of a cluster compared to WFA. R.
Senthamil Selvi, M. L. Valarmathi [20] discusses the efficient
Iztok Fister et al. [2], [10], [13] tells about the Firefly feature selection in the improved firefly heuristics. There are
Algorithm (FA) is a one of the optimization algorithm which following domains using the big data health care machine,
is based on the social flashing behavior of fireflies. There are bank transaction, social media. Problem of big data is NP-hard
various optimization algorithm available such as Artificial Bee and accordingly search intractable. There are huge amount of
Colony, Cuckoo Search Optimization algorithm, Ant colony twitter dataset available in the online.
Optimization and Particle Swarm Optimization. In this
algorithm, the flashing light helps fireflies for finding mates
III. SYSTEM DESIGN Finally, higher rank of firefly that the optimal result will be
Proposed architecture is given Fig. 2 shows process are the produced.
topic categorization based on user interest using the firefly
algorithm. Fig. 2 tells about the various types of topics under Firefly Algorithm for Topic Categorization
discussion may be interesting or not interesting. The Firefly algorithm is invented by Xin-She Yang and get
participants discussing about the topic can make an interesting radiate behavior of fireflies. In this algorithm mainly focused
topic more interesting, that is, motivating others to talk in on feature selection and clustering. Specifically, this research
support of the topic. In other case, a not interesting topic can clustering the topics using firefly algorithm through user
also be made interesting by the participants intervention or it comment and posthour.
may deviate away. Algorithm: Firefly function fireflies(comment, posthour);
Input: Two inputs are comment and posthour, where (posthour>comment)
Output: clusters using comment and posthour.
Step1: Objective function f(x), x = (x 1 , ..., x d )
Step2: Generate initial population of fireflies x comment =(comment = 1, 2, ..., n)
and x posthour = (posthour = 1, 2,…., n)
Step3: Light intensity I comment at x comment is determined by f(x comment )
I = I 0 e-γr2
Step4: Define light absorption coefficient, initial γ=1
Step5: Define the randomization parameter α=0.2
Step6: Define initial attractiveness I 0 =1.0
Step7: while (t < MaxGeneration)
Step8: for comment = 1 : n all n fireflies
Fig. 2: Firefly Algorithm based Topic Categorizer Step9: for posthour = 1 : comment all n fireflies
Step10: if (I posthour > I comment ), Move firefly comment towards posthour in
Topic categorization using firefly algorithm related d-dimension;
components are the population generator, fitness evaluator, xcomment = xcomment + I * (xposthour - xcomment) + αε comment
ranker and optimizer. Step11: end if
Step12: Attractiveness varies with distance r via exp[−r] Distance (r)
(x comment , x posthour ) = √(x comment -x posthour )2
Population Generator Step13: Evaluate new solutions and update light intensity
First analyze, the firefly population generation. That is Step14: end for posthour
specified in the input dataset. Inputs are facebook user Step15: end for comment
Step16: Rank the fireflies and find the current best
discussed topics through comment and posthour. It is used by Step17: end while
the interesting or not interesting that particular topic. Also, Step18: return Postprocess results and visualization.
specified the maximum generation of fireflies and input range. Step19: end
This is known as generation of fireflies.
Fig.3: Pseudo Code of Firefly Algorithm for Topic
Fitness Evaluator Categorizer
Fitness evaluator are evaluates by using the formula shown Firefly algorithm process are put the input data, population
in below. That is calculated light intensity value. Input generator, fitness evaluator, ranker and optimizer. Initially
comment is greater value than posthour. so, calculated fitness define the values are light absorption coefficient,
value or else input posthour is greater value than comment, randomization parameter and attractiveness. MaxGeneration
that is calculated movement of fireflies. Light intensity shown means maximum value of input dataset to be generated.
in eq.(1), Comment range between the 1 to n and posthour range
between the 1 to comment both for all n fireflies. Check
I=I 0 e-γr2 .…(1) I posthour >I comment satisfy this constraints their automatically
Then, find the movement of firefly. It is less brighter produce result, not satisfy this constraints move from less
location move to the higher brighter location in eq.(2), brighter location to higher brighter location using eq.(2). In
xcomment =xcomment + I * (xposthour -xcomment) + αε comment ….(2) Fig.3 shows the firefly algorithm, inputs are facebook
conversation through comments and output are the topic
Ranker clustering. Initial input must be 1 to n rank the firefly then
In this step, rank the fireflies according to their fitness finally produce optimal result.
value. Attractiveness is calculated by using the euclidean
distance in eq.(3), Distance (r) represents, IV. EXPERIMENTS AND RESULTS
(x comment , x posthour ) = √(x comment -x posthour )2 In this section discuss about the three dimensional view of
….(3) surface, moving of fireflies and topic categorization. First,
That is calculate the distance (r), and update the light intensity input data clustered by four groups and then fireflies are
value. Compare the light intensity of firefly and moving of moving into the center point that means path tracing. Finally,
fireflies, in these two values ranking the current best of value. similar topics are grouped itself.
Optimizer
Topic categorization end process is an optimization of
fireflies. There are many values iterated in ranking the firefly.
Surface in 3-D View are moving the particular range, that arranging the fireflies
In this graph four topics in the groups are classified in the moving to center of the clusters. In circle specify the clusters,
surface. Fig.4 shows the 3-D view of the graph, x-axis show particles are the fireflies, that particles closed into the contour
the input of comment, y-axis show the input of posthour and function.
z-axis show the surface. That the range between the -5 to 5. In
each and every point clustered in using the grid surface of the
function. There are 100 grid lines specified in this graph.
PSO and GA. In Fig.8 shows the test result analysis. In x-axis [6] E. Elmurngi and A. Gherbi, “An Empirical Study on Detecting Fake
Reviews Using Machine Learning Techniques”, International
define the thirty iteration iterated from the dataset and y-axis
Conference on Innovative Computing Technology, 2017.
define the fitness value of three algorithms in comparison [7] Gómez, A. Kaltenbrunner and V. Lopez “Statistical analysis of the
graph. social network and discussion threads in Slashdot”, Proceeding
Proceedings of the 17th International Conference on World Wide Web,
Table 1: Result Analysis for Topic Categorization Pp. 645–654,2008.
[8] http://en.wikipedia.org/wiki/Cluster_analysis,wikipedia.
INPUT (Multiple Users) OUTPUT (Multiple Users) [9] I.P. Cvijikj and F. Michahelles, “Understanding the user generated
Number of Number of Number of Number of Surface (z- content and interactions on a Facebook brand page”, Int. J.
comments posthour comments posthour axis)
(x-axis) (y-axis)
Soc.Humanist. Comput., Vol.2, No.1–2, Pp. 118–140, 2013.
3 4 3.94 3.98 0.99 [10] I. Fister, I. Fister Jr, X.S. Yang and J. Brest, “A comprehensive review
10 5 -3.97 4.05 0.99 of firefly algorithms”, Swarm and Evolutionary Computation, Vol. 13,
3 0 4.00 4.03 0.99 Pp. 34–46, 2013.
10 50 4.00 4.02 0.99 [11] J.W. Treem and P.M. Leonardi, “Social media use in organizations
58 3 3.99 4.00 0.99 exploring the affordances of visibility, editability, persistence, and
19 9 0.01 -3.95 1.99 association”, Commun. Yearb., Vol.36, Pp. 143–189, 2012.
1 3 -0.02 -4.01 1.99 [12] A. Kaltenbrunner, V. Gomez and V. Lopez, “Description and prediction
3 9 0.01 -0.00 1.99 of Slashdot activity”, Proceedings of the Latin American Web
0 3 0.00 -3.98 1.99 Conference, IEEE Computer Society, Pp. 57–66, 2008.
0 10 -0.00 -0.00 1.99 [13] K. Bhatt, A. Singh and D. Singh, “An Improved Optimized Web page
3 6 0.00 -0.00 2.00 Classification using Firefly Algorithm with NB Classifier”, International
4 5 -0.00 -3.99 1.99
Journal of Computer Applications, Vol. 146, No.4, 2016.
Topic categorization inputs are the comment and posthour [14] E. Kiciman, M. De Choudhury and B. Thiesson, “Analyzing social
for the multiple users and similarly on outputs then media relationships in context with discussion graphs”, Eleventh
additionally, surface of the clusters in Table.1. These values Workshop on Mining and Learning with Graphs, ACM, 2013.
[15] I. King, M.R. Lyu and H. Yang,, “Online Learning for Big Data
are plot the cluster of the topic. Output values range between Analytics”, Tutorial presentation at IEEE Big Data Santa Clara, CA,
the 0 to 1 based on light intensity and attractiveness. If the 2013.
value is zero firefly grouped by center of the cluster. [16] M. Lieberman, “Visualizing big data: Social network analysis”, Data,
Proceeding of the CASRO Digital Research Conference, on line+
Mobile+Big San Antonio, Texas, 2014.
V. CONCLUSION AND FUTURE WORK [17] X. Ling, Q. Mei, C.X. Zhai and B Schatz, “Mining multi-faceted
overviews of arbitrary topics in a text collection”, Proceeding of the
Firefly algorithm for classification was able to classify the 14th ACM SIGKDD International Conference on Knowledge Discovery
topics according to the participants activity used for the blogs, and Data Mining, Pp. 497–505, 2008.
comments, photos. It was able to classify the topics based [18] R. Duriqi, V. Raca and B. Cico, “Comparative Analysis of Classification
Algorithms on Three Different Datasets using WEKA”, Mediterranean
number of users that communicate in the particular topics. The Conference on Embedded Computing, 2016.
numeric values of share and comments are produced for [19] S. Arora and S. Singh, “The Firefly Optimization Algorithm:
obtaining the results. The topic similarity is measured within Convergence Analysis and Parameter Selection”, Int. J. of Com. Appli,
the range of the comment and posthour, the range of value is Vol. 69, No. 3, 2013.
[20] V. Subha and D. Murugan, “Opposition-Based Firefly Algorithm
calculated using the firefly algorithm and ranking is done Optimized Feature Subset Selection Approach for Fetal Risk
based on value for fireflies. Propose to select the features from Anticipation, Machine Learning and Applications”, An International
the classification data which is necessary to identify whether Journal, Vol. 3, No. 2, 2016.
the topic either interesting or not interesting. Predict the best [21] T. Meller, E. Wang and F. Lin, “New Classification Algorithms for
Developing Online Program Recommendation Systems”, Int. C. on
possible decision based on the conversation of different Mobile, 2009.
participants in random social network using ranking method, it [22] V. Virgilio, “Exploring Big Data in Social Networks”, Key Note address
will take a long period time. So, it will enhance the limited INWEB-National Science and Technology Institute for Web Federal
time period. University of Minas Gerais-UFMG, 2013.
[23] Y. Cheng, R. Chi and S. Zhu, “An Uncertain Data Model Construction
Method Based on Non-parametric Estimation”, IEEE International
REFERENCES Conference on Electronic Information and Communication Technology,
2016.
[1] A. Lamba and D. Kumar, “Optimization of KNN with Firefly [24] Y. Zhou, X. Guan, Z. Zhang and B. Zhang, “Predicting the tendency of
Algorithm”, BIJIT-BVICAM’s International Journal of Information topic discussion on the online social networks using a dynamic
Technology, Vol. 8 No. 2, 2016. Probability model, webscience”, Proceedings of the Hypertext
[2] A.J. Mohammed, Y. Yusof and H. Husni, “Determining Number of Workshop on Collaboration and Collective Intelligence, 2008.
Clusters Using Firefly Algorithm with Cluster Merging for Text [25] M.E.J. Newman, D.J. Watts and S.H. Strogatz, “Random graph models
Clustering”, International Visual Informatics Conference, Pp. 14–24, of social networks”, Proceedings of the National Academy of Sciences
2015. of the United State of America, Vol. 99, 2002.
[3] C.C. Aggarwal, “Network analysis in the big data age: Mining graphs
and social streams”, IBM T J Watson Research Center, ECML/PKDD,
2014.
[4] M.D. Choudhury, A. Monroy-Hernandez and G. Mark, “Narco
Emotions: Affect and Desensitization in Social Media during the
Mexican Drug War”, CHI, 2014.
[5] M. De Choudhury, W.A. Mason, J.M. Hofman and D.J. Watts,
“Inferring relevant social networks from interpersonal communication”,
Proceedings of the 19th International Conference on World Wide Web,
Pp. 301–310, 2010.