2018 Moscow Workshop on Electronic and Networking Technologies (MWENT)

Modeling of User Behavior

for Social Media Analysis
Anton Ivaschenko, Anastasiya Khorina, Vladislav Isayko, Daniil Krupin,
Viktor Bolotsky, and Pavel Sitnikov

principles of distributed simulation and decision-making

Abstract—Social media organization and analysis is one of the support powered by multi-agent technology [1]. The virtual
most interesting research areas nowadays implemented by world of social media should be treated as a complex network
modern networks and telecommunication technologies. of continuously running and co-evolving intelligent agents.
Implementation of modern social media in practice requires a
Such solutions are based on holons paradigm and bio-inspired
combination of technologies taken form technical and social
sciences. Many modern research works study it and try to approach [2], which requires development of new methods
understand the basic principles of its development and evolution. and tools for supporting fundamental mechanisms of self-
In this paper there is proposed an original model for social media organization and evolution similar to living organisms
user behavior formalization that can be used for integration of (colonies of ants, swarms of bees, etc) [3].
several social media, data import and processing using modern As for the human beings represented by actors or agents,
technologies of Big Data analysis. The proposed model allows
social network user should consider a combination of human
capturing the process of Internet user’s activity considering a
combination of human and time factors. Software solution was and time factors. Interaction of customers and service
developed for social networks analysis that provides providers powered by intermediary services generate and can
identification of positive and negative trends in users focus be characterized by a big number of events that form Big Data
evolution. The proposed approach is illustrated by the results of and require modern technologies for its analysis [4].
open data analysis taken from several popular social networks. Modeling the Internet users’ behavior can be based on the
modern principles of knowledge representation in the form of
Index Terms—Big Data, Multi-agent systems, Social Networks.
Ontologies [5]. These concepts allow formalizing self-
organization and semantics, which is advantageous for
I. INTRODUCTION abstract description of social concepts and their interaction in
technical applications.
S OCIAL networks and web communities attract the interest
of scientific researchers and IT engineers for more than 10
years. Deep study of social networks can be provided
In the context of this paper there should be mentioned the
papers on Internet development strategies [7], virtual
communities and social networks studies [8 – 9]. Despite the
nowadays using the modern technologies for Big Data successful application of mathematical statistics used to
analysis. Despite the successful application of mathematical cluster and generalize the user’s behavior the problem of Big
statistics used to cluster and generalize the user’s behavior the Data analysis of social networks remains open. This happens
problem of Big Data analysis of social networking remains due to a necessity to personalize user activity models and
still open. This happens due to a necessity to personalize user understand individual features of human behavior.
activity models and understand individual features of human Our experience in the area of integrated information space
behavior. In this paper there is proposed a possible solution for development and its users’ behavior analysis [10 – 12] can be
this problem. used to build a software solution to derive basic trends in
social media and provide intelligent functionality for social
II. STATE OF THE ART media big data analysis. The proposed abstract model and
New opportunities of interaction in virtual environments solution vision are given below.
allow Internet users to exchange the ideas immediately. At the
same time everybody needs to obtain and process lots of III. ABSTRACT MODEL
incoming events. This process can be described by the modern Let us present a community of Internet users by u i , where
i = 1.. N u– a number of users. The activity of users
A. Ivaschenko is a professor at Samara National Research University, information exchange can be presented by posts, comments or
Samara, Russia (e-mail: anton.ivashenko@gmail.com).
A. Khorina and V. Isayko are students of Samara National Research messages p j , where j = 1.. N w – an absolute number of an
University, Samara, Russia.
D. Krupin and V. Bolotsky are lead engineers at SEC “Open Code”,
informational object. Post generation is an event
Samara, Russia.
P. Sitnikov is a chair at ITMO University, Saint-Petersburg, Russia (e-
( )
g i , j = u i , p j , t i0, j (1)
mail: sitnikov@o-code.ru).
2018 Moscow Workshop on Electronic and Networking Technologies (MWENT)

Issue or processing of an information object can be

presented by an event ei , j that can be characterized by the
combination of user, focus, and time:
( ( ))
ei , j ,k = ei , j ,k p k , u i , f i ,k ,t i , j ,k = {0,1} (2)
where focus f i ,k presents the current user interest and can be
described by a tag cloud, which is a set of pairs:
f i ,k = (τ n , wl ,k )i ,k } (3)
where τ n is a tag (keyword) with weight wn, k .
The sequence of interdependent user focuses represents the
evolution of the user’s interest.
Each user has own ontology that forms the basis for his
perception. It changes with time under the influence of
learning and forgetting the information (presented by posts,
comments or messages) and can be presented by a chain of
ci ,m = (τ l , wl ,m )i ,m } (4)
This changing is correlated with user focus. The focus
cannot be considerably new in order to provide positive Fig. 1. Solution multi-agent architecture
perception, and at the same time it is not equal to the context
Post generator used to create posts according to predefined
to be able to excite interest.
logic. Navigator is used to process incoming data which is
Considering this correlation let us synchronize the context
described by network and can be presented like a sorted graph
and focus changes:
( ( ))
ei′, j ,m = ei′, j ,m p k , u i , ci ,m , t i , j ,m = {0,1} (5)
there the nodes are informational objects, for example web
sites, documents, posts, comments, and the links are
The statements (2) and (5) are Boolean variables, which references between these objects. Each object can refer to
mean that appearance or perception of a post, comment or several other objects and documents; and the navigator
message does not guarantee changes in focus and context. according to predefined all logic, decides, which link to go.
Events (2) and (5) can be used for analysis. One of the In addition to navigator and post generator there are
possible implementations is presented below. Study of the provided the informational frames under the multi-agent
user’s focus and context trends allows identifying tendencies, architecture that correspond to a predefined above focus and
variations and iterations that form the patterns of user context concepts. Focus is used represent the current interest
creativity. of Internet users. Context is used to formalize informational
In case new informational proposal remain suspended and space in which the agent performs its negotiating activity.
do not make any effect over the user focus, this means that the Based on the provided model there was developed an
user does not see any interest. Possible reasons are concerned algorithm for social media big data analysis. The model is
with context: additional education is needed to provoke such used to formalize the social media user and integrate the
interest. On the other side, lots of changes of the user context analytical software with various social media open for data
indicate the search for a stable interest that should be proposed import and analysis. The algorithm is presented in Fig. 2.
for the user at a certain time. This algorithm consists of 2 stages:
Context and focus can be also influenced by negative 1. Calculation of the sample frequency vector for all users
intervention. In order to manage the user focus there can be and development of the root-mean-square deviation vector for
generated a series of repeated affections partially covering the a variety of users
actual context and the targeted interest. Such patterns can also • You need to selecting topics and converting them to a view
be identified applying cross-correlation analysis to the (topic_hour, 1/users);
proposed model, which helps identification and resistance to • After process the received pairs and calculate the amount
negative informational influence. 1/user (topic_hour, ∑1/users);
• Calculate the root-mean-square deviation for each pair;
IV. SOLUTION ARCHITECTURE • The obtained values are divided by the period.
The proposed approach is based on simulation of focus and 2. Calculating the deviation metric for a particular user
context. It was implemented in multi agent architecture, which • You need to selecting topics and converting them to a view
is presented in Figure 1. Under the bounds of our proposed (topic_hour_user, 1);
architecture we provide profile descriptor, post generator and • Next you need to process of data pairs and counting the sum
navigator. These are methods generated and used to simulate of topics with the same key (topic_hour_user, sum);
real activity of users in the social networks. • Count the deviation of a particular user
2018 Moscow Workshop on Electronic and Networking Technologies (MWENT)

(topic_hour_user,∆); One of the main features of Internet users’ activity online

• Summarize the deviations of a particular user that should be considered under the explored scope is mutual
(topic_hour_user,∑∆); influence of contexts and focuses of communicating peers.
• Divide the sum of the deviations by the number of topics This factor makes it possible to introduce the control loop: in
(n) for a particular user. addition to web content semantic analysis the platform starts
After these steps you need to generate the resulting CSV- management the users interest based on focus identification
file in the form of a table with user data and information of and context feedback.
standard deviation of this user. This information is captured in social networks and has all
necessary details to get actual estimations. Still in this case it
is required to provide integration with social networks and the
data being processed contain tons of subjective assessments
and perceptions. Online libraries and professional
communities are more neutral. For example, Wikipedia
enforces various groups of authors to update the articles
targeting maximum objectivity. Analysis of this data can help
adequate identification of significant trends of consumers
focus identification that can be practically used e.g. for
marketing and product placement.
Activation method is used to simulate multi agent activities
in real times. The special agent dispatcher will call all of the
agents by using this activation method and after be activated
each agent generates the time series period according to some
distribution rule. Agent generates time series of calling
navigation and post generation methods. At this stages we
decides the proposed of navigation and generation by this
means that we can simulate post writers or post readers and
introduce some specific models of online activity, for example
the agent can be more active in night time or we can time
some time frames for high/low activity.
Focus and contexts are updates the results of real agent
behavior based on the influents of informational objects. We
can generate focus and contexts according to our goals and in
case we want to agent to behave in a sort of specific way we
introduce this control directly and formulate focus and context
the agent will do that you want. We will do behave the
specific way the other one method but this direct control not
good for real systems because there are none such direct
influence in real world but there is the sort of information
This approach allows simulating this influence and is
introduced it in multi agent system. In this case we need to
analyze the focus and contests changes of the agent during the
period of time and on this based on this analyzed we introduce
changes in focus by generating informational objects inside
this network. This can be done in real systems using the
contexts based advertisements. We can generate just the
objects with certain informational context, which can be
described by tag clouds.
The introduction of architecture can be used for simulate
online users in social networks and model realistic Internet
In other one area of simulation, practical application is
generation of cognitive patterns of collective behavior based
Fig. 2. Social media big data analysis on self-organization. In this case the agent should be simple
and the logic of focus and context should be close to very
simple but generic behavior. This logic can correspond to
know real users of social networks but it can represent some
2018 Moscow Workshop on Electronic and Networking Technologies (MWENT)

generalized behavior and the community of agent. Such corresponds the Bot activity and can be easily identified by the
behavior can be used to study and develop some visual cases. agent comparing the behavior of previous periods. The
In another case it is implemented as a sort of a frame, using described research results show that the proposed model can
which the algorithms of syntactical analysis or other large data be used for online behavior analysis and identification of
analysis can look onto the real world of social networks and negative informational influence.
filter the data for intelligent study.

To implement the proposed approach there was developed a
software solution for social media focus identification based
on knowledge discovery and Big Data analysis.
The solution can integrate with various data sources, pick
out concepts, generate tag clouds for contexts and focuses and
process their changes in time. Solution implementation
architecture is presented in Fig. 3. The data imported from
social networks is captured in database and can be processed
either in real time or in batch mode.

Fig. 4. Bot activity identification

As shown above, the proposed model allows capturing the
process of Internet user’s activity considering a combination
of human and time factors.

[1] M. Wooldridge, “An introduction to multi-agent systems”, John Wiley
and Sons, Chichester. 2002, 340 p.
[2] P. Leitao, “Holonic rationale and self-organization on design of complex
evolvable systems”, HoloMAS 2009, LNAI 5696, Springer-Verlag Berlin
Heidelberg. 2009. – pp. 1 – 12.
Fig. 3. Integration model [3] V. I. Gorodetskii, “Self-organization and multiagent systems: I. Models
of multiagent self-organization”, Journal of Computer and Systems
Sciences International, vol. 51, issue 2, 2012, pp. 256 – 281.
Crawler addresses asynchronously to a web service with [4] N. Bessis and C. Dobre, “Big Data and Internet of Things: A roadmap
requests for data from social networks. After receiving the for smart environments”, Studies in Computational Intelligence,
Springer. 2014. 450 p.
request, the web service starts processing it. Next, the web [5] D. Mouromtsev, D. Pavlov, Y. Emelyanov, A. Morozov, D.
service accesses the integrator, which starts downloading the Razdyakonov and M. Galkin, “The simple, web-based tool for
requested data in the form of RDF / XML files, storing the visualization and sharing of semantic data and ontologies”, CEUR
intermediate data received from the single request of the Workshop Proceedings, 2015. – pp. 77
[6] One Internet. Global commission on Internet Governance. 2016. Report.
crawler to receive the data by the single block to transfer the [7] H. Balakrishnan, N. Deo, “Discovering communities in complex
already downloaded ones. networks”, Proceedings of the 44th annual Southeast regional
Then in the background, i.e. in a mode where there is no conference, March 10–12, Melbourne, Florida, USA. 2006. – pp. 280–
need to control the data unloading process, the integrator [8] W. Wei, K. Joseph, H. Liu, K.M. Carley, "Exploring Characteristics of
automatically continues the embedded process and uploads the Suspended Users and Network Stability on Twitter". Social Network
data to the database and uses Apache JENA to generate RDF / Analysis and Mining. 2016. – pp. 6 – 51
[9] C. Kadushin, “Understanding social networks: theories, concepts, and
XML files that will be transferred to the first crawler address. findings”, Oxford University Press. 2012, 264 p.
The described model, software solution and its [10] A. Ivaschenko, “Multi-agent solution for business processes
implementation was probated and tested using a typical data management of 5PL transportation provider”. Lecture Notes in Business
Information Processing, Vol. 170. 2014. pp. 110 – 120.
set derived from a number of social networks. In addition to a [11] A. Ivaschenko, A. Minaev, M. Spodobaev, “Self-mediator software for
real regular result set of social media users’ negotiation there sensor networks”, Proceedings of the 2015 International Siberian
was introduced a peak batch of posts generated by an online Conference on Control and Communications (SIBCON), 2015, 4 p.
[12] A. Ivaschenko, A. Lednev, A. Diyazitdinova, P. Sitnikov, “Agent-based
bot. Apart from the social media (getting no a prior knowledge outsourcing solution for agency service management”, Lecture Notes in
of a data structure) the big data analysis algorithms was able to Networks and Systems, vol 16. Springer, Cham, 2016. – pp. 204 – 215.
identify the online bot influence.
The results are presented in Fig. 4. Gray lines represent the
annual trends of users’ activity. The peak identified on Aug 15

