A Toolkit to Support Dynamic Social Network Visualization

Cao, Yiwei; Klamma, Ralf; Spaniol, Marc; Leng, Yan

A Toolkit to Support Dynamic Social Network Visualization Yiwei Cao, Ralf Klamma, Marc Spaniol and Yan Leng Informatik 5 (Information Systems), RWTH Aachen University, Ahornstr. 55, 52056 Aachen, Germany {cao, klamma, spaniol, leng}@i5.informatik.rwth-aachen.de Abstract. In this paper we introduce the design, implementation and evaluation of the Dynamic Visualization Toolkit (DyVT) to support complex dynamic social network visualization. Dynamic aspects of social networks such as spatiotemporal as well as personalized information can be visualized in a common toolkit. To that end, an XML-based target language DyVTML is an extension of existing schemata enabling expression, storage and interchange of rich animated social network data. With the language and the available tool support, even less-experienced users can visualize temporal data in animations and spatial data in maps and personalize it with icons and colors. The prototype is evaluated by the visualization of large mailing list data sets. Keywords: Information systems, Information visualization, Social network analysis, XML. 1 Introduction Social networks present a determinable structure that shows how people know each other either directly or indirectly. They are a special kind of networks in which nodes are entities that have values in a social context and the related social relationships. Although one of the key characteristics of social relationships is changes, most social network visualization is static due to the high complexity of social network data. For instance, the temporal attributes are often neglected which indicate when social relations take place or are discarded. However, understanding networks from a dynamic perspective is essential, because it facilitates reasoning real objects such as complex dynamic systems that evolve over time in the real world. Besides just creating some nice graphics, social network visualization can generate learning situations [14]. It also provides investigators with new insights into network structures and helps them communicate with others [5]. So far, the research community has developed a number of tools for building, analyzing and visualizing social networks. However, these existing tools have several problems. First of all, they have “structural bias” which implicitly denies much of the dynamic nature of social relations [9]. The static data lacks other network data attributes such as spatial data [10]. Researchers are more and more interested in how networks develop and change timely and spatially. Secondly, the visualization results are not intuitive enough. Users are not allowed to choose the appearance of social network visualization. Finally, there is no specified language that covers a wide range of dynamic aspects for social network visualization in terms of interoperability, yet. Our research intends to surmount these limitations. An XML-based target language is specified to be used in a metadata repository where temporal, spatial metadata are integrated into a uniform XML file. Furthermore, this target language also supports visualization. Based on this XML language, we have developed a Dynamic Visualization Toolkit (DyVT) to give end users a better insight into their social networks. It uses animations to visualize temporal data that shows how relationships emerge over time. Moreover, graphs with map backgrounds are used to represent spatial data. Users’ personalized information is also represented dynamically as well. In sum, DyVT aims to visualize temporal, geospatial and personalized information by extracting, integrating, and processing data from diverse data sources. The results of the system can be also transformed into various multimedia formats. In the prototype of DyVT, we use data from the mailing lists of the EU Network of Excellence project PROLEARN (www.prolearn-academy.org) focusing on technology enhanced learning and professional training within Europe. The rest of this paper is organized as follows. Section 2 gives an overview of the state of the art in the field of dynamic social network visualization. Section 3 is devoted to the architectural design of DyVT. Section 4 describes the corresponding implementations and presents the prototype. Finally, we summarize the research work and give the perspective of future work in Section 5. 2 Related Work Social network concepts are the background knowledge for social network visualization. We also observe the diverse media in the Internet which builds up a digital social environment [12]. Visualization methods are discussed with regard to temporal, geospatial and personalized data. Furthermore, several XML based interchange formats are introduced, owing to the requirement of defining a target language to express rich social network data. 2.1 Basic Social Network Concepts Social networks are based on an assumption of the importance of relationships among interacting units. The social network perspective encompasses theories, models, and applications that are expressed in terms of relational concepts or processes. The important concepts of social networks are as follows [4]. Actors and their actions are viewed as interdependent rather than independent and autonomous units. Relational ties or linkages between actors are channels for transfer or flow of either material or nonmaterial resources. Network models focus on individuals and consider the network structural environment as opportunities for or constraints on individual actions. Network models conceptualize social, economic, and political structures etc. as lasting patterns of relations among actors. In practices, the Internet is increasingly serving as a mediator of social activities. It facilitates the social processes of communities in many categories of environments and mechanisms. The digital social environment can be defined as social environments that provide diverse forms of support to the social processes. It includes a variety of systems ranging from very explicit and centralized community systems that directly support people’s interactions, to some decentralized community systems that support peer-to-peer mode of interaction and that are directly controlled by their users [12]. The communication links are key objects to be visualized in social network visualization. Centralized media support one-to-many mode including forums, Wikis, newsletters, and centralized mailing lists etc. Forums are public online open spaces for discussion in which the communication link exits between every two members participating in the same online forum. Virtual community systems are a community of people sharing common interests, ideas, and feelings over the Internet or other collaborative networks. The communication link exists between two members if two persons share resources. In newsletters the communication link exists between the administrator and every other member who is subscribed to the newsletter. A centralized mailing list is a list of people who subscribe to a periodic mailing distribution on a particular topic. When an email message is sent to the mailing list, it is automatically forwarded to all addresses in the list. So senders have communication links to every member in the list. Wikis allow users freely creating and editing Web page content in Web browsers. The communication link between two persons could be two members who edit the same Web page. Decentralized media support peer-to-peer mode including Weblogs, emails, and mailing lists etc. For emails, a communication link is defined between two persons if they send emails to each other. Compared to the centralized mailing list, in decentralized mailing lists a communication link is similar to email. Weblogs are basically a journal published on the Web. They have two kinds of communication links. One is that one member makes comments to the Weblog entry of another member. The other is that two members make comments to the same Weblog entry. In conclusion, all these communication links on the Internet is valuable social network data to research on forming and development of communities and on social processes within as well as cross communities. 2.2 Visualization of Social Networks Social network analysts use two kinds of tools from mathematics to represent information about patterns of ties among social actors: node-link graphs and matrixes. The more common visualization is node-link graphs which consist of nodes (actors) connected by edges (ties). The main goal is optimization of graphs’ layout for comprehensibility and aesthetics. For example, certain mechanism is used to place nodes in adjacent positions according to topological and structural criteria [1]. Considering various types of social network data, temporal, geospatial, and personalized visualization are discussed respectively in the following. Temporal visualization. Much recent research shows that the most static network images do a poor job of understanding how networks develop and change. In order to represent the structure of social network efficiently, at least two dimensions are needed. Thus, there is no dimension to visualize time. Based on the literature survey, temporal visualization approaches based on node-link graphs are represented by animation layout. It is considered as the most suitable way to represent changes of underlying network structures. It is complicated to determine the effectiveness of animation, which is influenced by many factors. Among them, two important factors are stated in [3]. Readability of visualization depends on aesthetic criteria to get visualization comprehensive. And mental map preservation means those nodes that exist in the series of networks remain in the same positions. An intuitive dynamic visualization should balance both criteria. Compared with the static layout, animation layout considers much on dynamic social network relations which usually have two dimensions [10]. One is the relational pace concerning the rate of change in relations. The pace of change information can be described with regard to levels, changes or stabilities. The other is sequence which focuses on the order of relations. A better understanding of network changes depends on identifying such order of relations. With the both aforementioned criteria in mind, multiple images of networks are produced according to the relational pace and then these series of images are placed in sequences. In addition, sliders or other controls are often used to navigate the animation loop directly. Geospatial visualization. With the development of network and information technology, the spatial data collection, sharing and analyzing is becoming more and more important. Spatial objects have spatial relationships such as overlap and containing [11], which adds complexity to social network data. Compared to traditional visualization methods, background maps are a common way to visualize and understand geospatial data of social networks. Moreover, Web map services are effective approaches to geospatial data visualization. Personalized visualization. Another interesting category of visualization is to represent users’ preference of data visualization. We call it personalized visualization here. It could visualize icons, sizes, and colors of nodes, colors, weights, line types of links, even the layout of social network according to various layout algorithms. 2. 3 Languages for Dynamic Social Network Data Languages for social network data are used to describe the structures and contents of the sets of observations [6]. One of the most general characteristics of social network data is that they have values in a social context. Referred to languages for dynamic social network data, the focus is how to represent attributes that change over time. Besides, the principle types of dynamic social network data are relational data and attribute data. Relational data are connections defined by the different rules in digital social environments. Relational data focuses on the investigation of the social network structure. Attribute data is about attributes, opinions and behaviors of nodes. Both relational and attribute data can be gathered as a whole from various data sources, such as questionnaires, direct observation, written records, and experiments etc. In order to represent relational data and attribute data, three important XML based markup languages for graphics have been surveyed. GraphML stands for Graph Markup Language and is a graph exchange format that aims to represent either relational data or attribute data. GraphML is also the only published format that supports manipulation of dynamic graph data. In addition, GraphML is supported in large number of graph analysis and visualization software, such as JUNG, yFile, etc. Its major features are listed as below [2]: • Simplicity. It is easy to parse and interpret for both humans and machines. • Generality. There is no limitation with respect to the graph models such as hyper graphs and hierarchical graphs etc. • Extensibility. It is possible to extend the format in a well-defined way to represent additional data required by arbitrary applications or more sophisticated use, e.g. sending a layout algorithm together with the graph. • Robustness. Systems that are not capable of handling the full range of graph models or added information can be easily recognized. Models can be extracted to the subsets that can be handled in the systems. The graph model used in GraphML is: G = (V , E , D) , Where V is the set of nodes, E is the set of edges and D represents data labels. A valid GraphML data file has two parts: the header is used to define some basic features of the GraphML, such as XML standard, XML Schema and a root element. The graph topology is a central part of the GraphML including definition of both nodes and edges. The GraphML elements can contain any number of graphs. Edges and nodes may be ordered arbitrarily. For instance, it is unnecessary to list all nodes before all edges. The complexity of GraphML is low, since the space requirement for saving a graph with n nodes and m edges in GraphML is only Ο(n + m) . KML is a file format used to display geographic data in a special browser for maps, such as Google Earth, Google Maps, and Google Maps for mobile. A KML file is processed in the same way as HTML (and XML) files are processed by Web browsers. Like HTML, KML has a tag-based structure with names and attributes used for specific display purposes. Thus, Google Earth and Google Maps are the common browsers for KML files [8]. DyNetML is an XML-based social network language to address the needs of data interchange, developed at Carnegie Mellon University [13]. DyNetML represents dynamic network data as sets of time slices. Each of the time slices is a descriptive snapshot of the organization at a given time. A dynamic network element is defined as a sequence of MetaMatrix elements representing a snapshot of the organization for one time period. Each of the MetaMatrix elements consists of the following objects. TimePeriod allows clear identification of each time slice. Properties and measures represent data about all of the time slices. Node contains one or more node sets. Networks contain all networks in one time slice. And Anthrop facilitates the link of network data to anthropological data. Summarily, DyNetML can be used to describe temporal data, while KML format can be used to represent geospatial metadata. Since GraphML is designed to be extended easily, the implementation of logical integration of these three concepts can be performed by extending GraphML. 3 System Design of DyVT With reference to the state-of-the-art technologies, suitable methods are chosen to develop the DyVT. In this section the design issues of DyVT is discussed. It begins with the requirement analysis, followed by the system concepts and the data model. 3.1 Requirements Analysis Both functional and non-functional requirements are analyzed. The main motivations are both the limitations of existing social network visualization systems and userspecified requirements. The limitations of the existing social network tools are critical. First, these tools lack rich social network data representation. The interoperability oriented target languages are rarely expressive enough to fully represent the rich social network data. Second, current dynamic social network visualization forms are quite restricted. Line graphs simulate the social network changes as lines in the graphs. However, it is impossible to represent the global change of network over time, because such summary statistics provide information on a single dimension of a network structure. The other approach is to examine separate images over time, whereas this approach lacks readability. The reason is that it is impossible to identify the sequence linking node position in one frame to the position in the next. Third, the function of user customization is often missing. Customization refers to user preferred data about social network visualization. Fourth, the dynamic nature of social relations is ignored. The dynamic nature refers to contextualized information such as changing spatial, temporal and personalized information from users. A survey of the existing dynamic social network visualization tools shows that normally only one of these dynamic natures is visualized. Based on the state of the art and potential users, functional requirements can be divided into four categories. 1. Integration and interoperability of metadata. A target language records all kinds of data including raw data, temporal data and spatial data as well. Thus, it allows expressing and exchanging rich social network data. 2. Visualization of temporal data. The system should provide appropriate animation visualization for end users to know how data changes along time. 3. Visualization of spatial data. The user should be able to obtain the graph on maps where nodes and edges are located according to their real geographical location, if the social network data is location relevant. 4. Visualization of personalized data. The system should enable the user to have flexible choices to define the graphs in their own ways by selecting colors, sizes or icons etc. 3. 2 System Concepts and Data Modeling The main concept of DyVT is depicted in Figure 1. Social network data together with temporal, spatial, and appearance or personalized data are the input to DyVT. In DyVT temporal, spatial and personalized visualization are supported via an XMLbased target language. An XML-based target language, so-called DyVTML, is defined to integrate metadata from different sources. It enables the system to visualize temporal, spatial and personalized data in a highly collaborative unified interface. XML is chosen because of its simplicity, extensibility, interoperability, and openness. Fig. 1. System concepts of a Dynamic Visualization Tool (DyVT) Users can make good use of temporal data by an animation that shows how relationships emerge over time. At the same time, users get spatial insight into their data with the layout of these nodes on the map according to the geographic information. Users have also flexible options to specify the graph styles. It is possible for users to choose their favorite colors, icons to represent graph. In addition, the visualization results are changed dynamically by the users’ different choices. Furthermore, with the help of the DyVTML, the visualization results are compatible to various multimedia formats such SVG, GIF and JPEG etc. Consequently, they can be well exported into the diverse formats. After presenting the system concepts, we introduce DyVT on its data level. The data used in DyVT mainly comes from three sources: mailing list databases, spatial data, and personalized data which refers to user chosen appearance data. The mailing list data is stored in relational database, spatial data is in some spatial databases and user chosen appearance data is defined by users through a graphical user interface. The DyVTML is used to interoperate data extracted from different data sources and to transform them into data elements with the corresponding predefined tags. However, the major drawback of XML-based format stems from the size and complexity of XML files. The growth in size is dictated by needs for rigorous markup, as every data element requires a number of delimiter tags to describe its function to the parser. In order to solve this problem, we extract some redundant data elements from the DyVTML and define a new data format called Appearance Data Markup Language (ADML). DyVTML is also used to store social network data for visualization component. The major characteristic of these data is independence of visualization tier which includes relational social network data, temporal data and spatial data. Relational social network data is extracted from the mailing list data and contributes to build the structure of the social network. Temporal data is also from the mailing lists and provides temporal attributes to define both nodes and edges. Spatial data is extracted from some existing geospatial databases and used to determine the location of each node. Summarily, in the context of the mailing list the data stored in DyVTML describes email communication events with attributes about when and where they occur. Fig. 2. Data modeling of DyVT in the entity relationship diagram ADML is used to store user chosen appearance data. It comprises node appearance data such as colors, sizes, shapes, and label colors, etc. as well as edge appearance data such as colors, widths, weights, and label colors, etc. correspondingly. There are two advantages to use these two data formats separately. On the one hand, redundancy is reduced. As the number of nodes increases, it results in great complexity to define appearance for each node and edge in the system. It leads to redundant expression for describing both nodes and edges in DyVTML. On the other hand, processing time is saved. In order to get a better visualization result, DyVT enables users to change settings of graphical appearance for both nodes and edges. An updating process will be called several times based on the same social network data, when the user changes the settings. Usually, the appearance data is much smaller than the underlying social network data. Using this method, the processing time will be reduced by only parsing the updated appearance data rather than the whole social network data. The relationship of ADML and DyVTML as well as the data entities are illustrated in Figure 2. This data model is the guideline for the system implementation. 4. Implementation of DyVT The implementation of DyVT is based on three tiers: the database tier, the metadata enrichment tier and the visualization tier. The database tier contains raw data sources. In DyVT, we have three data sources: the mailing list database of the EU project PROLEARN, the GeoLiteCity database to get the mapping information between IP addresses to cities, and user chosen network appearance data. Since DyVT reads data from different data sources, then in the metadata enrichment tier we need to refine the raw data. Raw data consists of two types: relational data and attribute data. The former can be extracted from database by some constraints such as content or time etc. The latter can be attained from the graphic user interfaces. To integrate these two types of data efficiently, the metadata refining module deals with this piece of integration work through defining an XML-based data format for metadata storage. On the other hand, in the metadata modeling module, a built-in parser makes it possible to extract key fields from the XML-based DyVTML data file created in the previous modules for the further use. In the visualization tier, a resulting metadata model is responsible for containing all the information needed to draw a visual representation of the data. Two handlers are used, the temporal view handler and the map view handler. They are used to create visual features according to their algorithms. Contents of the visual abstraction are put into three types of views including animation view, static view and map view by users’ choices. These visualization results can be exported into several formats such as SVG, GIF, and JPEG etc. Moreover, like the basic visualization model, we provide user interaction in this tier as well. DyVT provides two levels of user interactions. One is for details of each time window in animation views. User can extract any static graph generated in any time window. The other interaction mainly serves the needs of experts who focus on studying the detail of layout algorithms. From the user interfaces, users can see and edit more details on configuration parameters for each algorithm. Two layout algorithms are currently implemented. The cycle layout arranges nodes within the predefined radius. The Kamada-Kawai layout (KK layout) is based on the algorithm by Kamada and Kawai [7] which optimizes a graph with the goal of the lowest energy sum. The energy is defined as the topological distances between each pair of nodes. Fig. 3. An instance of DyVTML In short, after we have captured the all needed kinds of metadata in the database tier and have refined and remodeled this metadata in the metadata enrichment tier, the visualization component is also executed in this tier. The actual rendering of the metadata onto views is done, too. These views can provide varying perspective onto data. For example, the use of maps shows the visualization with geographical attachment, and animations show the dynamic temporal data visualization. The DyVTML (cf. Figure 3) plays an important role to interchange the data among the three tiers to visualize the dynamic social network. The DyVT screenshot in Figure 4 illustrates the relationships among senders and receivers of the PROLEARN mailing list within a certain time period. The location of each sender or receiver is also visualized via Google Map API. A user-friendly interface is available by means of a set of wizards. The evaluation is performed by a task-oriented usability testing within the PROLEARN user communities. We defined tasks and asked users to finish the tasks and to record the time. Tasks are selected to get better feedback from users’ point of view of functional and non-functional requirements of the system. Evaluation results show that DyVT helps users build and specify the social network data visualization effectively via the dynamic views. 5 Conclusions and Future Work As a prototype for dynamic social network visualization, DyVT provides the possibilities to realize temporal visualization with animation in multiple views, spatial visualization, and personalized visualization in one uniform platform. So DyVT is innovative at unifying the three types of visualization in one solo toolkit. The XML- based language DyVTML enhances data interoperability among various multimedia data formats. Besides, according to users’ feedback some improvement work can still be done in future. Currently, only email exchange between two persons are implemented, which is a single relationship among actors. However, social relations among actors are usually more complicated, in which actors are connected in multiple ways simultaneously. We call this kind of network multi-relational network representing a heterogeneous set of nodes and edges. Therefore, there exist not only multiple entities of varying types, but also different ways or semantics by which these entities are connected. Fig. 4. The screenshot DyVT for geospatial visualization in continuous time slices and with user chosen icons In addition, only node-link graphs are used to visualize both temporal and geospatial network data. With regard to the usability testing results, the readability of node-link graph deteriorates significantly, when the size of the graph increases. Hence, a node-link diagram is only suitable for small graphs. To visualize large graphs, other representation should be employed such as matrix-based representation. Compared with node-link graphs, matrix has no links overlapping problems and is less affected by the increase of the number of nodes. With regard to the layout algorithms, there existed several well-developed layout algorithms for traditional static layout. It also provides many possibilities to either modify these algorithms to be suitable for dynamic visualizations or develop new layout algorithms for dynamic social networks. Moreover, the evaluation of the results in DyVT depends on the users themselves. It lacks the objective view of result. Statistical analysis tools should be integrated to give objective evaluation results. Now DyVT is used to visualize the mailing list data. However, mailing lists are just one pattern in the digital social environments. A series of other patterns choosing and executing components should be added in the database tier. Not limited in social network, visualization is an intuitive way to get a better insight into underlying data. With the aforementioned improvement possibilities, DyVT is a useful visualization toolkit to be integrated into other applications easily and quickly in future. References [1] Bertini, E.: Social network visualization: A brief survey. The Blog of Enrico Bertini, October (2005) http://www.dis.uniroma1.it/~bertini/blog/bertini-socialnetvis-2.pdf [2] U. Brandes, M. Eiglsperger, I. Herman, M. Himsolt, and M. S. Marshall. GraphML progress report: Structural layer proposal. In: P. Mutzel, M. Junger, and S. Leipert (eds.) Proceedings 9th International Symposium on Graph Drawing (GD '01), Springer Lecture Notes in Computer Science 2265 (2002) 501-512 [3] Erten, C., Kobourov, S.G., Le, V. and Navabi, A.: Simultaneous graph drawing: layout algorithms and visualization schemes. Journal of Graph Algorithm and Applications vol. 9 no. 1 (2005) 165-182 [4] Wasserman, S. and Faust, K.: Social network analysis: methods and applications. Cambridge, ENG and New York: Cambridge University Press (1994) [5] Freeman, L.C.: Visualizing social network. Journal of Social Structure, vol. 1 no. 1 (2000) http://www.cmu.edu/joss/content/articles/volume1/Freeman.html [6] Hanneman, R.A. and Riddle, M.: Introduction to social network methods Riverside, CA: University of California, Riverside (2005) http://faculty.ucr.edu/~hanneman/ [7] Kamada T. and Kawai S.: An algorithm for drawing general undirected graphs. Information Processing Letters 31(1989) 7-15 [8] KML home page. http://earth.google.com/kml/index.html [14 March 2007] [9] Milgram, S.: The small world problem. Psychology Today 1, May (1967) 61-67 [10] Moody, J., McFarland, D. and Bender-deMoll, S.: Dynamic network visualization. American Journal of Sociology vol. 110, no. 1, January (2005) 1206-1241 [11] Morris, A.J., Abdelmoty, A.I., El-Geresy, B.A. and Jones, C.B.: A Data-Flow Approach to Visual Querying in Large Spatial Databases. Chang, S.K., Chen, Z. and Lee, S.Y. (eds.): Proceedings of the 5th International Conference on Recent Advances in Visual Information Systems (VISUAL 2002), Springer Lecture Notes in Computer Science 2314 (2002) 175186 [12] Nabeth, T.: Unders tanding the identity concept in the context of digital social environments. Project report of INSEAD CALT - FIDIS, January (2005) http://www.calt.insead.edu/Project/Fidis/documents/2005-fidis-Understanding_the_Identity _Concept_in_the_Context_of_Digital_Social_Environments.pdf [13] Tsvetovat, M., Reminga, J. and Carley, K.M.: DyNetML: Interchange Format for Rich Social Network Data, Technical report, Institute for Software Research International School of Computer Science, Carnegie Mellon University, January (2004) CMU-ISRI-04-105 [14] Viegas, F.B. and Donath, J.: Social network visualization: can we go beyond the graph. Workshop on Social Networks for Design and Analysis: Using Network Information in CSCW (2004)

RELATED PAPERS

RELATED TOPICS

Log In

A Toolkit to Support Dynamic Social Network Visualization

A Toolkit to Support Dynamic Social Network Visualization

Related Papers

RELATED PAPERS

RELATED TOPICS