Keywords

1 Introduction

Data visualisation provides intuitive ways for information analysis, allowing users to infer correlations and causalities that are not always possible with traditional data mining techniques. The wide availability of vast amounts of graph-structured data, RDF in the case of the Data Web, demands for user-friendly methods and tools for data exploration and knowledge uptake. We consider some core challenges related on the management and visualization of very large RDF graphs; e.g., the Wikidata RDF graph has more than 300M nodes and edges.

First, their size exceeds the capabilities of memory-based layout techniques and libraries, enforcing disk-based implementations. Then, graph rendering is a time consuming process; even drawing a small part of the graph (containing a few hundreds of nodes) requires considerable time when we assume real-time systems. The same holds for graph interaction and navigation. Most operations, such as zoom in/out and move, are not easily implemented to large dense graphs, as their implementations require redrawing and re-layout large parts of them.

Related works in the field handle very large graphs through hierarchical visualization approaches. Although hierarchical approaches provide fancy visualizations with low memory requirements, their applicability is heavily based on the particular characteristics of the input dataset. In most cases, the hierarchy is constructed by exploiting clustering and partitioning methods [1, 4, 5, 14, 19]. In other works, the hierarchy is defined with hub-based [15] and destiny-based [22] techniques. [3] supports ad-hoc hierarchies which are manually defined by the users. Some of these systems offer a disk-based implementation [1, 14, 19] whereas others keep the whole graph in main memory [35, 15, 22].

In the context of the Web of Data [2, 79, 16, 17, 20], there is a large number of tools that visualize RDF graphs (adopting a node-link approach); the most notable ones are ZoomRDF [21], Fenfire [12], LODWheel [18], RelFinder [13] and LODeX [6]. All these tools require the whole graph to be loaded on the UI. Several tools that follow the same non-scalable approach have also been developed in the field of ontology visualization [10, 11].

In contrast to all existing works, we introduce a generic platform, called graphVizdb, for scalable graph visualization that do not necessarily depend on the characteristics of the dataset. The efficiency of the proposed platform is based on a novel technique for indexing and storing the graph. The core idea is that in a preprocessing phase, the graph is drawn, using any of the existing graph layout algorithms. After drawing the graph, the coordinates assigned to its nodes (with respect to a Euclidean plane) are indexed with a spatial data structure, i.e., an R-tree, and stored in a database. In runtime, while the user is navigating over the graph, based on the coordinates, specific parts of the graph are retrieved and send to the user.

Fig. 1.
figure 1

Preprocessing overview

2 Platform Overview

The graphVizdb platform is built on top of two main concepts: (1) it is based on a “spatial-oriented” approach for graph visualization, similar to approaches followed in browsing maps; and (2) it adopts a disk-based implementation for supporting interaction with the graph, i.e., a database backend is used to index and store graph and visual information.

Partition-Based Graph Layout. Here we outline the partition-based approach adopted by the graphVizdb in order to handle very large graph. Recall that, for graph layout, the graph is drawn once in a preprocessing phase, using any of the existing graph layout algorithms. However several graph layout algorithms require large amount of memory in order to draw very large graphs. In order to overcome this problem, our partition-based approach (outlined in Fig. 1) is described next.

(1) Initially, the graph (RDF) data is divided into a set of smaller sub-graphs (i.e., partitions) using a graph partitioning algorithm. At the same time, the graph partitioning algorithm tries to minimize the number of edges connecting nodes in different partitions. (2) Then, using a graph layout algorithm, each of the sub-graph resulted from the graph partitioning, is visualized into a Euclidean plane, excluding (i.e., not visualizing) the edges connecting nodes through different partitions (i.e., crossing edges). (3) The visualized partitions are organized and combined into a “global” plane using a greedy algorithm whose goal is twofold. First, it ensures that the distinct sub-graphs do not overlap on the plane, and at the same time it tries to minimize the total length of the crossing edges. (4) Based on the “global” plane, the coordinates for each node and edge are indexed and stored in the database.

Spatial Operations for Graph Exploration. In graphVizdb, most of the user’s requests are translated into simple spatial operations evaluated over the database. In this context, window queries (i.e., spatial range queries that retrieve the information contained with in a specific spatial region) are the core operation for most user’s requests. The user navigates on the graph by moving the viewing window. When the window is moved, its new coordinates with respect to the whole canvas are tracked on the client side, and a window query is sent to the server. The query is evaluated on the server using the R-tree indexes. This way, for each user request, graphVizdb efficiently renders only visible parts of the graph, minimizing in this way both backend-frontend communication cost as well as rendering and layout time. Additionally, more sophisticated operations, e.g., abstraction/enrichment zoom operations are also implemented using spatial operations.

Implementation. We have implemented a graphVizdb prototypeFootnote 1 which provides interactive visualization over large graphs. The prototype offers three main operations: (1) interactive navigation, (2) multi-level exploration, and (3) keyword search. We use MySQL for data storing and indexing, the Jena framework for RDF data handling, MetisFootnote 2 for graph partitioning, and GraphvizFootnote 3 for drawing the graph partitions. In the front-end, we use mxgraphFootnote 4, a client-side JavaScript visualization library. A video presenting the basic functionality of our prototype is available at: vimeo.com/117547871.