Querying Graphs with Data

Published: 20 March 2016 Publication History


Graph databases have received much attention as of late due to numerous applications in which data is naturally viewed as a graph; these include social networks, RDF and the Semantic Web, biological databases, and many others. There are many proposals for query languages for graph databases that mainly fall into two categories. One views graphs as a particular kind of relational data and uses traditional relational mechanisms for querying. The other concentrates on querying the topology of the graph. These approaches, however, lack the ability to combine data and topology, which would allow queries asking how data changes along paths and patterns enveloping it.
In this article, we present a comprehensive study of languages that enable such combination of data and topology querying. These languages come in two flavors. The first follows the standard approach of path queries, which specify how labels of edges change along a path, but now we extend them with ways of specifying how both labels and data change. From the complexity point of view, the right type of formalisms are subclasses of register automata. These, however, are not well suited for querying. To overcome this, we develop several types of extended regular expressions to specify paths with data and study their querying power and complexity. The second approach adopts the popular XML language XPath and extends it from XML documents to graphs. Depending on the exact set of allowed features, we have a family of languages, and our study shows that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.

Supplementary Material

a14-libkin-apndx.pdf (libkin.zip)
Supplemental movie, appendix, image and software files for, Querying Graphs with Data


Jaroslav Pokorny

Various approaches to querying so-called graphs with data are described very comprehensively in this paper. In principle, this category of graphs is called (labeled) property graphs in the graph database community. Concerning queries in such databases, two categories are considered: one views graphs as sets of relational data, and the other concentrates on querying the topology of the graph. The authors focus on querying that combines both of these approaches. In section 2, containing preliminaries, an important notion of path and various types of complexity of query evaluation are defined. In the forthcoming sections, the authors deal with two categories of languages: path languages and graph languages. Also, notions like regular path queries (RPQs), conjunctive RPQs (CRPQs), and nested regular expressions (NREs) are introduced. Section 3 discusses existing languages for paths, particularly register automata and related classes of expressions focused on data path queries. First, regular data path queries (RDPQs) are studied, followed by regular query expressions with memory (RQM), which use regular expressions with memory (REM). Another class of regular expressions for data paths permits testing for equality or inequality of data; this enables consideration of queries based on regular expressions with equality. The authors talk about regular queries with datasets (RQDs). The core section 4 shows how the XPath language can be adapted to work over graphs. XPath has many desirable properties, such as the ability to define patterns that cannot be captured by paths, close connections to first-order logic, and efficient evaluation algorithms. The authors show how to extend the language to operate over graph databases while still retaining these properties. Such a language is called GPath here, and it is discussed with different data tests permitted in its formulas. Based on two categories of path formulas, the regular graph XPath (GXPathreg) and core graph XPath (GXPathcore) languages are introduced. Some extensions of both languages with counters are also discussed. The next two subsections of section 4 focus on query evaluation and expressive power. The authors investigate the complexity of querying graph databases using variants of GXPath and its many dialects when compared to first-order logic. They provide a detailed analysis of expressiveness for navigational features of these dialects and prove that GXPathcore is a proper subset of GXPathreg. Another result says that GXPathreg is equivalent to a three-variable fragment of the transitive-closer first-order logic. Expressiveness of languages using datasets is also studied. The rest of section 4 contains a comparison of GXPath to path languages introduced in section 3, as well as to traditional navigational languages such as RPQs, CRPQs, and NREs. A hierarchy of the GXPath fragments with regard to their relative expressive power illustrates their mutual relationships. In section 5, the authors define conjunctive queries using languages from previous sections as their basic building blocks. Section 6 includes a summary of complexity results for classes of queries studied in the paper. Finally, section 6 offers a number of possible directions for future research. The consideration of incomplete information is very realistic, since it occurs very often in practice, particularly in querying graphs with data. The notions defined in the paper are specified in a usual denotational way, which offers the needed clarity and preciseness. This increases the readability of the paper. Without a doubt, the paper offers interesting, valuable, and useful material for those interested in graph query languages and the associated theory. Online Computing Reviews Service

Information & Contributors


Published In

cover image Journal of the ACM
Journal of the ACM  Volume 63, Issue 2
May 2016
249 pages
Issue’s Table of Contents
Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2016
Accepted: 01 December 2015
Revised: 01 December 2015
Received: 01 September 2014
Published in JACM Volume 63, Issue 2


Author Tags

  Graph databases
  XPath
  data values
  navigational queries


