Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Compact and efficient representation of general graph databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose a compact data structure to store labeled attributed graphs based on the \(k^2\)-tree, which is a very compact data structure designed to represent a simple directed graph. The idea we propose can be seen as an extension of the \(k^2\)-tree to support property graphs. In addition to the static approach, we also propose a dynamic version of the storage representation, which allows flexible schemas and insertion or deletion of data. We provide an implementation of a basic set of operations, which can be combined to form complex queries over these graphs with attributes. We evaluate the performance of our proposal with existing graph database systems and prove that our compact attributed graph representation obtains also competitive time results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://sparsity-technologies.com/.

  2. http://neo4j.org/.

  3. http://hypergraphdb.org/.

  4. https://www.sap.com/products/hana.html.

  5. http://www.orientechnologies.com/orientdb/.

  6. http://thinkaurelius.github.io/titan/.

  7. http://giraph.apache.org/.

  8. rank(Bi) computes the number of ones that are set up in bitmap B until position i.

  9. select(Bi) obtains the position in B of the i-th 1.

  10. A wavelet tree is a data structure that maintains a sequence S of n symbols supporting the following operations: access(Si), which returns the symbol at position i in S; \(rank_c(S, i)\), which counts the times symbol c appears up to position i in S; and \(select_c(S, j)\), which returns the position in S of the j-th appearance of symbol c. They can be efficiently implemented using compressed space and perform well in practice. Wavelet trees and their applications have been extensively described by [31].

  11. https://github.com/xxsds/DYNAMIC.

  12. http://movielens.umn.edu/.

References

  1. Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New York

    Book  MATH  Google Scholar 

  2. Álvarez-García S, de Bernardo G, Brisaboa N, Navarro G (2017) A succinct data structure for self-indexing ternary relations. J Discrete Algorithms 43:38–53

    Article  MathSciNet  MATH  Google Scholar 

  3. Angles R, Gutiérrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1

    Article  Google Scholar 

  4. Böhm H-J, Schneider G (2000) Virtual screening for bioactive molecules. Wiley, Weinheim

    Book  Google Scholar 

  5. Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the 13th international world wide web conference (WWW), pp 595–601

  6. Bornea,M, Dolby J, Kementsietsidis A, Srinivas K, Dantressangle P, Udrea O, Bhattacharjee B (2013) Building an efficient RDF store over a relational database. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data (SIGMOD). ACM, pp 21–132

  7. Brisaboa N, Cerdeira-Pena A, de Bernardo G, Navarro G (2017) Compressed representation of dynamic binary relations with applications. Inf Syst 69:106–123

    Article  Google Scholar 

  8. Brisaboa N, Ladra S, Navarro G (2014) Compact representation of web graphs with extended functionality. Inf Syst 39(1):152–174

    Article  Google Scholar 

  9. Caro D, Rodríguez MA, Brisaboa NR, Fariña A (2016) Compressed kd-tree for temporal graphs. Knowl Inf Syst 49:553–595

    Article  Google Scholar 

  10. Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of 15th conference on knowledge discovery and data mining (KDD), pp 219–228

  11. Ciglan M, Averbuch A, Hluchy L (2012) Benchmarking traversal operations over graph databases. In: Proceedings of the 28th international conference on data engineering workshops (ICDEW), pp 186–189

  12. Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16

    Article  Google Scholar 

  13. Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298

    Article  Google Scholar 

  14. de Bernardo G, Álvarez-García S, Brisaboa N, Navarro G, Pedreira O (2013) Compact querieable representations of raster data. In: Proceedings 20th international symposium on string processing and information retrieval (SPIRE). LNCS 8214, pp 96–108

  15. Erling O, Averbuch A, Larriba-Pey J, Chafi H, Gubichev A, Prat A, Pham M-D, Boncz P (2011) The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data (SIGMOD). ACM, pp 619–630

  16. Fischer J, Peters D (2016) GLOUDS: representing tree-like graphs. J Discrete Algorithms 36:39–49

    Article  MathSciNet  MATH  Google Scholar 

  17. Grouplens (2014) Movielens dataset. http://grouplens.org/datasets/movielens/

  18. Gyssens M, Paredaens J, Van den Bussche J, Van Gucht D (1994) A graph-oriented object database model. IEEE Trans Knowl Data Eng 6(4):572–586

    Article  Google Scholar 

  19. Han J, Haihong E, Le G, Du J (2011) Survey on NoSQL database. In: Proceedings of the 6th international conference on pervasive computing and applications (ICPCA), pp 363–366

  20. Hernández C, Navarro G (2014) Compressed representations for web and social graphs. Knowl Inf Syst 40(2):279–313

    Article  Google Scholar 

  21. Iordanov B (2010) HyperGraphDB: a generalized graph database. In: Web-age information management. Springer, pp 25–36

  22. Jacobson G (1989) Space-efficient static trees and graphs. In: Proceedings of the 30th IEEE symposium on foundations of computer science (FOCS), pp 549–554

  23. Ladra S, Paramá J, Silva-Coira F (2017) Scalable and queryable compressed storage structure for raster data. Inf Syst 72:179–204

    Article  Google Scholar 

  24. Larriba-Pey J.L, Martínez-Bazán N, Domínguez-Sal D (2014) Introduction to graph databases. In: Reasoning web. Reasoning on the web in the big data Era, Vol. 8714 of Lecture Notes in Computer Science. Springer International Publishing, pp 171–194

  25. Levene M, Poulovassilis A (1990) The hypernode model and its associated query language. In: Proceedings of the 5th Jerusalem conference on information technology, IEEE, pp 520–530

  26. Mäkinen V, Navarro G (2008) Dynamic entropy-compressed sequences and full-text indexes. ACM Trans Algorithms 4(3):32–38

    Article  MathSciNet  MATH  Google Scholar 

  27. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM international conference on management of data (SIGMOD), pp 135–146

  28. Maneth S, Peternek F (2016) Compressing graphs by grammars. In: Proceedings of the 32nd IEEE international conference on data engineering (ICDE). IEEE, pp 109–120

  29. Martínez-Bazan N, Águila-Lorente MA, Muntés-Mulero V, Dominguez-Sal D, Gómez-Villamor S, Larriba-Pey JL (2012) Efficient graph management based on bitmap indices. In: Proceedings of the 16th international database engineering and applications symposium (IDEAS). ACM, pp 110–119

  30. Martínez-Bazan N, Muntés-Mulero V, Gómez-Villamor S, Nin J, Sánchez-Martínez MA, Larriba-Pey JL (2007) DEX: high-performance exploration on large graphs for information retrieval. In: Proceedings of the 16th ACM conference on information and knowledge management (CIKM). ACM, pp 573–582

  31. Navarro G (2014) Wavelet trees for all. J Discrete Algorithms 25:2–20

    Article  MathSciNet  MATH  Google Scholar 

  32. Navarro G (2016) Compact data structures—a practical approach. Cambridge University Press, Cambridge. ISBN 978-1-107-15238-0

  33. Padrol-Sureda A, Perarnau-Llobet G, Pfeifle J, Muntés-Mulero V (2010) Overlapping community search for social networks. In: Proceedings of the IEEE 26th international conference on data engineering (ICDE). IEEE Press, pp 992–995

  34. Paradies M, Kinder C, Bross J, Fischer T, Kasperovics R, Gildhoff H (2017) GraphScript: implementing complex graph algorithms in SAP HANA. In: Proceedings of the 16th international symposium on database programming languages (DBPL). ACM, pp 13:1–13:4

  35. Prezza N (2017) A framework of dynamic data structures for string processing. In: International symposium on experimental algorithms. Leibniz international proceedings in informatics (LIPIcs)

  36. Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: Proceedings of the IEEE 19th international conference on data engineering (ICDE). IEEE Press, pp 405–416

  37. Robinson I, Webber J, Eifrem E (2013) Graph databases, O’Reilly

  38. Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann Publishers Inc, Burlington

    MATH  Google Scholar 

  39. SAP (2016) SAP HANA Graph Reference. Document version 1.0

  40. Sun W, Fokoue A, Srinivas K, Kementsietsidis A, Hu G, Xie G (2015) SQLGraph: an efficient relational-based property graph store. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD). ACM, pp 1887–1901

  41. Tinkerpop (2014) Gremlim query language. https://github.com/tinkerpop/gremlin/wiki

Download references

Acknowledgements

This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie [Grant Agreement No 690941]; from the Ministerio de Economía y Competitividad (PGE and ERDF) [Grant Numbers TIN2015-69951-R; TIN2016-77158-C4-3-R] and from Xunta de Galicia (co-founded with ERDF) [Grant Numbers ED431C 2017/58; ED431G/01]. We also thank Nieves R. Brisaboa for her contributions during the initial discussions of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susana Ladra.

Additional information

A preliminary partial version of this paper appeared in Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG2010), pp. 18–25, 2010.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Álvarez-García, S., Freire, B., Ladra, S. et al. Compact and efficient representation of general graph databases. Knowl Inf Syst 60, 1479–1510 (2019). https://doi.org/10.1007/s10115-018-1275-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1275-x

Keywords