Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming Graphs

Published: 26 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    A persistent Regular Path Query (RPQ) on a streaming graph is to continuously find every pair of vertices that are connected by a path in the graph within a sliding window, such that the edge label sequence of this path matches a given regular expression. The existing RPQ evaluation algorithm in the literature incrementally maintains a set of spanning-tree-like data structures to quickly form query results and to avoid reprocessing edges that are shared by multiple sliding windows. This approach allows parallel processing of the graph edges within a sliding window but requires a blocking expiration phase between sliding windows to remove the old edges. This blocking phase can significantly degrade the query performance, especially when the edges arrive quickly and the sliding windows overlap significantly.
    This paper presents a new RPQ evaluation strategy called Multi-Window Parallel (MWP) method leveraging a new data structure called Timestamped Rooted Digraph (TRD). The novel idea is to incrementally maintain TRDs for the quick formulation of query results, like the aforementioned spanning trees, but simultaneously contain needed information for multiple sliding windows. MWP eliminates the forced blocking expiration phase. Only when memory runs low, a quick "dirty garbage collection" (DGC) process is done to remove some unneeded edges and nodes on TRDs, without incurring large costs. Extensive experiments on real graph datasets show that MWP significantly outperforms the existing algorithm in terms of throughput, tail latency, and scalability, and that DGC provides an effective solution for releasing memory with minimum impact.

    References

    [1]
    Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data. 1421--1432.
    [2]
    Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoc. 2017. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR) 50, 5 (2017), 1--40.
    [3]
    Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. 2011. EP-SPARQL: a unified language for event processing and stream reasoning. In Proceedings of the 20th international conference on World wide web. 635--644.
    [4]
    Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, Juan L Reutter, Javiel Rojas-Ledesma, and Adrián Soto. 2021. Worst-case optimal graph joins in almost no space. In Proceedings of the 2021 International Conference on Management of Data. 102--114.
    [5]
    Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, and Javiel Rojas-Ledesma. 2022. Time-and space-efficient regular path queries. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3091--3105.
    [6]
    Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Grossniklaus. 2009. C- SPARQL: SPARQL for continuous querying. In Proceedings of the 18th international conference on World wide web. 1061--1062.
    [7]
    Aaron Bernstein. 2013. Maintaining shortest paths under deletions in weighted directed graphs. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 725--734.
    [8]
    Angela Bonifati, Wim Martens, and Thomas Timm. 2019. Navigating the Maze of Wikidata Query Logs. the web conference (2019).
    [9]
    Jean-Paul Calbimonte. 2017. Linked data notifications for rdf streams. In Proceedings of the Web Stream Processing workshop (WSP 2017) and the 2nd International Workshop on Ontology Modularity, Contextuality, and Evolution (WOMoCoE 2017) co-located with 16th International Semantic Web Conference (ISWC 2017). 22 October 2017.
    [10]
    Jean-Paul Calbimonte, Oscar Corcho, and Alasdair JG Gray. 2010. Enabling ontology-based access to streaming data sources. In The Semantic Web--ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7--11, 2010, Revised Selected Papers, Part I 9. Springer, 96--111.
    [11]
    Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems. 85--98.
    [12]
    Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SIAM journal on computing 31, 6 (2002), 1794--1813.
    [13]
    Daniele Dell'Aglio, Jean-Paul Calbimonte, Emanuele Della Valle, and Oscar Corcho. 2015. Towards a unified language for RDF stream query processing. In European Semantic Web Conference. Springer, 353--363.
    [14]
    Saumen Dey, Víctor Cuevas-Vicenttín, Sven Köhler, Eric Gribkoff, Michael Wang, and Bertram Ludäscher. 2013. On implementing provenance-aware regular path queries with relational query engines. In Proceedings of the Joint EDBT/ICDT 2013 Workshops. 214--223.
    [15]
    Andrzej Ehrenfeucht and Paul Zeiger. 1974. Complexity measures for regular expressions. In Proceedings of the sixth annual ACM symposium on Theory of computing. 75--79.
    [16]
    Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3 billion items to 200 million users in real-time. In Proceedings of the 2018 world wide web conference. 1775--1784.
    [17]
    Orri Erling and Ivan Mikhailov. 2009. RDF Support in the Virtuoso DBMS. In Networked Knowledge-Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic Systems. Springer, 7--24.
    [18]
    Valeria Fionda, Giuseppe Pirrò, and Mariano P Consens. 2019. Querying knowledge graphs with extended property paths. Semantic Web 10, 6 (2019), 1127--1168.
    [19]
    Victor Mikhaylovich Glushkov. 1961. The abstract theory of automata. Russian Mathematical Surveys 16, 5 (1961), 1.
    [20]
    Xiangyang Gou and Lei Zou. 2021. Sliding window-based approximate triangle counting over streaming graphs with duplicate edges. In Proceedings of the 2021 International Conference on Management of Data. 645--657.
    [21]
    Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, Aaditya Landge, and Jimmy Lin. 2018. RecService: Distributed Real-Time Graph Processing at Twitter. In 10th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 18). USENIX Association, Boston, MA. https://www.usenix.org/conference/hotcloud18/presentation/grewal
    [22]
    Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.
    [23]
    John Hopcroft. 1971. An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations. Elsevier, 189--196.
    [24]
    Louis Jachiet, Pierre Genevès, Nils Gesbert, and Nabil Layaïda. 2020. On the optimization of recursive relational queries: Application to graph queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 681--697.
    [25]
    Bruce M Kapron, Valerie King, and Ben Mountjoy. 2013. Dynamic graph connectivity in polylogarithmic worst case time. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1131--1142.
    [26]
    Srdjan Komazec, Davide Cerri, and Dieter Fensel. 2012. Sparkwave: continuous schema-enhanced pattern matching over RDF data streams. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems. 58--68.
    [27]
    André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. In Scientific and Statistical Database Management: 24th International Conference, SSDBM 2012, Chania, Crete, Greece, June 25--27, 2012. Proceedings 24. Springer, 177--194.
    [28]
    Jakub Lacki. 2011. Improved deterministic algorithms for decremental transitive closure and strongly connected components. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1438--1445.
    [29]
    Danh Le-Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, and Manfred Hauswirth. 2011. A native and adaptive approach for unified processing of linked streams and linked data. In International Semantic Web Conference. Springer, 370--388.
    [30]
    David Lomet, Alan Fekete, Rui Wang, and Peter Ward. 2012. Multi-version concurrency via timestamp range conflict management. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 714--725.
    [31]
    Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2014. Yago3: A knowledge base from multilingual wikipedias. In 7th biennial conference on innovative data systems research. CIDR Conference.
    [32]
    Kento Miura, Toshiyuki Amagasa, Hiroyuki Kitagawa, R Bordawekar, and T Lahiri. 2019. Accelerating Regular Path Queries using FPGA. In ADMS@ VLDB. 47--54.
    [33]
    Jayanta Mondal and Amol Deshpande. 2014. Eagr: Supporting continuous ego-centric aggregate queries over large dynamic graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of data. 1335--1346.
    [34]
    Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.
    [35]
    Van-Quyet Nguyen and Kyungbaek Kim. 2017. Efficient regular path query evaluation by splitting with unit-subquery cost matrix. IEICE TRANSACTIONS on Information and Systems 100, 10 (2017), 2648--2652.
    [36]
    Maurizio Nolé and Carlo Sartiani. 2016. Regular path queries on massive graphs. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management. 1--12.
    [37]
    Nigini Oliveira, Michael Muller, Nazareno Andrade, and Katharina Reinecke. 2018. The exchange in StackExchange: Divergences between Stack Overflow and its culturally diverse participants. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--22.
    [38]
    Anil Pacaci, Angela Bonifati, and M Tamer Özsu. 2020. Regular path query evaluation on streaming graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1415--1430.
    [39]
    Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proceedings of the VLDB Endowment 11, 12 (2018), 1876--1888.
    [40]
    Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time content recommendations at Twitter. Proceedings of the VLDB Endowment 9, 13 (2016), 1281--1292.
    [41]
    Frank Tetzel, Wolfgang Lehner, and Romans Kasperovics. 2020. Efficient Compilation of Regular Path Queries. Datenbank-Spektrum 20 (2020), 243--259.
    [42]
    Ken Thompson. 1968. Programming techniques: Regular expression search algorithm. Commun. ACM 11, 6 (1968), 419--422.
    [43]
    Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, and Srikanta Bedathur. 2019. Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 international conference on management of data. 1463--1480.
    [44]
    Xin Wang, Junhu Wang, and Xiaowang Zhang. 2016. Efficient distributed regular path queries on rdf graphs using partial evaluation. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1933--1936.
    [45]
    Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2013. Evaluation of SPARQL Property Paths via Recursive SQL. AMW 1087 (2013).
    [46]
    Ying Zhang, Pham Minh Duc, Oscar Corcho, and Jean-Paul Calbimonte. 2012. SRBench: a streaming RDF/SPARQL benchmark. In The Semantic Web--ISWC 2012: 11th International Semantic Web Conference, Boston, MA, USA, November 11--15, 2012, Proceedings, Part I 11. Springer, 641--657.

    Index Terms

    1. MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming Graphs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 2, Issue 1
      SIGMOD
      February 2024
      1874 pages
      EISSN:2836-6573
      DOI:10.1145/3654807
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 March 2024
      Published in PACMMOD Volume 2, Issue 1

      Permissions

      Request permissions for this article.

      Author Tags

      1. parallel processing
      2. persistent query evaluation
      3. regular path queries
      4. streaming graphs

      Qualifiers

      • Research-article

      Funding Sources

      • NSFC (National Natural Science Foundation of China)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 103
        Total Downloads
      • Downloads (Last 12 months)103
      • Downloads (Last 6 weeks)33
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media