Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution

Published: 05 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Many systems are designed to run graph algorithms efficiently in memory but they achieve only cache efficiency or work efficiency. We tackle this fundamental trade-off in existing systems by designing CoroGraph, a system that attains both cache efficiency and work efficiency for in-memory graph processing. CoroGraph adopts a novel hybrid execution model, which generates update messages at vertex granularity to prioritize promising vertices for work efficiency, and commits updates at partition granularity to share data access for cache efficiency. To overlap the random memory access of graph algorithms with computation, CoroGraph extensively uses coroutine, i.e., a lightweight function in C++ that can yield and resume with low overhead, to prefetch the required data. A suite of designs are incorporated to reap the full benefits of coroutine, which include prefetch pipeline, cache-friendly graph format, and stop-free synchronization. We compare CoroGraph with five state-of-the-art graph algorithm systems via extensive experiments. The results show that CoroGraph yields shorter algorithm execution time than all baselines in 18 out of 20 cases, and its speedup over the best-performing baseline can be over 2x. Detailed profiling suggests that CoroGraph achieves both cache efficiency and work efficiency with a low memory stall and a small number of processed edges.

    References

    [1]
    Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: a survey. Data mining and knowledge discovery 29, 3 (2015), 626--688.
    [2]
    J Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. 2005. Large scale networks fingerprinting and visualization using the k-core decomposition. Advances in neural information processing systems 18 (2005).
    [3]
    Grant Ayers, Heiner Litz, Christos Kozyrakis, and Parthasarathy Ranganathan. 2020. Classifying memory access patterns for prefetching. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 513--526.
    [4]
    Alex Beutel, Leman Akoglu, and Christos Faloutsos. 2015. Fraud detection through graph-based user behavior modeling. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 1696--1697.
    [5]
    Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: compression techniques. In Proceedings of the 13th international conference on World Wide Web. 595--602.
    [6]
    Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, et al. 2013. TAO:Facebook's Distributed Data Store for the Social Graph. In 2013 USENIX Annual Technical Conference (USENIX ATC 13). 49--60.
    [7]
    Yukuo Cen, Jing Zhang, Gaofei Wang, Yujie Qian, Chuizheng Meng, Zonghong Dai, Hongxia Yang, and Jie Tang. 2019. Trust relationship prediction in alibaba E-commerce platform. IEEE Transactions on Knowledge and Data Engineering 32, 5 (2019), 1024--1035.
    [8]
    Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
    [9]
    Shimin Chen, Anastassia Ailamaki, Phillip B Gibbons, and Todd C Mowry. 2007. Improving hash join performance through prefetching. ACM Transactions on Database Systems (TODS) 32, 3 (2007), 17--es.
    [10]
    Intel Coorporation. 2016. Intel 64 and IA-32 architectures optimization reference manual.
    [11]
    Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2022. Introduction to algorithms. MIT press.
    [12]
    Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2017. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. 293--304.
    [13]
    David Easley and Jon Kleinberg. 2010. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge university press.
    [14]
    Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3+ billion items to 200+ million users in real-time. In Proceedings of the 2018 world wide web conference. 1775--1784.
    [15]
    Stephen Eubank, Hasan Guclu, VS Anil Kumar, Madhav V Marathe, Aravind Srinivasan, Zoltan Toroczkai, and Nan Wang. 2004. Modelling disease outbreaks in realistic urban social networks. Nature 429, 6988 (2004), 180--184.
    [16]
    Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel computation on natural graphs. In 10th USENIX symposium on operating systems design and implementation (OSDI 12). 17--30.
    [17]
    Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In 11th USENIX symposium on operating systems design and implementation (OSDI 14). 599--613.
    [18]
    Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making pull-based graph processing performant. ACM SIGPLAN Notices 53, 1 (2018), 246--260.
    [19]
    Yongjun He, Jiacheng Lu, and Tianzheng Wang. 2021. Coroutine-Oriented MainMemory Database Engine. Proceedings of the VLDB Endowment 14, 3 (2021), 431--444.
    [20]
    Trey Ideker, Owen Ozier, Benno Schwikowski, and Andrew F Siegel. 2002. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18, suppl_1 (2002), S233--S240.
    [21]
    ISO/IEC. 2017. Technical Specifcation --- C++ Extensions for Coroutines. https://www.iso.org/standard/73008.html
    [22]
    Hawoong Jeong, Bálint Tombor, Réka Albert, Zoltan N Oltvai, and A-L Barabási. 2000. The large-scale organization of metabolic networks. Nature 407, 6804 (2000), 651--654.
    [23]
    Christopher Jonathan, Umar Farooq Minhas, James Hunter, Justin Levandoski, and Gor Nishanov. 2018. Exploiting coroutines to attack the" killer nanoseconds". Proceedings of the VLDB Endowment 11, 11 (2018), 1702--1714.
    [24]
    Juno Kim and Steven Swanson. 2022. Blaze: fast graph processing on fast SSDs. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--15.
    [25]
    Onur Kocberber, Babak Falsafi, and Boris Grot. 2015. Asynchronous memory access chaining. Proceedings of the VLDB Endowment 9, 4 (2015), 252--263.
    [26]
    Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.
    [27]
    Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of the 22nd international conference on world wide web. 1343--1350.
    [28]
    Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web. 591--600.
    [29]
    Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi:Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 31--46.
    [30]
    Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor Prasanna. 2019. GPOP: A cache and memory-efficient framework for graph processing over partitions. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 393--394.
    [31]
    Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor Prasanna. 2020. Gpop: A scalable cache-and memory-efficient framework for graph processing over parts. ACM Transactions on Parallel Computing (TOPC) 7, 1 (2020), 1--24.
    [32]
    Kartik Lakhotia, Rajgopal Kannan, and Viktor Prasanna. 2018. Accelerating PageRank using Partition-Centric Processing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 427--440.
    [33]
    Shengliang Lu, Shixuan Sun, Johns Paul, Yuchen Li, and Bingsheng He. 2021. Cache-efficient fork-processing patterns on large graphs. In Proceedings of the 2021 International Conference on Management of Data. 1208--1221.
    [34]
    Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.
    [35]
    Prashanth Menon, Todd C Mowry, and Andrew Pavlo. 2017. Relaxed operator fusion for in-memory databases: Making compilation, vectorization, and prefetching work together at last. Proceedings of the VLDB Endowment 11, 1 (2017), 1--13.
    [36]
    Ulrich Meyer and Peter Sanders. 2003. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms 49, 1 (2003), 114--152.
    [37]
    Michael Mitzenmacher and Eli Upfal. 2017. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press.
    [38]
    Jan Mühlig and Jens Teubner. 2021. MxTasks: How to Make Efficient Synchronization and Prefetching Easy. In Proceedings of the 2021 International Conference on Management of Data. 1331--1344.
    [39]
    Vikram Narayanan, David Detweiler, Tianjiao Huang, and Anton Burtsev. 2023. DRAMHiT: A Hash Table Architected for the Speed of DRAM. In Proceedings of the Eighteenth European Conference on Computer Systems. 817--834.
    [40]
    Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the twenty-fourth ACM symposium on operating systems principles. 456--471.
    [41]
    Georgios Psaropoulos, Thomas Legler, Norman May, and Anastasia Ailamaki. 2017. Interleaving with coroutines: a practical approach for robust index joins. Proceedings of the VLDB Endowment 11, CONF (2017), 230--242.
    [42]
    Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472--488.
    [43]
    Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269--287.
    [44]
    Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity:A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 505--516.
    [45]
    Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016. Corescope: Graph mining using k-core analysis---patterns, anomalies and algorithms. In 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 469--478.
    [46]
    Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. 135--146.
    [47]
    Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: an in-memory graph random walk engine. (2021).
    [48]
    Narayanan Sundaram, Nadathur Rajagopalan Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. Graphmat: High performance graph analytics made productive. Proceedings of the VLDB Endowment 8, 11 (2015), 1214--1225.
    [49]
    Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. Graphit: A high-performance graph dsl. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1--30.
    [50]
    Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 301--316.
    [51]
    Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph:Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 375--386.

    Cited By

    View all
    • (2024)How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache PrefetchingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663451(1-10)Online publication date: 10-Jun-2024
    • (2024)How Does Software Prefetching Work on GPU Query Processing?Proceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663445(1-9)Online publication date: 10-Jun-2024

    Index Terms

    1. CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 17, Issue 4
            December 2023
            309 pages
            ISSN:2150-8097
            Issue’s Table of Contents

            Publisher

            VLDB Endowment

            Publication History

            Published: 05 March 2024
            Published in PVLDB Volume 17, Issue 4

            Check for updates

            Badges

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)86
            • Downloads (Last 6 weeks)19
            Reflects downloads up to 27 Jul 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache PrefetchingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663451(1-10)Online publication date: 10-Jun-2024
            • (2024)How Does Software Prefetching Work on GPU Query Processing?Proceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663445(1-9)Online publication date: 10-Jun-2024

            View Options

            Get Access

            Login options

            Full Access

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media