Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3466752.3480096acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

Improving Streaming Graph Processing Performance using Input Knowledge

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Streaming graphs are ubiquitous in today’s big data era. Prior work has improved the performance of streaming graph workloads without taking input characteristics into account. In this work, we demonstrate that input knowledge-driven software and hardware co-design is critical to optimize the performance of streaming graph processing. To improve graph update efficiency, we first characterize the performance trade-offs of input-oblivious batch reordering. Guided by our findings, we propose input-aware batch reordering to adaptively reorder input batches based on their degree distributions. To complement adaptive batch reordering, we propose updating graphs dynamically, based on their input characteristics, either in software (via update search coalescing) or in hardware (via acceleration support). To improve graph computation efficiency, we present input-aware work aggregation which adaptively modulates the computation granularity based on inter-batch locality characteristics. Evaluated across 260 workloads, our input-aware techniques provide on average 4.55 × and 2.6 × improvement in graph update performance for different input types (on top of eliminating the performance degradation from input-oblivious batch reordering). The graph compute performance is improved by 1.26 × (up to 2.7 ×).

    References

    [1]
    [1] [n. d.]. https://www.darpa.mil/program/hierarchical-identify-verify-exploit.
    [2]
    [n. d.]. Laboratory for Web Algorithms. http://law.di.unimi.it/datasets.php.
    [3]
    [3] 2017. https://www.boost.org/doc/libs/1_67_0/libs/sort/doc/html/sort/parallel/parallel_stable_sort.html.
    [4]
    2019. DARPA ERI: HIVE and Intel PUMA Graph Processor. https://fuse.wikichip.org/news/2611/darpa-eri-hive-and-intel-puma-graph-processor/.
    [5]
    [5] 2020. https://graphchallenge.mit.edu/darpa-hive.
    [6]
    [6] 2020. https://software.intel.com/en-us/node/506191.
    [7]
    Masab Ahmad, Halit Dogan, Christopher J Michael, and Omer Khan. 2019. Heteromap: A runtime performance predictor for efficient processing of graph analytics on heterogeneous multi-accelerators. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 268–281.
    [8]
    Masab Ahmad and Omer Khan. 2016. Gpu concurrency choices in graph analytics. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1–10.
    [9]
    Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105–117.
    [10]
    Sam Ainsworth and Timothy M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. In Proceedings of the 2016 International Conference on Supercomputing(ICS ’16). ACM, New York, NY, USA, Article 39, 11 pages. https://doi.org/10.1145/2925426.2926254
    [11]
    Vignesh Balaji and Brandon Lucia. [n. d.]. When is graph reordering an optimization? studying the effect of lightweight graph reordering across applications and input graphs. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 203–214.
    [12]
    Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 373–386. https://doi.org/10.1109/HPCA.2019.00051
    [13]
    Abanti Basak, Jilan Lin, Ryan Lorica, Xinfeng Xie, Zeshan Chishti, Alaa Alameldeen, and Yuan Xie. 2020. SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
    [14]
    Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP benchmark suite. arXiv preprint arXiv:1508.03619(2015).
    [15]
    Robert D Blumofe and Charles E Leiserson. 1999. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM) 46, 5 (1999), 720–748.
    [16]
    Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A Large Time-Aware Graph. SIGIR Forum 42, 2 (2008), 33–38.
    [17]
    Anna D Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature communications 10, 1 (2019), 1–10.
    [18]
    Federico Busato, Oded Green, Nicola Bombieri, and David A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC.2018.8547541
    [19]
    György Buzsáki and Kenji Mizuseki. 2014. The log-dynamic brain: how skewed distributions affect network operations. Nature Reviews Neuroscience 15, 4 (2014), 264–278.
    [20]
    Zhuhua Cai, Dionysios Logothetis, and Georgos Siganos. 2012. Facilitating real-time graph mining. In Proceedings of the fourth international workshop on Cloud data management. ACM, 1–8.
    [21]
    Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 52:1–52:12.
    [22]
    Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. 2009. A Measurement-driven Analysis of Information Propagation in the Flickr Social Network. In In Proceedings of the 18th International World Wide Web Conference (WWW’09). Madrid, Spain.
    [23]
    Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 85–98.
    [24]
    Guohao Dai, Tianhao Huang, Yu Wang, Huazhong Yang, and John Wawrzynek. 2019. HyVE: Hybrid vertex-edge memory hierarchy for energy-efficient graph processing. IEEE Trans. Comput. 68, 8 (2019), 1131–1146.
    [25]
    William James Dally and Brian Patrick Towles. 2004. Principles and practices of interconnection networks. Elsevier.
    [26]
    Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2019. Low-latency Graph Streaming Using Compressed Purely-functional Trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI 2019). ACM, New York, NY, USA, 918–934. https://doi.org/10.1145/3314221.3314598
    [27]
    David Ediger, Rob McColl, Jason Riedy, and David A Bader. 2012. Stinger: High performance data structure for streaming graphs. In 2012 IEEE Conference on High Performance Extreme Computing. IEEE, 1–5.
    [28]
    Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference(WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1775–1784. https://doi.org/10.1145/3178876.3186183
    [29]
    Young-Ho Eom and Hang-Hyun Jo. 2015. Tail-scope: Using friends to estimate heavy tails of degree distributions in large-scale complex networks. Scientific reports 5(2015), 09752.
    [30]
    Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. Spotlight: Detecting anomalies in streaming graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1378–1386.
    [31]
    Priyank Faldu, Jeff Diamond, and Boris Grot. 2020. Domain-Specialized Cache Management for Graph Analytics. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 234–248.
    [32]
    Guanyu Feng, Zixuan Ma, Daixuan Li, Xiaowei Zhu, Yanzheng Cai, Wentao Han, and Wenguang Chen. 2020. RisGraph: A Real-Time Streaming System for Evolving Graphs. arXiv preprint arXiv:2004.00803(2020).
    [33]
    Guoyao Feng, Xiao Meng, and Khaled Ammar. [n. d.]. DISTINGER: A distributed graph data structure for massive dynamic graph processing. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 1814–1822.
    [34]
    Oded Green and David A Bader. 2016. cuSTINGER: Supporting dynamic graph algorithms for GPUs. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–6.
    [35]
    Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, Aaditya Landge, and Jimmy Lin. 2018. Recservice: Distributed Real-time Graph Processing at Twitter. In Proceedings of the 10th USENIX Conference on Hot Topics in Cloud Computing(HotCloud’18). USENIX Association, Berkeley, CA, USA, 3–3. http://dl.acm.org/citation.cfm?id=3277180.3277183
    [36]
    Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–13.
    [37]
    Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the Ninth European Conference on Computer Systems. ACM, 1.
    [38]
    Keita Iwabuchi, Scott Sallinen, Roger Pearce, Brian Van Essen, Maya Gokhale, and Satoshi Matsuoka. 2016. Towards a distributed large-scale dynamic graph data store. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 892–901.
    [39]
    Anand Iyer, Li Erran Li, and Ion Stoica. 2015. Celliq: Real-time cellular network analytics at scale. In 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15). 309–322.
    [40]
    Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems. ACM, 5.
    [41]
    Wole Jaiyeoba and Kevin Skadron. 2019. GraphTinker: A High Performance Data structure for Dynamic Graph Processing. In 2019 IEEE International Parallel Distributed Processing Symposium (IPDPS).
    [42]
    Pradeep Kumar and H Howie Huang. 2019. GraphOne: A data store for real-time analytics on evolving graphs. In 17th {USENIX} Conference on File and Storage Technologies ({FAST} 19). 249–263.
    [43]
    Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
    [44]
    Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-driven synchronous processing of streaming graphs. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–16.
    [45]
    Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: graph semantics aware SSD. In Proceedings of the 46th International Symposium on Computer Architecture. 116–128.
    [46]
    Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–14.
    [47]
    Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In 2019 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52),. IEEE.
    [48]
    Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 439–455.
    [49]
    Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. Graphpim: Enabling instruction-level pim offloading in graph computing frameworks. In 2017 IEEE International symposium on high performance computer architecture (HPCA). IEEE, 457–468.
    [50]
    Hamza Omar, Masab Ahmad, and Omer Khan. 2017. GraphTuner: An input dependence aware loop perforation scheme for efficient execution of approximated graph algorithms. In 2017 IEEE International Conference on Computer Design (ICCD). IEEE, 201–208.
    [51]
    Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy efficient architecture for graph analytics accelerators. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 166–177.
    [52]
    Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2020. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence.
    [53]
    Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proceedings of the VLDB Endowment 11, 12 (2018), 1876–1888.
    [54]
    Shafiur Rahman, Nael Abu-Ghazaleh, and Rajiv Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In 2020 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-53),. IEEE.
    [55]
    Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. http://networkrepository.com
    [56]
    A. Samara and J. Tuck. 2020. The Case for Domain-Specialized Branch Predictors for Graph-Processing. IEEE Computer Architecture Letters 19, 2 (2020), 101–104. https://doi.org/10.1109/LCA.2020.3005895
    [57]
    Albert Segura, Jose-Maria Arnau, and Antonio González. 2019. SCU: a GPU stream compaction unit for graph processing. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 424–435.
    [58]
    Dipanjan Sengupta and Shuaiwen Leon Song. 2017. Evograph: On-the-fly efficient mining of evolving graphs on gpu. In International Supercomputing Conference. Springer, 97–119.
    [59]
    Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. Graphin: An online high performance incremental graph processing framework. In European Conference on Parallel Processing. Springer, 319–333.
    [60]
    Mo Sha, Yuchen Li, Bingsheng He, and Kian-Lee Tan. 2017. Accelerating dynamic graph analytics on gpus. Proceedings of the VLDB Endowment 11, 1 (2017), 107–120.
    [61]
    Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: real-time content recommendations at twitter. Proceedings of the VLDB Endowment 9, 13 (2016), 1281–1292.
    [62]
    Feng Sheng, Qiang Cao, Haoran Cai, Jie Yao, and Changsheng Xie. 2018. GraPU: Accelerate Streaming Graph Analysis Through Preprocessing Buffered Updates. In Proceedings of the ACM Symposium on Cloud Computing(SoCC ’18). ACM, New York, NY, USA, 301–312. https://doi.org/10.1145/3267809.3267811
    [63]
    Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. 2016. Tornado: A system for real-time iterative analysis over evolving data. In Proceedings of the 2016 International Conference on Management of Data. ACM, 417–430.
    [64]
    Shreyas G Singapura, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K Prasanna. 2017. OSCAR: Optimizing SCrAtchpad reuse for graph processing. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–7.
    [65]
    Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating graph processing using ReRAM. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 531–543.
    [66]
    Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the Evolution of User Interaction in Facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN’09).
    [67]
    Keval Vora, Rajiv Gupta, and Guoqing Xu. 2016. Synergistic analysis of evolving graphs. ACM Transactions on Architecture and Code Optimization (TACO) 13, 4(2016), 32.
    [68]
    Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. ACM SIGOPS Operating Systems Review 51, 2 (2017), 237–251.
    [69]
    Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, 2019. Alleviating irregularity in graph analytics acceleration: A hardware/software co-design approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 615–628.
    [70]
    Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing communication for PIM-based graph processing with efficient data partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 544–557.
    [71]
    Jinhong Zhou, Shaoli Liu, Qi Guo, Xuda Zhou, Tian Zhi, Daofu Liu, Chao Wang, Xuehai Zhou, Yunji Chen, and Tianshi Chen. 2017. Tunao: A high-performance and energy-efficient reconfigurable accelerator for graph processing. In 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 731–734.
    [72]
    Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-based Graph Processing. In 2019 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52),. IEEE.

    Cited By

    View all
    • (2024)Heterogeneous Hyperthreading2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00018(68-78)Online publication date: 27-May-2024
    • (2024)Enabling Large Dynamic Neural Network Training with Learning-based Memory Management2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00066(788-802)Online publication date: 2-Mar-2024
    • (2023)RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural NetworkACM Transactions on Architecture and Code Optimization10.1145/361768520:4(1-26)Online publication date: 14-Dec-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2021
    1322 pages
    ISBN:9781450385572
    DOI:10.1145/3466752
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Check for updates

    Author Tags

    1. Graph analytics
    2. Streaming graphs

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MICRO '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)668
    • Downloads (Last 6 weeks)48
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Heterogeneous Hyperthreading2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00018(68-78)Online publication date: 27-May-2024
    • (2024)Enabling Large Dynamic Neural Network Training with Learning-based Memory Management2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00066(788-802)Online publication date: 2-Mar-2024
    • (2023)RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural NetworkACM Transactions on Architecture and Code Optimization10.1145/361768520:4(1-26)Online publication date: 14-Dec-2023
    • (2023)Layph: Making Change Propagation Constraint in Incremental Graph Processing by Layering Graph2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00212(2766-2779)Online publication date: Apr-2023
    • (2023)ACGraph: Accelerating Streaming Graph Processing via Dependence Hierarchy2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247904(1-6)Online publication date: 9-Jul-2023
    • (2022)ReaDy: A ReRAM-Based Processing-in-Memory Accelerator for Dynamic Graph Convolutional NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319915241:11(3567-3578)Online publication date: 1-Nov-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media