Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

LightRW: FPGA Accelerated Graph Dynamic Random Walks

Published: 30 May 2023 Publication History

Abstract

Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of GDRWs on multi-core CPUs, massive random memory accesses and costly synchronizations cause severe resource underutilization, and the processing of GDRWs is usually the key performance bottleneck in many graph applications. This paper studies an alternative architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability of hardware customization so that we are able to explore fine-grained pipeline execution and specialized memory access optimizations. Specifically, we propose LightRW, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series of optimizations to enable fine-grained pipeline execution on the chip and to exploit the massive parallelism of FPGA while significantly reducing memory accesses. As current commonly used sampling methods in GDRWs do not efficiently support fine-grained pipeline execution, we develop a parallelized reservoir sampling method to sample multiple vertices per cycle for efficient pipeline execution. To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. Experimental results show that our optimization techniques are able to improve the performance of GDRWs on FPGA significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over the state-of-the-art CPU-based MetaPath and Node2vec random walks, respectively. This work is open-sourced on GitHub at https://github.com/Xtra-Computing/LightRW.

Supplemental Material

MP4 File
This video presents LightRW, our recent FPGA-based accelerator solution for Graph dynamic random walks (GDRWs), which have become a powerful paradigm for graph analytics and learning applications. While many existing studies optimize GDRW performance on multi-core CPUs, resource underutilization is caused by massive random memory accesses and costly synchronizations. LightRW incorporates a series of optimizations that enable fine-grained pipeline execution on the chip and exploit FPGA's massive parallelism while significantly reducing memory accesses. We have developed a parallelized reservoir sampling method that samples multiple vertices per cycle with high throughput. Additionally, we propose a degree-aware configurable cache to buffer hot vertices on-chip for data reuse and a dynamic burst access engine to efficiently retrieve neighbors. Experimental results demonstrate that LightRW significantly improves GDRW performance.
PDF File
Read me
ZIP File
Datasets
ZIP File
Source Code

References

[1]
AMD. 2023. Heterogeneous Accelerated Compute Clusters (HACC) Program. https://www.amd-haccs.io/index.html.
[2]
Mohamed Arafa, Bahaa Fahim, Sailesh Kottapalli, Akhilesh Kumar, Lily P Looi, Sreenivas Mandava, Andy Rudoff, Ian M Steiner, Bob Valentine, Geetha Vedaraman, et al. 2019. Cascade lake: Next generation intel xeon scalable processor. IEEE Micro, Vol. 39, 2 (2019), 29--36.
[3]
Junya Arai, Hiroaki Shiokawa, Takeshi Yamamuro, Makoto Onizuka, and Sotetsu Iwamura. 2016. Rabbit order: Just-in-time parallel reordering for fast graph analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 22--31.
[4]
Mikhail Asiatici and Paolo Ienne. 2019. DynaBurst: Dynamically Assemblying DRAM Bursts over a Multitude of Random Accesses. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 254--262. https://doi.org/10.1109/FPL.2019.00049
[5]
Mohammed Bakiri, Christophe Guyeux, Jean-Francc ois Couchot, and Abdelkrim Kamel Oudjida. 2018. Survey on hardware implementation of random number generators on FPGA: Theory and experimental analyses. Computer Science Review, Vol. 27 (2018), 135--153.
[6]
Vignesh Balaji and Brandon Lucia. 2019. Combining data duplication and graph reordering to accelerate parallel graph processing. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. 133--144.
[7]
Subho S. Banerjee, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2019. AcMC2 : Accelerating Markov Chain Monte Carlo Algorithms for Probabilistic Models. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS '19). ACM, New York, NY, USA, 515--528. https://doi.org/10.1145/3297858.3304019
[8]
Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 373--386.
[9]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.
[10]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
[11]
Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2019. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 67--73.
[12]
Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2022. ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS. ACM Transactions on Reconfigurable Technology and Systems (TRETS) (2022).
[13]
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021a. Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 937--942.
[14]
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021b. ThunderGP: HLS-based graph processing framework on fpgas. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 69--80.
[15]
Wai-Ki Ching, Ximin Huang, Michael K Ng, and Tak-Kuen Siu. 2013. Higher-order markov chains. In Markov Chains. Springer, 141--176.
[16]
Young-kyu Choi, Yuze Chi, Weikang Qiao, Nikola Samardzic, and Jason Cong. 2021. Hbm connect: High-performance hls interconnect for fpga hbm. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 116--126.
[17]
Intel Corporation. 2022. Intel VTune Profiler. https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
[18]
Petros Dellaportas, Jonathan J Forster, and Ioannis Ntzoufras. 2002. On Bayesian model and variable selection using MCMC. Statistics and Computing, Vol. 12, 1 (2002), 27--36.
[19]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135--144.
[20]
Pavlos S Efraimidis and Paul G Spirakis. 2006. Weighted random sampling with a reservoir. Information processing letters, Vol. 97, 5 (2006), 181--185.
[21]
James Fairbanks, David Ediger, Rob McColl, David A Bader, and Eric Gilbert. 2013. A statistical framework for streaming graph analysis. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). IEEE, 341--347.
[22]
Alex Forencich, Alex C. Snoeren, George Porter, and George Papen. 2020. Corundum: An Open-Source 100-Gbps NIC. In 28th IEEE International Symposium on Field-Programmable Custom Computing Machines.
[23]
Marco Gori and Augusto Pucci. 2006. Research paper recommender systems: A random-walk based approach. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06). IEEE, 778--781.
[24]
Martin Grohe. 2020. word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 1--16.
[25]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.
[26]
Yu He, Yangqiu Song, Jianxin Li, Cheng Ji, Jian Peng, and Hao Peng. 2019. Hetespaceywalk: A heterogeneous spacey random walk for heterogeneous information network embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 639--648.
[27]
Yuwei Hu, Yixiao Du, Ecenur Ustun, and Zhiru Zhang. 2021. GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.
[28]
Jung-Chang Huang and Tau Leng. 1999. Generalized loop-unrolling: a method for program speedup. In Proceedings 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology. ASSET'99 (Cat. No. PR00122). IEEE, 244--248.
[29]
Lorenz Hübschle-Schneider and Peter Sanders. 2019. Parallel Weighted Random Sampling. In 27th Annual European Symposium on Algorithms (ESA 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[30]
Rajesh Jayaram, Gokarna Sharma, Srikanta Tirthapura, and David P Woodruff. 2019. Weighted reservoir sampling from distributed streams. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 218--235.
[31]
Dario Korolija, Timothy Roscoe, and Gustavo Alonso. 2020. Do $$OS$$ abstractions make sense on $$FPGAs$$?. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 991--1010.
[32]
Pradeep Kumar and H Howie Huang. 2020. Graphone: A data store for real-time analytics on evolving graphs. ACM Transactions on Storage (TOS), Vol. 15, 4 (2020), 1--40.
[33]
Pierre L'Ecuyer and Richard Simard. 2007. TestU01: AC library for empirical testing of random number generators. ACM Transactions on Mathematical Software (TOMS), Vol. 33, 4 (2007), 1--40.
[34]
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: an approach to modeling networks. Journal of Machine Learning Research, Vol. 11, 2 (2010).
[35]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[36]
Jure Leskovec and Rok Sosivc. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 8, 1 (2016), 1.
[37]
Yuan Li, Paul Chow, Jiang Jiang, Minxuan Zhang, and Shaojun Wei. 2013. Software/Hardware Parallel Long-Period Random Number Generation Framework Based on the WELL Method. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 22, 5 (2013), 1054--1059.
[38]
Xi Liu, Ping-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, and Xidao Wen. 2019. Real-time streaming graph embedding through local actions. In Companion Proceedings of The 2019 World Wide Web Conference. 285--293.
[39]
Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: statistical mechanics and its applications, Vol. 390, 6 (2011), 1150--1170.
[40]
Makoto Matsumoto and Takuji Nishimura. 1998. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol. 8, 1 (1998), 3--30.
[41]
SoSy Lab LMU Munich. 2021. CPU Energy Meter. https://github.com/sosy-lab/cpu-energy-meter.
[42]
Giannis Nikolentzos and Michalis Vazirgiannis. 2020. Random walk graph neural networks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 16211--16222.
[43]
Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--15.
[44]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.
[45]
SJ Plimpton, SG Moore, A Borner, AK Stagg, TP Koehler, JR Torczynski, and MA Gallis. 2019. Direct simulation Monte Carlo on petaflop supercomputers and beyond. Physics of Fluids, Vol. 31, 8 (2019), 086101.
[46]
Juan J Rodriguez-Andina, Maria J Moure, and Maria D Valdes. 2007. Features, design tools, and application domains of FPGAs. IEEE Transactions on Industrial Electronics, Vol. 54, 4 (2007), 1810--1823.
[47]
Ryan Rossi and Nesreen Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
[48]
Reuven Y Rubinstein and Dirk P Kroese. 2016. Simulation and the Monte Carlo method. Vol. 10. John Wiley & Sons.
[49]
Ilya Safro, Paul D Hovland, Jaewook Shin, and Michelle Mills Strout. 2009. Improving Random Walk Performance. In CSC. 108--112.
[50]
Mutsuo Saito and Makoto Matsumoto. 2008. SIMD-oriented fast Mersenne Twister: a 128-bit pseudorandom number generator. In Monte Carlo and Quasi-Monte Carlo Methods 2006. Springer, 607--622.
[51]
Frank Ludvig Spitzer. 1976. Principles of random walk / Frank Spitzer 2d ed. ed.). Springer-Verlag New York. xiii, 408 p. ; pages.
[52]
Mario Stipvc ević and cC etin Kaya Kocc. 2014. True random number generators. In Open Problems in Mathematics and Computational Science. Springer, 275--315.
[53]
Chunyou Su, Hao Liang, Wei Zhang, Kun Zhao, Baole Ai, Wenting Shen, and Zeke Wang. 2021. Graph Sampling with Fast Random Walker on HBM-enabled FPGA Accelerators. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 211--218.
[54]
Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: an in-memory graph random walk engine. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1992--2005.
[55]
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. Acm Sigkdd Explorations Newsletter, Vol. 14, 2 (2013), 20--28.
[56]
Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong. 2021. ThundeRiNG: Generating Multiple Independent Random Number Sequences on FPGAs. Association for Computing Machinery, New York, NY, USA, 115--126. https://doi.org/10.1145/3447818.3461664
[57]
Fatemeh Vahedian, Robin Burke, and Bamshad Mobasher. 2017. Weighted Random Walk Sampling for Multi-Relational Recommendation. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (Bratislava, Slovakia) (UMAP '17). Association for Computing Machinery, New York, NY, USA, 230--237. https://doi.org/10.1145/3079628.3079685
[58]
Fatemeh Vahedian, Robin D Burke, and Bamshad Mobasher. 2016. Weighted Random Walks for Meta-Path Expansion in Heterogeneous Networks. In RecSys Posters.
[59]
Guojia Wan, Bo Du, Shirui Pan, and Gholameza Haffari. 2020. Reinforcement learning based meta-path discovery in large-scale heterogeneous information networks. In Proceedings of the aaai conference on artificial intelligence, Vol. 34. 6094--6101.
[60]
Nian Wang, Min Zeng, Yiming Li, Fang-Xiang Wu, and Min Li. 2021. Essential Protein Prediction Based on node2vec and XGBoost. Journal of Computational Biology, Vol. 28, 7 (2021), 687--700.
[61]
Wikipedia contributors. 2022a. Inverse transform sampling -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Inverse_transform_sampling&oldid=1115190568 [Online; accessed 16-October-2022].
[62]
Wikipedia contributors. 2022b. Link prediction -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Link_prediction&oldid=1100501520 [Online; accessed 14-October-2022].
[63]
Xilinx. 2020a. Vitis Unified Software Development Platform 2020.2 Documentation. https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/index.html.
[64]
Xilinx. 2020b. Xilinx OpenNIC Shell. https://github.com/Xilinx/open-nic-shell
[65]
Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang. 2019. Knightking: a fast distributed graph random walk engine. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 524--537.
[66]
Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining. 283--292.
[67]
Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, and Haikun Liu. 2021. LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.
[68]
Shijie Zhou, Rajgopal Kannan, Viktor K Prasanna, Guna Seetharaman, and Qing Wu. 2019. HitGraph: High-throughput graph processing framework on FPGA. IEEE Transactions on Parallel and Distributed Systems, Vol. 30, 10 (2019), 2249--2264.

Cited By

View all
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 6-Aug-2024
  • (2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
  • (2024)An FPGA-Based Accelerator for Graph Embedding using Sequential Training Algorithm2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00040(148-154)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Badges

Author Tags

  1. FPGA accelerator
  2. parallel weighted reservoir sampling
  3. random walk on graphs

Qualifiers

  • Research-article

Funding Sources

  • AI Singapore Programme
  • Google South \& Southeast Asia Research Award 2022
  • AMD Heterogeneous Accelerated Compute Clusters (HACC) program
  • Ministry of Education AcRF Tier 2 grant

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)376
  • Downloads (Last 6 weeks)42
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 6-Aug-2024
  • (2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
  • (2024)An FPGA-Based Accelerator for Graph Embedding using Sequential Training Algorithm2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00040(148-154)Online publication date: 27-May-2024
  • (2024)Observations and Opportunities in Solving Large-Scale Graph Data Processing Challenges at ByteDance by Using Heterogeneous Hardware2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00470(5677-5678)Online publication date: 13-May-2024
  • (2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media