research-article

On Efficient Large Sparse Matrix Chain Multiplication

Authors:

Yuchi MaAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 3

Article No.: 156, Pages 1 - 27

https://doi.org/10.1145/3654959

Published: 30 May 2024 Publication History

Abstract

Sparse matrices are often used to model the interactions among different objects and they are prevalent in many areas including e-commerce, social network, and biology. As one of the fundamental matrix operations, the sparse matrix chain multiplication (SMCM) aims to efficiently multiply a chain of sparse matrices, which has found various real-world applications in areas like network analysis, data mining, and machine learning. The efficiency of SMCM largely hinges on the order of multiplying the matrices, which further relies on the accurate estimation of the sparsity values of intermediate matrices. Existing matrix sparsity estimators often struggle with large sparse matrices, because they suffer from the accuracy issue in both theory and practice. To enable efficient SMCM, in this paper we introduce a novel row-wise sparsity estimator (RS-estimator), a straightforward yet effective estimator that leverages matrix structural properties to achieve efficient, accurate, and theoretically guaranteed sparsity estimation. Based on the RS-estimator, we propose a novel ordering algorithm for determining a good order of efficient SMCM. We further develop an efficient parallel SMCM algorithm by effectively utilizing multiple CPU threads. We have conducted experiments by multiplying various chains of large sparse matrices extracted from five real-world large graph datasets, and the results demonstrate the effectiveness and efficiency of our proposed methods. In particular, our SMCM algorithm is up to three orders of magnitude faster than the state-of-the-art algorithms.

References

[1]

Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Benjamin Lipshitz, Oded Schwartz, and Sivan Toledo. 2013. Communication optimal parallel multiplication of sparse random matrices. In Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures. 222--231.

Digital Library

[2]

Henrik Barthels, Marcin Copik, and Paolo Bientinesi. 2018. The generalized matrix chain algorithm. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 138--148.

Digital Library

[3]

Girish Biswas and Nandini Mukherjee. 2021. Memory Optimized Dynamic Matrix Chain Multiplication Using Shared Memory in GPU. In International Conference on Distributed Computing and Internet Technology. 160--172.

[4]

Matthias Boehm, Douglas R Burdick, Alexandre V Evfimievski, Berthold Reinwald, Frederick R Reiss, Prithviraj Sen, Shirish Tatikonda, and Yuanyuan Tian. 2014. SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 37, 3 (2014), 52--62.

[5]

Aydin Buluc and John R Gilbert. 2008. Challenges and advances in parallel sparse matrix-matrix multiplication. In 2008 37th International Conference on Parallel Processing. IEEE, 503--510.

Digital Library

[6]

Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--11.

[7]

Timothy M Chan. 2007. More algorithms for all-pairs shortest paths in weighted graphs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. 590--598.

Digital Library

[8]

Serafeim Chatzopoulos, Thanasis Vergoulis, Dimitrios Skoutas, Theodore Dalamagas, Christos Tryfonopoulos, and Panagiotis Karras. 2022. arXiv preprint arXiv:2201.04058 (2022).

[9]

Yuedan Chen, Kenli Li, Wangdong Yang, Guoqing Xiao, Xianghui Xie, and Tao Li. 2018. Performance-aware model for sparse matrix-matrix multiplication on the sunway taihulight supercomputer. IEEE transactions on parallel and distributed systems 30, 4 (2018), 923--938.

[10]

Igor Chikalov, Shahid Hussain, and Mikhail Moshkov. 2011. Sequential optimization of matrix chain multiplication relative to different cost functions. In SOFSEM 2011: Theory and Practice of Computer Science: 37th Conference on Current Trends in Theory and Practice of Computer Science, Novy Smokovec, Slovakia, January 22--28, 2011. Proceedings 37. Springer, 157--165.

[11]

Edith Cohen. 1994. Estimating the size of the transitive closure in linear time. In Proceedings 35th Annual Symposium on Foundations of Computer Science. IEEE, 190--200.

Digital Library

[12]

Edith Cohen. 1998. Structure prediction and computation of sparse matrix products. Journal of Combinatorial Optimization 2 (1998), 307--332.

[13]

Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M Hellerstein, and Caleb Welton. 2009. MAD skills: new analysis practices for big data. Proceedings of the VLDB Endowment 2, 2 (2009), 1481--1492.

Digital Library

[14]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press.

[15]

Steven Dalton, Luke Olson, and Nathan Bell. 2015. Optimizing sparse matrix-matrix multiplication for the gpu. ACM Transactions on Mathematical Software (TOMS) 41, 4 (2015), 1--20.

Digital Library

[16]

Gunduz Vehbi Demirci and Cevdet Aykanat. 2020. Scaling sparse matrix-matrix multiplication in the accumulo database. Distributed and Parallel Databases 38 (2020), 31--62.

Digital Library

[17]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135--144.

Digital Library

[18]

Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. Proceedings of the VLDB Endowment 13, 6 (2020), 854--867.

Digital Library

[19]

Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1797--1806.

[20]

Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. 2020. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of The Web Conference 2020. 2331--2341.

Digital Library

[21]

Vijay Gadepally, Jake Bolewski, Dan Hook, Dylan Hutchison, Ben Miller, and Jeremy Kepner. 2015. Graphulo: Linear algebra graph kernels for nosql databases. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE, 822--830.

Digital Library

[22]

Jianhua Gao, Weixing Ji, Fangli Chang, Shiyu Han, Bingxin Wei, Zeming Liu, and Yizhuo Wang. 2023. A systematic survey of general sparse matrix-matrix multiplication. Comput. Surveys 55, 12 (2023), 1--36.

Digital Library

[23]

John R Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse matrices in MATLAB: Design and implementation. SIAM journal on matrix analysis and applications 13, 1 (1992), 333--356.

[24]

Sadashiva S Godbole. 1973. On efficient computation of matrix chain products. IEEE Trans. Comput. 100, 9 (1973), 864--866.

[25]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.

Digital Library

[26]

Fred G Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS) 4, 3 (1978), 250--269.

Digital Library

[27]

Yu He, Yangqiu Song, Jianxin Li, Cheng Ji, Jian Peng, and Hao Peng. 2019. Hetespaceywalk: A heterogeneous spacey random walk for heterogeneous information network embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 639--648.

Digital Library

[28]

Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319--333.

Digital Library

[29]

Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. 2020. Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

[30]

Dylan Hutchison, Bill Howe, and Dan Suciu. 2017. LaraDB: A minimalist kernel for linear and relational algebra computation. In Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond. 1--10.

Digital Library

[31]

Moritz Kaufmann, Manuel Then, Alfons Kemper, and Thomas Neumann. 2017. Parallel Array-Based Single-and Multi-Source Breadth First Searches on Large Dense Graphs. In EDBT. 1--12.

[32]

David Kernert, Frank Köhler, and Wolfgang Lehner. 2015. SpMacho-Optimizing Sparse Linear Algebra Expressions with Probabilistic Density Estimation. In EDBT. 289--300.

[33]

Bogyeong Kim, Kyoseung Koo, Undraa Enkhbat, Sohyun Kim, Juhun Kim, and Bongki Moon. 2022. M2Bench: A Database Benchmark for Multi-Model Analytic Workloads. Proceedings of the VLDB Endowment 16, 4 (2022), 747--759.

Digital Library

[34]

Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81 (2010), 53--67.

[35]

Jeongmyung Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, and Yongjun Park. 2020. Optimization of GPU-based sparse matrix multiplication for large sparse networks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 925--936.

[36]

Charles Eric Leiserson, Ronald L Rivest, Thomas H Cormen, and Clifford Stein. 1994. Introduction to algorithms. Vol. 3. MIT press Cambridge, MA, USA.

[37]

Jiajun Li, Ahmed Louri, Avinash Karanth, and Razvan Bunescu. 2021. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 775--788.

[38]

Keqin Li. 2007. Analysis of parallel algorithms for matrix chain product and matrix powers on distributed memory systems. IEEE Transactions on Parallel and Distributed Systems 18, 7 (2007), 865--878.

Digital Library

[39]

Colin Yu Lin, Ngai Wong, and Hayden Kwok-Hay So. 2013. Design space exploration for sparse matrix-matrix multiplication on FPGAs. International Journal of Circuit Theory and Applications 41, 2 (2013), 205--219.

[40]

Weifeng Liu and Brian Vinter. 2014. An efficient GPU general sparse matrix-matrix multiplication for irregular data. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 370--381.

Digital Library

[41]

Weifeng Liu and Brian Vinter. 2015. A framework for general sparse matrix--matrix multiplication on GPUs and heterogeneous processors. J. Parallel and Distrib. Comput. 85 (2015), 47--61.

Digital Library

[42]

Jaeseok Myung and Sang-goo Lee. 2012. Matrix chain multiplication via multi-way join algorithms in MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication. 1--5.

Digital Library

[43]

Kazufumi Nishida, Yasuaki Ito, and Koji Nakano. 2011. Accelerating the dynamic programming for the matrix chain product on the GPU. In 2011 Second International Conference on Networking and Computing. IEEE, 320--326.

[44]

Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu. 2022. TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 90--106.

Digital Library

[45]

NVIDIA. 2020. Nvidia cuSPARSE library. Retrieved from https://developer.nvidia.com/cusparse.

[46]

Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. Outerspace: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 724--736.

[47]

Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G Pudov, Vadim O Pirogov, and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In International Conference on High Performance Computing. Springer, 48--57.

[48]

Hao Peng, Ruitong Zhang, Yingtong Dou, Renyu Yang, Jingyi Zhang, and Philip S Yu. 2021. Reinforced neighborhood selection guided multi-relational graph neural networks. ACM Transactions on Information Systems (TOIS) 40, 4 (2021), 1--46.

Digital Library

[49]

Berthold Reinwald, Shirish Tatikonda, and Yuanyuan Tian. 2016. Sparsity-driven matrix representation to optimize operational and storage efficiency. US Patent 9,396,164.

[50]

Hongbo Rong, Jongsoo Park, Lingxiang Xiang, Todd A Anderson, and Mikhail Smelyanskiy. 2016. Sparso: Contextdriven optimizations of sparse linear algebra. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. 247--259.

Digital Library

[51]

Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydin Buluç. 2020. Optimizing high performance Markov clustering for pre-exascale architectures. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 116--126.

[52]

Chuan Shi, Binbin Hu, Wayne Xin Zhao, and S Yu Philip. 2018. Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering 31, 2 (2018), 357--370.

Digital Library

[53]

Johanna Sommer, Matthias Boehm, Alexandre V Evfimievski, Berthold Reinwald, and Peter J Haas. 2019. Mnc: Structure-exploiting sparsity estimation for matrix expressions. In Proceedings of the 2019 International Conference on Management of Data. 1607--1623.

Digital Library

[54]

Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 766--780.

[55]

Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.

Digital Library

[56]

Manuel Then, Moritz Kaufmann, Fernando Chirigati, Tuan-Anh Hoang-Vu, Kien Pham, Alfons Kemper, Thomas Neumann, and Huy T Vo. 2014. The more the merrier: Efficient multi-source graph traversal. Proceedings of the VLDB Endowment 8, 4 (2014), 449--460.

Digital Library

[57]

Stijn Marinus Van Dongen. 2000. Graph clustering by flow simulation. Ph.D. Dissertation.

[58]

Virginia Vassilevska, Ryan Williams, and Raphael Yuster. 2010. Finding heaviest H-subgraphs in real weighted graphs, with applications. ACM Transactions on Algorithms (TALG) 6, 3 (2010), 1--23.

Digital Library

[59]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel Math Kernel Library. Springer International Publishing, 167--188.

[60]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous graph attention network. In The world wide web conference. 2022--2032.

[61]

Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, and Markus Steinberger. 2019. Adaptive sparse matrix-matrix multiplication on the GPU. In Proceedings of the 24th symposium on principles and practice of parallel programming. 68--81.

Digital Library

[62]

Michael M Wolf, Mehmet Deveci, Jonathan W Berry, Simon D Hammond, and Sivasankaran Rajamanickam. 2017. Fast linear algebra-based triangle counting with kokkoskernels. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.

[63]

Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 901--912.

[64]

Yongyang Yu, Mingjie Tang, Walid G Aref, Qutaibah M Malluhi, Mostafa M Abbas, and Mourad Ouzzani. 2017. Inmemory distributed matrix computation processing and optimization. In 2017 IEEE 33rd International conference on data engineering (ICDE). IEEE, 1047--1058.

[65]

Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph transformer networks. Advances in neural information processing systems 32 (2019).

[66]

Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018. Metagraph2vec: Complex semantic path augmented heterogeneous network embedding. In Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3--6, 2018, Proceedings, Part II 22. Springer, 196--208.

Digital Library

[67]

Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. 2020. Sparch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261--274.

[68]

Yingli Zhou, Yixiang Fang, Wensheng Luo, and Yunming Ye. 2023. Influential community search over large heterogeneous information networks. Proceedings of the VLDB Endowment 16, 8 (2023), 2047--2060.

Digital Library

Index Terms

On Efficient Large Sparse Matrix Chain Multiplication
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our ...
Improved sparse low-rank matrix estimation

We consider estimating simultaneously sparse and low-rank matrices from their noisy observations.We use non-convex penalty functions that are parameterized to ensure strict convexity of the overall objective function.An ADMM based algorithm is derived ...
Optimizing Sparse Matrix Vector Multiplication Using Diagonal Storage Matrix Format
HPCC '10: Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications

Sparse matrix vector multiplication (SpMV) is used in many scientific computations. The main bottleneck of this algorithm is memory bandwidth and many methods reduce memory bandwidth usage by compressing the index array. The matrices from finite ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 2, Issue 3

SIGMOD

June 2024

1953 pages

EISSN:2836-6573

DOI:10.1145/3670010

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024

Published in PACMMOD Volume 2, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC
Basic and Applied Basic Research Fund in Guangdong Province
Guangdong Talent Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
225
Total Downloads

Downloads (Last 12 months)225
Downloads (Last 6 weeks)65

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents