research-article

Public Access

Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra

Authors:

Dimitrije Jankov,

Chris JermaineAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 1222 - 1234

https://doi.org/10.1145/3448016.3457317

Published: 18 June 2021 Publication History

Abstract

Machine learning (ML) computations are often expressed using vectors, matrices, or higher-dimensional tensors. Such data structures can have many different implementations, especially in a distributed environment: a matrix could be stored as row or column vectors, tiles of different sizes, or relationally, as a set of (rowIndex, colIndex, value) triples. Many other storage formats are possible. The choice of format can have a profound impact on the performance of a ML computation. In this paper, we propose a framework for automatic optimization of the physical implementation of a complex ML or linear algebra (LA) computation in a distributed environment, develop algorithms for solving this problem, and show, through a prototype on top of a distributed relational database system, that our ideas can radically speed up common ML and LA computations.

Supplementary Material

MP4 File (3448016.3457317.mp4)

Machine learning (ML) computations are often expressed using vectors, matrices, or higher-dimensional tensors. Such data structures can have many different implementations, especially in a distributed environment: a matrix could be stored as row or column vectors, tiles of different sizes, or relationally, as a set of (rowIndex, colIndex, value) triples. Many other storage formats are possible. The choice of format can have a profound impact on the performance of a ML computation. In this paper, we propose a framework for automatic optimization of the physical implementation of a complex ML or linear algebra computation in a distributed environment, develop algorithms for solving this problem, and show, through a prototype on top of a distributed relational database system, that our ideas can radically speed up common ML and LA computations.

Download
25.89 MB

References

[1]

2017. PyTorch. http://pytorch.org. Accessed Sep 1, 2018.

[2]

2021. SystemDS. https://systemds.apache.org/. Accessed Feb 1, 2021.

[3]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI 16. USENIX Association, GA, 265--283.

[4]

Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: a language and compiler for algorithmic choice. ACM Sigplan Notices, Vol. 44, 6 (2009), 38--49.

Digital Library

[5]

Shivnath Babu, Pedro Bizarro, and David DeWitt. 2005. Proactive Re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Baltimore, Maryland) (SIGMOD '05). ACM, New York, NY, USA, 107--118. https://doi.org/10.1145/1066157.1066171

Digital Library

[6]

Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel. 1997. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In ACM International Conference on Supercomputing 25th Anniversary Volume. 253--260.

Digital Library

[7]

L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, et al. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software, Vol. 28, 2 (2002), 135--151.

Digital Library

[8]

Matthias Boehm, Michael W Dusenberry, Deron Eriksson, Alexandre V Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick R Reiss, Prithviraj Sen, Arvind C Surve, et al. 2016. Systemml: Declarative machine learning on spark. Proceedings of the VLDB Endowment, Vol. 9, 13 (2016), 1425--1436.

Digital Library

[9]

Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J Haas, and Christopher Jermaine. 2013. Simulation of database-valued Markov chains using SimSQL. In SIGMOD 2013. ACM, 637--648.

Digital Library

[10]

Moses Charikar, Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. 2000. Towards estimation error guarantees for distinct values. In Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 268--279.

Digital Library

[11]

T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015).

[12]

Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May OÕReilly, and Saman Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. 379Ð390.

[13]

Ahmed Elgohary, Matthias Boehm, Peter J Haas, Frederick R Reiss, and Berthold Reinwald. 2016. Compressed linear algebra for large-scale machine learning. Proceedings of the VLDB Endowment, Vol. 9, 12 (2016), 960--971.

Digital Library

[14]

Diego Fabregat-Traver and Paolo Bientinesi. 2013. Application-tailored linear algebra algorithms: A search-based approach. The International journal of high performance computing applications, Vol. 27, 4 (2013), 426--439.

Digital Library

[15]

Joseph Felsenstein. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of molecular evolution, Vol. 17, 6 (1981), 368--376.

[16]

Matteo Frigo and Steven G Johnson. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181), Vol. 3. IEEE, 1381--1384.

[17]

Amol Ghoting, Rajasekar Krishnamurthy, Edwin Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. 2011. SystemML: Declarative machine learning on MapReduce. In ICDE. 231--242.

[18]

F.A. Graybill. 1983. Matrices with applications in statistics .Wadsworth International Group. 82008485

[19]

W Daniel Hillis and Guy L Steele Jr. 1986. Data parallel algorithms. Commun. ACM, Vol. 29, 12 (1986), 1170--1183.

Digital Library

[20]

Botong Huang, Shivnath Babu, and Jun Yang. 2013. Cumulon: optimizing statistical data analysis in the cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1--12.

Digital Library

[21]

Chien-Chin Huang, Qi Chen, Zhaoguo Wang, Russell Power, Jorge Ortiz, Jinyang Li, and Zhen Xiao. 2015. Spartan: A distributed array framework with smart tiling. In 2015 $$USENIX$$ Annual Technical Conference ($$USENIX$$$$ATC$$ 15). 1--15.

[22]

Yannis E. Ioannidis and Stavros Christodoulakis. 1991. On the Propagation of Errors in the Size of Join Results. In Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data (Denver, Colorado, USA) (SIGMOD '91). ACM, New York, NY, USA, 268--277. https://doi.org/10.1145/115790.115835

Digital Library

[23]

Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, and Zekai J Gao. 2019. Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning. Proceedings of the VLDB Endowment, Vol. 12, 7 (2019), 822--835.

Digital Library

[24]

Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 2279--2288.

[25]

Navin Kabra and David J. DeWitt. 1998. Efficient Mid-query Re-optimization of Sub-optimal Query Execution Plans. SIGMOD Rec., Vol. 27, 2 (June 1998), 106--117. https://doi.org/10.1145/276305.276315

Digital Library

[26]

David Kernert, Wolfgang Lehner, and Frank Köhler. 2016. Topology-aware optimization of big sparse matrices and matrix multiplications on main-memory systems. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 823--834.

[27]

Shangyu Luo, Zekai J Gao, Michael Gubanov, Luis L Perez, and Christopher Jermaine. 2017. Scalable linear algebra on a relational database system. In ICDE 2017. IEEE, 523--534.

[28]

Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015a. Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 785--794.

Digital Library

[29]

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015b. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43--52.

Digital Library

[30]

Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing, Vol. 14, 9 (2011).

[31]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, Vol. 12 (2011), 2825--2830.

[32]

Markus Puschel, José MF Moura, Jeremy R Johnson, David Padua, Manuela M Veloso, Bryan W Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, et al. 2005. SPIRAL: Code generation for DSP transforms. Proc. IEEE, Vol. 93, 2 (2005), 232--275.

[33]

Johanna Sommer, Matthias Boehm, Alexandre V Evfimievski, Berthold Reinwald, and Peter J Haas. 2019. Mnc: Structure-exploiting sparsity estimation for matrix expressions. In Proceedings of the 2019 International Conference on Management of Data. 1607--1623.

Digital Library

[34]

Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2020. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355.

[35]

Michael J Voss and Rudolf Eigemann. 2001. High-level adaptive program optimization with ADAPT. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming. 93--102.

Digital Library

[36]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phiª. Springer, 167--188.

[37]

Minjie Wang, Chien-chin Huang, and Jinyang Li. 2019. Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--17.

Digital Library

[38]

R Clinton Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In SC'98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE, 38--38.

[39]

Lele Yu, Yingxia Shao, and Bin Cui. 2015. Exploiting matrix dependency for efficient distributed matrix computation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 93--105.

Digital Library

[40]

Yongyang Yu, Mingjie Tang, Walid G Aref, Qutaibah M Malluhi, Mostafa M Abbas, and Mourad Ouzzani. 2017. In-memory distributed matrix computation processing and optimization. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 1047--1058.

[41]

Jia Zou, R Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, and Chris Jermaine. 2018. PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In Proceedings of the 2018 International Conference on Management of Data. 1189--1204.

Digital Library

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://doi.org/10.1145/3639307
Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang C(2024)Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systemsThe VLDB Journal10.1007/s00778-024-00845-033:5(1231-1255)Online publication date: 12-Apr-2024
https://doi.org/10.1007/s00778-024-00845-0
Baunsgaard SBoehm M(2023)AWARE: Workload-aware, Redundancy-exploiting Linear AlgebraProceedings of the ACM on Management of Data10.1145/35886821:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588682
Show More Cited By

Index Terms

Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra
1. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information systems applications
    1. Computing platforms
    2. Decision support systems
      1. Data analytics

Recommendations

Studies in numerical linear algebra
Matrix Algebra: Theory, Computations, and Applications in Statistics
Black box linear algebra: extending wiedemann's analysis of a sparse matrix preconditioner for computations over small fields

Wiedemann's paper, introducing his algorithm for sparse and structured matrix computations over arbitrary fields, also presented a pair of matrix preconditioners for computations over small fields. The analysis of the second of these is extended here in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

June 2021

2969 pages

ISBN:9781450383431

DOI:10.1145/3448016

General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SIGMOD/PODS '21

Sponsor:

SIGMOD

SIGMOD/PODS '21: International Conference on Management of Data

June 20 - 25, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
814
Total Downloads

Downloads (Last 12 months)224
Downloads (Last 6 weeks)32

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://doi.org/10.1145/3639307
Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang C(2024)Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systemsThe VLDB Journal10.1007/s00778-024-00845-033:5(1231-1255)Online publication date: 12-Apr-2024
https://doi.org/10.1007/s00778-024-00845-0
Baunsgaard SBoehm M(2023)AWARE: Workload-aware, Redundancy-exploiting Linear AlgebraProceedings of the ACM on Management of Data10.1145/35886821:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588682
Choi ULee K(2023) D ense o r S parse : Elastic SPMM Implementation for Optimal Big-Data Processing IEEE Transactions on Big Data10.1109/TBDATA.2022.31991979:2(637-652)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TBDATA.2022.3199197
Olteanu DVortmeier NŽivanović Ɖ(2023)Givens rotations for QR decomposition, SVD and PCA over database joinsThe VLDB Journal10.1007/s00778-023-00818-933:4(1013-1037)Online publication date: 23-Nov-2023
https://doi.org/10.1007/s00778-023-00818-9
Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang CIves ZBonifati AEl Abbadi A(2022)In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data ShuffleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526150(1286-1300)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526150

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten