research-article

Public Access

Loop and data transformations for sparse matrix code

Authors:

Michelle StroutAuthors Info & Claims

ACM SIGPLAN Notices, Volume 50, Issue 6

Pages 521 - 532

https://doi.org/10.1145/2813885.2738003

Published: 03 June 2015 Publication History

Abstract

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.

References

[1]

H. M. Aktulga, A. Buluç, S. Williams, and C. Yang. Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14, 2014.

Digital Library

[2]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2002.

Digital Library

[3]

C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 1991.

Digital Library

[4]

G. Arnold, J. Hölzl, A. S. Köksal, R. Bodík, and M. Sagiv. Specifying and verifying sparse matrix codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP), ICFP ’10, pages 249–260, New York, NY, USA, 2010. ACM.

Digital Library

[5]

S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.1, Argonne National Laboratory, 2010.

[6]

A. Basumallik and R. Eigenmann. Optimizing irregular sharedmemory applications for distributed-memory systems. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, 2006.

Digital Library

[7]

M. Belgin, G. Back, and C. J. Ribbens. Pattern-based Sparse Matrix Representation for Memory-Efficient SMVM Kernels. In ICS ’09, pages 100–109, 2009.

Digital Library

[8]

N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of SC ’09, Nov. 2009.

Digital Library

[9]

A. Bik and H. Wijshoff. On automatic data structure selection and code generation for sparse computations. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 768 of Lecture Notes in Computer Science, pages 57–75. Springer Berlin Heidelberg, 1994.

Digital Library

[10]

A. Bik and H. A. Wijshoff. Advanced compiler optimizations for sparse computations. In Supercomputing ’93 Proceedings, pages 430– 439, Nov 1993.

Digital Library

[11]

A. J. C. Bik. Compiler Support for Sparse Matrix Computations. PhD thesis, Leiden University, May 1996.

[12]

A. J. C. Bik and H. A. G. Wijshoff. Automatic data structure selection and transformation for sparse matrix computations. IEEE Trans. Parallel Distrib. Syst., 7(2):109–126, 1996.

Digital Library

[13]

A. Buluç and J. R. Gilbert. Highly parallel sparse matrix-matrix multiplication. CoRR, abs/1006.2183, 2010.

[14]

A. Buluç and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications, 2010.

[15]

C. Chen. Polyhedra scanning revisited. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI ’12, pages 499–508, June 2012.

Digital Library

[16]

T. Davis. The University of Florida Sparse Matrix Collection. NA Digest, 97, 1997.

[17]

C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229–241, New York, NY, USA, May 1999. ACM.

Digital Library

[18]

P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79–103, 1996.

Digital Library

[19]

M. Hall, J. Chame, J. Shin, C. Chen, G. be Rudy, and M. M. Khan. Loop transformation recipes for code generation and auto-tuning. In LCPC, October, 2009.

Digital Library

[20]

H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17(7): 606–618, 2006.

Digital Library

[21]

W. A. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, University of Maryland, Dec. 1996.

Digital Library

[22]

V. Kotlyar and K. Pingali. Sparse code generation for imperfectly nested loops with dependences. In Proceedings of the 11th International Conference on Supercomputing, ICS ’97, 1997.

Digital Library

[23]

V. Kotlyar, K. Pingali, and P. Stodghill. A relational approach to the compilation of sparse matrix programs. In C. Lengauer, M. Griebl, and S. Gorlatch, editors, Euro-Par’97 Parallel Processing, volume 1300 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1997.

Digital Library

[24]

Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, May 2000.

Digital Library

[25]

X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 273–282, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2130-3.

Digital Library

[26]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716–727, Apr. 2012. ISSN 2150-8097.

Digital Library

[27]

D. Lowell, J. Godwin, J. Holewinski, D. Karthik, C. Choudary, A. Mametjanov, B. Norris, G. Sabin, P. Sadayappan, and J. Sarich. Stencil-aware gpu optimization of iterative solvers. SIAM J. Scientific Computing, pages –1–1, 2013.

[28]

N. Mateev, K. Pingali, P. Stodghill, and V. Kotlyar. Next-generation generic programming and its application to sparse matrix computations. In Proceedings of the 14th International Conference on Supercomputing, pages 88–99, Santa Fe, New Mexico, USA, May 2000.

Digital Library

[29]

J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225–236, 2004.

Digital Library

[30]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming, 29(3):217–247, 2001.

Digital Library

[31]

R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the 2nd International Conference on Supercomputing, pages 140–152, 1988.

Digital Library

[32]

N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192–202, October 1999.

Digital Library

[33]

C. Oancea and L. Rauchwerger. A hybrid approach to proving memory reference monotonicity. In S. Rajopadhye and M. Mills Strout, editors, Languages and Compilers for Parallel Computing, volume 7146 of Lecture Notes in Computer Science, pages 61–75. Springer Berlin Heidelberg, 2013.

[34]

W. Pugh and T. Shpeisman. Sipr: A new framework for generating efficient code for sparse matrix computations. In Proceedings of the Eleventh International Workshop on Languages and Compilers for Parallel Computing, Chapel Hill, North Carolina, August 1998.

Digital Library

[35]

F. Quilleré and S. Rajopadhye. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28 (5):469–498, Oct. 2000.

Digital Library

[36]

L. Rauchwerger and D. Padua. The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, PLDI ’95, 1995.

Digital Library

[37]

M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of SC’12, November 2012.

Digital Library

[38]

G. Rudy, M. M. Khan, M. Hall, C. Chen, and C. Jacqueline. A programming language interface to describe transformations and code generation. In Proceedings of the 23rd international conference on Languages and com pilers for parallel computing, LCPC’10, pages 136–150, Berlin, Heidelberg, 2011. Springer-Verlag.

Digital Library

[39]

J. Saltz, C. Chang, G. Edjlali, Y.-S. Hwang, B. Moon, R. Ponnusamy, S. Sharma, A. Sussman, M. Uysal, G. Agrawal, R. Das, and P. Havlak. Programming irregular applications: Runtime support, compilation and tools. Advances in Computers, 45:105–153, 1997.

[40]

M. Shantharam, A. Chatterjee, and P. Raghavan. Exploiting dense substructures for fast sparse matrix vector multiplication. Int. J. High Perform. Comput. Appl., 25(3):328–341, Aug. 2011.

Digital Library

[41]

M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An approach for code generation in the sparse polyhedral framework. Technical Report CS-13-109, Colorado State University, December 2013.

[42]

H. van der Spek and H. Wijshoff. Sublimation: Expanding data structures to enable data instance specific optimizations. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, pages 106–120. Springer Berlin / Heidelberg, 2010.

Digital Library

[43]

N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proceedings of the 15th International Conference on Compiler Construction, Mar. 2006.

Digital Library

[44]

A. Venkat, M. Shantharam, M. Hall, and M. M. Strout. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, 2014.

Digital Library

[45]

R. Vuduc and H. Moon. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the High Performance Computing and Communications, volume 3726 of LNCS, pages 807–816. Springer, 2005. ISBN 978-3-540-29031-5.

Digital Library

[46]

R. Vuduc, J. W. Demmel, and K. A. Yelick. Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521–530, 2005.

[47]

R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, 2003.

Digital Library

[48]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178 – 194, 2009.

Digital Library

[49]

S. Williams, N. Bell, J. Choi, M. Garland, L. Oliker, and R. Vuduc. Sparse matrix vector multiplication on multicore and accelerator systems. In J. Kurzak, D. A. Bader, and J. Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press, 2010.

[50]

B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize noncoalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’13, 2013.

Digital Library

Cited By

Hsu OStrange MSharma RWon JOlukotun KEmer JHorowitz MKjølstad FAamodt TJerger NSwift M(2023)The Sparse Abstract MachineProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582051(710-726)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582051
Popoola TZhao TSt. George ABhetwal KStrout MHall MOlschanowsky CDubach CBruening DHardekopf B(2023)Code Synthesis for Sparse Tensor Format Conversion and OptimizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580021(28-40)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580021
Tseng SHan SWierman A(2023)Trading Throughput for Freshness: Freshness-aware Traffic Engineering and In-Network Freshness ControlACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35769198:1-2(1-26)Online publication date: 7-Mar-2023
https://dl.acm.org/doi/10.1145/3576919
Show More Cited By

Index Terms

Loop and data transformations for sparse matrix code
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages

Recommendations

Loop and data transformations for sparse matrix code
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and ...
Non-affine Extensions to Polyhedral Code Generation
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

This paper describes a loop transformation framework that extends a polyhedral representation of loop nests to represent and transform computations with non-affine index arrays in loop bounds and subscripts via a new interface between compile-time and ...
Non-affine Extensions to Polyhedral Code Generation
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

This paper describes a loop transformation framework that extends a polyhedral representation of loop nests to represent and transform computations with non-affine index arrays in loop bounds and subscripts via a new interface between compile-time and ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 50, Issue 6

PLDI '15

June 2015

630 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2813885

Editor:
Andy Gill
University of Kansas, Lawrence, KS

Issue’s Table of Contents

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
General Chair:
David Grove
IBM Research, USA
,
Program Chair:
Steve Blackburn
Australian National University, Australia

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2015

Published in SIGPLAN Volume 50, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

89
Total Citations
View Citations
2,369
Total Downloads

Downloads (Last 12 months)278
Downloads (Last 6 weeks)41

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hsu OStrange MSharma RWon JOlukotun KEmer JHorowitz MKjølstad FAamodt TJerger NSwift M(2023)The Sparse Abstract MachineProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582051(710-726)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582051
Popoola TZhao TSt. George ABhetwal KStrout MHall MOlschanowsky CDubach CBruening DHardekopf B(2023)Code Synthesis for Sparse Tensor Format Conversion and OptimizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580021(28-40)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580021
Tseng SHan SWierman A(2023)Trading Throughput for Freshness: Freshness-aware Traffic Engineering and In-Network Freshness ControlACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35769198:1-2(1-26)Online publication date: 7-Mar-2023
https://dl.acm.org/doi/10.1145/3576919
Zhao TLi WQin BWang LZomaya A(2023)Pulsed Power Load Coordination in Mission- and Time-critical Cyber-physical SystemsACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35731978:1-2(1-27)Online publication date: 7-Mar-2023
https://dl.acm.org/doi/10.1145/3573197
(2023)BibliographyEngineering a Compiler10.1016/B978-0-12-815412-0.00023-1(793-813)Online publication date: 2023
https://doi.org/10.1016/B978-0-12-815412-0.00023-1
Cooper KTorczon L(2023)Overview of CompilationEngineering a Compiler10.1016/B978-0-12-815412-0.00007-3(1-26)Online publication date: 2023
https://doi.org/10.1016/B978-0-12-815412-0.00007-3
Zhang GZhao YTao YYu ZDai GHuang SWen YPetoumenos PWang Y(2023)Sgap: towards efficient sparse tensor algebra compilation for GPUCCF Transactions on High Performance Computing10.1007/s42514-023-00140-45:2(210-227)Online publication date: 8-May-2023
https://doi.org/10.1007/s42514-023-00140-4
Guo JLiu JWang QZhu X(2023)Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3sAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0801-7_2(22-39)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-97-0801-7_2
Pushak YHoos H(2022)AutoML Loss LandscapesACM Transactions on Evolutionary Learning and Optimization10.1145/35587742:3(1-30)Online publication date: 22-Nov-2022
https://dl.acm.org/doi/10.1145/3558774
You XLiu CYang HWang PLuan ZQian D(2022)Vectorizing SpMV by Exploiting Dynamic Regular PatternsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545042(1-12)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545042
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Issue’s Table of Contents