Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Loop and data transformations for sparse matrix code

Published: 03 June 2015 Publication History

Abstract

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.

References

[1]
H. M. Aktulga, A. Buluç, S. Williams, and C. Yang. Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14, 2014.
[2]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2002.
[3]
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 1991.
[4]
G. Arnold, J. Hölzl, A. S. Köksal, R. Bodík, and M. Sagiv. Specifying and verifying sparse matrix codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP), ICFP ’10, pages 249–260, New York, NY, USA, 2010. ACM.
[5]
S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.1, Argonne National Laboratory, 2010.
[6]
A. Basumallik and R. Eigenmann. Optimizing irregular sharedmemory applications for distributed-memory systems. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, 2006.
[7]
M. Belgin, G. Back, and C. J. Ribbens. Pattern-based Sparse Matrix Representation for Memory-Efficient SMVM Kernels. In ICS ’09, pages 100–109, 2009.
[8]
N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of SC ’09, Nov. 2009.
[9]
A. Bik and H. Wijshoff. On automatic data structure selection and code generation for sparse computations. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 768 of Lecture Notes in Computer Science, pages 57–75. Springer Berlin Heidelberg, 1994.
[10]
A. Bik and H. A. Wijshoff. Advanced compiler optimizations for sparse computations. In Supercomputing ’93 Proceedings, pages 430– 439, Nov 1993.
[11]
A. J. C. Bik. Compiler Support for Sparse Matrix Computations. PhD thesis, Leiden University, May 1996.
[12]
A. J. C. Bik and H. A. G. Wijshoff. Automatic data structure selection and transformation for sparse matrix computations. IEEE Trans. Parallel Distrib. Syst., 7(2):109–126, 1996.
[13]
A. Buluç and J. R. Gilbert. Highly parallel sparse matrix-matrix multiplication. CoRR, abs/1006.2183, 2010.
[14]
A. Buluç and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications, 2010.
[15]
C. Chen. Polyhedra scanning revisited. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI ’12, pages 499–508, June 2012.
[16]
T. Davis. The University of Florida Sparse Matrix Collection. NA Digest, 97, 1997.
[17]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229–241, New York, NY, USA, May 1999. ACM.
[18]
P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79–103, 1996.
[19]
M. Hall, J. Chame, J. Shin, C. Chen, G. be Rudy, and M. M. Khan. Loop transformation recipes for code generation and auto-tuning. In LCPC, October, 2009.
[20]
H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17(7): 606–618, 2006.
[21]
W. A. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, University of Maryland, Dec. 1996.
[22]
V. Kotlyar and K. Pingali. Sparse code generation for imperfectly nested loops with dependences. In Proceedings of the 11th International Conference on Supercomputing, ICS ’97, 1997.
[23]
V. Kotlyar, K. Pingali, and P. Stodghill. A relational approach to the compilation of sparse matrix programs. In C. Lengauer, M. Griebl, and S. Gorlatch, editors, Euro-Par’97 Parallel Processing, volume 1300 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1997.
[24]
Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, May 2000.
[25]
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 273–282, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2130-3.
[26]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716–727, Apr. 2012. ISSN 2150-8097.
[27]
D. Lowell, J. Godwin, J. Holewinski, D. Karthik, C. Choudary, A. Mametjanov, B. Norris, G. Sabin, P. Sadayappan, and J. Sarich. Stencil-aware gpu optimization of iterative solvers. SIAM J. Scientific Computing, pages –1–1, 2013.
[28]
N. Mateev, K. Pingali, P. Stodghill, and V. Kotlyar. Next-generation generic programming and its application to sparse matrix computations. In Proceedings of the 14th International Conference on Supercomputing, pages 88–99, Santa Fe, New Mexico, USA, May 2000.
[29]
J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225–236, 2004.
[30]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming, 29(3):217–247, 2001.
[31]
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the 2nd International Conference on Supercomputing, pages 140–152, 1988.
[32]
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192–202, October 1999.
[33]
C. Oancea and L. Rauchwerger. A hybrid approach to proving memory reference monotonicity. In S. Rajopadhye and M. Mills Strout, editors, Languages and Compilers for Parallel Computing, volume 7146 of Lecture Notes in Computer Science, pages 61–75. Springer Berlin Heidelberg, 2013.
[34]
W. Pugh and T. Shpeisman. Sipr: A new framework for generating efficient code for sparse matrix computations. In Proceedings of the Eleventh International Workshop on Languages and Compilers for Parallel Computing, Chapel Hill, North Carolina, August 1998.
[35]
F. Quilleré and S. Rajopadhye. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28 (5):469–498, Oct. 2000.
[36]
L. Rauchwerger and D. Padua. The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, PLDI ’95, 1995.
[37]
M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of SC’12, November 2012.
[38]
G. Rudy, M. M. Khan, M. Hall, C. Chen, and C. Jacqueline. A programming language interface to describe transformations and code generation. In Proceedings of the 23rd international conference on Languages and com pilers for parallel computing, LCPC’10, pages 136–150, Berlin, Heidelberg, 2011. Springer-Verlag.
[39]
J. Saltz, C. Chang, G. Edjlali, Y.-S. Hwang, B. Moon, R. Ponnusamy, S. Sharma, A. Sussman, M. Uysal, G. Agrawal, R. Das, and P. Havlak. Programming irregular applications: Runtime support, compilation and tools. Advances in Computers, 45:105–153, 1997.
[40]
M. Shantharam, A. Chatterjee, and P. Raghavan. Exploiting dense substructures for fast sparse matrix vector multiplication. Int. J. High Perform. Comput. Appl., 25(3):328–341, Aug. 2011.
[41]
M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An approach for code generation in the sparse polyhedral framework. Technical Report CS-13-109, Colorado State University, December 2013.
[42]
H. van der Spek and H. Wijshoff. Sublimation: Expanding data structures to enable data instance specific optimizations. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, pages 106–120. Springer Berlin / Heidelberg, 2010.
[43]
N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proceedings of the 15th International Conference on Compiler Construction, Mar. 2006.
[44]
A. Venkat, M. Shantharam, M. Hall, and M. M. Strout. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, 2014.
[45]
R. Vuduc and H. Moon. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the High Performance Computing and Communications, volume 3726 of LNCS, pages 807–816. Springer, 2005. ISBN 978-3-540-29031-5.
[46]
R. Vuduc, J. W. Demmel, and K. A. Yelick. Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521–530, 2005.
[47]
R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, 2003.
[48]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178 – 194, 2009.
[49]
S. Williams, N. Bell, J. Choi, M. Garland, L. Oliker, and R. Vuduc. Sparse matrix vector multiplication on multicore and accelerator systems. In J. Kurzak, D. A. Bader, and J. Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press, 2010.
[50]
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize noncoalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’13, 2013.

Cited By

View all
  • (2023)The Sparse Abstract MachineProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582051(710-726)Online publication date: 25-Mar-2023
  • (2023)Code Synthesis for Sparse Tensor Format Conversion and OptimizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580021(28-40)Online publication date: 17-Feb-2023
  • (2023)Trading Throughput for Freshness: Freshness-aware Traffic Engineering and In-Network Freshness ControlACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35769198:1-2(1-26)Online publication date: 7-Mar-2023
  • Show More Cited By

Index Terms

  1. Loop and data transformations for sparse matrix code

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 6
      PLDI '15
      June 2015
      630 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2813885
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2015
        630 pages
        ISBN:9781450334686
        DOI:10.1145/2737924
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 June 2015
      Published in SIGPLAN Volume 50, Issue 6

      Check for updates

      Author Tags

      1. inspector/executor
      2. loop transformations
      3. non-affine
      4. polyhedral model
      5. sparse matrices

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)278
      • Downloads (Last 6 weeks)41
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)The Sparse Abstract MachineProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582051(710-726)Online publication date: 25-Mar-2023
      • (2023)Code Synthesis for Sparse Tensor Format Conversion and OptimizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580021(28-40)Online publication date: 17-Feb-2023
      • (2023)Trading Throughput for Freshness: Freshness-aware Traffic Engineering and In-Network Freshness ControlACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35769198:1-2(1-26)Online publication date: 7-Mar-2023
      • (2023)Pulsed Power Load Coordination in Mission- and Time-critical Cyber-physical SystemsACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35731978:1-2(1-27)Online publication date: 7-Mar-2023
      • (2023)BibliographyEngineering a Compiler10.1016/B978-0-12-815412-0.00023-1(793-813)Online publication date: 2023
      • (2023)Overview of CompilationEngineering a Compiler10.1016/B978-0-12-815412-0.00007-3(1-26)Online publication date: 2023
      • (2023)Sgap: towards efficient sparse tensor algebra compilation for GPUCCF Transactions on High Performance Computing10.1007/s42514-023-00140-45:2(210-227)Online publication date: 8-May-2023
      • (2023)Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3sAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0801-7_2(22-39)Online publication date: 20-Oct-2023
      • (2022)AutoML Loss LandscapesACM Transactions on Evolutionary Learning and Optimization10.1145/35587742:3(1-30)Online publication date: 22-Nov-2022
      • (2022)Vectorizing SpMV by Exploiting Dynamic Regular PatternsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545042(1-12)Online publication date: 29-Aug-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media