Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Fast linear programming through transprecision computing on small and sparse data

Published: 13 November 2020 Publication History

Abstract

A plethora of program analysis and optimization techniques rely on linear programming at their heart. However, such techniques are often considered too slow for production use. While today’s best solvers are optimized for complex problems with thousands of dimensions, linear programming, as used in compilers, is typically applied to small and seemingly trivial problems, but to many instances in a single compilation run. As a result, compilers do not benefit from decades of research on optimizing large-scale linear programming. We design a simplex solver targeted at compilers. A novel theory of transprecision computation applied from individual elements to full data-structures provides the computational foundation. By carefully combining it with optimized representations for small and sparse matrices and specialized small-coefficient algorithms, we (1) reduce memory traffic, (2) exploit wide vectors, and (3) use low-precision arithmetic units effectively. We evaluate our work by embedding our solver into a state-of-the-art integer set library and implement one essential operation, coalescing, on top of our transprecision solver. Our evaluation shows more than an order-of-magnitude speedup on the core simplex pivot operation and a mean speedup of 3.2x (vs. GMP) and 4.6x (vs. IMath) for the optimized coalescing operation. Our results demonstrate that our optimizations exploit the wide SIMD instructions of modern microarchitectures effectively. We expect our work to provide foundations for a future integer set library that uses transprecision arithmetic to accelerate compiler analyses.

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p287-p-video.mp4)
A plethora of program analysis and optimization techniques rely on linear programming at their heart. However, such techniques are often considered too slow for production use. While today’s best solvers are optimized for complex problems with thousands of dimensions, linear programming, as used in compilers, is typically applied to small and seemingly trivial problems, but to many instances in a single compilation run. As a result, compilers do not benefit from decades of research on optimizing large-scale linear programming. We design a simplex solver targeted at compilers. A novel theory of transprecision computation applied from individual elements to full data-structures provides the computational foundation. By carefully combining it with optimized representations for small and sparse matrices and specialized small-coefficient algorithms, we (1) reduce memory traffic, (2) exploit wide vectors, and (3) use low-precision arithmetic units effectively.

References

[1]
Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F Donaldson, Jeroen Ketema, et al. 2015. Pencil: A platform-neutral compute intermediate language for accelerator programming. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 138-149. https://doi.org/10.1109/pact. 2015.17
[2]
Roberto Bagnara, Patricia M Hill, and Enea Zafanella. 2008. The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Science of Computer Programming 72, 1-2 ( 2008 ), 3-21. https://doi.org/10.1016/j.scico. 2007. 08.001
[3]
Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P Sadayappan. 2017. Analytical modeling of cache behavior for afine programs. Proceedings of the ACM on Programming Languages 2, POPL ( 2017 ), 32. https://doi.org/10.1145/3158120
[4]
Dimitris Bertsimas and John N Tsitsiklis. 1997. Introduction to linear optimization. Vol. 6. Athena Scientific Belmont, MA.
[5]
Erin Carson and Nicholas J Higham. 2018. Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM Journal on Scientific Computing 40, 2 ( 2018 ), A817-A847. https://doi.org/10.1137/17m1140819
[6]
C. Chambers and D. Ungar. 1989. Customization: Optimizing Compiler Technology for SELF, a Dynamically-typed Objectoriented Programming Language. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation (PLDI '89). 146-160. https://doi.org/10.1145/73141.74831
[7]
Sharan Chetlur, Clif Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Eficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ).
[8]
Richard Crandall and Carl B Pomerance. 2006. Prime numbers: a computational perspective. Vol. 182. Springer Science & Business Media. https://doi.org/10.2307/3621190
[9]
David Detlefs, Greg Nelson, and James B Saxe. 2005. Simplify: a theorem prover for program checking. Journal of the ACM (JACM) 52, 3 ( 2005 ), 365-473. https://doi.org/10.1145/1066100.1066102
[10]
Will Dietz, Peng Li, John Regehr, and Vikram Adve. 2015. Understanding integer overflow in C/C++. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 ( 2015 ), 2. https://doi.org/10.1109/icse. 2012.6227142
[11]
Paul Feautrier. 1988. Parametric integer programming. RAIRO-Operations Research 22, 3 ( 1988 ), 243-268. https://doi.org/10. 1051/ro/1988220302431
[12]
M. J. Fromberger. 2019. imath. https://github.com/creachadair/imath. Accessed: 2019-04-25.
[13]
Philip E Gill, Walter Murray, Michael A Saunders, and Margaret H Wright. 1984. Sparse matrix methods in optimization. SIAM J. Sci. Statist. Comput. 5, 3 ( 1984 ), 562-589. https://doi.org/10.21236/ada124397
[14]
Torbjrn Granlund et al. 2015. GNU MP 6.0 Multiple precision arithmetic library. Samurai Media Limited.
[15]
Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. 66-75. https://doi.org/10.1145/2544137.2544160
[16]
Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly-performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters 22, 04 ( 2012 ), 1250010. https://doi.org/10.1145/2925426.2926286
[17]
Tobias Grosser, Sven Verdoolaege, and Albert Cohen. 2015. Polyhedral AST generation is more than scanning polyhedra. ACM Transactions on Programming Languages and Systems (TOPLAS) 37, 4 ( 2015 ), 1-50. https://doi.org/10.1145/2743016
[18]
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737-1746.
[19]
Tobias Gysi, Tobias Grosser, Laurin Brandner, and Torsten Hoefler. 2019. A fast analytical model of fully associative caches. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 816-829. https://doi.org/10.1145/3314221.3314606
[20]
Christoph Haase. 2018. A survival guide to presburger arithmetic. ACM SIGLOG News 5, 3 ( 2018 ), 67-82. https://doi.org/10. 1145/3242953.3242964
[21]
Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, and Jack Dongarra. 2018a. The design of fast and energy-eficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques. In International Conference on Computational Science. Springer, 586-600. https://doi.org/10.1007/ 978-3-319-93698-7_45
[22]
Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J Higham. 2018b. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 47. https://doi.org/10.1109/sc. 2018.00050
[23]
Jared Hoberock. 2019. C+ + Extensions for Parallelism Version 2 ( Working Draft, N4808 ). Accessed: 2019-07-22.
[24]
Urs Hölzle, Craig Chambers, and David Ungar. 1992. Debugging Optimized Code with Dynamic Deoptimization. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation (PLDI '92). 32-43. https://doi.org/10.1145/143095.143114
[25]
Urs Hölzle and David Ungar. 1994. Optimizing Dynamically-dispatched Calls with Run-time Type Feedback. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI '94). 326-336. https: //doi.org/10.1145/178243.178478
[26]
Elias N Houstis, John R Rice, NP Chrisochoides, HC Karathanasis, PN Papochiou, EA Vavalis, and Ko Yang Wang. 1990. //ELLPACK: A Numerical Simulation Programming Environment for Parallel MIMD Machines. In Proceedings of the 4th International Conference on Supercomputing (ICS '90). Association for Computing Machinery, New York, NY, USA, 96-107. https://doi.org/10.1145/77726.255144
[27]
Thomas Kotzmann, Christian Wimmer, Hanspeter Mössenböck, Thomas Rodriguez, Kenneth Russell, and David Cox. 2008. Design of the Java HotSpot Client Compiler for Java 6. ACM Transactions on Architecture and Code Optimization (TACO) 5, 1, Article 7 (May 2008 ), 32 pages. https://doi.org/10.1145/1369396.1370017
[28]
Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R Bishop. 2014. A unified sparse matrix data format for eficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36, 5 ( 2014 ), C401-C423. https://doi.org/10.1137/130930352
[29]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: feedback-directed and runtime optimization. IEEE Computer Society, 75. https://doi.org/10.1109/cgo. 2004.1281665
[30]
Vincent Loechner. 1999. PolyLib: A library for manipulating parameterized polyhedra.
[31]
László Lovász and Herbert E Scarf. 1992. The generalized basis reduction algorithm. Mathematics of Operations Research 17, 3 ( 1992 ), 751-764. https://doi.org/10.1287/moor.17.3. 751
[32]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jefrey S Vetter. 2018. Nvidia tensor core programmability, performance & precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 522-531. https://doi.org/10.1109/ipdpsw. 2018.00091
[33]
Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In International Conference on High-Performance Embedded Architectures and Compilers. Springer, 111-125. https://doi.org/10.1007/978-3-642-11515-8_10
[34]
Charles Gregory Nelson. 1981. Techniques for program verification. Xerox. Palo Alto Research Center.
[35]
Philip Pfafe, Tobias Grosser, and Martin Tillmann. 2019. Eficient hierarchical online-autotuning: a case study on polyhedral accelerator mapping. In Proceedings of the ACM International Conference on Supercomputing. 354-366. https://doi.org/10. 1145/3330345.3330377
[36]
Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. ( 2012 ).
[37]
Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, and John Cavazos. 2008. Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). Association for Computing Machinery, New York, NY, USA, 90-100. https: //doi.org/10.1145/1375581.1375594
[38]
Louis-Noel Pouchet, Cedric Bastoul, Albert Cohen, and Nicolas Vasilache. 2007. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In International Symposium on Code Generation and Optimization (CGO'07). IEEE, 144-156. https://doi.org/10.1109/cgo. 2007.21
[39]
Mojzesz Presburger. 1929. Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt in Comptes Rendus du I congres de Mathématiciens des Pays Slaves. Slaves, Warsaw ( 1929 ), 92-101.
[40]
Manuel Rigger, Stefan Marr, Bram Adams, and Hanspeter Mössenböck. 2019. Understanding GCC Builtins to Develop Better Tools. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019 ). 74-85. https://doi.org/10.1145/3338906.3338907
[41]
A Schriver. 1986. Theory of integer and linear programming.
[42]
Ramakrishna Upadrasta and Albert Cohen. 2013. Sub-Polyhedral Scheduling Using (Unit-)Two-Variable-per-Inequality Polyhedra. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '13). Association for Computing Machinery, New York, NY, USA, 483-496. https://doi.org/10.1145/2429069.2429127
[43]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv: 1802. 04730 ( 2018 ).
[44]
Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In International Congress on Mathematical Software. Springer, 299-302. https://doi.org/10.1007/978-3-642-15582-6_49
[45]
Sven Verdoolaege. 2015. Integer set coalescing. In International Workshop on Polyhedral Compilation Techniques, Date: 2015 /01/19-2015/01/19, Location: Amsterdam, The Netherlands.
[46]
Sven Verdoolaege. 2020. Integer Set Library: Manual, Version 0.22.1. Retrieved from http://isl.gforge. inria.fr/manual.pdf on 31.08. 2020.
[47]
Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, Jose Ignacio Gomez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 ( 2013 ), 54. https://doi.org/10.1145/2400682.2400713
[48]
Josef Weidendorfer. 2008. Sequential performance analysis with callgrind and kcachegrind. In Tools for High Performance Computing. Springer, 93-113. https://doi.org/10.1007/978-3-540-68564-7_7
[49]
Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. 2013. One VM to Rule Them All. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward! 2013). 187-204. https: //doi.org/10.1145/2509578.2509581
[50]
Zahari Zlatev. 1991. Sparse Matrix Technique for Ordinary Diferential Equations. Springer Netherlands, 131-154. https: //doi.org/10.1007/ 978-94-017-1116-6_8

Cited By

View all
  • (2021)Dynamic Compilation for Transprecision Applications on Heterogeneous PlatformJournal of Low Power Electronics and Applications10.3390/jlpea1103002811:3(28)Online publication date: 29-Jun-2021
  • (2021)FPL: fast Presburger arithmetic through transprecisionProceedings of the ACM on Programming Languages10.1145/34855395:OOPSLA(1-26)Online publication date: 15-Oct-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
November 2020
3108 pages
EISSN:2475-1421
DOI:10.1145/3436718
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020
Published in PACMPL Volume 4, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Linear Programming
  2. Presburger Arithmetic
  3. Simplex
  4. Transprecision

Qualifiers

  • Research-article

Funding Sources

  • Schweizerischer Nationalfonds zur Foerderung der Wissenschaftlichen Forschung

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)254
  • Downloads (Last 6 weeks)23
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Dynamic Compilation for Transprecision Applications on Heterogeneous PlatformJournal of Low Power Electronics and Applications10.3390/jlpea1103002811:3(28)Online publication date: 29-Jun-2021
  • (2021)FPL: fast Presburger arithmetic through transprecisionProceedings of the ACM on Programming Languages10.1145/34855395:OOPSLA(1-26)Online publication date: 15-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media