Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1810085.1810125acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Static reuse distances for locality-based optimizations in MATLAB

Published: 02 June 2010 Publication History

Abstract

The problem of modeling memory locality of applications to guide compiler optimizations in a systematic manner is an important unsolved problem, made even more significant with the advent of multi-core and many-core architectures. We describe an approach based on a novel source-level metric, called static reuse distance, to model the memory behavior of applications written in matlab. We use matlab as a representative language that lets end-users express their algorithms precisely, but at a relatively high level. Matlab's "high-level" characteristics allow the static analysis to focus on large objects, such as arrays, without losing accuracy due to processor-specific layout of scalar values in memory. We present an efficient algorithm to compute static reuse distances using an extended version of dependence graphs. Our approach differs from earlier similar attempts in three important aspects: it targets high-level programming systems characterized by heavy use of libraries; it works on full programs, instead of being confined to loops; and it integrates practical mechanisms to handle separately compiled procedures as well as pre-compiled library procedures that are only available in binary form.
We study matlab code, taken from real programs, to demonstrate the effectiveness of our model. Finally, we present some applications of our approach to program transformations that are known to be important in matlab, but are expected to be relevant to other similar high level languages as well.

References

[1]
P. S. Abrams. An APL Machine. Doctoral dissertation, Stanford University, Stanford Linear Accelerator Center, Stanford, California, USA, Feb. 1970.
[2]
AMD Core Math Library (ACML). On the web. http://developer.amd.com/tools/acml/Pages/default.aspx.
[3]
D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler Transformations for High-Performance Computing. ACM Computing Surveys, 26(4):345--420, Dec. 1994.
[4]
K. Beyls and E. H. D'Hollander. Refactoring for Data Locality. Computer, 42(2):62--71, Feb. 2009.
[5]
N. Birkbeck, J. Levesque, and J. N. Amaral. A Dimension Abstraction Approach to Vectorization in Matlab. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 115--130, 2007.
[6]
C. Caşcaval and D. A. Padua. Estimating Cache Misses and Locality Using Stack Distances. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS), pages 150--159, 2003.
[7]
A. Chauhan and K. Kennedy. Reducing and Vectorizing Procedures for Telescoping Languages. International Journal of Parallel Programming, 30(4):291--315, Aug. 2002.
[8]
C. Ding and K. Kennedy. The Memory Bandwidth Bottleneck and its Amelioration by a Compiler. In Proceedings of the 14th IEEE International Parallel and Distributed Processing Symposium, 2000.
[9]
C. Ding and Y. Zhong. Predicting Whole-Program Locality through Reuse Distance Analysis. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 245--257, 2003.
[10]
M. Frigo and S. G. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1998.
[11]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-Oblivious Algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, 1999.
[12]
The GOTO BLAS. On the web. http://www.tacc.utexas.edu/resources/software/#blas.
[13]
P. Gottschling, D. S. Wise, and M. D. Adams. Representation-Transparent Matrix Algorithms with Scalable Performance. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS), pages 116--125, 2007.
[14]
Intel Math Kernel Library. On the web. http://www.intel.com/cd/software/products/asmo-na/eng/307757.htm.
[15]
M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Caşcaval. How Much Parallelism is There in Irregular Applications? In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 3--14, 2009.
[16]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[17]
C. C. McGeoch. Experimental Algorithmics. Communications of the ACM, 50(11):27--31, Nov. 2007.
[18]
L. D. Rose and D. Padua. Techniques for the Translation of MATLAB Programs into Fortran 90. ACM Transactions on Programming Languages and Systems, 21(2):286--323, Mar. 1999.
[19]
G. Roth and K. Kennedy. Dependence Analysis of Fortran90 Array Syntax. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'96), 1996.
[20]
C.-Y. Shei, A. Chauhan, and S. Shaw. Compile-time disambiguation of MATLAB types through concrete interpretation with automatic run-time fallback. In Proceedings of the 16th annual IEEE International Conference on High Performance Computing (HiPC), 2009.
[21]
R. van Beusekom. A Vectorizer for Octave. Masters thesis, technical report number INF/SRC_04_53, Utrecht University, Center for Software Technology, Institute of Information and Computing Sciences, Utrecht, The Netherlands, Feb. 2005.
[22]
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (SC '98), 1998.
[23]
M. E. Wolf and M. S. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, pages 30--44, 1991.
[24]
K. Yotov, T. Roeder, K. Pingali, J. Gunnels, and F. Gustavson. An Experimental Comparison of Cache-Oblivious and Cache-Conscious Programs. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 93--104, 2007.

Cited By

View all
  • (2024)Static Reuse Profile Estimation for Array ApplicationsProceedings of the International Symposium on Memory Systems10.1145/3695794.3695817(235-244)Online publication date: 30-Sep-2024
  • (2023)LLVM Static Analysis for Program Characterization and Memory Reuse Profile EstimationProceedings of the International Symposium on Memory Systems10.1145/3631882.3631885(1-6)Online publication date: 2-Oct-2023
  • (2020)A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance AnalysisACM Transactions on Design Automation of Electronic Systems10.1145/339818925:6(1-26)Online publication date: 2-Sep-2020
  • Show More Cited By

Index Terms

  1. Static reuse distances for locality-based optimizations in MATLAB

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
    June 2010
    365 pages
    ISBN:9781450300186
    DOI:10.1145/1810085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MATLAB
    2. compilers
    3. locality
    4. memory hierarchy

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICS'10
    Sponsor:
    ICS'10: International Conference on Supercomputing
    June 2 - 4, 2010
    Ibaraki, Tsukuba, Japan

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Static Reuse Profile Estimation for Array ApplicationsProceedings of the International Symposium on Memory Systems10.1145/3695794.3695817(235-244)Online publication date: 30-Sep-2024
    • (2023)LLVM Static Analysis for Program Characterization and Memory Reuse Profile EstimationProceedings of the International Symposium on Memory Systems10.1145/3631882.3631885(1-6)Online publication date: 2-Oct-2023
    • (2020)A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance AnalysisACM Transactions on Design Automation of Electronic Systems10.1145/339818925:6(1-26)Online publication date: 2-Sep-2020
    • (2018)Locality analysis through static parallel samplingACM SIGPLAN Notices10.1145/3296979.319240253:4(557-570)Online publication date: 11-Jun-2018
    • (2018)Locality analysis through static parallel samplingProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192402(557-570)Online publication date: 11-Jun-2018
    • (2017)Thread Data Sharing in CacheACM SIGPLAN Notices10.1145/3155284.301875952:8(103-115)Online publication date: 26-Jan-2017
    • (2017)Thread Data Sharing in CacheProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018759(103-115)Online publication date: 26-Jan-2017
    • (2016)Compiler-Directed Data Locality Optimization in MATLABProceedings of the 19th International Workshop on Software and Compilers for Embedded Systems10.1145/2906363.2906378(6-9)Online publication date: 23-May-2016
    • (2014)Performance Metrics and Models for Shared CacheJournal of Computer Science and Technology10.1007/s11390-014-1460-729:4(692-712)Online publication date: 4-Jul-2014
    • (2013)HOTLACM SIGPLAN Notices10.1145/2499368.245115348:4(343-356)Online publication date: 16-Mar-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media