Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Intercepting Functions for Memoization: A Case Study Using Transcendental Functions

Published: 24 June 2015 Publication History

Abstract

Memoization is the technique of saving the results of executions so that future executions can be omitted when the input set repeats. Memoization has been proposed in previous literature at the instruction, basic block, and function levels using hardware, as well as pure software--level approaches including changes to programming language.
In this article, we focus on software memoization for procedural languages such as C and Fortran at the granularity of a function. We propose a simple linker-based technique for enabling software memoization of any dynamically linked pure function by function interception and illustrate our framework using a set of computationally expensive pure functions—the transcendental functions. Transcendental functions are those that cannot be expressed in terms of a finite sequence of algebraic operations (trigonometric functions, exponential functions, etc.) and hence are computationally expensive.
Our technique does not need the availability of source code and thus can even be applied to commercial applications, as well as applications with legacy codes. As far as users are concerned, enabling memoization is as simple as setting an environment variable. Our framework does not make any specific assumptions about the underlying architecture or compiler toolchains and can work with a variety of current architectures. We present experimental results for a x86-64 platform using both gcc and icc compiler toolchains, and an ARM Cortex-A9 platform using gcc. Our experiments include a mix of real-world programs and standard benchmark suites: SPEC and Splash2x. On standard benchmark applications that extensively call the transcendental functions, we report memoization benefits of up to 50% on Intel Ivy Bridge and up to 10% on ARM Cortex-A9.
Memoization was able to regain a performance loss of 76% in bwaves due to a known performance bug in the GNU implementation of the pow function. The same benchmark on ARM Cortex-A9 benefited by more than 200%.

Supplementary Material

TACO1202-18 (taco1202-18.pdf)
Slide deck associated with this paper

References

[1]
Carlos Alvarez, Jesus Corbal, and Mateo Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers 54, 7, 922--927.
[2]
Vishal Aslot, Max Domeika, Rudolf Eigenmann, Greg Gaertner, Wesley B. Jones, and Bodo Parady. 2001. SPEComp: A new benchmark suite for measuring parallel computer performance. In Proceedings of the Workshop on OpenMP Applications and Tools. 1--10.
[3]
Sandra Barsky. n.d. This Is a Program to Solve Nonlinear 2-D PDE Using One-Step Linearization. Retrieved May 4, 2015, from http://www.mgnet.org/mgnet/Codes/barsky/nl.c.
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81.
[5]
Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO’03). 265--275.
[6]
Mamadou Ciss, Nicolas Parisey, Fabrice Moreau, Charles-Antoine Dedryver, and Jean-Sébastien Pierre. 2013. A spatiotemporal model for predicting grain aphid population dynamics and optimizing insecticide sprays at the scale of continental France. Environmental Science and Pollution Research 21, 7, 4819--4827.
[7]
Daniel Citron, Dror Feitelson, and Larry Rudolph. 1998. Accelerating multi-media processing by implementing memoing in multiplication and division units. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, 252--261.
[8]
Daniel A. Connors and Wen-Mei W. Hwu. 1999. Compiler-directed dynamic computation reuse: Rationale and initial results. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32). 158--169.
[9]
Amarildo T. Da Costa, Felipe M. G. França, and Eliseu M. C. Filho. 2000. The dynamic trace memorization reuse technique. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE, Los Alamitos, CA, 92--92. http://www.computer.org/csdl/proceedings/pact/2000/0622/00/06220092.pdf.
[10]
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). 449--460.
[11]
Antonio González, Jordi Tubella, and Carlos Molina. 1999. Trace-level reuse. In Proceedings of the 1999 International Conference on Parallel Processing. IEEE, Los Alamitos, CA, 30--37. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=797385.
[12]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4, 1--17.
[13]
Jian Huang and David J. Lilja. 1999. Exploiting basic block value locality with block reuse. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 106--114. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=744342.
[14]
Intel Corporation. n.d. Intel Turbo Boost Technology. Retrieved May 4, 2015, from http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-technology.html.
[15]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa, and Reddi Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). ACM, New York, NY, 190--200.
[16]
Paul McNamee and Marty Hall. 1998. Developing a tool for memoizing functions in C++. ACM SIGPLAN Notices 33, 8, 17--22.
[17]
Pierre Michaud, André Seznec, Damien Fetis, Yiannakis Sazeides, and Theofanis Constantinou. 2007. A study of thread migration in temperature-constrained multicores. ACM Transactions on Architecture and Code Optimization 4, 2, Article No. 9.
[18]
Uday Reddy. 1988. Objects as closures: Abstract semantics of object-oriented languages. In Proceedings of the 1988 ACM Conference on LISP and Functional Programming (LFP’88). ACM, New York, NY, 289--297.
[19]
Stephen E. Richardson. 1992. Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation. Technical Report. Sun Microsystems, Inc. Mountain View, CA.
[20]
Emmanuel Riou, Erven Rohou, Philippe Clauss, Nabil Hallou, and Alain Ketterlin. 2014. PADRONE: A platform for online profiling, analysis, and optimization. In Proceedings of the International Workshop on Dynamic Compilation Everywhere (DCE’14). http://hal.inria.fr/hal-00917950.
[21]
Hugo Rito and João Cachopo. 2010. Memoization of methods using software transactional memory to track internal state dependencies. In Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java (PPPJ’10). ACM, New York, NY, 89--98.
[22]
James Tuck, Wonsun Ahn, Luis Ceze, and Josep Torrellas. 2008. SoftSig: Software-exposed hardware signatures for code analysis and optimization. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, 145--156.

Cited By

View all
  • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
  • (2023)TransPimLib: Efficient Transcendental Functions for Processing-in-Memory Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00031(235-247)Online publication date: Apr-2023
  • (2022)Data analytics for cybersecurity enhancement of transformer protectionACM SIGEnergy Energy Informatics Review10.1145/3508467.35084691:1(12-19)Online publication date: 3-Jan-2022
  • Show More Cited By

Index Terms

  1. Intercepting Functions for Memoization: A Case Study Using Transcendental Functions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 2
    July 2015
    410 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2775085
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 June 2015
    Accepted: 01 March 2015
    Revised: 01 January 2015
    Received: 01 August 2014
    Published in TACO Volume 12, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Link time optimization
    2. function interception

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • European Research Council Advanced

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)103
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
    • (2023)TransPimLib: Efficient Transcendental Functions for Processing-in-Memory Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00031(235-247)Online publication date: Apr-2023
    • (2022)Data analytics for cybersecurity enhancement of transformer protectionACM SIGEnergy Energy Informatics Review10.1145/3508467.35084691:1(12-19)Online publication date: 3-Jan-2022
    • (2022)Information batteriesACM SIGEnergy Energy Informatics Review10.1145/3508467.35084681:1(1-11)Online publication date: 3-Jan-2022
    • (2022)Energy-efficient acceleration of convolutional neural networks using computation reuseJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2022.102490126:COnline publication date: 1-May-2022
    • (2022)Approximate function memoizationConcurrency and Computation: Practice and Experience10.1002/cpe.720434:23Online publication date: 20-Jul-2022
    • (2021)A methodology and framework for software memoization of functionsProceedings of the 18th ACM International Conference on Computing Frontiers10.1145/3457388.3458668(93-101)Online publication date: 11-May-2021
    • (2020)A study on the performance impact of programmable logic controllers based on enhanced architecture and organizationMicroprocessors and Microsystems10.1016/j.micpro.2020.10308276(103082)Online publication date: Jul-2020
    • (2019)Leveraging Caches to Accelerate Hash Tables and MemoizationProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358272(440-452)Online publication date: 12-Oct-2019
    • (2019)Supporting the Scale-Up of High Performance Application to Pre-Exascale Systems: The ANTAREX Approach2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/EMPDP.2019.8671584(116-123)Online publication date: Feb-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media