Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Open access

METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

Published: 01 April 2007 Publication History

Abstract

With the diverging improvements in CPU speeds and memory access latencies, detecting and removing memory access bottlenecks becomes increasingly important. In this work we present METRIC, a software framework for isolating and understanding such bottlenecks using partial access traces. METRIC extracts access traces from executing programs without special compiler or linker support. We make four primary contributions. First, we present a framework for extracting partial access traces based on dynamic binary rewriting of the executing application. Second, we introduce a novel algorithm for compressing these traces. The algorithm generates constant space representations for regular accesses occurring in nested loop structures. Third, we use these traces for offline incremental memory hierarchy simulation. We extract symbolic information from the application executable and use this to generate detailed source-code correlated statistics including per-reference metrics, cache evictor information, and stream metrics. Finally, we demonstrate how this information can be used to isolate and understand memory access inefficiencies. This illustrates a potential advantage of METRIC over compile-time analysis for sample codes, particularly when interprocedural analysis is required.

References

[1]
Bala, V., Duesterwald, E., and Banerjia, S. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 1--12.
[2]
Buck, B. and Hollingsworth, J. 2000a. An API for runtime code patching. Int. J. High Perform. Comput. Appl. 14, 4, 317--329.
[3]
Buck, B. and Hollingsworth, J. 2000b. Using hardware performance monitors to isolate memory bottlenecks. In Supercomput., 64--65.
[4]
Burrows, M. and Wheeler, D. J. 1994. A block-sorting lossless data compression algorithm. Tech. Rep. 124.
[5]
Burtscher, M. 2004a. Vpc3: A fast and effective trace-compression algorithm. In Proceedings of the SIGMETRICS Conference on Measurement and Modeling of Computer Systems (New York). 167--176.
[6]
Burtscher, M. 2004b. Vpc3 source code. http://www.csl.cornell.edu/burtscher/research/tracecom pression/.
[7]
Chatterjee, S., Parker, E., Hanlon, P., and Lebeck, A. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 286--297.
[8]
Chilimbi, T. 2001. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 191--202.
[9]
Chilimbi, T., Davidson, B., and Larus, J. 1999. Cache-Conscious structure definition. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 13--24.
[10]
Chilimbi, T., Hill, M., and Larus, J. 1999b. Cache-Conscious structure layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 1--12.
[11]
Cifuentes, C. and Emmerik, M. 2000. UQBT: Adaptable binary translation at low cost. Comput. 33, 3 (Mar.), 60--66.
[12]
DeRose, L., Ekanadham, K., Hollingsworth, J. K., and Sbaraglia, S. 2002. SIGMA: A simulator infrastructure to guide memory analysis. In Proceedings of the ACM/IEEE SC Conference.
[13]
Ding, C. and Zhong, Y. 2003. Predicting whole-program locality through reuse distance analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.
[14]
Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst. 21, 4, 703--746.
[15]
Grant, B., Philipose, M., Mock, M., Chambers, C., and Eggers, S. 1999. An evaluation of staged run-time optimizations in dyc. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 293--304.
[16]
Havlak, P. and Kennedy, K. 1991. An implementation of interprocedural bounded regular section analysis. IEEE Trans. Parallel Distrib. Syst. 2, 3 (Jul.), 350--360.
[17]
Horowitz, M., Martonosi, M., Mowry, T., and Smith, M. 1996. Informing memory operations: Providing memory performance feedback in modern processors. In Proceedings of the International Symposium on Computer Architecure, 260--270.
[18]
Intel. 2004. Intel Itanium2 Processor Reference Manual for Software Development and Optimization Vol.1, Intel, Santa Clara, CA.
[19]
Larus, J. and Ball, T. 1994. Rewriting executable files to measure program behavior. Softw. Pract. Experi. 24, 2 (Feb.), 197--218.
[20]
Larus, J. and Schnarr, E. 1995. EEL: Machine-Independent executable editing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 291--300.
[21]
Lebeck, A. and Wood, D. 1994. Cache profiling and the SPEC benchmarks: A case study. Comput. 27, 10 (Oct.), 15--26.
[22]
Lebeck, A. and Wood, D. 1997. Active memory: A new abstraction for memory system simulation. ACM Trans. Model. Comput. Simul. 7, 1 (Jan.), 42--77.
[23]
Manning, N. 2005. Sequitur source code. http://sequence.rutgers.edu/sequitur/sequitur.cc.
[24]
Marathe, J. and Mueller, F. 2002. Detecting memory performance bottlenecks via binary rewriting. In Proceedings of the Workshop on Binary Translation.
[25]
Marathe, J., Mueller, F., and de Supinski, B. R. 2005. A hybrid hardware/software approach to efficiently determine cache coherence bottlenecks. In International Conference on Supercomputing. accepted.
[26]
Marathe, J., Mueller, F., Mohan, T., de Supinski, B. R., McKee, S. A., and Yoo, A. 2003. METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In Proceedings of the International Symposium on Code Generation and Optimization, 289--300.
[27]
Marathe, J., Nagarajan, A., and Mueller, F. 2004. Detailed cache coherence characterization for OpenMP benchmarks. In Proceedings of the International Conference on Supercomputing, 287--297.
[28]
Mellor-Crummey, J., Fowler, R., and Whalley, D. 2001. Tools for application-oriented performance tuning. In Proceedings of the International Conference on Supercomputing, 154--165.
[29]
Mohan, T., de Supinski, B. R., McKee, S. A., Mueller, F., Yoo, A., and Schulz, M. 2003. Identifying and exploiting spatial regularity in data memory references. Supercomput.
[30]
Mowry, T. and Luk, C.-K. 1997. Predicting data cache misses in non-numeric applications through correlation profiling. In MICRO-30, 314--320.
[31]
Mueller, F., Mohan, T., de Supinski, B. R., McKee, S. A., and Yoo, A. 2001. Partial data traces: Efficient generation and representation. In Workshop on Binary Translation. IEEE Technical Committee on Computer Architecture Newsletter.
[32]
Nevill-Manning, C. G. and Witten, I. H. 1997a. Compression and explanation using hierarchical grammars. Comput. J. 40, 2--3.
[33]
Nevill-Manning, C. G. and Witten, I. H. 1997b. Linear-Time, incremental hierarchy inference for compression. In Proceedings of the Data Compression Conference, 3--11.
[34]
Seward, J. 2005. Libbzip2 source code. http://www.bzip.org/index.html.
[35]
Sites, R., Chernoff, A., Kirk, M., Marks, M., and Robinson, S. 1993. Binary translation. Commun. ACM 36, 2 (Feb.), 69--81.
[36]
Srivastava, A. and Eustace, A. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 196--205.
[37]
Tendler, J. M., Dodson, J. S., Fields, Jr., J. S., Le, H., and Sinharoy, B. 2002. POWER4 system microarchitecture. IBM J. Res. Develop. 46, 1 (Jan.), 5--25.
[38]
Ung, D. and Cifuentes, C. 2000. Optimising hot paths in a dynamic binary translator. In Proceedings of the Workshop on Binary Translation.
[39]
Vetter, J. and Mueller, F. 2003. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel Distrib. Comput. 63, 9 (Sept.), 853--865.
[40]
Weikle, D., McKee, S. A., Skadron, K., and Wulf, W. 2000. Caches as filters: A framework for the analysis of caching systems. In Proceedings of the Grace Murray Hopper Conference.
[41]
Wulf, W. 1992. Evaluation of the WM architecture. In Proceedings of the International Symposium on Computer Architecture, 382--390.
[42]
Zhong, Y., Orlovich, M., Shen, X., and Ding, C. 2004. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.

Cited By

View all

Index Terms

  1. METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 29, Issue 2
      April 2007
      327 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/1216374
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 April 2007
      Published in TOPLAS Volume 29, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Dynamic binary rewriting
      2. cache analysis
      3. data trace compression
      4. data trace generation
      5. program instrumentation

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)86
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)MerchandiserProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577497(204-217)Online publication date: 25-Feb-2023
      • (2021)CinnamonProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370313(103-114)Online publication date: 27-Feb-2021
      • (2020)QEMTraceProceedings of the 2020 Summer Simulation Conference10.5555/3427510.3427518(1-12)Online publication date: 20-Jul-2020
      • (2019)AlleriaACM Transactions on Embedded Computing Systems10.1145/335819318:5s(1-22)Online publication date: 8-Oct-2019
      • (2019)Rewriting toward trace coverage analysis of symmetric systemsInnovations in Systems and Software Engineering10.1007/s11334-019-00348-015:3-4(191-206)Online publication date: 1-Sep-2019
      • (2017)Stack-Size Sensitive On-Chip Memory Backup for Self-Powered Nonvolatile ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.266660636:11(1804-1816)Online publication date: Nov-2017
      • (2016)Configurable and Efficient Memory Access Tracing via Selective Expression-Based x86 Binary Instrumentation2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2016.69(261-270)Online publication date: Sep-2016
      • (2015)Accurate and Efficient Object Tracing for Java ApplicationsProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688037(51-62)Online publication date: 28-Jan-2015
      • (2013)Elastic and scalable tracing and accurate replay of non-deterministic eventsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465001(59-68)Online publication date: 10-Jun-2013
      • (2013)Using memory profile analysis for automatic synthesis of pointers codeACM Transactions on Embedded Computing Systems10.1145/2442116.244211812:3(1-21)Online publication date: 8-Apr-2013
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media