research-article

Open access

PROMPT: A Fast and Extensible Memory Profiling Framework

Authors:

Sotiris Apostolakis,

Simone Campanoni,

David I. AugustAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue OOPSLA1

Article No.: 110, Pages 449 - 473

https://doi.org/10.1145/3649827

Published: 29 April 2024 Publication History

Abstract

Memory profiling captures programs’ dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT’s impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.

References

[1]

Abseil Team. 2023. Abseil/Abseil-CPP: Abseil Common Libraries (C++). https://github.com/abseil/abseil-cpp

[2]

Sotiris Apostolakis, Ziyang Xu, Greg Chan, Simone Campanoni, and David I. August. 2020. Perspective: A Sensible Approach to Speculative Automatic Parallelization. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 351–367. isbn:978-1-4503-7102-5 https://doi.org/10.1145/3373376.3378458

Digital Library

[3]

Sotiris Apostolakis, Ziyang Xu, Zujun Tan, Greg Chan, Simone Campanoni, and David I. August. 2020. SCAF: a speculation-aware collaborative dependence analysis framework. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 638–654. isbn:978-1-4503-7613-6 https://doi.org/10.1145/3385412.3386028

Digital Library

[4]

Matthew Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, and David August. 2007. Revisiting the Sequential Programming Model for Multi-Core. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 69–84. isbn:978-0-7695-3047-5 https://doi.org/10.1109/MICRO.2007.20

[5]

Derek Bruening, Qin Zhao, and Saman Amarasinghe. 2012. Transparent dynamic instrumentation. 47, 7 (2012), 133–144. issn:0362-1340, 1558-1160 https://doi.org/10.1145/2365864.2151043

Digital Library

[6]

James Bucek, Klaus-Dieter Lange, and Jóakim V. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. ACM, 41–42. isbn:978-1-4503-5629-9 https://doi.org/10.1145/3185768.3185771

Digital Library

[7]

Dehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: automatic feedback-directed optimization for warehouse-scale applications. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, 12–23. isbn:978-1-4503-3778-6 https://doi.org/10.1145/2854038.2854044

Digital Library

[8]

Tong Chen, Jin Lin, Xiaoru Dai, Wei-Chung Hsu, and Pen-Chung Yew. 2004. Data Dependence Profiling for Speculative Optimizations. In Compiler Construction, Evelyn Duesterwald (Ed.). 2985, Springer Berlin Heidelberg, 57–72. isbn:978-3-540-21297-3 978-3-540-24723-4 https://doi.org/10.1007/978-3-540-24723-4_5

[9]

D. A. Connors. 1997. Memory Profiling for Directing Data Speculative Optimizations and Scheduling. http://impact.crhc.illinois.edu/Shared/Thesis/dconnors-thesis.pdf

[10]

Albert Danial. 2021. cloc: v1.92. https://github.com/AlDanial/cloc

[11]

Enrico Armenio Deiana, Brian Suchy, Michael Wilkins, Brian Homerding, Tommy McMichen, Katarzyna Dunajewski, Peter Dinda, Nikos Hardavellas, and Simone Campanoni. 2023. Program State Element Characterization. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization. ACM, 199–211. isbn:9798400701016 https://doi.org/10.1145/3579990.3580011

Digital Library

[12]

DynamoRio Team. 2023. drcachesim. https://dynamorio.org/page_drcachesim.html Publication Title: Tracing and analysis framework

[13]

F. Gabbay and A. Mendelson. 1997. Can program profiling support value prediction? In Proceedings of 30th Annual International Symposium on Microarchitecture. IEEE Comput. Soc, 270–280. isbn:978-0-8186-7977-3 https://doi.org/10.1109/MICRO.1997.645817

[14]

GCC Team. 2023. GCC, the GNU compiler collection. https://gcc.gnu.org/

[15]

Gregory Popovitch. 2023. GREG7MDP/parallel-hashmap: A family of header-only, very fast and memory-friendly hashmap and BTREE containers. https://github.com/greg7mdp/parallel-hashmap

[16]

Thomas B Jablin, Yun Zhang, James A Jablin, Jialu Huang, Hanjun Kim, and David I August. 2010. Liberty queues for epic architectures. In Proceedings of the Eigth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC). https://liberty.princeton.edu/Publications/epic10_queues.pdf

[17]

Nick P. Johnson, Hanjun Kim, Prakash Prabhu, Ayal Zaks, and David I. August. 2012. Speculative separation for privatization and reductions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 359–370. isbn:978-1-4503-1205-9 https://doi.org/10.1145/2254064.2254107

Digital Library

[18]

Alain Ketterlin and Philippe Clauss. 2012. Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 437–448. isbn:978-1-4673-4819-5 978-0-7695-4924-8 https://doi.org/10.1109/MICRO.2012.47

Digital Library

[19]

Changsu Kim, Juhyun Kim, Juwon Kang, Jae W. Lee, and Hanjun Kim. 2017. Context-Aware Memory Profiling for Speculative Parallelism. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC). IEEE, 328–337. isbn:978-1-5386-2293-3 https://doi.org/10.1109/HiPC.2017.00045

[20]

Minjang Kim, Hyesoon Kim, and Chi-Keung Luk. 2010. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 535–546. isbn:978-1-4244-9071-4 https://doi.org/10.1109/MICRO.2010.49

Digital Library

[21]

Rakesh Krishnaiyer, Emre Kultursay, Pankaj Chawla, Serguei Preis, Anatoly Zvezdin, and Hideki Saito. 2013. Compiler-Based Data Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. IEEE, 1575–1586. isbn:978-0-7695-4979-8 https://doi.org/10.1109/IPDPSW.2013.231

Digital Library

[22]

J.R. Larus. 1993. Loop-level parallelism in numeric and symbolic programs. 4, 7 (1993), 812–826. issn:10459219 https://doi.org/10.1109/71.238302

Digital Library

[23]

C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 75–86. isbn:978-0-7695-2102-2 https://doi.org/10.1109/CGO.2004.1281665

[24]

Liberty Research Group. 2022. Collaborative Parallelization Framework Compiler. https://github.com/PrincetonUniversity/cpf

[25]

Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. 2006. POSH: a TLS compiler that exploits program structure. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 158–167. isbn:978-1-59593-189-4 https://doi.org/10.1145/1122971.1122997

Digital Library

[26]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. 40, 6 (2005), 190–200. issn:0362-1340, 1558-1160 https://doi.org/10.1145/1064978.1065034

Digital Library

[27]

Thomas Mason. 2009. Lampview: A loop-aware toolset for facilitating parallelization. https://liberty.princeton.edu/Publications/mastersthesis_tmason.pdf

[28]

Nicolas Morew, Mohammad Norouzi, Ali Jannesari, and Felix Wolf. 2020. Skipping Non-essential Instructions Makes Data-Dependence Profiling Faster. In Euro-Par 2020: Parallel Processing, Maciej Malawski and Krzysztof Rzadca (Eds.). 12247, Springer International Publishing, 3–17. isbn:978-3-030-57674-5 978-3-030-57675-2 https://doi.org/10.1007/978-3-030-57675-2_1

Digital Library

[29]

Tipp Moseley, Alex Shye, Vijay Janapa Reddi, Dirk Grunwald, and Ramesh Peri. 2007. Shadow Profiling: Hiding Instrumentation Costs with Parallelism. In International Symposium on Code Generation and Optimization (CGO’07). IEEE, 198–208. isbn:978-0-7695-2764-2 https://doi.org/10.1109/CGO.2007.35

Digital Library

[30]

mTrace Team. 2013. MTRACE. http://lacasa.uah.edu/index.php/software-data/mtrace-tools-and-traces

[31]

Nicholas Nethercote and Julian Seward. 2007. How to shadow every byte of memory used by a program. In Proceedings of the 3rd international conference on Virtual execution environments. ACM, 65–74. isbn:978-1-59593-630-1 https://doi.org/10.1145/1254810.1254820

Digital Library

[32]

Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 89–100. isbn:978-1-59593-633-2 https://doi.org/10.1145/1250734.1250746

Digital Library

[33]

Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2–14. isbn:978-1-72811-436-1 https://doi.org/10.1109/CGO.2019.8661201

[34]

Arun Kejariwal Peng Wu and Calin Cascaval. 2008. Compiler-Driven Dependence Profiling to Guide Program Parallelization. In LCPC. 232–248. https://doi.org/10.1007/978-3-540-89740-8_16

Digital Library

[35]

PROMPT Team. 2024. PROMPT memory profiling system. https://github.com/PrincetonUniversity/PROMPT

[36]

Qiang Wu, A. Pyatakov, A. Spiridonov, E. Raman, D.W. Clark, and D.I. August. 2004. Exposing memory access regularities using object-relative memory profiling. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 315–323. isbn:978-0-7695-2102-2 https://doi.org/10.1109/CGO.2004.1281684

[37]

Ram Rangan and David I August. 2006. Amortizing software queue overhead for pipelined interthread communication. In Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism (PMUP). 1–5. https://liberty.princeton.edu/Publications/pmup06_pmtsync.pdf

[38]

L. Rauchwerger and D.A. Padua. 1999. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. 10, 2 (1999), 160–180. issn:10459219 https://doi.org/10.1109/71.752782

Digital Library

[39]

Thomas Reps and Todd Turnidge. 1996. Program specialization via program slicing. In Partial Evaluation, Olivier Danvy, Robert Glück, and Peter Thiemann (Eds.). 1110, Springer Berlin Heidelberg, 409–429. isbn:978-3-540-61580-4 978-3-540-70589-5 https://doi.org/10.1007/3-540-61580-6_20

[40]

Yukinori Sato, Yasushi Inoguchi, and Tadao Nakamura. 2012. Whole program data dependence profiling to unveil parallel regions in the dynamic execution. In 2012 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 69–80. isbn:978-1-4673-4532-3 978-1-4673-4531-6 https://doi.org/10.1109/IISWC.2012.6402902

Digital Library

[41]

Ulrik P. Schultz, Julia L. Lawall, and Charles Consel. 2003. Automatic program specialization for Java. 25, 4 (2003), 452–499. issn:0164-0925, 1558-4593 https://doi.org/10.1145/778559.778561

Digital Library

[42]

Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In 2012 USENIX annual technical conference (USENIX ATC 12). https://www.usenix.org/conference/ usenixfederatedconferencesweek/addresssanitizer-fast-address- sanity-checker

[43]

J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2000. A scalable approach to thread-level speculation. 28, 2 (2000), 1–12. issn:0163-5964 https://doi.org/10.1145/342001.339650

Digital Library

[44]

Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast detector of uninitialized memory use in C++. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 46–55. isbn:978-1-4799-8161-8 https://doi.org/10.1109/CGO.2015.7054186

[45]

K. Swaminathan, G. Lakshminarayanan, and Seok-Bum Ko. 2012. High Speed Generic Network Interface for Network on Chip Using Ping Pong Buffers. In 2012 International Symposium on Electronic System Design (ISED). IEEE, 72–76. isbn:978-1-4673-4704-4 978-0-7695-4902-6 https://doi.org/10.1109/ISED.2012.11

Digital Library

[46]

Jakub Szuppe. 2016. Boost.Compute: A parallel computing library for C++ based on OpenCL. In Proceedings of the 4th International Workshop on OpenCL. ACM, 1–39. isbn:978-1-4503-4338-1 https://doi.org/10.1145/2909437.2909454

Digital Library

[47]

Sriraman Tallam and Rajiv Gupta. 2007. Unified control flow and data dependence traces. 4, 3 (2007), 19. issn:1544-3566, 1544-3973 https://doi.org/10.1145/1275937.1275943

Digital Library

[48]

William Thies, Vikram Chandrasekhar, and Saman Amarasinghe. 2007. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 356–369. isbn:978-0-7695-3047-5 https://doi.org/10.1109/MICRO.2007.38

Digital Library

[49]

Rajeshwar Vanka and James Tuck. 2012. Efficient and accurate data dependence profiling using software signatures. In Proceedings of the Tenth International Symposium on Code Generation and Optimization. ACM, 186–195. isbn:978-1-4503-1206-6 https://doi.org/10.1145/2259016.2259041

Digital Library

[50]

Steven Wallace and Kim Hazelwood. 2007. SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance. In International Symposium on Code Generation and Optimization (CGO’07). IEEE, 209–220. isbn:978-0-7695-2764-2 https://doi.org/10.1109/CGO.2007.37

Digital Library

[51]

Mingzhe Wang, Jie Liang, Chijin Zhou, Zhiyong Wu, Xinyi Xu, and Yu Jiang. 2022. Odin: on-demand instrumentation with on-the-fly recompilation. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ACM, 1010–1024. isbn:978-1-4503-9265-5 https://doi.org/10.1145/3519939.3523428

Digital Library

[52]

Xiangyu Zhang and R. Gupta. 2004. Whole Execution Traces. In 37th International Symposium on Microarchitecture (MICRO-37’04). IEEE, 105–116. isbn:978-0-7695-2126-8 https://doi.org/10.1109/MICRO.2004.37

Digital Library

[53]

Ziyang Xu, Yebin Chon, Yian Su, Zujun Tan, Sotiris Apostolakis, Simone Campanoni, and David August. 2024. Artifact for Paper "PROMPT: A Fast and Extensible Memory Profiling Framework". https://doi.org/10.5281/zenodo.10783906

[54]

Hongtao Yu and Zhiyuan Li. 2012. Fast loop-level data dependence profiling. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 37–46. isbn:978-1-4503-1316-2 https://doi.org/10.1145/2304576.2304584

Digital Library

[55]

Hongtao Yu and Zhiyuan Li. 2012. Multi-slicing: a compiler-supported parallel approach to data dependence profiling. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. ACM, 23–33. isbn:978-1-4503-1454-1 https://doi.org/10.1145/2338965.2336756

Digital Library

[56]

Xiangyu Zhang, Armand Navabi, and Suresh Jagannathan. 2009. Alchemist: A Transparent Dependence Distance Profiling Infrastructure. In 2009 International Symposium on Code Generation and Optimization. IEEE, 47–58. isbn:978-0-7695-3576-0 https://doi.org/10.1109/CGO.2009.15

Digital Library

[57]

Qin Zhao, Derek Bruening, and Saman Amarasinghe. 2010. Efficient memory shadowing for 64-bit architectures. 45, 8 (2010), 93–102. issn:0362-1340, 1558-1160 https://doi.org/10.1145/1837855.1806667

Digital Library

[58]

Qin Zhao, Joon Edward Sim, Weng-Fai Wong, and Larry Rudolph. 2006. DEP: detailed execution profile. In Proceedings of the 15th international conference on Parallel architectures and compilation techniques. ACM, 154–163. isbn:978-1-59593-264-8 https://doi.org/10.1145/1152154.1152180

Digital Library

Index Terms

PROMPT: A Fast and Extensible Memory Profiling Framework
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program analysis

Recommendations

Recursive data structure profiling
MSP '05: Proceedings of the 2005 workshop on Memory system performance

As the processor-memory performance gap increases, so does the need for aggressive data structure optimizations to reduce memory access latencies. Such optimizations require a better understanding of the memory behavior of programs. We propose a ...
Memory profiling on shared-memory multiprocessors
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Memory profiling is the process of characterizing a program's memorybehavior by observing and recording its response to specific inputsets. Relevant aspects of the program's memory behavior maythen be used to guide memory optimizations in an ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue OOPSLA1

April 2024

1492 pages

EISSN:2475-1421

DOI:10.1145/3554316

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024

Published in PACMPL Volume 8, Issue OOPSLA1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)
DOE U.S. Department of Energy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
236
Total Downloads

Downloads (Last 12 months)236
Downloads (Last 6 weeks)60

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents