Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2304576.2304584acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Fast loop-level data dependence profiling

Published: 25 June 2012 Publication History

Abstract

Execution-driven data dependence profiling has gained significant interest as a tool to compensate the weakness of static data dependence analysis. Although such dependence profiling is valid for specific inputs only, its result can be used in many ways for program parallelization. Unfortunately, traditional hash-based dependence profiling can take tremendous memory and machine time, which severely limits its practical use. In this paper, we propose new compiler-based techniques to perform fast loop-level data dependence profiling. Firstly, using type consistency and alias information, our compiler embeds memory tags into the data structures in the original program such that memory addresses can be efficiently compared for dependence testing. This approach avoids the bytewise hashing overhead in conventional profiling methods. Secondly, we prove that a partial dependence graph obtained from profiling is sufficient for loop-level reordering transformations and parallelization. Such partial dependence graph can be obtained very fast, without having to exhaustively enumerate all dependence edges. Thirdly, our compiler partitions the profiling task into independent slices. Such slices can be profiled in parallel, producing subgraphs which are eventually combined automatically into the complete data dependence graph by the compiler. Experiments show that these techniques significantly reduce the memory use and shorten the profiling time (by an order of magnitude for several SPEC2006 benchmarks). Benchmarks too big to profile at all loop levels by previous methods can now be profiled fully within several hours.

References

[1]
ANDERSEN, L. Program analysis and specialization for the c programming language. PhD Thesis, DIKU, University of Copenhagen (1994).
[2]
Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., and August, D. Revisiting the sequential programming model for multi-core. In MICRO (2007).
[3]
Bruening, D., Garnett, T., and Amarasinghe, S. An infrastructure for adaptive dynamic optimization. In CGO (2003).
[4]
Burtscher, M. Vpc3: a fast and effective trace-compression algorithm. In SIGMETRICS (2004).
[5]
Chen, T., Lin, J., Dai, X., Hsu, W., and Yew, P. Data dependence profiling for speculative optimizations. In CC (2004), Springer.
[6]
Crosthwaite, P., Williams, J., and Sutton, P. Profile driven data-dependency analysis for improved high level language hardware synthesis. In International Conference on Field-Programmable Technology (2009).
[7]
Faxén, K.-F., Popov, K., Jansson, S., and Albertsson, L. Embla - data dependence profiling for parallel programming. In CISIS (2008).
[8]
Garcia, S., Jeon, D., Louie, C. M., and Taylor, M. B. Kremlin: rethinking and rebooting gprof for the multicore age. In PLDI (2011).
[9]
Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979.
[10]
Jeon, D., Garcia, S., Louie, C., Kota Venkata, S., and Taylor, M. B. Kremlin: like gprof, but for parallelization. In PPoPP (2011).
[11]
Kahlon, V. Bootstrapping: a technique for scalable flow and context-sensitive pointer alias analysis. In PLDI (2008).
[12]
Kennedy, K., and Allen, J. R. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., 2002.
[13]
Kim, M., Kim, H., and Luk, C.-K. Sd3: A scalable approach to dynamic data-dependence profiling. In MICRO (2010).
[14]
Korf, R. E. From approximate to optimal solutions: a case study of number partitioning. In Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1 (1995).
[15]
Larus, J. R. Whole program paths. In PLDI (1999).
[16]
Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., and Torrellas, J. Posh: a TLS compiler that exploits program structure. In PPoPP (2006).
[17]
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI (2005).
[18]
Mak, J., Faxén, K.-F., Janson, S., and Mycroft, A. Estimating and exploiting potential parallelism by source-level dependence profiling. In EuroPar (2010).
[19]
Mellor-Crummey, J. M., and LeBlanc, T. J. A software instruction counter. In ASPLOS (1989).
[20]
Moseley, T., Shye, A., Reddi, V. J., Grunwald, D., and Peri, R. Shadow profiling: Hiding instrumentation costs with parallelism. In CGO (2007).
[21]
Nethercote, N., and Seward, J. How to shadow every byte of memory used by a program. In Proceedings of the 3rd international conference on Virtual execution environments (2007), VEE.
[22]
Nethercote, N., and Seward, J. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI (2007).
[23]
Rul, S., Vandierendonck, H., and De Bosschere, K. A profile-based tool for finding pipeline parallelism in sequential programs. Parallel Comput. 36 (September 2010).
[24]
Steensgaard, B. Points-to analysis in almost linear time. In POPL (1996).
[25]
Tallam, S., and Gupta, R. Unified control flow and data dependence traces. ACM Trans. Archit. Code Optim. 4 (September 2007).
[26]
Tallam, S., Gupta, R., and Zhang, X. Extended whole program paths. In PACT (2005).
[27]
Thies, W., Chandrasekhar, V., and Amarasinghe, S. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO (2007).
[28]
Tournavitis, G., Wang, Z., Franke, B., and O'Boyle, M. F. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In PLDI (2009).
[29]
Vandierendonck, H., Rul, S., and De Bosschere, K. The Paralax infrastructure: automatic parallelization with a helping hand. In PACT (2010).
[30]
Wallace, S., and Hazelwood, K. Superpin: Parallelizing dynamic instrumentation for real-time performance. In CGO (2007).
[31]
Wu, P., Kejariwal, A., and Caşcaval, C. Compiler-Driven Dependence Profiling to Guide Program Parallelization. In LCPC (2008).
[32]
Yu, H., Xue, J., Huo, W., Feng, X., and Zhang, Z. Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code. In CGO (2010).
[33]
Zhang, X., and Gupta, R. Whole execution traces and their applications. ACM Trans. Archit. Code Optim. 2 (September 2005).
[34]
Zhang, X., Navabi, A., and Jagannathan, S. Alchemist: A transparent dependence distance profiling infrastructure. In CGO (2009).
[35]
Zhao, Q., Bruening, D., and Amarasinghe, S. Umbra: efficient and scalable memory shadowing. In CGO (2010).
[36]
Zhao, Q., Cutcutache, I., and Wong, W.-F. Pipa: pipelined profiling and analysis on multi-core systems. In CGO (2008).

Cited By

View all
  • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
  • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
  • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing
June 2012
400 pages
ISBN:9781450313162
DOI:10.1145/2304576
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data dependence
  2. instrumentation
  3. profiling
  4. software parallelization

Qualifiers

  • Research-article

Conference

ICS'12
Sponsor:
ICS'12: International Conference on Supercomputing
June 25 - 29, 2012
San Servolo Island, Venice, Italy

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
  • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
  • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • (2018)Generalized profile-guided iterator recognitionProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179511(185-195)Online publication date: 24-Feb-2018
  • (2018)Distributed Parallelizability Analysis of Legacy Code2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00028(103-110)Online publication date: Dec-2018
  • (2017)Context-Aware Memory Profiling for Speculative Parallelism2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00045(328-337)Online publication date: Dec-2017
  • (2017)A hybrid sample generation approach in speculative multithreadingThe Journal of Supercomputing10.1007/s11227-017-2118-3Online publication date: 7-Aug-2017
  • (2016)$${\mathrm{DS}}_{\mathrm{spirit}}$$DSspiritThe Journal of Supercomputing10.1007/s11227-015-1612-872:2(770-788)Online publication date: 1-Feb-2016
  • (2014)Variability of data dependences and control flow2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2014.6844482(180-189)Online publication date: Mar-2014
  • (2014)Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy CodeCompiler Construction10.1007/978-3-642-54807-9_9(154-173)Online publication date: 2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media