Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Memory Referencing Behavior in Compiler-Parallelized Applications

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Compiler-parallelized applications are increasing in importance as moderate-scale multiprocessors become common. This paper evaluates how features of advanced memory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler parallelization. Using full-sized input data sets and applications taken from the SPEC, NAS, PERFECT, and RICEPS benchmark suites, we measure statistics such as speedups, memory costs, causes of cache misses, cache line utilization, and data traffic. This exploration allows us to draw several conclusions. First, we find that larger granularity parallelism often correlates with good memory system behavior, good overall performance, and high speedup in these applications. Second, we show that when long (512 byte) cache lines are used, many of these applications suffer from false sharing and low cache line utilization. Third, we identify some of the common artifacts in compiler-parallelized codes that can lead to false sharing or other types of poor memory system performance, and we suggest methods for improving them. Overall, this study offers both an important snapshot of the behavior of applications compiled by state-of-the-art compilers, as well as an increased understanding of the interplay between cache line size, program granularity, and memory performance in moderate-scale multiprocessors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. J. Eggers and R. H. Katz, The Effect of Sharing on the Cache and Bus Performance of Parallel Programs, Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pp. 257–270 (April 1989).

  2. A. Gupta and W.-D. Weber, Cache Invalidation Patterns in Shared-Memory Multiprocessors, IEEE Trans. on Computers 41(7):794–810 (July 1992).

    Article  Google Scholar 

  3. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, The Directory-Based Protocol for the DASH Multiprocessor, Proc. 17th Ann. Int’l. Symp. on Computer Architecture (May 1990).

  4. J. P. Singh, W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared-Memory, Computer Architecture News 20(1):5–44 (March 1992).

    Article  Google Scholar 

  5. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, Methodological Considerations and Characterization of the SPLASH-2 Parallel Application Suite, Proc. of the 22st Int’l. Symp. on Computer Architecture, Santa Margherita Ligure, Italy (June 1995).

    Google Scholar 

  6. R. Wilson et al. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers, ACM SIGPLAN Notices 29(12):31–37 (December 1994).

    Article  Google Scholar 

  7. C.-W. Tseng, J. Anderson, S. Amarasinghe, and M. Lam, Unified Compilation Techniques for Shared and Distributed Address Space Machines, Proc. of the ACM Int’l. Conf. on Supercomputing, Barcelona, Spain (July 1995).

    Google Scholar 

  8. M. Martonosi, A. Gupta, and T. Anderson, MemSpy: Analyzing Memory System Bottlenecks in Programs, Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 1–12 (June 1992).

  9. M. R. Martonosi, Analyzing and Tuning Memory Performance in Sequential and Parallel Programs, Ph.D. Thesis, Stanford University (December 1993). Also Stanford CSL Technical Report CSL-TR-94-602.

    Google Scholar 

  10. H. Davis, S. R. Goldschmidt, and J. Hennessy, Multiprocessor Simulation and Tracing Using Tango, Proc. Int’l. Conf. on Parallel Processing (August 1991).

  11. S. R. Goldschmidt, Simulation of Multiprocessors, Speed and Accuracy, Ph.D. Thesis, Stanford University (June 1993).

    Google Scholar 

  12. M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy and P. Strenstrom. The Detection and Elimination of Useless Misses in Multiprocessors, Proc. 20th Int’l. Symp. on Computer Architecture, pp. 88–97 (May 1993).

  13. J. Kuskin et al. The Stanford FLASH Multiprocessor, Proc. of the 21st Int’l. Symp. on Computer Architecture, Chicago, Illinois, pp. 302–313 (April 1994).

    Chapter  Google Scholar 

  14. M. W. Hall, B. R. Murphy, and S. P. Amarasinghe, Interprocedural Parallelization Analysis: A Case Study, Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, San Francisco (February 1995).

    Google Scholar 

  15. R. L. Lee, The Effectiveness of Caches and Data Prefetch Buffers in Large-Scale Shared Memory Multiprocessors, Ph.D. Thesis, University of Illinois at Urbana-Champaign (May 1987).

    Google Scholar 

  16. J. Torrellas, M. S. Lam, and J. L. Hennessy, False Sharing and Spatial Locality in Multiprocessor Caches, IEEE Trans. on Computers 43(6):651–663 (June 1994).

    Article  MATH  Google Scholar 

  17. S. K. Reinhardt, J. R. Larus, and D. A. Wood, Tempest and Typhoon: User-Level Shared Memory, Proc. 21st Ann. Int’l. Symp. on Computer Architecture, pp. 325–377 (April 1994).

  18. M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam, Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler, Proc. of Supercomputing ’95 (December 1995).

  19. W. Bolosky and M. Scott, False Sharing and Its Effect on Shared Memory Performance, Proc. of the USENIX Symp. on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), San Diego, California (September 1993).

    Google Scholar 

  20. S. J. Eggers and T. E. Jeremiassen, Eliminating False Sharing, Proc. 1991 Int’l. Conf. on Parallel Processing, St. Charles, Illinois (August 1991).

    Google Scholar 

  21. T. Jeremiassen and S. Eggers, Reducing False Sharing on Shared Memory Multiprocessors Through Compile Time Data Transformations, Proc. of the Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, Santa Barbara, California (July 1995).

    Google Scholar 

  22. W. Blume and R. Eigenmann, Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs, IEEE Trans. on Parallel and Distributed Systems 3(6):643–656 (November 1992).

    Article  Google Scholar 

  23. C. Natarajan, S. Sharma, and R. Iyer, Measurement-based Characterization of Global Memory and Network Contention, Operating System and Parallelization Overheads: Case Study on a Shared-Memory Multiprocessor, Proc. of the 21st Int’l. Symp. on Computer Architecture, Chicago, Illinois (May 1994).

    Google Scholar 

  24. D. J. Lilja, The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor, IEEE Trans. on Parallel and Distributed Systems 5(6):573–584 (June 1994).

    Article  Google Scholar 

  25. S. Carr, K. S. McKinley, and C.-W. Tseng, Compiler Optimizations for Improving Data Locality, Proc. Sixth Int’l. Conf. on Architectured Support for Programming Languages and Operating Systems (ASPLOS), pp. 252–262 (October 1994).

  26. J. Anderson and M. Lam, Global Optimizations for Parallelism and Locality on Scalable Parallel Machines, Proc. SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico (June 1993).

    Google Scholar 

  27. E. Granston and H. Wishoff, Managing Pages in Shared Virtual Memory Systems: Getting the Compiler into the Game, Proc. 1993 ACM Int’l Conf. on Supercomputing, Tokyo, Japan (July 1993).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evan Torrie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torrie, E., Martonosi, M., Hall, M.W. et al. Memory Referencing Behavior in Compiler-Parallelized Applications. Int J Parallel Prog 24, 349–376 (1996). https://doi.org/10.1007/BF03356754

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03356754

Key Words