Memory Referencing Behavior in Compiler-Parallelized Applications

Torrie, Evan; Martonosi, Margaret; Hall, Mary W.; Tseng, Chau-Wen

doi:10.1007/BF03356754

Memory Referencing Behavior in Compiler-Parallelized Applications

Published: 26 May 2016

Volume 24, pages 349–376, (1996)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Evan Torrie¹,
Margaret Martonosi²,
Mary W. Hall³ &
…
Chau-Wen Tseng⁴

16 Accesses
Explore all metrics

Abstract

Compiler-parallelized applications are increasing in importance as moderate-scale multiprocessors become common. This paper evaluates how features of advanced memory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler parallelization. Using full-sized input data sets and applications taken from the SPEC, NAS, PERFECT, and RICEPS benchmark suites, we measure statistics such as speedups, memory costs, causes of cache misses, cache line utilization, and data traffic. This exploration allows us to draw several conclusions. First, we find that larger granularity parallelism often correlates with good memory system behavior, good overall performance, and high speedup in these applications. Second, we show that when long (512 byte) cache lines are used, many of these applications suffer from false sharing and low cache line utilization. Third, we identify some of the common artifacts in compiler-parallelized codes that can lead to false sharing or other types of poor memory system performance, and we suggest methods for improving them. Overall, this study offers both an important snapshot of the behavior of applications compiled by state-of-the-art compilers, as well as an increased understanding of the interplay between cache line size, program granularity, and memory performance in moderate-scale multiprocessors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

S. J. Eggers and R. H. Katz, The Effect of Sharing on the Cache and Bus Performance of Parallel Programs, Third Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pp. 257–270 (April 1989).
A. Gupta and W.-D. Weber, Cache Invalidation Patterns in Shared-Memory Multiprocessors, IEEE Trans. on Computers 41(7):794–810 (July 1992).
Article Google Scholar
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, The Directory-Based Protocol for the DASH Multiprocessor, Proc. 17th Ann. Int’l. Symp. on Computer Architecture (May 1990).
J. P. Singh, W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared-Memory, Computer Architecture News 20(1):5–44 (March 1992).
Article Google Scholar
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, Methodological Considerations and Characterization of the SPLASH-2 Parallel Application Suite, Proc. of the 22st Int’l. Symp. on Computer Architecture, Santa Margherita Ligure, Italy (June 1995).
Google Scholar
R. Wilson et al. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers, ACM SIGPLAN Notices 29(12):31–37 (December 1994).
Article Google Scholar
C.-W. Tseng, J. Anderson, S. Amarasinghe, and M. Lam, Unified Compilation Techniques for Shared and Distributed Address Space Machines, Proc. of the ACM Int’l. Conf. on Supercomputing, Barcelona, Spain (July 1995).
Google Scholar
M. Martonosi, A. Gupta, and T. Anderson, MemSpy: Analyzing Memory System Bottlenecks in Programs, Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 1–12 (June 1992).
M. R. Martonosi, Analyzing and Tuning Memory Performance in Sequential and Parallel Programs, Ph.D. Thesis, Stanford University (December 1993). Also Stanford CSL Technical Report CSL-TR-94-602.
Google Scholar
H. Davis, S. R. Goldschmidt, and J. Hennessy, Multiprocessor Simulation and Tracing Using Tango, Proc. Int’l. Conf. on Parallel Processing (August 1991).
S. R. Goldschmidt, Simulation of Multiprocessors, Speed and Accuracy, Ph.D. Thesis, Stanford University (June 1993).
Google Scholar
M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy and P. Strenstrom. The Detection and Elimination of Useless Misses in Multiprocessors, Proc. 20th Int’l. Symp. on Computer Architecture, pp. 88–97 (May 1993).
J. Kuskin et al. The Stanford FLASH Multiprocessor, Proc. of the 21st Int’l. Symp. on Computer Architecture, Chicago, Illinois, pp. 302–313 (April 1994).
Chapter Google Scholar
M. W. Hall, B. R. Murphy, and S. P. Amarasinghe, Interprocedural Parallelization Analysis: A Case Study, Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, San Francisco (February 1995).
Google Scholar
R. L. Lee, The Effectiveness of Caches and Data Prefetch Buffers in Large-Scale Shared Memory Multiprocessors, Ph.D. Thesis, University of Illinois at Urbana-Champaign (May 1987).
Google Scholar
J. Torrellas, M. S. Lam, and J. L. Hennessy, False Sharing and Spatial Locality in Multiprocessor Caches, IEEE Trans. on Computers 43(6):651–663 (June 1994).
Article MATH Google Scholar
S. K. Reinhardt, J. R. Larus, and D. A. Wood, Tempest and Typhoon: User-Level Shared Memory, Proc. 21st Ann. Int’l. Symp. on Computer Architecture, pp. 325–377 (April 1994).
M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam, Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler, Proc. of Supercomputing ’95 (December 1995).
W. Bolosky and M. Scott, False Sharing and Its Effect on Shared Memory Performance, Proc. of the USENIX Symp. on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), San Diego, California (September 1993).
Google Scholar
S. J. Eggers and T. E. Jeremiassen, Eliminating False Sharing, Proc. 1991 Int’l. Conf. on Parallel Processing, St. Charles, Illinois (August 1991).
Google Scholar
T. Jeremiassen and S. Eggers, Reducing False Sharing on Shared Memory Multiprocessors Through Compile Time Data Transformations, Proc. of the Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, Santa Barbara, California (July 1995).
Google Scholar
W. Blume and R. Eigenmann, Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs, IEEE Trans. on Parallel and Distributed Systems 3(6):643–656 (November 1992).
Article Google Scholar
C. Natarajan, S. Sharma, and R. Iyer, Measurement-based Characterization of Global Memory and Network Contention, Operating System and Parallelization Overheads: Case Study on a Shared-Memory Multiprocessor, Proc. of the 21st Int’l. Symp. on Computer Architecture, Chicago, Illinois (May 1994).
Google Scholar
D. J. Lilja, The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor, IEEE Trans. on Parallel and Distributed Systems 5(6):573–584 (June 1994).
Article Google Scholar
S. Carr, K. S. McKinley, and C.-W. Tseng, Compiler Optimizations for Improving Data Locality, Proc. Sixth Int’l. Conf. on Architectured Support for Programming Languages and Operating Systems (ASPLOS), pp. 252–262 (October 1994).
J. Anderson and M. Lam, Global Optimizations for Parallelism and Locality on Scalable Parallel Machines, Proc. SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico (June 1993).
Google Scholar
E. Granston and H. Wishoff, Managing Pages in Shared Virtual Memory Systems: Getting the Compiler into the Game, Proc. 1993 ACM Int’l Conf. on Supercomputing, Tokyo, Japan (July 1993).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Systems Laboratory, Stanford University, Stanford, USA
Evan Torrie
Department of Electrical Engineering, Princeton University, Princeton, USA
Margaret Martonosi
Department of Computer Science, California Institute of Technology, Pasadena, USA
Mary W. Hall
Department of Computer Science, University of Maryland, College Park, USA
Chau-Wen Tseng

Authors

Evan Torrie
View author publications
You can also search for this author in PubMed Google Scholar
Margaret Martonosi
View author publications
You can also search for this author in PubMed Google Scholar
Mary W. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Chau-Wen Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evan Torrie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torrie, E., Martonosi, M., Hall, M.W. et al. Memory Referencing Behavior in Compiler-Parallelized Applications. Int J Parallel Prog 24, 349–376 (1996). https://doi.org/10.1007/BF03356754

Download citation

Published: 26 May 2016
Issue Date: August 1996
DOI: https://doi.org/10.1007/BF03356754

Key Words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Memory Referencing Behavior in Compiler-Parallelized Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Software Cache Coherent Control by Parallelizing Compiler

Compiler Optimizations for Parallel Programs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

Subscribe and save

Buy Now

Navigation

Memory Referencing Behavior in Compiler-Parallelized Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Software Cache Coherent Control by Parallelizing Compiler

Compiler Optimizations for Parallel Programs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Subscribe and save

Buy Now

Search

Navigation