Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

The effect of sharing on the cache and bus performance of parallel programs

Published: 01 April 1989 Publication History
  • Get Citation Alerts
  • Abstract

    Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In this study, we use traces of parallel programs to evaluate the cache and bus performance of shared memory multiprocessors, in which coherency is maintained by a write-invalidate protocol. In particular, we analyze the effect of sharing overhead on cache miss ratio and bus utilization.
    Our studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine-grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.

    References

    [1]
    A. Agarwal, J. Hennessy and M. Horowitz, "Cache Performance of Operation System and Multiprogramming Workloads", ACM Transactions on Computer Systems, 6, 4 (November 1988), 393-431.
    [2]
    A. Agarwal and A. Gupta, "Memory-Reference Characteristics of Multiprocessor Applications under MACH", Proceedings of the 1988 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, 16, 1 (1988), 215-225.
    [3]
    C. Alexander, W. Keshlear, F. Cooper and F. Briggs, "Cache Memory Performance in a UNIX Environment", Computer Ardu'tecture News, 1,4, 3 (June 1986), 14-70.
    [4]
    I. Archibald and J. Baer, "An Evaluation of Cache Coherence Solutiom in Shared-B~ Multiprocessors", ACM Transactions on Computer Systems, 4, 4 (November 1986), 273-298.
    [5]
    A. Casotto, F. Romeo and A. Sangiovanni- VincentellL "A Parallel Simulated Annealing Algorithm for the Placement of Macro-Cells", IEEE International Conference on Computer-Aided Design, Santa Clara, CA (November 1986), 30-33.
    [6]
    D. F. Cheriton, A. Gupta, P. D. Boyle and H. A. Goosen, "The VMP Multiprocessor: initial Experience, Refinements and Performance Evaluation", Proceedings 15 th Annual International Symposium on Computer Architecture, Honolulu, HA (May 1988), 410-421.
    [7]
    S. Devadas and A. R. Newton, "Topological Optimization of Multiple Level Array Logic", IEEE Transactions on Computer-Aided Design (November 1987).
    [8]
    S. J. Eggers and R. H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation", Proceedings 15th Annual international Symposium on Computer Architecture, Honolulu HA (May 1988), 373-383.
    [9]
    S. J. Eggers and R. H. Katz, "Evaluation of the Performance of Four Snooping Cache Coherency Protocols", submitted for publication (1988).
    [10]
    G.A. Gibson, "SpurBus Specification", to appear as Computer Science Division Technical Report, University of California, Berkeley (December 1988).
    [11]
    J. R. Goodman, "Cache Memory Optimization to Reduce Processor/Memory Traffic", Journal of VLSI and Computer Systems, 2, 1 & 2 (1987), 61- 86.
    [12]
    M.D. Hill, S. J. Eggers, $. R. Larus, G. S. Taylor, G. Adams, B. K. Bose, G. A. Gibson, P. M. Hansen, J. Keller, S. I. Kong, C. G. Lee, D. Lee, J. M. Pendleton, S. A. Ritchie, D. A. Wood, B. G. Zorn, P. N. Hilfinger, D. Hodges, R. H. Katz, J. Ousterhout and D. A. Patterson, "SPUR: A VLSI Multiprocessor Workstation", IEEE Computer, 19, 11 (November 1986), 8-22.
    [13]
    M. D. Hill, "Aspects of Cache Memory and Instruction Buffer Performance", Technical Report No. UCB/Computer Science Dpt. 87/381, University of California, Berkeley (November 1987).
    [14]
    R. Katz, S. Eggers, D. Wood, C. L. Perkins and R. Sheldon, "Implementing a Cache Consistency Protocol", Proceedings 12th Annual International Symposium on Computer Architecture, 13, 3 (June 1985), 276-283.
    [15]
    H.T. Ma, S. Devadas, R. Wei and A. Sangiovanni- Vincentelli, "Logic Verification Algorithms and their Parallel Implementation", Proceedings of the 24th Design Automation Conference(July 1987), 283-29O.
    [16]
    S. McGrogan, R. Olson and N. Toda, "Paralielizing Large Existing Programs - Methodology and Experiences", Proceedings of Spring COMPCON (March 1986), 458-466.
    [17]
    D.A. Patterson, "Reduced Instruction Computers", Communications of the ACM, 28, 1 (January 1985), 8-21.
    [18]
    S. Przybylski, M. Horowitz and J. Hennessy, "Performance Tradeoffs in Cache Design", Proceedings of the 15th Annual International Symposium on Computer Architecture, Honolulu, Hawaii (May 1988), 290-298.
    [19]
    C. Ruggieri and T. P. Murtagh, "Lifetime Analysis of Dynamically Allocated Objects", Conference Record of the 15th Annual ACM Symposium on Principles of Programming Languages, San Diego (January 1988), 285-293.
    [20]
    R.L. Sites and A. Agarwal, "Multiprocessor Cache Analysis Using ATUM", Proceedings 15th Annual International Symposium on Computer Architecture, Honolulu, HA (May 1988), 186-195.
    [21]
    A. J. Smith, "Cache Evaluation and the Impact of Workload Choice", Proceedings of 12th Annual International Symposium of Computer Architecture, 13, 3 (June 1985), 64-73.
    [22]
    A. L Smith, "Line (Block) Size Choice for CPU Caches", IEEE Trans. on Computers, C-36, 9 (September 1987).
    [23]
    D. A. Wood, S. J. Eggers, G. Gibson, M. D. Hill, J. Pendleton, S. A. Ritchie, G. S. Taylor, R. H. Katz and D. A. Patterson, "An In-Cache Address Translation Mechanism", 13th Annual International Symposium on Computer Architecture, Tokyo, Japan (June 1986), 358-365.
    [24]
    D. A. Wood, S. J. Eggers and G. A. Gibson, "SPUR Memory System Architecture", Technical Report No. UCB/Computer Science Dpt./87f394, University of California, Berkeley (December 1987).

    Cited By

    View all

    Recommendations

    Reviews

    Andrew Robert Huber

    How does the sharing resulting from writing an application program as a set of parallel processes affect cache performance__?__ The authors investigate this question for shared-memory multiprocessors with a single bus. They use trace-driven simulation to examine the performance of four applications written explicitly for parallel execution. The parallel programming model used is single-program-multiple-data: <__?__Pub Fmt italic>N<__?__Pub Fmt /italic> processes each execute identical instructions on their own part of the shared data. This corresponds to many real-world applications written for some small number of processors, with each process dedicated to its own processor. The applications are actual CAD programs written for <__?__Pub Fmt italic>N<__?__Pub Fmt /italic> = 5, 11, 12, and 12 processors. The hardware simulated is RISC-like. The unsurprising answer is an unequivocal “it depends”— <__?__Pub Caret>on the sharing the application does. Applications whose processes exhibit locality (multiple consecutive writes to shared data within a cache block) behave much like nonparallel programs. Applications with fine-grain sharing (where multiple processes contend for shared data within cache blocks) do not. In either case, cache miss ratios and bus utilization are higher than in nonparallel programs because of extra misses caused by the cache invalidations necessary to maintain cache consistency. For programs with locality, this shows up as a smaller improvement in the miss ratio as cache block size or total cache size increases. For programs with fine-grain sharing, the extra misses can be sufficient to increase the miss ratio for large block or cache size. The results for bus utilization are similar. The paper is competently organized and presented. The usual caveats apply since the model and applications used, while representative, are limited, and the traces include only application references. It would have been interesting to see how the metrics varied with the number of processes. The results will be of interest to cache designers of shared memory multiprocessors and to programmers interested enough in performance to reorganize applications to take cache parameters into account.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 17, Issue 2
    Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
    April 1989
    291 pages
    ISSN:0163-5964
    DOI:10.1145/68182
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
      April 1989
      303 pages
      ISBN:0897913000
      DOI:10.1145/70082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 1989
    Published in SIGARCH Volume 17, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)94
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Eve: A Parallel Event-Driven Programming LanguageEuro-Par 2014: Parallel Processing Workshops10.1007/978-3-319-14313-2_15(170-181)Online publication date: 2014
    • (1989)Software-controlled cache coherence protocol for multicache systemsInformation Processing Letters10.1016/0020-0190(89)90190-733:3(125-130)Online publication date: Nov-1989
    • (2018)Report on the SIGIR 2017 Workshop on Axiomatic Thinking for Information Retrieval and Related Tasks (ATIR)ACM SIGIR Forum10.1145/3190580.319059651:3(99-106)Online publication date: 22-Feb-2018
    • (2018)Current and Future Trends in Mobile Device ForensicsACM Computing Surveys10.1145/317784751:3(1-31)Online publication date: 1-May-2018
    • (2018)Blue sky ideas in artificial intelligence education from the EAAI 2017 new and future AI educator programAI Matters10.1145/3175502.31755093:4(23-31)Online publication date: 16-Feb-2018
    • (2017)Redesign the Memory Allocator for Non-Volatile Main MemoryACM Journal on Emerging Technologies in Computing Systems10.1145/299765113:3(1-26)Online publication date: 14-Apr-2017
    • (2017)Power-Utility-Driven Write Management for MLC PCMACM Journal on Emerging Technologies in Computing Systems10.1145/299764813:3(1-22)Online publication date: 20-Apr-2017
    • (2017)Low-level implementation of the SISC protocol for thread-level speculation on a multi-core architectureParallel Computing10.1016/j.parco.2017.07.00767:C(1-19)Online publication date: 1-Sep-2017
    • (2016)Memory Referencing Behavior in Compiler-Parallelized ApplicationsInternational Journal of Parallel Programming10.1007/BF0335675424:4(349-376)Online publication date: 26-May-2016
    • (2012)Performance and Overhead MeasurementsMulticore Programming Using the ParC Language10.1007/978-1-4471-2164-0_8(259-277)Online publication date: 2012
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media