Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Limits on the performance benefits of multithreading and prefetching

Published: 15 May 1996 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents new analytical models of the performance benefits of multithreading and prefetching, and experimental measurements of parallel applications on the MIT Alewife multiprocessor. For the first time, both techniques are evaluated on a real machine as opposed to simulations. The models determine the region in the parameter space where the techniques are most effective, while the measurements determine the region where the applications lie. We find that these regions do not always overlap significantly.The multithreading model shows that only 2-4 contexts are necessary to maximize this technique's potential benefit in current multiprocessors. Multithreading improves execution time by less than 10% for most of the applications that we examined. The model also shows that multithreading can significantly improve the performance of the same applications in multiprocessors with longer latencies. Reducing context-switch overhead is not crucial.The software prefetching model shows that allowing 4 outstanding prefetches is sufficient to achieve most of this technique's potential benefit on current multiprocessors. Prefetching improves performance over a wide range of parameters, and improves execution time by as much as 20-50% even on current multiprocessors. The two models show that prefetching has a significant advantage over multithreading for machines with low memory latencies and/or applications with high cache miss rates because a prefetch instruction consumes less time than a context-switch.

    References

    [1]
    A. Agarwal. Performance Tradeoffs in Multithreaded Processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525--539, September 1992.
    [2]
    A. Agarwal, R. Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung. The MIT Alewife Machine: Architecture and Performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, June 1995.
    [3]
    A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An Evolutionary Processor Design for Multiprocessors. tEEE Micro, 13(3):48-61, June 1993.
    [4]
    R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera Computer System. In Proceedings of the International Conference on Supercomputing, pages 1-6, Amsterdam, June 1990. ACM.
    [5]
    D. Bailey et aI. The NAS Parallel Benchmarks. Technical Report RNR-94-007, NASA Ames Research Center, March 1994.
    [6]
    R. Bianchini and T.J. LeBlanc. A Preliminary Evaluation of Cache-Miss-lnitiated Prefetching Techniques in Scalable Multiprocessors. Technical Report TR 515, Department of Computer Science, University of Rochester, May 1994.
    [7]
    G. Byrd and M. Holliday. Multithreaded Processor Architectures. 1EEE Spectrum, pages 38-46, August 1995.
    [8]
    D. Callahan, K. Kennedy, and A. Porterfield. Software Prefetching. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, Boston, MA, April 1991.
    [9]
    T.-F. Chen and J.-L. Baer. A Performance Study of Software and Hardware Prefetching Schemes. In Proceedings of the 21st International Symposium on Computer Architecture, Chicago, IL, April 1994. ACM.
    [10]
    F. Dahlgren, M. Dubois, and P. Stenstrom. Sequential Hardware Prefetching in Shared-Memory Multiprocessors. 1EEE Transactions on Parallel and Distributed Systems, 6(7):733-- 746, July 1995.
    [11]
    P. Dubey, A. Krishna, and M. Squillante. Analytic Performance Modeling for a Spectrum of Multithreaded Processor Architectures. Computer Science RC 1966 I, IBM, July 1994.
    [12]
    S. Frank, H. Burkhardt IIi, and J. Rothnie. The KSR1; Bridging the Gap Between Shared Memory and MPPs. In Proceedings of the 38th Annual IEEE Computer Society Computer Conference (COMPCON), pages 284-294, San Francisco, CA, 1993. IEEE.
    [13]
    A. Gupta, J. Hennessy, K. Gharaehorloo, T. Mowry, and W.- D. Weber. Comparative Evaluation of Latency Reducing and Tolerating Techniques. In Proceedings of the 18th International Symposium on Computer Architecture, pages 254--263, Toronto, Canada, May 1991. ACM.
    [14]
    H. Hum et al. The Multi-Threaded Architecture Multiprocessor. Technical Report ACAPS Technical Memo 88, McGill University School of Computer Science, December 1994.
    [15]
    R.A. Iannucci, editor. Multithreaded Computer Architecture- A Summary of the State of the Art. Kluwer Academic Publishers, 1994.
    [16]
    S. Keckler and W. Dally. Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism. in Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 202-213, Gold Coast, Australia, June 1992. IEEE.
    [17]
    J. Kubiatowicz, D. Chaiken, and A. Agarwal. Closing the Window of Vulnerability in Multiphase Memory Transactions. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 274--284, Boston, MA, October 1992. ACM.
    [18]
    J. Laudon, A. Gupta, andM. Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308-318, San Jose, CA, October 1994. ACM.
    [19]
    D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63-79, March 1992.
    [20]
    T. Mowry and A. Gupta. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing, 12(2):87-106, June 1991.
    [21]
    S.S. Nemawarkar, R. Govindarajan, G.R. Gao, and V.K. Agarwal. Analysis of Multithreaded Architectures with Distributed Shared Memory. In Proceedings of the 5th 1EEE Symposium on Parallel and Distributed Processing, pages 114--121, Dallas, TX, 1993. IEEE.
    [22]
    G. PapadopoulosandD. Culler. Monsoon: An Explicit Token- Store Architecture. in Proceedings of the 17th Annual international Symposium on Computer Architecture, pages 82-91, June 1990.
    [23]
    R.H. Saavedra-Barrera, D. Culler, and T. von Eicken. Analysis of Multithreaded Architectures for Parallel Computing. In Proceedings of the 2nd Annual A CM Symposium on Parallel Algorithms and Architectures, pages 169-177, July 1990.
    [24]
    J.P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Technical Report CSL-TR-92-526, Stanford University, June 1992.
    [25]
    B.J. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. Society of Photooptical Instrumentation Engineers, 298:241-248, i981.
    [26]
    R. Thekkath and S. Eggers. The Effectiveness of Multiple Hardware Contexts. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 328--337, San Jose, CA, October 1994. ACM.

    Cited By

    View all
    • (2004)Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processorsProceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems10.1145/1066650.1066667(1-12)Online publication date: 22-Oct-2004
    • (2004)Performance estimation of virtual duplex systems on simultaneous multithreaded processors18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.10.1109/IPDPS.2004.1303241(210-217)Online publication date: 2004
    • (1999)Responsiveness without interruptsProceedings of the 13th international conference on Supercomputing10.1145/305138.305172(101-108)Online publication date: 20-Jun-1999
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 24, Issue 1
    May 1996
    273 pages
    ISSN:0163-5999
    DOI:10.1145/233008
    Issue’s Table of Contents
    • cover image ACM Conferences
      SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
      May 1996
      279 pages
      ISBN:0897917936
      DOI:10.1145/233013
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 May 1996
    Published in SIGMETRICS Volume 24, Issue 1

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 06 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2004)Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processorsProceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems10.1145/1066650.1066667(1-12)Online publication date: 22-Oct-2004
    • (2004)Performance estimation of virtual duplex systems on simultaneous multithreaded processors18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.10.1109/IPDPS.2004.1303241(210-217)Online publication date: 2004
    • (1999)Responsiveness without interruptsProceedings of the 13th international conference on Supercomputing10.1145/305138.305172(101-108)Online publication date: 20-Jun-1999
    • (1997)Multi-threading and remote latency in software DSMsProceedings of 17th International Conference on Distributed Computing Systems10.1109/ICDCS.1997.598057(296-304)Online publication date: 1997

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media