Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

The effectiveness of multiple hardware contexts

Published: 01 November 1994 Publication History
  • Get Citation Alerts
  • Abstract

    Multithreaded processors are used to tolerate long memory latencies. By executing threads loaded in multiple hardware contexts, an otherwise idle processor can keep busy, thus increasing its utilization. However, the larger size of a multi-thread working set can have a negative effect on cache conflict misses. In this paper we evaluate the two phenomena together, examining their combined effect on execution time.
    The usefulness of multiple hardware contexts depends on: program data locality, cache organization and degree of multiprocessing. Multiple hardware contexts are most effective on programs that have been optimized for data locality. For these programs, execution time dropped with increasing contexts, over widely varying architectures. With unoptimized applications, multiple contexts had limited value. The best performance was seen with only two contexts, and only on uniprocessors and small multiprocessors. The behavior of the unoptimized applications changed more noticeably with variations in cache associativity and cache hierarchy, unlike the optimized programs.
    As a mechanism for exploiting program parallelism, an additional processor is clearly better than another context. However, there were many configurations for which the addition of a few hardware contexts brought as much or greater performance than a larger multiprocessor with fewer than the optimal number of contexts.

    References

    [1]
    A. Agarwal. Limits on interconnection network performnce, iEEE Transactions on Parallel and Distributed Systms, 2(4):398-412, October 1991.
    [2]
    A. Agarwai. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525-539, September 1992.
    [3]
    A. Agarwal, B-H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessmg. 17th Annual International Symposium on Computer Arc. hitecture, pages 104-114, May 1990.
    [4]
    R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. International Conference on Supercomputing, pages 1{-6, June 1990.
    [5]
    B.N. Bershad, E. D, Lazowska, and H. M. Levy. PRESTO: A system for object-oriented parallel programming. Software: Practice and Experience, 18(8):713-732, August 1988.
    [6]
    B. Boothe and A. Ranade. Improved mulfithreading techniques for hiding communication latency in multiprocessors. 19th Annual International Symposium on Computer Architecture, pages 214-223, May 1992.
    [7]
    D. Chaiken, J. Kubiatowicz, and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme. Architectural Support for Programming Languages and Operating Systems, pages 224-234, April 1991.
    [8]
    S.j. Eggers, D. R. Keppel, E. J. Koldinger, and H. M. Levy. Techniques for efficient inline tracing on a shared-memory mulfiprocessor. ACM SiGMETRICS Conference on Measurernent and Modeling of Computer Systems, pages 37-46, May 990.
    [9]
    K.i. Farkas and N. P. Jouppi. Complexity/performance tradeoffs with non-blocking loads. 21th Annual International Symposium on Computer Architecture, pages 211-222, April 1994.
    [10]
    M. K. Fattens and A. R. Pleszkum. Strategies for achieving processor throughput. 18th Annual International Symposium on Computer Architecture, pages 362-369, May 1991.
    [11]
    A. Gupta, J. Hennesey, K. Gharachorloo, T Mowry, and W- D. Weber. Comparative evaluation of latency reducing and tolerating techniques. 18th Annual International Symposium on Computer Architecture, pages 254-263, May 1991.
    [12]
    R. H. Halstead and T. Fujita. MASA: A mulfithreaded processor architecture for parallel symbolic computing. 15th Annual International Symposium on Computer Archi,tecture, pages 443--451, May 1988.
    [13]
    T.E. Jeremiassen and S.J. Eggers. Computing per-process summary side-effect information. 5th Workshop on Languages and Compilers for Parallel Computing, August 1992. Also appeared as LNCS #757, pages 175-19I.
    [14]
    T.E. Jeremiassen and S.J. Eggers. Static analysis of barrier synchronization in explicitly parallel programs. International Conference on Parallel Architectures and Compilation Techniques, Montreal, August 1994.
    [15]
    D. Kroft. Lockup-free instruction fetch/prefetch cache organization. 8th Annual Symposium on Computer Architecture, pages 81-87, May 1981.
    [16]
    E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. Supercompt~ing '92, pages 104-113, November 1992.
    [17]
    J. H. Mulder, N. T. Quach, and M. J Flynn. An area model for on-chip memories and its applications. IEEE Journal of Solid-State Circuits, 26(2):98-106, February 1991.
    [18]
    C. D. Polychronopoulos and D. J. Kuck. Guided selfscheduling' A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C- 36(t2):1425-1439, December 1987.
    [19]
    R. H. Saavedra-Barrera, D. E. Culler, and T. yon Eicken. Analysis of multithreaded architectures for parallel computing. 2nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 169-178, July 1990.
    [20]
    J.P. Singh, W-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1 ):5--44, March 1992.
    [21]
    B.J. Smith. Architecture and applications of the HEP multiprocessor computer system. SPIE, Real-Time Signal Processing/V, 298:241-248, 1981.
    [22]
    Symmetry Technical Summary. Sequent Computer Systems, Inc.
    [23]
    R. Thekkath and S.J. Eggers. Impact of sharing-basedthTead placement on multithreaded architectures. 21th Annual international Symposium on Computer Architecture, pages 176- 186, April 1994.
    [24]
    T.H. Tzen and L. M. Ni. Dynamic loop scheduling for sharedmemory multiprocessors. 1991 International Conference on Parallel Processing, pages 1i:246-250, August 1991.
    [25]
    T. Wada, S. Rajan, and S. A. Przybylski. An analytical access time model for on-chip cache memories. IEEE Journal of Solid-State Circuits, 27(8):1147-1156, August 1992.
    [26]
    W-D. Weber and A. Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. 16th Annual International Symposium on Computer Architecture, pages 273-280, June 1989.

    Cited By

    View all
    • (2018)Increasing resource utilization in mixed-criticality systems using a polymorphic VLIW processorJournal of Systems Architecture10.1016/j.sysarc.2018.01.00384(2-11)Online publication date: Mar-2018
    • (2017)Using a polymorphic VLIW processor to improve schedulability and performance for mixed-criticality systems2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA.2017.8046315(1-9)Online publication date: Aug-2017
    • (2015)Multiple contexts in a multi-ported VLIW register file implementation2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)10.1109/ReConFig.2015.7393329(1-6)Online publication date: Dec-2015
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 28, Issue 5
    Dec. 1994
    323 pages
    ISSN:0163-5980
    DOI:10.1145/381792
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
      November 1994
      341 pages
      ISBN:0897916603
      DOI:10.1145/195473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 1994
    Published in SIGOPS Volume 28, Issue 5

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Increasing resource utilization in mixed-criticality systems using a polymorphic VLIW processorJournal of Systems Architecture10.1016/j.sysarc.2018.01.00384(2-11)Online publication date: Mar-2018
    • (2017)Using a polymorphic VLIW processor to improve schedulability and performance for mixed-criticality systems2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA.2017.8046315(1-9)Online publication date: Aug-2017
    • (2015)Multiple contexts in a multi-ported VLIW register file implementation2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)10.1109/ReConFig.2015.7393329(1-6)Online publication date: Dec-2015
    • (2013)A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache ConflictsIEICE Transactions on Information and Systems10.1587/transinf.E96.D.2047E96.D:9(2047-2054)Online publication date: 2013
    • (2013)Multi-criteria checkpointing strategiesProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_43(420-431)Online publication date: 26-Aug-2013
    • (1997)Parallel replacement mechanism for multithreadProceedings. Advances in Parallel and Distributed Computing10.1109/APDC.1997.574052(338-344)Online publication date: 1997
    • (2023)Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUsMicroelectronics Journal10.1016/j.mejo.2023.105825138(105825)Online publication date: Aug-2023
    • (2013)OWLACM SIGPLAN Notices10.1145/2499368.245115848:4(395-406)Online publication date: 16-Mar-2013
    • (2013)OWLACM SIGARCH Computer Architecture News10.1145/2490301.245115841:1(395-406)Online publication date: 16-Mar-2013
    • (2013)OWLProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451158(395-406)Online publication date: 16-Mar-2013
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media