article

Free access

Interleaving: a multithreading technique targeting multiprocessors and workstations

Authors:

Mark HorowitzAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 28, Issue 5

Pages 308 - 318

https://doi.org/10.1145/381792.195576

Published: 01 November 1994 Publication History

Abstract

There is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only natural that architectural features that benefit only multiprocessors are less likely to be adopted in commodity microprocessors. In this paper, we explore multiple-context processors, an architectural technique proposed to hide the large memory latency in multiprocessors. We show that while current multiple-context designs work reasonably well for multiprocessors, they are ineffective in hiding the much shorter uniprocessor latencies using the limited parallelism found in workstation environments. We propose an alternative design that combines the best features of two existing approaches, and present simulation results that show it yields better performance for both multiprogrammed workloads on a workstation and parallel applications on a multiprocessor. By addressing the needs of the workstation environment, our proposal makes multiple contexts more attractive for commodity microprocessors.

References

[1]

Anant Agarwal. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525-539, September 1992.

Digital Library

[2]

Cray Research, Incorporated. Cray T3D Technical Summary, October 1993.

[3]

David E. Culler, Michial Gunter, and James C. Lee. Analysis of multithreaded microprocessors under multiprogramming. Technical Report UCB/CSD 92/687, University of California, Berkeley, May 1992.

Digital Library

[4]

George E. Daddis Jr. and H. C. Tomg. The concurrent execution of multiple instruction streams on superscalar processors. In Proceedings of the 1991 International Conference on Parallel Processing, volume I, pages 76--83, 1991.

[5]

Helen Davis, Steven R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, pages 99-107, August 1991.

[6]

Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.

[7]

Kourosh Gharachorloo, Dan Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.

Digital Library

[8]

Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and Wolf-Dietrich Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceeding of the 18th Annual international Symposium on Computer Architecture, pages 254-263, May 1991.

Digital Library

[9]

Joe Heinrich. MIPS R4000 User's Manual. Prentice-Hall, 1993.

Digital Library

[10]

William Jaffe, Bob Miller, and Jeff Yetter. A 200 MFLOP precision architecture processor. In Hot Chips IV Symposium Record, pages 1.2.1-1.2.13, August 1992.

[11]

Stephen W. Keckler and William J. Dally. Processor coupling: Integrating compile time and runtime scheduling for parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 202-213, May 1992.

Digital Library

[12]

David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 81-87, 1981.

Digital Library

[13]

Kiyoshi Kurihara, David Chaiken, and Anant Agarwal. Latency tolerance through multithreading in large-scale multiprocessors. In Proceedings of the International Symposium on Shared Memory Multiprocessing, pages 91-101, April 1991.

[14]

James Laudon. Architectural and Implementation Tradeoffs for Multiple-Context Processors. PhD thesis, Stanford University, Stanford, California, May 1994.

Digital Library

[15]

James Laudon, Anoop Gupta, and Mark Horowitz. Architectural and implementation tradeoffs in the design of multiplecontext processors. Technical Report CSL-TR-92-523, Stanford University, May 1992.

Digital Library

[16]

Dan Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.

Digital Library

[17]

Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, October 1992.

Digital Library

[18]

Peter R. Nuth and William J. Dally. A mechanism for efficient context switching. In Proceedings of the 1991 IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 301-304, 1991.

Digital Library

[19]

Amos R. Omondi. Design of a high performance instruction pipeline. Computer Systems Science and Engineering, 6(1): 13-29, January 1991.

[20]

R. Guru Prasadh and Chuan-lin Wu. A benchmark evaluation of a multi-threaded RISC processor architecture. In Proceedings of the 1991 International Conference on Parallel Processing, volume I, pages 84-91, 1991.

[21]

Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford parallel applications for sharedmemory. Computer Architecture News, 20(1):5--44, March 1992.

Digital Library

[22]

Burton J. Smith. Architecture and applications of the HEP multiprocessor computer system. SPIE, 298:241-248, 1981.

[23]

Michael David Smith. Support for Speculative Execution in High-Performance Processors. PhD thesis, Stanford University, Stanford, California, November 1992.

Digital Library

[24]

S. Peter Song and Marvin Denman. The PowerPC 604TM RISC microprocessor. Motorola Luncheon, iSCA '94, April 1994.

[25]

Josep Torrellas. Multiprocessor Cache Memory Performance: Characterization and Optimization. PhD thesis, Stanford University, Stanford, California, August 1992.

Digital Library

[26]

Wolf-Dietrich Weber and Anoop Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 273-280, June 1989.

Digital Library

Cited By

Nojiri YYamasaki N(2023)A Design of Multithreaded RISC-V Processor for Real-Time System2023 Eleventh International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW60564.2023.00014(31-37)Online publication date: 27-Nov-2023
https://doi.org/10.1109/CANDARW60564.2023.00014
Olukotun KHammond LLaudon JOlukotun KHammond LLaudon J(2022)Improving ThroughputChip Multiprocessor Architecture10.1007/978-3-031-01720-9_2(21-59)Online publication date: 5-Mar-2022
https://doi.org/10.1007/978-3-031-01720-9_2
Craven SLong DSmith J(2010)Open Source Precision Timed Soft Processor for Cyber Physical System ApplicationsProceedings of the 2010 International Conference on Reconfigurable Computing and FPGAs10.1109/ReConFig.2010.72(448-451)Online publication date: 13-Dec-2010
https://dl.acm.org/doi/10.1109/ReConFig.2010.72
Show More Cited By

Index Terms

Interleaving: a multithreading technique targeting multiprocessors and workstations

Recommendations

Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

There is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only ...
Iteration Interleaving--Based SIMD Lane Partition

The efficacy of single instruction, multiple data (SIMD) architectures is limited when handling divergent control flows. This circumstance results in SIMD fragments using only a subset of the available lanes. We propose an iteration interleaving--based ...
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

Reduction operations frequently appear in algorithms. Due to their mathematical invariance properties (assuming that round-off errorscan be tolerated), it is reasonable to ignore ordering constraints on the computation of reductions in order to take ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 28, Issue 5

Dec. 1994

323 pages

ISSN:0163-5980

DOI:10.1145/381792

Chairman:
Henry M. Levy
Univ. of Washington, Seattle

Issue’s Table of Contents

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
November 1994
341 pages
ISBN:0897916603
DOI:10.1145/195473
Chairmen:
Forest Baskett
Silicon Graphics
,
Douglas Clark
Princeton Univ.

Copyright © 1994 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1994

Published in SIGOPS Volume 28, Issue 5

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
944
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)8

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nojiri YYamasaki N(2023)A Design of Multithreaded RISC-V Processor for Real-Time System2023 Eleventh International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW60564.2023.00014(31-37)Online publication date: 27-Nov-2023
https://doi.org/10.1109/CANDARW60564.2023.00014
Olukotun KHammond LLaudon JOlukotun KHammond LLaudon J(2022)Improving ThroughputChip Multiprocessor Architecture10.1007/978-3-031-01720-9_2(21-59)Online publication date: 5-Mar-2022
https://doi.org/10.1007/978-3-031-01720-9_2
Craven SLong DSmith J(2010)Open Source Precision Timed Soft Processor for Cyber Physical System ApplicationsProceedings of the 2010 International Conference on Reconfigurable Computing and FPGAs10.1109/ReConFig.2010.72(448-451)Online publication date: 13-Dec-2010
https://dl.acm.org/doi/10.1109/ReConFig.2010.72
Park JYang HPark GKim SWeems C(2010)An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.07.00270:11(1110-1118)Online publication date: 1-Nov-2010
https://dl.acm.org/doi/10.1016/j.jpdc.2010.07.002
Laudon JGolla RGrohoski G(2009)Throughput-Oriented Multicore ProcessorsMulticore Processors and Systems10.1007/978-1-4419-0263-4_7(205-230)Online publication date: 3-Aug-2009
https://doi.org/10.1007/978-1-4419-0263-4_7
Guangzuo Cui Mingzeng Hu Xiaoming Li (1997)Parallel replacement mechanism for multithreadProceedings. Advances in Parallel and Distributed Computing10.1109/APDC.1997.574052(338-344)Online publication date: 1997
https://doi.org/10.1109/APDC.1997.574052
Psaropoulos GOukid ILegler TMay NAilamaki A(2019)Bridging the Latency Gap between NVM and DRAM for Latency-bound OperationsProceedings of the 15th International Workshop on Data Management on New Hardware10.1145/3329785.3329917(1-8)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3329785.3329917
Steptoe MKrüger RGarcia RLiang XMaciejewski R(2018)A Visual Analytics Framework for Exploring Theme Park DynamicsACM Transactions on Interactive Intelligent Systems10.1145/31620768:1(1-27)Online publication date: 20-Feb-2018
https://dl.acm.org/doi/10.1145/3162076
Wang YShi CLi LTong HQu H(2018)Visualizing Research Impact through Citation DataACM Transactions on Interactive Intelligent Systems10.1145/31327448:1(1-24)Online publication date: 9-Mar-2018
https://dl.acm.org/doi/10.1145/3132744
Psaropoulos GLegler TMay NAilamaki A(2018)Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joinsThe VLDB Journal10.1007/s00778-018-0533-6Online publication date: 14-Dec-2018
https://doi.org/10.1007/s00778-018-0533-6
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents