Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/139669.139728acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Processor coupling: integrating compile time and runtime scheduling for parallelism

Published: 01 April 1992 Publication History

Abstract

The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling, a mechanism for controlling multiple ALUs to exploit both instruction-level and inter-thread parallelism, by using compile time and runtime scheduling. The compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle by cycle basis, and several threads can be active concurrently. We provide simulation results demonstrating that, on four simple numerical benchmarks, processor coupling achieves better performance than purely statically scheduled or multi-processor machine organizations. We examine how performance is affected by restricted communication between ALUs and by long memory latencies. We also present an implementation and feasibility study of a processor coupled node.

References

[1]
ALVERSON, R., CALLAHAN, D., CUMMI/qGS, D., KOBLENZ, B., PORTERFIELD, A., AND SMITH, 1}. The Tera computer system. In Proceedings of the International Conference on Supercomputing (June 1990), pp. 1-6.
[2]
ARVIND, AND CULLER, D. E. Dataflow architectures. AnnualReviews in Computer Science 1 (February 1986), 225-53.
[3]
COLWELL, R. P., HALL, W. E., JOSHI, C. S., PAPWORTH, D. B., RODMAN, P. K., AND TORNES, J. E. Architecture and implementation of a VLIW supercomputer. In Proceedings of Supercomputing '90 (November 1990), IEEE Computer Society Press, pp. 910-919.
[4]
COLWELL, R. P., NIX, R. P., O'DONNELL, J. J., PAPWORTH, D. B., AND RODMAN, P. K. A VLIW architecture for a trace scheduling compiler. IEEE Transactions on Computers 37, 8 (August 1988), 967-979.
[5]
ELLIS, J. R. Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, MA, 1986.
[6]
FISHER, J. A., AND RAU, B. R. Instruction-level parallel processing. Science 253 (September 1991), 1233-1241.
[7]
GUPTA, A., AND WEBER, W.-D. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results. In Proceedings of the 16th Annual Symposium on Computer Architecture (May 1989), IEEE, pp. 273-280.
[8]
HAI.STEAD, R. H., AND FUJITA, T. MASA: A mtdtithreadedprocessor architecture for parallel symbolic computing. In Proceedings of the 15th Annual Symposium on Computer Architecture (1988), IEEE, pp. 443--451.
[9]
IANUCCI, R. A. Toward a dataflow/Von Neumann hybrid architecture. In Proceedings of the 15th Annual Symposium on Computer Architecture (1988), IEEE, pp. 131-140.
[10]
JOHNSON, W. M. Superscalar Microprocessor Design. Prentice Hall, Englewood Cliffs, N J, 199I.
[11]
JouPPI, N. e., AND WALL, D. W. Available instruction level parallelism for superscalar and superpipelined machines. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (April 1989), ACM Press, pp. 272-282.
[12]
KECKLER, S. W. A coupled multi-ALU processing node for a highly parallel computer. Tech. Rep. 1355, Massachusetts Institute of Technology Artificial Intelligence Laboratory, Cambridge, MA 02139, 1992.
[13]
KUErtN, J. T., AND SMITH, B. J. The Horizon supercomputing system: Architecture and software. In Proceedings of Supercomputing '88 (Orlando, Florida, November 1988), pp. 28-34.
[14]
LAM, M. Software pipelining: An effective scheduling technique for VLIW machines. In ACM Sigplan '88 Conference on Programming Language Design and Implementation (1988), pp. 318-328,
[15]
NUTH, P. Px., AND DALLY, W.J. A mechanism for efficient context switching. In Proceedings of the International Conference on Computer Design (October 1991), IEEE, pp. 301-304.
[16]
SADAYAPPAN, P., AND VlSVANATHAN, V. Circuit simulation on shared memory multiprocessors. IEEE Transactions on Computers 37, 12 (December 1988), 1634-1642.
[17]
SMrrH, B. J. Architecture and applications of the HEP multiprocessor computer system. SPIE 298 (I981), 241-248.
[18]
TOMASULO, R. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal 11 (January 1967), 25-33.
[19]
WOLFE, A., AND SHEN, J, P. A variable instruction stream extension to the VLIW archttecture. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (April 1991), ACM Press, pp. 2-14.

Cited By

View all
  • (2020)Design of Efficient Dynamic Scheduling of RISC Processor Instructions2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)10.1109/IEMCON51383.2020.9284902(0236-0240)Online publication date: 4-Nov-2020
  • (2011)A multithreaded multicore system for embedded media processingTransactions on high-performance embedded architectures and compilers III10.5555/1980776.1980787(154-173)Online publication date: 1-Jan-2011
  • (2011)A Multithreaded Multicore System for Embedded Media ProcessingProceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 659010.1007/978-3-642-19448-1_9(154-173)Online publication date: 1-Jan-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
May 1992
439 pages
ISBN:0897915097
DOI:10.1145/139669
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 20, Issue 2
    Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)
    May 1992
    429 pages
    ISSN:0163-5964
    DOI:10.1145/146628
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1992

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA92
Sponsor:
ISCA92: International Conference on Computer Architecture
May 19 - 21, 1992
Queensland, Australia

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)14
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Design of Efficient Dynamic Scheduling of RISC Processor Instructions2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)10.1109/IEMCON51383.2020.9284902(0236-0240)Online publication date: 4-Nov-2020
  • (2011)A multithreaded multicore system for embedded media processingTransactions on high-performance embedded architectures and compilers III10.5555/1980776.1980787(154-173)Online publication date: 1-Jan-2011
  • (2011)A Multithreaded Multicore System for Embedded Media ProcessingProceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 659010.1007/978-3-642-19448-1_9(154-173)Online publication date: 1-Jan-2011
  • (2010)Recent Trends in Superscalar Architecture to Exploit More Instruction Level ParallelismInformation and Communication Technologies10.1007/978-3-642-15766-0_116(660-665)Online publication date: 2010
  • (2009)Simultaneous Multithreading VLIW DSP Architecture with Dynamic Dispatch MechanismProceedings of the 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools10.1109/DSD.2009.128(505-512)Online publication date: 27-Aug-2009
  • (2005)High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture ParadigmIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2005.15016:12(1132-1142)Online publication date: 1-Dec-2005
  • (2005)Reducing Conflicts in SMT VLIW Processor for Higher ThroughputProceedings of the Second International Conference on Embedded Software and Systems10.1109/ICESS.2005.80(5-11)Online publication date: 16-Dec-2005
  • (2005)Microarchitectural Wire Management for Performance and Power in Partitioned ArchitecturesProceedings of the 11th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2005.21(28-39)Online publication date: 12-Feb-2005
  • (2005)StarT-NG: Delivering seamless parallel computingEURO-PAR '95 Parallel Processing10.1007/BFb0020458(101-116)Online publication date: 9-Jun-2005
  • (2005)A realistic study on multithreaded superscalar processor designEuro-Par'97 Parallel Processing10.1007/BFb0002858(1092-1101)Online publication date: 26-Sep-2005
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media