article

Free access

A workload-adaptive and reconfigurable bus architecture for multicore processors

Authors:

Alexandros Papakonstantinou,

Deming ChenAuthors Info & Claims

International Journal of Reconfigurable Computing, Volume 2010

Article No.: 2, Pages 1 - 22

https://doi.org/10.1155/2010/205852

Published: 01 January 2010 Publication History

PDF eReader Publisher Site

Abstract

Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improvedmulticore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.

References

[1]

P. Wielage and K. Goossens, "Networks on silicon: blessing or nightmare?" in Proceedings of the Euromicro Symposium on Digital System Design (DSD '02), pp. 423-425, 2002.

Digital Library

[2]

S. V. Adve and K. Gharachorloo, "Shared memory consistency models: a tutorial," Computer, vol. 29, no. 12, pp. 66-76, 1996.

Digital Library

[3]

R. Iris Bahar and S. Manne, "Power and energy reduction via pipeline balancing," in Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA '01), pp. 218-229, 2001.

Digital Library

[4]

E. Duesterwald, C. Cascaval, and S. Dwarkadas, "Characterizing and predicting program behavior and its variability," in Parallel Architectures and Compilation Techniques, pp. 220- 231, 2003.

Digital Library

[5]

L. N. Bhuyan, Q. Yang, and D. P. Agrawal, "Performance of multiprocessor interconnection networks," Computer, vol. 22, no. 2, pp. 25-37, 1989.

Digital Library

[6]

W. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, San Francisco, Calif, USA, 2003.

Digital Library

[7]

S. Brown and J. Rose, "FPGA and CPLD architectures: a tutorial," IEEE Design and Test of Computers, vol. 13, no. 2, pp. 42-57, 1996.

Digital Library

[8]

V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs, Kluwer Academic Publishers, Norwell, Mass, USA, 1999.

Digital Library

[9]

J. Balfour and W. J. Dally, "Design tradeoffs for tiled CMP on-chip networks," in Proceedings of the 20th International Conference on Supercomputing (ICS '06), pp. 187-198, 2006.

Digital Library

[10]

R. Kumar, V. Zyuban, and D. M. Tullsen, "Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling," in Proceedings of the 32nd International Symposium on Computer Architecture (ISCA '05), pp. 408-419, June 2005.

Digital Library

[11]

A.-C. Lai and B. Falsafi, "Memory sharing predictor: the key to a speculative coherent DSM," in Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA '99), pp. 172-183, 1999.

Digital Library

[12]

N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt, "The M5 simulator: modeling networked systems," IEEE Micro, vol. 26, no. 4, pp. 52-60, 2006.

Digital Library

[13]

D. L. Detlefs, P. A. Martin, M. Moir, and G. L. Steele Jr., "Lock-free reference counting," in Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC '01), pp. 190-199, 2001.

Digital Library

[14]

P. Stenström, "Survey of cache coherence schemes for multiprocessors," Computer, vol. 23, no. 6, pp. 12-24, 1990.

Digital Library

[15]

J. Rose and S. Brown, "Flexibility of interconnection structures for field-programmable gate arrays," IEEE Journal of Solid-State Circuits, vol. 26, no. 3, pp. 277-282, 1991.

[16]

J. B. Carter, "Design of the Munin distributed shared memory system," Journal of Parallel and Distributed Computing, vol. 29, no. 2, pp. 219-227, 1995.

Digital Library

[17]

P. Sweazey and A. J. Smith, "A class of compatible cache consistency protocols and their support by the IEEE futurebus," in Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA '86), pp. 414-423, 1986.

Digital Library

[18]

B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner, "POWER5 system microarchitecture," IBM Journal of Research and Development, vol. 49, no. 4-5, pp. 505-521, 2005.

Digital Library

[19]

D. Kroft, "Lockup-free instruction fetch/prefetch cache organization," in Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA '81), pp. 81-87, 1981.

[20]

S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, "The SPLASH-2 programs: characterization and methodological considerations," in Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), pp. 24-36, 1995.

Digital Library

[21]

T. Sherwood, S. Sair, and B. Calder, "Phase tracking and prediction," SIGARCH Computer Architecture News, vol. 31, no. 2, pp. 336-347, 2003.

Digital Library

[22]

R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures," in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO '00), pp. 245-257, 2000.

Digital Library

[23]

R. Jayaseelan and T. Mitra, "Dynamic thermal management via architectural adaptation," in Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC '09), pp. 484- 489, July 2009.

Digital Library

[24]

S. Somogyi, T. F. Wenisch, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi, "Memory coherence activity prediction in commercial workloads," in Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI '04), pp. 37-45, 2004.

Digital Library

[25]

A. W. Wilson Jr., "Hierarchical cache/bus architecture for shared memory multiprocessors," in Proceedings of the 14th Annual International Symposium on Computer Architecture, pp. 244-252, 1987.

Digital Library

[26]

A. Moshovos, G. Memik, A. Choudhary, and B. Falsafi, "JETTY: filtering snoops for reduced energy consumption in SMP servers," in Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '01), pp. 85-96, October 2001.

Digital Library

[27]

A. Moshovos, "RegionScout: exploiting coarse grain sharing in snoop-based coherence," SIGARCH Compututer Architecture News, vol. 33, no. 2, pp. 234-245, 2005.

Digital Library

[28]

M. M. Kim, J.D. Davis, M. Oskin, and T. Austin, "Polymorphic on-chip networks," in Proceedings of the 35th International Symposium on Computer Architecture (ISCA '08), pp. 101-112, June 2008.

Digital Library

[29]

L. K. John and E. John, "A dynamically reconfigurable interconnect for array processors," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 6, no. 1, pp. 150- 157, 1998.

Digital Library

[30]

N. Eisley, L.-S. Peh, and L. Shang, "In-network cache coherence," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '06), pp. 321-332, December 2006.

Digital Library

[31]

W. J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," in Proceedings of the 38th Annual Design Automation Conference (DAC '01), pp. 684-689, 2001.

Digital Library

[32]

M. R. Marty and M. D. Hill, "Coherence ordering for ringbased chip multiprocessors," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '06), pp. 309-320, December 2006.

Digital Library

[33]

J. F. Cantin, M. H. Lipasti, and J. E. Smith, "Improving multiprocessor performance with coarse-grain coherence tracking," in Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA '05), pp. 246-257, 2005.

Digital Library

[34]

L. Snyder, "Introduction to the configurable, highly parallel computer," Computer, vol. 15, no. 1, pp. 47-56, 1982.

Digital Library

[35]

P. Salverda and C. Zilles, "Fundamental performance constraints in horizontal fusion of in-order cores," in Proceedings of the 14th IEEE International Symposium on High Performance Computer Architecture (HPCA '08), pp. 252-263, February 2008.

[36]

M. A. Watkins, M. J. Cianchetti, and D. H. Albonesi, "Shared reconfigurable architectures for CMPs," in Proceedings of the International Conference on Field Programmable Logic and Applications, pp. 299-304, September 2008.

[37]

W. Heirman, J. Dambre, I. Artundo et al., "Predicting reconfigurable interconnect performance in distributed sharedmemory systems," Integration, the VLSI Journal, vol. 40, no. 4, pp. 382-393, 2007.

Digital Library

Cited By

Zhao HJang ODing WZhang YKandemir MIrwin MGroeneveld PSciuto DHassoun S(2012)A hybrid NoC design for cache coherence optimization for chip multiprocessorsProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228511(834-842)Online publication date: 3-Jun-2012
https://dl.acm.org/doi/10.1145/2228360.2228511

Index Terms

A workload-adaptive and reconfigurable bus architecture for multicore processors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
2. Hardware
  1. Integrated circuits

Recommendations

SAMBA-bus: A high performance bus architecture for system-on-chips

A high performance communication architecture, SAMBA-bus, is proposed in this paper. In SAMBA-bus architecture, multiple compatible bus transactions can be performed simultaneously with only a single bus access grant from the bus arbiter. Experimental ...
Off-chip bandwidth for multicore processors: managing the next big wall
A dynamically reconfigurable communication architecture for multicore embedded systems

To deal with the communication bottleneck of multiprocessor systems, several communication architectures have been proposed in the last decade. Yet, none of them has demonstrated the performance of the direct connections between two communicating units. ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Reconfigurable Computing

International Journal of Reconfigurable Computing Volume 2010, Issue

January 2010

70 pages

ISSN:1687-7195

EISSN:1687-7209

Issue’s Table of Contents

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Accepted: 24 May 2010

Revised: 08 March 2010

Published: 01 January 2010

Received: 03 October 2009

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
103
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)8

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao HJang ODing WZhang YKandemir MIrwin MGroeneveld PSciuto DHassoun S(2012)A hybrid NoC design for cache coherence optimization for chip multiprocessorsProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228511(834-842)Online publication date: 3-Jun-2012
https://dl.acm.org/doi/10.1145/2228360.2228511

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents