Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

A workload-adaptive and reconfigurable bus architecture for multicore processors

Published: 01 January 2010 Publication History

Abstract

Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improvedmulticore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.

References

[1]
P. Wielage and K. Goossens, "Networks on silicon: blessing or nightmare?" in Proceedings of the Euromicro Symposium on Digital System Design (DSD '02), pp. 423-425, 2002.
[2]
S. V. Adve and K. Gharachorloo, "Shared memory consistency models: a tutorial," Computer, vol. 29, no. 12, pp. 66-76, 1996.
[3]
R. Iris Bahar and S. Manne, "Power and energy reduction via pipeline balancing," in Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA '01), pp. 218-229, 2001.
[4]
E. Duesterwald, C. Cascaval, and S. Dwarkadas, "Characterizing and predicting program behavior and its variability," in Parallel Architectures and Compilation Techniques, pp. 220- 231, 2003.
[5]
L. N. Bhuyan, Q. Yang, and D. P. Agrawal, "Performance of multiprocessor interconnection networks," Computer, vol. 22, no. 2, pp. 25-37, 1989.
[6]
W. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, San Francisco, Calif, USA, 2003.
[7]
S. Brown and J. Rose, "FPGA and CPLD architectures: a tutorial," IEEE Design and Test of Computers, vol. 13, no. 2, pp. 42-57, 1996.
[8]
V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs, Kluwer Academic Publishers, Norwell, Mass, USA, 1999.
[9]
J. Balfour and W. J. Dally, "Design tradeoffs for tiled CMP on-chip networks," in Proceedings of the 20th International Conference on Supercomputing (ICS '06), pp. 187-198, 2006.
[10]
R. Kumar, V. Zyuban, and D. M. Tullsen, "Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling," in Proceedings of the 32nd International Symposium on Computer Architecture (ISCA '05), pp. 408-419, June 2005.
[11]
A.-C. Lai and B. Falsafi, "Memory sharing predictor: the key to a speculative coherent DSM," in Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA '99), pp. 172-183, 1999.
[12]
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt, "The M5 simulator: modeling networked systems," IEEE Micro, vol. 26, no. 4, pp. 52-60, 2006.
[13]
D. L. Detlefs, P. A. Martin, M. Moir, and G. L. Steele Jr., "Lock-free reference counting," in Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC '01), pp. 190-199, 2001.
[14]
P. Stenström, "Survey of cache coherence schemes for multiprocessors," Computer, vol. 23, no. 6, pp. 12-24, 1990.
[15]
J. Rose and S. Brown, "Flexibility of interconnection structures for field-programmable gate arrays," IEEE Journal of Solid-State Circuits, vol. 26, no. 3, pp. 277-282, 1991.
[16]
J. B. Carter, "Design of the Munin distributed shared memory system," Journal of Parallel and Distributed Computing, vol. 29, no. 2, pp. 219-227, 1995.
[17]
P. Sweazey and A. J. Smith, "A class of compatible cache consistency protocols and their support by the IEEE futurebus," in Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA '86), pp. 414-423, 1986.
[18]
B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner, "POWER5 system microarchitecture," IBM Journal of Research and Development, vol. 49, no. 4-5, pp. 505-521, 2005.
[19]
D. Kroft, "Lockup-free instruction fetch/prefetch cache organization," in Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA '81), pp. 81-87, 1981.
[20]
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, "The SPLASH-2 programs: characterization and methodological considerations," in Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), pp. 24-36, 1995.
[21]
T. Sherwood, S. Sair, and B. Calder, "Phase tracking and prediction," SIGARCH Computer Architecture News, vol. 31, no. 2, pp. 336-347, 2003.
[22]
R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures," in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO '00), pp. 245-257, 2000.
[23]
R. Jayaseelan and T. Mitra, "Dynamic thermal management via architectural adaptation," in Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC '09), pp. 484- 489, July 2009.
[24]
S. Somogyi, T. F. Wenisch, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi, "Memory coherence activity prediction in commercial workloads," in Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI '04), pp. 37-45, 2004.
[25]
A. W. Wilson Jr., "Hierarchical cache/bus architecture for shared memory multiprocessors," in Proceedings of the 14th Annual International Symposium on Computer Architecture, pp. 244-252, 1987.
[26]
A. Moshovos, G. Memik, A. Choudhary, and B. Falsafi, "JETTY: filtering snoops for reduced energy consumption in SMP servers," in Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '01), pp. 85-96, October 2001.
[27]
A. Moshovos, "RegionScout: exploiting coarse grain sharing in snoop-based coherence," SIGARCH Compututer Architecture News, vol. 33, no. 2, pp. 234-245, 2005.
[28]
M. M. Kim, J.D. Davis, M. Oskin, and T. Austin, "Polymorphic on-chip networks," in Proceedings of the 35th International Symposium on Computer Architecture (ISCA '08), pp. 101-112, June 2008.
[29]
L. K. John and E. John, "A dynamically reconfigurable interconnect for array processors," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 6, no. 1, pp. 150- 157, 1998.
[30]
N. Eisley, L.-S. Peh, and L. Shang, "In-network cache coherence," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '06), pp. 321-332, December 2006.
[31]
W. J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," in Proceedings of the 38th Annual Design Automation Conference (DAC '01), pp. 684-689, 2001.
[32]
M. R. Marty and M. D. Hill, "Coherence ordering for ringbased chip multiprocessors," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '06), pp. 309-320, December 2006.
[33]
J. F. Cantin, M. H. Lipasti, and J. E. Smith, "Improving multiprocessor performance with coarse-grain coherence tracking," in Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA '05), pp. 246-257, 2005.
[34]
L. Snyder, "Introduction to the configurable, highly parallel computer," Computer, vol. 15, no. 1, pp. 47-56, 1982.
[35]
P. Salverda and C. Zilles, "Fundamental performance constraints in horizontal fusion of in-order cores," in Proceedings of the 14th IEEE International Symposium on High Performance Computer Architecture (HPCA '08), pp. 252-263, February 2008.
[36]
M. A. Watkins, M. J. Cianchetti, and D. H. Albonesi, "Shared reconfigurable architectures for CMPs," in Proceedings of the International Conference on Field Programmable Logic and Applications, pp. 299-304, September 2008.
[37]
W. Heirman, J. Dambre, I. Artundo et al., "Predicting reconfigurable interconnect performance in distributed sharedmemory systems," Integration, the VLSI Journal, vol. 40, no. 4, pp. 382-393, 2007.

Cited By

View all
  • (2012)A hybrid NoC design for cache coherence optimization for chip multiprocessorsProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228511(834-842)Online publication date: 3-Jun-2012

Index Terms

  1. A workload-adaptive and reconfigurable bus architecture for multicore processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image International Journal of Reconfigurable Computing
      International Journal of Reconfigurable Computing  Volume 2010, Issue
      January 2010
      70 pages
      ISSN:1687-7195
      EISSN:1687-7209
      Issue’s Table of Contents

      Publisher

      Hindawi Limited

      London, United Kingdom

      Publication History

      Accepted: 24 May 2010
      Revised: 08 March 2010
      Published: 01 January 2010
      Received: 03 October 2009

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 13 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)A hybrid NoC design for cache coherence optimization for chip multiprocessorsProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228511(834-842)Online publication date: 3-Jun-2012

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media