Article

Core fusion: accommodating software diversity in chip multiprocessors

Authors:

Jose F. MartinezAuthors Info & Claims

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Pages 186 - 197

https://doi.org/10.1145/1250662.1250686

Published: 09 June 2007 Publication History

Abstract

This paper presents core fusion, a reconfigurable chip multiprocessor(CMP) architecture where groups of fundamentally independent cores can dynamically morph into a larger CPU, or they can be used as distinct processing elements, as needed at run time by applications. Core fusion gracefully accommodates software diversity and incremental parallelization in CMPs. It provides a single execution model across all configurations, requires no additional programming effort or specialized compiler support, maintains ISA compatibility, and leverages mature micro-architecture technology.

References

[1]

S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, pages 403--410, 1990.

[2]

V. Aslot and R. Eigenmann. Quantitative performance analysis of the SPEC OMPM2001 benchmarks. Scientific Programming, 11(2):105--124, 2003.

Digital Library

[3]

S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Intl. Symp. on Computer Architecture, pages 506--517, Madison, Wisconsin, June 2005.

Digital Library

[4]

R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically managing the communication-parallelism trade-off in future clustered processors. In Intl. Symp. on Computer Architecture, pages 275--287, San Diego, CA, June 2003.

Digital Library

[5]

A. Baniasadi and A. Moshovos. Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors. In Intl. Symp. on Microarchitecture, pages 337--347, Monterey, CA, December 2000.

Digital Library

[6]

M. Bekerman, S. Jourdan, R. Ronen, G. Kirshenboim, L. Rappoport, A. Yoaz, and U. Weiser. Correlated load-address predictors. In Intl. Symp. on Computer Architecture, pages 54--63, Atlanta, GA, May 1999.

Digital Library

[7]

R. Bhargava and L. K. John. Improving dynamic cluster assignment for clustered trace cache processors. In Intl. Symp. on Computer Architecture, pages 264--274, San Diego, CA, June 2003.

Digital Library

[8]

J. Burns and J.-L. Gaudiot. Area and system clock effects on SMT/CMP processors. In Intl. Conf. on Parallel Architectures and Compilation Techniques, page 211, Barcelona, Spain, September 2001.

Digital Library

[9]

B. Calder and G. Reinman. A comparative survey of load speculation architectures. Journal of Instruction-Level Parallelism, 2, May 2000.

[10]

R. Canal, J.-M. Parcerisa, and A. González. A cost-effective clustered architecture. In Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 160--168, Newport Beach, CA, October 1999.

Digital Library

[11]

R. Canal, J.-M. Parcerisa, and A. González. Dynamic cluster assignment mechanisms. In Intl. Symp. on High-Performance Computer Architecture, pages 132--142, Toulouse, France, January 2000.

[12]

R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon. Parallel Programming in OpenMP. Morgan Kaufmann, San Francisco, CA, 2001.

Digital Library

[13]

P. Chaparro, G. Magklis, J. González, and A. González. Distributing the frontend for temperature reduction. In Intl. Symp. on High-Performance Computer Architecture, pages 61--70, San Francisco, CA, February 2005.

Digital Library

[14]

G. Chrysos and J. Emer. Memory dependence prediction using store sets. In Intl. Symp. on Computer Architecture, pages 142--153, Barcelona, Spain, June-July 1998.

Digital Library

[15]

J. D. Collins and D. M. Tullsen. Clustered multithreaded architectures-pursuing both ipc and cycle time. In Intl. Parallel and Distributed Processing Symp., Santa Fe, New Mexico, April 2004.

[16]

A. E.-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. Partitioning multi-threaded processors with a large number of threads. In Intl. Symp. on Performance Analysis of Systems and Software, pages 112--123, Austin, TX, March 2005.

Digital Library

[17]

P. Bai et al. A 65nm logic technology featuring 35nm gate length, enhanced channel strain, 8 cu interconnect layers, low-k ild and 0.57m2 sram cell. In IEEE Intl. Electron Devices Meeting, Washington, DC, December 2005.

[18]

K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. The Multicluster architecture: Reducing cycle time through partitioning. In Intl. Symp. on Microarchitecture, pages 149--159, Research Triangle Park, NC, December 1997.

Digital Library

[19]

J. González, F. Latorre, and A. González. Cache organizations for clustered microarchitectures. In Workshop on Memory Performance Issues, pages 46--55, Munich, Germany, June 2004.

Digital Library

[20]

J. L. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, 33(7):28--35, July 2000.

Digital Library

[21]

R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 9(2):24--36, March 1999.

Digital Library

[22]

A. KleinOsowski and D. Lilja. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters, 1, June 2002.

Digital Library

[23]

R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Intl. Symp. on Microarchitecture, pages 81--92, San Diego, CA, December 2003.

Digital Library

[24]

R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Intl. Symp. on Computer Architecture, pages 64--75, München, Germany, June 2004.

Digital Library

[25]

R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Intl. Symp. on Computer Architecture, pages 408--419, Madison, Wisconsin, June 2005.

Digital Library

[26]

F. Latorre, J. González, and A. González. Back-end assignment schemes for clustered multithreaded processors. In Intl. Conf. on Supercomputing, pages 316--325, Malo, France, June-July 2004.

Digital Library

[27]

R. Lawrence, G. Almasi, and H. Rushmeier. A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Technical report, IBM, January 1998.

[28]

K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart Memories: a modular reconfigurable architecture. In Intl. Symp. on Computer Architecture, pages 161--171, Vancouver, Canada, June 2000.

Digital Library

[29]

J. F. Martínez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Intl. Symp. on Microarchitecture, Istanbul, Turkey, November 2002.

Digital Library

[30]

M. Moudgill, K. Pingali, and S. Vassiliadis. Register renaming and dynamic speculation: An alternative approach. In Intl. Symp. on Microarchitecture, pages 202--213, Austin, TX, December 1993.

Digital Library

[31]

K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 2--11, Cambridge, MA, October 1996.

Digital Library

[32]

S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Intl. Symp. on Computer Architecture, pages 206--218, Denver, CO, June 1997.

Digital Library

[33]

J.-M. Parcesira. Design of Clustered Superscalar Microarchitectures. Ph.D. dissertation, Univ. Polit`ecnica de Catalunya, April 2004.

[34]

J. Pisharath, Y. Liu, W.-K. Liao, A. Choudhary, G. Memik, and J. Parhi. NU-MineBench 2.0. Technical Report CUCIS-2005-08-01, Center for Ultra-Scale Computing and Information Security, Northwestern University, August 2005.

[35]

J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. http://sesc.sourceforge.net.

[36]

E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. E. Smith. Trace processors. In Intl. Symp. on Microarchitecture, pages 138--148, Research Triangle Park, NC, December 1997.

Digital Library

[37]

K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Intl. Symp. on Computer Architecture, pages 422--433, San Diego, CA, June 2003.

Digital Library

[38]

G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Intl. Symp. on Computer Architecture, pages 414--425, Santa Margherita Ligure, Italy, June 1995.

Digital Library

[39]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Intl. Symp. on Computer Architecture, pages 24--36, Santa Margherita Ligure, Italy, June 1995.

Digital Library

[40]

H. Zhong, S. A. Lieberman, and S. A. Mahlke. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Intl. Symp. on High-Performance Computer Architecture, Phoenix, Arizona, February 2007.

Digital Library

[41]

V. V. Zyuban and P. M. Kogge. Inherently lower-power high-performance superscalar architectures. IEEE Transactions on Computers, 50(3):268--285, March 2001.

Digital Library

Cited By

Abyaneh ALiao MZahedi S(2022)Malcolm: Multi-agent Learning for Cooperative Load Management at Rack ScaleProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706116:3(1-25)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570611
Akhunov KYildirim K(2022)AdaMICAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35503046:3(1-30)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3550304
Ta TAl-Hawaj KCebry NOu YHall EGolden CBatten C(2022)big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00025(181-198)Online publication date: Oct-2022
https://doi.org/10.1109/MICRO56248.2022.00025
Show More Cited By

Index Terms

Core fusion: accommodating software diversity in chip multiprocessors
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
      2. Self-organizing autonomic computing
    2. Parallel architectures

Recommendations

Core fusion: accommodating software diversity in chip multiprocessors

This paper presents core fusion, a reconfigurable chip multiprocessor(CMP) architecture where groups of fundamentally independent cores can dynamically morph into a larger CPU, or they can be used as distinct processing elements, as needed at run time ...
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

Though the prime target of multicore architectures is parallel and multithreaded workloads (which favors maximum core count), executing sequential code fast continues to remain critical (which benefits from maximum core size). This poses a difficult ...
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management

To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

June 2007

542 pages

ISBN:9781595937063

DOI:10.1145/1250662

General Chair:
Dean Tullsen
University of California, San Diego
,
Program Chair:
Brad Calder
Microsoft & University of California, San Diego

ACM SIGARCH Computer Architecture News Volume 35, Issue 2
May 2007
527 pages
ISSN:0163-5964
DOI:10.1145/1273440
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA07

Sponsor:

SIGARCH
IEEE-CS

SPAA07: 19th ACM Symposium on Parallelism in Algorithms and Architectures

June 9 - 13, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

269
Total Citations
View Citations
2,542
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abyaneh ALiao MZahedi S(2022)Malcolm: Multi-agent Learning for Cooperative Load Management at Rack ScaleProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706116:3(1-25)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570611
Akhunov KYildirim K(2022)AdaMICAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35503046:3(1-30)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3550304
Ta TAl-Hawaj KCebry NOu YHall EGolden CBatten C(2022)big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00025(181-198)Online publication date: Oct-2022
https://doi.org/10.1109/MICRO56248.2022.00025
Bedoukian PAdit NPeguero ESampson A(2021)Software-Defined Vector Processing on Manycore FabricsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480099(392-406)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480099
Wang DKim NSherwood TBerger EKozyrakis C(2021)DiAG: a dataflow-inspired architecture for general-purpose processorsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446703(93-106)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446703
Oliveira GGomez-Luna JOrosa LGhose SVijaykumar NFernandez ISadrosadati MMutlu O(2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110993
Souza JManivannan MPericàs MBeck ALi Z(2020)Enhancing thread-level parallelism in asymmetric multicores using transparent instruction offloadingProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437772(1-6)Online publication date: 20-Jul-2020
https://dl.acm.org/doi/10.5555/3437539.3437772
Kulkarni NGonzalez-Pumariega GKhurana AShoemaker CDelimitrou CAlbonesi D(2020)CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00060(650-664)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00060
Goossens JPoczekajlo XPaolillo ARodriguez PErmont JSong YGill C(2019)ACCEPTORProceedings of the 27th International Conference on Real-Time Networks and Systems10.1145/3356401.3356420(209-219)Online publication date: 6-Nov-2019
https://dl.acm.org/doi/10.1145/3356401.3356420
Rokicki SRohou EDerrien S(2019)Hybrid-DBT: Hardware/Software Dynamic Binary Translation Targeting VLIWIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.286428838:10(1872-1885)Online publication date: Oct-2019
https://doi.org/10.1109/TCAD.2018.2864288
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents