Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2016604.2016606acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion

Published: 03 May 2011 Publication History

Abstract

In this paper we propose SoftHV, a high-performance HW/SW co-designed in-order processor that performs horizontal and vertical fusion of instructions.
SoftHV consists of a co-designed virtual machine (Cd-VM) which reorders, removes and fuses instructions from frequently executed regions of code. On the hardware front, SoftHV implements HW features for efficient execution of Cd-VM and efficient execution of the fused instructions. In particular, (1) Interlock Collapsing ALU (ICALU) are included to execute pairs of dependent simple arithmetic operations in a single cycle, and (2) Vector Load units (VLDU) are added to execute parallel loads.
The key novelty of SoftHV resides on the efficient usage of HW using a Cd-VM in order to provide high-performance by drastically cutting down processor complexity. Co-designed processor provides efficient mechanisms to exploit ILP and reduce the latency of certain code sequences.
Results presented in this paper show that SoftHV produces average performance improvements of 85% in SPECFP and 52% in SPECINT, and up-to 2.35x, over a conventional four-way in-order processor. For a two-way in-order processor configuration SoftHV obtains improvements in performance of 72% and 47% for SPECFP and SPECINT, respectively. Overall, we show that such a co-designed processor based on an in-order core provides a compelling alternative to out-of-order processors for the low-end domain where high-performance at a low-complexity is a key feature.

References

[1]
E. Altman et al. BOA: The Architecture of a Binary Translation Processor. Technical report, IBM, 1999.
[2]
N. Clark et al. Application-Specific Processing on a General-Purpose Core via Transparent Instruction set customization. In IEEE Intl. Symp. on Microarchitecture, 2004.
[3]
R. Colwell et al. A 0.6μm BiCMOS Processor with Dynamic Execution. In IEEE Intl. Solid-State Circuits Conf., 1995.
[4]
T. Conte et al. Using branch handling hardware to support profile-driven optimization. In IEEE Intl. Symp. on Microarchitecture, 1994.
[5]
K. Cooper et al. An experimental evaluation of list scheduling. Technical report, Dept. of Computer Science, Rice University, 1998.
[6]
J. Dehnert et al. The Transmeta Code Morphing Software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In IEEE Intl. Symp. on Code Generation and Optimization, 2003.
[7]
K. Diefendorff et al. Altivec extension to powerpc accelerates media processing. IEEE Micro, 2000.
[8]
K. Ebcioglu et al. DAISY: Dynamic compilation for 100% architectural compatibility. In IEEE Intl. Symp. on Computer Architecture, 1997.
[9]
G. Hinton et al. The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal, 2001.
[10]
S. Hu et al. An approach for implementing efficient superscalar CISC processors. In IEEE Intl. Symp. on High-Performance Computer Architecture, 2006.
[11]
S. Hu et al. Reducing startup time in co-designed virtual machines. In IEEE Intl. Symp. on Computer Architecture, 2006.
[12]
A. K8. Software Optimization Guide for AMD64 Processors. 2005.
[13]
E. J. Kelly. Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed, 1998.
[14]
A. Klaiber. The technology behind Crusoe Processors. Transmeta Technical Brief, 2000.
[15]
S. Palacharla et al. Complexity-effective superscalar processors. In IEEE Intl. Symp. on Computer Architecture, 1997.
[16]
S. Patel et al. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Transactions on Computers, 2001.
[17]
J. Phillips et al. High performance 3-1 interlock collapsing alus. IEEE Transactions on Computers, 1994.
[18]
M. Poletto et al. Linear scan register allocation. ACM Trans. Program. Lang. Syst., 1999.
[19]
R. Rosner et al. Power awareness through selective dynamically optimized traces. In IEEE Intl. Symp. on Computer Architecture, 2004.
[20]
J. Smith et al. Virtual Machines: A Versatile Platform for Systems and Processes. Elsevier Inc., 2005.
[21]
M. J. Wing et al. Gated store buffer for an advanced microprocessor, 2000.
[22]
Z. Ye et al. CHIMAERA: A High-Performance architecture with a tightly-coupled reconfigurable functional unit. In IEEE Intl. Symp. on Computer Architecture, 2000.
[23]
S. Yehia et al. From sequences of dependent instructions to functions: An approach for improving performance without ILP or speculation. In IEEE Intl. Symp. on Computer Architecture, 2004.
[24]
M. Yourst. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator. In IEEE Intl. Symp. on Performance Analysis of Systems and Software, 2007.

Cited By

View all
  • (2021)Towards Transparent Dynamic Binary Translation from RISC-V to a CGRAArchitecture of Computing Systems10.1007/978-3-030-81682-7_8(118-132)Online publication date: 15-Jul-2021
  • (2016)Software transparent dynamic binary translation for coarse-grain reconfigurable architectures2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446060(138-150)Online publication date: Mar-2016
  • (2015)Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2015.7095806(203-214)Online publication date: Mar-2015
  • Show More Cited By

Index Terms

  1. SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers
      May 2011
      268 pages
      ISBN:9781450306980
      DOI:10.1145/2016604
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 May 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. co-designed virtual machine
      2. micro-op fusion

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CF'11
      Sponsor:
      CF'11: Computing Frontiers Conference
      May 3 - 5, 2011
      Ischia, Italy

      Acceptance Rates

      Overall Acceptance Rate 273 of 785 submissions, 35%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Towards Transparent Dynamic Binary Translation from RISC-V to a CGRAArchitecture of Computing Systems10.1007/978-3-030-81682-7_8(118-132)Online publication date: 15-Jul-2021
      • (2016)Software transparent dynamic binary translation for coarse-grain reconfigurable architectures2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446060(138-150)Online publication date: Mar-2016
      • (2015)Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2015.7095806(203-214)Online publication date: Mar-2015
      • (2014)A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving TheoriesACM Transactions on Programming Languages and Systems10.1145/265899337:1(1-30)Online publication date: 17-Nov-2014
      • (2013)Optimization and Mathematical Modeling in Computer ArchitectureSynthesis Lectures on Computer Architecture10.2200/S00531ED1V01Y201308CAC0268:4(1-144)Online publication date: 30-Sep-2013
      • (2013)A general constraint-centric scheduling framework for spatial architecturesACM SIGPLAN Notices10.1145/2499370.246216348:6(495-506)Online publication date: 16-Jun-2013
      • (2013)Discerning the dominant out-of-order performance advantageACM SIGPLAN Notices10.1145/2499368.245114348:4(241-252)Online publication date: 16-Mar-2013
      • (2013)A general constraint-centric scheduling framework for spatial architecturesProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462163(495-506)Online publication date: 16-Jun-2013
      • (2013)Discerning the dominant out-of-order performance advantageACM SIGARCH Computer Architecture News10.1145/2490301.245114341:1(241-252)Online publication date: 16-Mar-2013
      • (2013)Discerning the dominant out-of-order performance advantageProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451143(241-252)Online publication date: 16-Mar-2013
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media