research-article

Application-Specific Cross-Layer Optimization Based on Predictive Variable-Latency VLSI Design

Authors:

Andrew B. Kahng,

Lu WangAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 12, Issue 3

Article No.: 21, Pages 1 - 19

https://doi.org/10.1145/2746341

Published: 21 September 2015 Publication History

Abstract

Traditional synchronous VLSI design requires that all computations in a logic stage complete in one clock cycle. This leads to increasingly pessimistic design as technology scaling introduces increasingly significant parametric variations that result in an increasing performance variability. Alternatively, by allowing computations in a logic stage to complete in a variable number of clock cycles, variable-latency design provides relaxed timing constraints for average performance, area, and power consumption optimization. In this article, we present improved variable-latency design techniques including: (1) a generic minimum-intrusion variable-latency VLSI design paradigm, (2) a signal probability-based approximate prediction logic construction method for minimum misprediction rate at minimum cost, and (3) an application-specific cross-layer analysis methodology. Our experiments show that the proposed variable-latency design methodology on average reduces the computation latency by 26.80%(14.65%) at cost of 0.08%(3.4%) area and 0.4%(2.2%) energy consumption increase for the interger (floating point) unit of an open-source SPARC V8 processor LEON2 synthesized with a clock-cycle time between 1.97ns(3.49ns) and 5.96ns(13.74ns) based on the 45nm Nangate open cell library, while an automotive application-specific design further achieves an average latency reduction of 41.8%.

References

[1]

M. Alam. 2008. Reliability- and process-variation aware design of integrated circuits. Microelectron. Reliab. 48, 1114--1122.

[2]

T. Austin, V. Bertacco, D. Blaauw, and T. Mudge. 2005. Opportunities and challenges for better than worst-case design. In Proceedings of the Asian and South Pacific Design Automation Conference (ASP-DAC'05). 2--7.

Digital Library

[3]

T. Austin, D. Blaauw, T. Mudge, and K. Flautner. 2004. Making typical silicon matter with razor. IEEE Comput. 37, 3, 57--65.

Digital Library

[4]

L. Benini, E. Macii, M. Poncino, and G. D. Micheli. 1998. Telescopic units: A new paradigm for performance optimization of VLSI designs. IEEE Trans. Comput.-Aided Des. 17, 3, 220--232.

Digital Library

[5]

L. Benini and G. D. Micheli. 2004. Networks on chips: A new paradigm for component-based MPSoC design. http://si2.epfl.ch/&sim;demichel/publications/archive/2004/mpsoc.pdf.

[6]

L. Benini, G. D. Micheli, A. Lioy, E. Macii, G. Odasso, and M. Poncino. 1999. Automatic synthesis of large telescopic units based on near-minimum timed supersetting. IEEE Trans. Comput. 48, 8, 769--779.

Digital Library

[7]

C. L. Berman, D. J. Hathaway, A. S. Lapaugh, and L. Trevillyan. 1990. Efficient techniques for timing corretion. In Proceedings of the IEEE International Symposium Circuits and Systems (ISCA'90). 415--419.

[8]

D. Blaauw, S. Kalaiselvan, K. Lai, W.-H. Ma, S. Pant, C. Tokunaga, S. Das, and D. Bull. 2008. RAZOR-II: In-situ error detection and correction for PVT and SER tolerance. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'08). 400--401.

[9]

S. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 11, 10--15.

Digital Library

[10]

B. Bose and T. R. N. Rao. 1982. Theory of unidirectional error correcting/detecting codes. IEEE Trans. Comput. C-31, 6, 521--530.

Digital Library

[11]

K. A. Bowman, J. W. Tschanz, N. S. Kim, J. Lee, C. B. Wilkerson, S. L. Lu, T. Karnik, and V. K. De. 2008. Energy-efficient and metastability-immune timing-error detection and instruction-replay-based recovery circuits for dynamic-variation tolerance. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'08). 402--623.

[12]

K. A. Bowman, J. W. Tschanz, S.-L. L. Lu, P. A. Aseron, M. M. Khellah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson, T. Karnik, and V. K. De. 2011. A 45nm resilient microprocessor core for dynamic variation tolerance. IEEE J. Solid State Circ. 46, 1, 194--208.

[13]

S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. 2006. A self-tuning dvs processor using delay-error detection and correction. IEEE J. Solid State Circ. 41, 4, 792--804.

[14]

S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. T. Blaauw. 2009. Razorii: In situ error detection and correction for PVT and SER tolerance. IEEE J. Solid State Circ. 44, 1, 32--48.

[15]

D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim, and K. Flautner. 2004. RAZOR: Circuit-level correction of timing errors for low-power operation. IEEE Micro 24, 6, 10--20.

Digital Library

[16]

D. Ernst, N. S. Kim, S. Das, S. Pant, T. Pham, R. Rao, C. Ziesler, D. Blaauw, T. Austin, and T. Mudge. 2003. RAZOR: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36^th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). 7.

Digital Library

[17]

M. Fojtik, D. Fick, Y. Kim, N. Pincknet, D. Harris, D. Blaauw, and D. Sylvester. 2012. Bubble razor: An architecture-independent approach to timing-error detection and correction. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'12). 488--490.

[18]

S. B. Furber and P. Day. 1996. Four-phase micropipeline latch control circuits. IEEE Trans. VLSI Syst. 4, 2, 247--253.

Digital Library

[19]

A. Gaisler. 2015. LEON SPARC V8 processors. http://www.gaisler.com/.

[20]

S. Ghosh, S. Bhunia, and K. Roy. 2006. A new paradigm for low-power, variation-tolerant circuit synthesis using critical path isolation. In Proceedings of the IEEE International Conference Computer-Aided Design (ICCAD'06). 619--624.

Digital Library

[21]

S. Ghosh, S. Bhunia, and K. Roy. 2007. Crista: A new paradigm for low-power, variation-tolerant, and adaptive circuit synthesis using critical path isolation. IEEE Trans. Comput.-Aided Des. 26, 11, 1947--1956.

Digital Library

[22]

S. Ghosh and K. Roy. 2010. Parameter variation tolerance and error resiliency: New design paradigm for the nanoscale era. Proc. IEEE 98, 10, 1718--1751.

[23]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14.

Digital Library

[24]

S. Hauck. 1995. Asynchronous design methodologies: An overview. Proc. IEEE 83, 1, 69--93.

[25]

N. K. Jha and S. J. Wang. 1993. Design and synthesis of self-checking VLSI circuits. IEEE Trans. Comput.-Aided Des. 12, 878--887.

Digital Library

[26]

D. R. Kelly and B. J. Phillips. 2005. Arithmetic data value speculation. In Proceedings of the 10^th Asia-Pacific Conference on Advances on Computer Systems Architecture (ACSAC'05). 353--366.

Digital Library

[27]

Y. Kondo, N. Ikumi, K. Ueno, J. Mori, and M. Hirano. 1997. An early-completion-detecting alu for a 1GHz 64B datapath. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'97). 418--497.

[28]

B. Liu. 2008. Signal probability based statistical timing analysis. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'08). 562--567.

Digital Library

[29]

B. Liu, X. Chen, and F. Teshome. 2012. Resilient and adaptive performance logic. ACM J. Emerg. Technol. Comput. Syst. 8, 3.

Digital Library

[30]

S.-L. Lu. 2004. Speeding up processing with approximation circuits. Comput. 37, 3, 67--73.

Digital Library

[31]

G. Mago. 1973. Monotone functions in sequential circuits. IEEE Trans. Comput. 22, 10, 928--933.

Digital Library

[32]

P. C. Mcgeer, R. K. Brayton, A. L. Sangiovanni-Vincentelli, and S. K. Sahni. 1991. Performance enhancement through the generalized bypass transform. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'91). 184--187.

[33]

D. E. Muller and W. S. Bartky. 1959. A theory of asynchronous circuits. In Proceedings of the International Symposium on the Theory of Switching. 204--243.

[34]

F. N. Najm. 1993. Transition density: A new measure of activity in digital circuits. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 12, 2, 310--323.

Digital Library

[35]

D. B. Neres, J. Cortadella, and M. Kishinevsky. 2009. Variable-latency design by function speculation. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 1704--1709.

Digital Library

[36]

M. Olivieri 2001. Design of synchronous and asynchronous variable-latency pipelined multipliers. IEEE Trans. VLSI Syst. 9, 2, 365--376.

Digital Library

[37]

Silicon Integration Initiative (SI2). 2015. Nangate open cell library. http://www.si2.org/openeda.si2.org/projects/nangatelib/.

[38]

M. Singh and S. M. Nowick. 2007. MOUSETRAP: Ultra-high-speed transition signaling asynchronous pipelines. IEEE Trans. VLSI Syst. 15, 6, 684--698.

Digital Library

[39]

J. Sparso and S. Furber. 2001. Principles of Asynchronous Circuit Design -- A Systems Perspective. Kluwer Academic.

Digital Library

[40]

Y.-S. Su, D.-C. Wang, S.-C. Chang, and M. Marek-Sadowska. 2011. Performance optimization using variable-latency design style. IEEE Trans. VLSI Syst. 19, 10, 1874--1883.

Digital Library

[41]

I. E. Sutherland. 1989. Micropipelines. Comm. ACM 32, 6, 720--738.

Digital Library

[42]

T. Verhoeff. 1988. Delay-insensitive codes -- An overview. Distrib. Comput. 3, 1--8.

Digital Library

[43]

N. H. E. West and D. M. Harris. 2011. CMOS VLSI Design: A Circuits and Systems Perspective, 4^th ed. Addison-Wesley.

Digital Library

Index Terms

Application-Specific Cross-Layer Optimization Based on Predictive Variable-Latency VLSI Design

Recommendations

A VLSI Modulo m Multiplier

A novel method to compute the exact digits of the modulo m product of integers is proposed, and a modulo m multiply structure is defined. Such a structure can be implemented by means of a few fast VLSI binary multipliers, and a response time of about ...
High-Speed VLSI Multiplication Algorithm with a Redundant Binary Addition Tree

A high-speed VLSI multiplication algorithm internally using redundant binary representation is proposed. In n bit binary integer multiplication, n partial products are first generated and then added up pairwise by means of a binary tree of redundant ...
High-Speed Area-Efficient Multiplier Design Using Multiple-Valued Current-Mode Circuits

Presents a very-large-scale-integration (VLSI)-oriented high-speed multiplier design method based on carry-propagation-free addition trees and a circuit technique, so-called multiple-valued current-mode (MVCM) circuits. The carry-propagation-free ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 12, Issue 3

Special Issue on Cross-Layer System Design and Regular Papers

September 2015

207 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/2828988

Editor:
Krishnendu Chakrabarty
Duke University, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 21 September 2015

Accepted: 01 March 2015

Revised: 01 October 2014

Received: 01 June 2014

Published in JETC Volume 12, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
230
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents