Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Application-Specific Cross-Layer Optimization Based on Predictive Variable-Latency VLSI Design

Published: 21 September 2015 Publication History

Abstract

Traditional synchronous VLSI design requires that all computations in a logic stage complete in one clock cycle. This leads to increasingly pessimistic design as technology scaling introduces increasingly significant parametric variations that result in an increasing performance variability. Alternatively, by allowing computations in a logic stage to complete in a variable number of clock cycles, variable-latency design provides relaxed timing constraints for average performance, area, and power consumption optimization. In this article, we present improved variable-latency design techniques including: (1) a generic minimum-intrusion variable-latency VLSI design paradigm, (2) a signal probability-based approximate prediction logic construction method for minimum misprediction rate at minimum cost, and (3) an application-specific cross-layer analysis methodology. Our experiments show that the proposed variable-latency design methodology on average reduces the computation latency by 26.80%(14.65%) at cost of 0.08%(3.4%) area and 0.4%(2.2%) energy consumption increase for the interger (floating point) unit of an open-source SPARC V8 processor LEON2 synthesized with a clock-cycle time between 1.97ns(3.49ns) and 5.96ns(13.74ns) based on the 45nm Nangate open cell library, while an automotive application-specific design further achieves an average latency reduction of 41.8%.

References

[1]
M. Alam. 2008. Reliability- and process-variation aware design of integrated circuits. Microelectron. Reliab. 48, 1114--1122.
[2]
T. Austin, V. Bertacco, D. Blaauw, and T. Mudge. 2005. Opportunities and challenges for better than worst-case design. In Proceedings of the Asian and South Pacific Design Automation Conference (ASP-DAC'05). 2--7.
[3]
T. Austin, D. Blaauw, T. Mudge, and K. Flautner. 2004. Making typical silicon matter with razor. IEEE Comput. 37, 3, 57--65.
[4]
L. Benini, E. Macii, M. Poncino, and G. D. Micheli. 1998. Telescopic units: A new paradigm for performance optimization of VLSI designs. IEEE Trans. Comput.-Aided Des. 17, 3, 220--232.
[5]
L. Benini and G. D. Micheli. 2004. Networks on chips: A new paradigm for component-based MPSoC design. http://si2.epfl.ch/∼demichel/publications/archive/2004/mpsoc.pdf.
[6]
L. Benini, G. D. Micheli, A. Lioy, E. Macii, G. Odasso, and M. Poncino. 1999. Automatic synthesis of large telescopic units based on near-minimum timed supersetting. IEEE Trans. Comput. 48, 8, 769--779.
[7]
C. L. Berman, D. J. Hathaway, A. S. Lapaugh, and L. Trevillyan. 1990. Efficient techniques for timing corretion. In Proceedings of the IEEE International Symposium Circuits and Systems (ISCA'90). 415--419.
[8]
D. Blaauw, S. Kalaiselvan, K. Lai, W.-H. Ma, S. Pant, C. Tokunaga, S. Das, and D. Bull. 2008. RAZOR-II: In-situ error detection and correction for PVT and SER tolerance. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'08). 400--401.
[9]
S. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 11, 10--15.
[10]
B. Bose and T. R. N. Rao. 1982. Theory of unidirectional error correcting/detecting codes. IEEE Trans. Comput. C-31, 6, 521--530.
[11]
K. A. Bowman, J. W. Tschanz, N. S. Kim, J. Lee, C. B. Wilkerson, S. L. Lu, T. Karnik, and V. K. De. 2008. Energy-efficient and metastability-immune timing-error detection and instruction-replay-based recovery circuits for dynamic-variation tolerance. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'08). 402--623.
[12]
K. A. Bowman, J. W. Tschanz, S.-L. L. Lu, P. A. Aseron, M. M. Khellah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson, T. Karnik, and V. K. De. 2011. A 45nm resilient microprocessor core for dynamic variation tolerance. IEEE J. Solid State Circ. 46, 1, 194--208.
[13]
S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. 2006. A self-tuning dvs processor using delay-error detection and correction. IEEE J. Solid State Circ. 41, 4, 792--804.
[14]
S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. T. Blaauw. 2009. Razorii: In situ error detection and correction for PVT and SER tolerance. IEEE J. Solid State Circ. 44, 1, 32--48.
[15]
D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim, and K. Flautner. 2004. RAZOR: Circuit-level correction of timing errors for low-power operation. IEEE Micro 24, 6, 10--20.
[16]
D. Ernst, N. S. Kim, S. Das, S. Pant, T. Pham, R. Rao, C. Ziesler, D. Blaauw, T. Austin, and T. Mudge. 2003. RAZOR: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). 7.
[17]
M. Fojtik, D. Fick, Y. Kim, N. Pincknet, D. Harris, D. Blaauw, and D. Sylvester. 2012. Bubble razor: An architecture-independent approach to timing-error detection and correction. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'12). 488--490.
[18]
S. B. Furber and P. Day. 1996. Four-phase micropipeline latch control circuits. IEEE Trans. VLSI Syst. 4, 2, 247--253.
[19]
A. Gaisler. 2015. LEON SPARC V8 processors. http://www.gaisler.com/.
[20]
S. Ghosh, S. Bhunia, and K. Roy. 2006. A new paradigm for low-power, variation-tolerant circuit synthesis using critical path isolation. In Proceedings of the IEEE International Conference Computer-Aided Design (ICCAD'06). 619--624.
[21]
S. Ghosh, S. Bhunia, and K. Roy. 2007. Crista: A new paradigm for low-power, variation-tolerant, and adaptive circuit synthesis using critical path isolation. IEEE Trans. Comput.-Aided Des. 26, 11, 1947--1956.
[22]
S. Ghosh and K. Roy. 2010. Parameter variation tolerance and error resiliency: New design paradigm for the nanoscale era. Proc. IEEE 98, 10, 1718--1751.
[23]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14.
[24]
S. Hauck. 1995. Asynchronous design methodologies: An overview. Proc. IEEE 83, 1, 69--93.
[25]
N. K. Jha and S. J. Wang. 1993. Design and synthesis of self-checking VLSI circuits. IEEE Trans. Comput.-Aided Des. 12, 878--887.
[26]
D. R. Kelly and B. J. Phillips. 2005. Arithmetic data value speculation. In Proceedings of the 10th Asia-Pacific Conference on Advances on Computer Systems Architecture (ACSAC'05). 353--366.
[27]
Y. Kondo, N. Ikumi, K. Ueno, J. Mori, and M. Hirano. 1997. An early-completion-detecting alu for a 1GHz 64B datapath. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'97). 418--497.
[28]
B. Liu. 2008. Signal probability based statistical timing analysis. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'08). 562--567.
[29]
B. Liu, X. Chen, and F. Teshome. 2012. Resilient and adaptive performance logic. ACM J. Emerg. Technol. Comput. Syst. 8, 3.
[30]
S.-L. Lu. 2004. Speeding up processing with approximation circuits. Comput. 37, 3, 67--73.
[31]
G. Mago. 1973. Monotone functions in sequential circuits. IEEE Trans. Comput. 22, 10, 928--933.
[32]
P. C. Mcgeer, R. K. Brayton, A. L. Sangiovanni-Vincentelli, and S. K. Sahni. 1991. Performance enhancement through the generalized bypass transform. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'91). 184--187.
[33]
D. E. Muller and W. S. Bartky. 1959. A theory of asynchronous circuits. In Proceedings of the International Symposium on the Theory of Switching. 204--243.
[34]
F. N. Najm. 1993. Transition density: A new measure of activity in digital circuits. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 12, 2, 310--323.
[35]
D. B. Neres, J. Cortadella, and M. Kishinevsky. 2009. Variable-latency design by function speculation. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 1704--1709.
[36]
M. Olivieri 2001. Design of synchronous and asynchronous variable-latency pipelined multipliers. IEEE Trans. VLSI Syst. 9, 2, 365--376.
[37]
Silicon Integration Initiative (SI2). 2015. Nangate open cell library. http://www.si2.org/openeda.si2.org/projects/nangatelib/.
[38]
M. Singh and S. M. Nowick. 2007. MOUSETRAP: Ultra-high-speed transition signaling asynchronous pipelines. IEEE Trans. VLSI Syst. 15, 6, 684--698.
[39]
J. Sparso and S. Furber. 2001. Principles of Asynchronous Circuit Design -- A Systems Perspective. Kluwer Academic.
[40]
Y.-S. Su, D.-C. Wang, S.-C. Chang, and M. Marek-Sadowska. 2011. Performance optimization using variable-latency design style. IEEE Trans. VLSI Syst. 19, 10, 1874--1883.
[41]
I. E. Sutherland. 1989. Micropipelines. Comm. ACM 32, 6, 720--738.
[42]
T. Verhoeff. 1988. Delay-insensitive codes -- An overview. Distrib. Comput. 3, 1--8.
[43]
N. H. E. West and D. M. Harris. 2011. CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Addison-Wesley.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 12, Issue 3
Special Issue on Cross-Layer System Design and Regular Papers
September 2015
207 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/2828988
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 21 September 2015
Accepted: 01 March 2015
Revised: 01 October 2014
Received: 01 June 2014
Published in JETC Volume 12, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CAD
  2. VLSI
  3. VLSI statistical timing analysis and optimization
  4. better-than-worst-case VLSI design
  5. optimization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 230
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media