Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Application-Specific Cross-Layer Optimization Based on Predictive Variable-Latency VLSI Design

Published: 21 September 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Traditional synchronous VLSI design requires that all computations in a logic stage complete in one clock cycle. This leads to increasingly pessimistic design as technology scaling introduces increasingly significant parametric variations that result in an increasing performance variability. Alternatively, by allowing computations in a logic stage to complete in a variable number of clock cycles, variable-latency design provides relaxed timing constraints for average performance, area, and power consumption optimization. In this article, we present improved variable-latency design techniques including: (1) a generic minimum-intrusion variable-latency VLSI design paradigm, (2) a signal probability-based approximate prediction logic construction method for minimum misprediction rate at minimum cost, and (3) an application-specific cross-layer analysis methodology. Our experiments show that the proposed variable-latency design methodology on average reduces the computation latency by 26.80%(14.65%) at cost of 0.08%(3.4%) area and 0.4%(2.2%) energy consumption increase for the interger (floating point) unit of an open-source SPARC V8 processor LEON2 synthesized with a clock-cycle time between 1.97ns(3.49ns) and 5.96ns(13.74ns) based on the 45nm Nangate open cell library, while an automotive application-specific design further achieves an average latency reduction of 41.8%.

    References

    [1]
    M. Alam. 2008. Reliability- and process-variation aware design of integrated circuits. Microelectron. Reliab. 48, 1114--1122.
    [2]
    T. Austin, V. Bertacco, D. Blaauw, and T. Mudge. 2005. Opportunities and challenges for better than worst-case design. In Proceedings of the Asian and South Pacific Design Automation Conference (ASP-DAC'05). 2--7.
    [3]
    T. Austin, D. Blaauw, T. Mudge, and K. Flautner. 2004. Making typical silicon matter with razor. IEEE Comput. 37, 3, 57--65.
    [4]
    L. Benini, E. Macii, M. Poncino, and G. D. Micheli. 1998. Telescopic units: A new paradigm for performance optimization of VLSI designs. IEEE Trans. Comput.-Aided Des. 17, 3, 220--232.
    [5]
    L. Benini and G. D. Micheli. 2004. Networks on chips: A new paradigm for component-based MPSoC design. http://si2.epfl.ch/∼demichel/publications/archive/2004/mpsoc.pdf.
    [6]
    L. Benini, G. D. Micheli, A. Lioy, E. Macii, G. Odasso, and M. Poncino. 1999. Automatic synthesis of large telescopic units based on near-minimum timed supersetting. IEEE Trans. Comput. 48, 8, 769--779.
    [7]
    C. L. Berman, D. J. Hathaway, A. S. Lapaugh, and L. Trevillyan. 1990. Efficient techniques for timing corretion. In Proceedings of the IEEE International Symposium Circuits and Systems (ISCA'90). 415--419.
    [8]
    D. Blaauw, S. Kalaiselvan, K. Lai, W.-H. Ma, S. Pant, C. Tokunaga, S. Das, and D. Bull. 2008. RAZOR-II: In-situ error detection and correction for PVT and SER tolerance. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'08). 400--401.
    [9]
    S. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 11, 10--15.
    [10]
    B. Bose and T. R. N. Rao. 1982. Theory of unidirectional error correcting/detecting codes. IEEE Trans. Comput. C-31, 6, 521--530.
    [11]
    K. A. Bowman, J. W. Tschanz, N. S. Kim, J. Lee, C. B. Wilkerson, S. L. Lu, T. Karnik, and V. K. De. 2008. Energy-efficient and metastability-immune timing-error detection and instruction-replay-based recovery circuits for dynamic-variation tolerance. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'08). 402--623.
    [12]
    K. A. Bowman, J. W. Tschanz, S.-L. L. Lu, P. A. Aseron, M. M. Khellah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson, T. Karnik, and V. K. De. 2011. A 45nm resilient microprocessor core for dynamic variation tolerance. IEEE J. Solid State Circ. 46, 1, 194--208.
    [13]
    S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. 2006. A self-tuning dvs processor using delay-error detection and correction. IEEE J. Solid State Circ. 41, 4, 792--804.
    [14]
    S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. T. Blaauw. 2009. Razorii: In situ error detection and correction for PVT and SER tolerance. IEEE J. Solid State Circ. 44, 1, 32--48.
    [15]
    D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim, and K. Flautner. 2004. RAZOR: Circuit-level correction of timing errors for low-power operation. IEEE Micro 24, 6, 10--20.
    [16]
    D. Ernst, N. S. Kim, S. Das, S. Pant, T. Pham, R. Rao, C. Ziesler, D. Blaauw, T. Austin, and T. Mudge. 2003. RAZOR: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). 7.
    [17]
    M. Fojtik, D. Fick, Y. Kim, N. Pincknet, D. Harris, D. Blaauw, and D. Sylvester. 2012. Bubble razor: An architecture-independent approach to timing-error detection and correction. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'12). 488--490.
    [18]
    S. B. Furber and P. Day. 1996. Four-phase micropipeline latch control circuits. IEEE Trans. VLSI Syst. 4, 2, 247--253.
    [19]
    A. Gaisler. 2015. LEON SPARC V8 processors. http://www.gaisler.com/.
    [20]
    S. Ghosh, S. Bhunia, and K. Roy. 2006. A new paradigm for low-power, variation-tolerant circuit synthesis using critical path isolation. In Proceedings of the IEEE International Conference Computer-Aided Design (ICCAD'06). 619--624.
    [21]
    S. Ghosh, S. Bhunia, and K. Roy. 2007. Crista: A new paradigm for low-power, variation-tolerant, and adaptive circuit synthesis using critical path isolation. IEEE Trans. Comput.-Aided Des. 26, 11, 1947--1956.
    [22]
    S. Ghosh and K. Roy. 2010. Parameter variation tolerance and error resiliency: New design paradigm for the nanoscale era. Proc. IEEE 98, 10, 1718--1751.
    [23]
    M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14.
    [24]
    S. Hauck. 1995. Asynchronous design methodologies: An overview. Proc. IEEE 83, 1, 69--93.
    [25]
    N. K. Jha and S. J. Wang. 1993. Design and synthesis of self-checking VLSI circuits. IEEE Trans. Comput.-Aided Des. 12, 878--887.
    [26]
    D. R. Kelly and B. J. Phillips. 2005. Arithmetic data value speculation. In Proceedings of the 10th Asia-Pacific Conference on Advances on Computer Systems Architecture (ACSAC'05). 353--366.
    [27]
    Y. Kondo, N. Ikumi, K. Ueno, J. Mori, and M. Hirano. 1997. An early-completion-detecting alu for a 1GHz 64B datapath. In Proceedings of the IEEE Solid State Circuits Conference (ISSCC'97). 418--497.
    [28]
    B. Liu. 2008. Signal probability based statistical timing analysis. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'08). 562--567.
    [29]
    B. Liu, X. Chen, and F. Teshome. 2012. Resilient and adaptive performance logic. ACM J. Emerg. Technol. Comput. Syst. 8, 3.
    [30]
    S.-L. Lu. 2004. Speeding up processing with approximation circuits. Comput. 37, 3, 67--73.
    [31]
    G. Mago. 1973. Monotone functions in sequential circuits. IEEE Trans. Comput. 22, 10, 928--933.
    [32]
    P. C. Mcgeer, R. K. Brayton, A. L. Sangiovanni-Vincentelli, and S. K. Sahni. 1991. Performance enhancement through the generalized bypass transform. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'91). 184--187.
    [33]
    D. E. Muller and W. S. Bartky. 1959. A theory of asynchronous circuits. In Proceedings of the International Symposium on the Theory of Switching. 204--243.
    [34]
    F. N. Najm. 1993. Transition density: A new measure of activity in digital circuits. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 12, 2, 310--323.
    [35]
    D. B. Neres, J. Cortadella, and M. Kishinevsky. 2009. Variable-latency design by function speculation. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 1704--1709.
    [36]
    M. Olivieri 2001. Design of synchronous and asynchronous variable-latency pipelined multipliers. IEEE Trans. VLSI Syst. 9, 2, 365--376.
    [37]
    Silicon Integration Initiative (SI2). 2015. Nangate open cell library. http://www.si2.org/openeda.si2.org/projects/nangatelib/.
    [38]
    M. Singh and S. M. Nowick. 2007. MOUSETRAP: Ultra-high-speed transition signaling asynchronous pipelines. IEEE Trans. VLSI Syst. 15, 6, 684--698.
    [39]
    J. Sparso and S. Furber. 2001. Principles of Asynchronous Circuit Design -- A Systems Perspective. Kluwer Academic.
    [40]
    Y.-S. Su, D.-C. Wang, S.-C. Chang, and M. Marek-Sadowska. 2011. Performance optimization using variable-latency design style. IEEE Trans. VLSI Syst. 19, 10, 1874--1883.
    [41]
    I. E. Sutherland. 1989. Micropipelines. Comm. ACM 32, 6, 720--738.
    [42]
    T. Verhoeff. 1988. Delay-insensitive codes -- An overview. Distrib. Comput. 3, 1--8.
    [43]
    N. H. E. West and D. M. Harris. 2011. CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Addison-Wesley.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 12, Issue 3
    Special Issue on Cross-Layer System Design and Regular Papers
    September 2015
    207 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/2828988
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 21 September 2015
    Accepted: 01 March 2015
    Revised: 01 October 2014
    Received: 01 June 2014
    Published in JETC Volume 12, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CAD
    2. VLSI
    3. VLSI statistical timing analysis and optimization
    4. better-than-worst-case VLSI design
    5. optimization

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 230
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media