ICCAD2000, Pages 2-7
Physical Planning with Retiming
Jason Cong and Sung Kyu Lim
UCLA Department of Computer Science, Los Angeles, CA 90095
Abstract
In this paper, we propose a unified approach to partitioning, floorplanning, and retiming
for effective and efficient performance optimization. The integration enables the
partitioner to exploit more realistic geometric delay model provided by the underlying
floorplan.
Simultaneous consideration of partitioning and retiming under the
geometric delay model enables us to hide global interconnect latency effectively by
repositioning FF along long wires. Under the proposed geometric embedding based
performance driven partitioning problem, our GEO algorithm performs multi-level topdown partitioning while determining the location of the partitions. We adopt the concept
of sequential arrival time [14] and develop sequential required time in our retiming based
timing analysis engine. GEO performs cluster-move based iterative improvement on top
of multi-level cluster hierarchy [4], where the gain function obtained from the timing
analysis is based on the minimization of cutsize, wirelength, and sequential slack. In our
comparison to (i) state-of-the-art partitioner hMetis [9] followed by retiming [11] and
simulated annealing based slicing floorplanning [15], and (ii) state-of-the-art
simultaneous partitioning with retiming HPM [7] followed by floorplanning [15], GEO
obtains 35% and 23% better delay results while maintaining comparable cutsize,
wirelength, and runtime results.
References
[1] R. Bellman. On a routing problem. Quarterly of Applied Mathematics, pages 87-90, 1958.
[2] J. Cong. An interconnect-centric design flow for nanometer technologies. In Proc. of Int'l Symp. on
VLSI Technology, Systems, and Applications, pages 54-57, 1999.
[3] J. Cong, H. Li, and C. Wu. Simultaneous circuit partitioning / clustering with retiming for performance
optimization. In Proc. ACM Design Automation Conf., pages 460-465, 1999.
[4] J. Cong and S. K. Lim. Edge separability based circuit clustering with application to circuit partitioning.
In Proc. Asia and South Paci_c Design Automation Conf., pages 429-434, 2000.
[5] J. Cong and S. K. Lim. Performance driven multiway partitioning. In Proc. Asia and South Pacific
Design Automation Conf., pages 441-446, 2000.
[6] J. Cong and S. K. Lim. Physical planning with retiming. Technical Report 200019, UCLA Computer
Science Dept., 2000.
[7] J. Cong, S. K. Lim, and C. Wu. Performance driven multi-level and multiway partitioning with
retiming. In Proc. ACM Design Automation Conf., pages 274-279, 2000.
[8] J. Cong and D. Z. Pan. Interconnect delay estimation models for synthesis and design planning. In Proc.
Asia and South Pacific Design Automation Conf., pages 97-100, 1999.
[9] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning : Application
in VLSI domain. In Proc. ACM Design Automation Conf., pages 526-529, 1997.
[10] E. L. Lawler, K. N. Levitt, and J. Turner. Module clustering to minimize delay in digital networks.
IEEE Trans. on Computer-Aided Design, pages 47-57, 1969.
[11] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, pages 5-35, 1991.
[12] L. Liu, M. Kuo, C. K. Cheng, and T. C. Hu. Performance driven partitioning using a replication graph
approach. In Proc. ACM Design Automation Conf., pages 206-210, 1995.
[13] R. Murgai, R. K. Brayton, and A. Sangiovanni Vincentelli. On clustering for minimum delay/area. In
Proc. IEEE Int. Conf. on Computer-Aided Design, pages 6-9, 1991.
[14] P. Pan, A. K. Karandikar, and C. L. Liu. Optimal clock period clustering for sequential circuits with
retiming. IEEE Trans. on Computer-Aided Design, pages 489-498, 1998.
[15] D. F. Wong and C. L. Liu. Floorplan design of VLSI circuits. Algorithmica, pages 263-291, 1989.
[16] H. Yang and D. F. Wong. Circuit clustering for delay minimization under area and pin constraints.
IEEE Trans. on Computer-Aided Design, pages 976-986, 1997.
ICCAD2000, Pages 8-12
Corner Block List: An Effective and Efficient Topological Representation
of Non-Slicing Floorplan
Xianlong Hong1, Gang Huang1, Yici Cai1, Jiangchun Gu1, Sheqin Dong1,
Chung-Kuan Cheng2, Jun Gu3
1
2
Department of Computer Science and Technology, Tsinghua University, Beijing, 100084 P. R. China
Department of Computer Science and Engineering, University of California, San Diego La Jolla, CA
92093-0114, USA, Email: kuan@cs.ucsd.edu
3
Department of Computer Science, Science & Technology University of Hong Kong
Abstract
In this paper, a corner block list - a new efficient topological representation for nonslicing floorplan is proposed with applications to VLSI floorplan and building block
placement. Given a corner block list, it takes only linear time to construct the floorplan.
Unlike the O-tree structure, which determines the exact floorplan based on given block
sizes, corner block list defines the floorplan independent of the block sizes. Thus, the
structure is better suited for floorplan optimization with various size configurations of
each block. Based on this new structure and the simulated annealing technique, an
efficient floorplan algorithm is given. Soft blocks and the aspect ratio of the chip are
taken into account in the simulated annealing process. The experimental results
demonstrate the algorithm is quite promising.
References
[1] D.F.Wong & C.L.Liu, “ A new algorithm for floorplan design”. In: Proc. of 23 rd ACM/IEEE Design
Automation Conference, 101-107, 1986.
[2] H. Onodera, Y.Taniguychi & K.Tamaru, “ Branch and bound placement for building block layout” in:
Proc. 28th ACM/IEEE Design Automation Conference, 423-439, 1991
[3] H. Murata, K. Fujiyoshi, S.Nakatake, & Y. Kajitani, “ Rectangle-packingbased module placement”, in:
Proc. of International Conference on Computer Aided Design, 472 -479. 1995.
[4] T.Tahahashi, “ An algorithm for finding a maximum-weight decreasing sequence in a permutation,
motivated by rectangle packing problem”, Technical report of IEICE, vol. VLD 96, no.201, 31-35, 1996.
[5] S.Nakatake, H. Murata, K. Fujiyoshi, Y. Kajitani, “Module placement on BSG-structure and IC layout
application” in: Proc. of International Conference on Computer Aided Design, 484-490, 1996.
[6] P.N.Guo, C.K.Cheng, “ An O-tree representation of non-slicing floorplan and its applications”, in:
ACM/IEEE Design Automation Conference, 1999.
[7] S.Kirkpatrick et al. “Optimization by simulated annealing," Science. vol. 220, 671-680, 1983
ICCAD2000, Pages 13-16
Modeling Non-Slicing Floorplans with Binary Trees
Florin Balasa
University of Illinois at Chicago, Dept. of EECS, Chicago, IL 40407
Several novel topological representations of non-slicing floorplans [2] have been more
recently proposed, providing new ideas and techniques for solving block placement
problems and other related layout applications. Among these topological representations,
ordered trees exhibited a lower redundancy and, therefore, a provable smaller search
space, which makes them the best topological candidate for solving general block
placement problems. Starting from the early eighties, binary trees have been widely used
to represent slicing floorplans [7]. This paper shows that binary trees can efficiently
model non-slicing floorplans as well, as there is a one-to-one mapping between the sets
of binary and ordered trees representing the floorplan. Moreover, this paper shows that
binary trees exhibiting a certain property can be used to represent block placement
configurations with symmetry constraints, which is very useful when dealing with
device-level placement problems for analog layout. As the number of these trees is
proven to be smaller than the number of symmetric-feasible sequence-pairs [1], using
binary trees is better than using either sequence-pairs or O-trees when solving analog
placement problems. A comparative evaluation, substantiating these theoretical results,
has been carried out by providing alternative optimization engines to a placement tool
operating in an industrial environment.
References
[1] F. Balasa, K. Lampaert, "Symmetry within the sequencepair representation in the context of placement
for analog design," IEEE Trans. on CAD of IC's and Systems, Vol. 17, No. 7, pp. 721-731, July
2000.
[2] J. Cohn, D. Garrod, R. Rutenbar, L. Carley, Analog DeviceLevel Automation, Kluwer Acad., 1994.
[3] P.-N. Guo, C.-K. Cheng, T. Yoshimura, "An 0-tree representation of non-slicing floorplan and its
applications," Proc. 36th ACM/IEEE Des. Aut. Conf., pp. 268-273, June 1999.
[4] D.E. Knuth, The Art of Computer Programming (3rd edition), Addison Weslay Longman, 1997.
[5] H.M urata, K. Fujiyoshi, S. Nakatake, Y. Kajitani, "VLSI module placement based on rectanglepacking by the sequence-pair," IEEE Trans. on Comp.-Aided Design of IC's and Systems, Vol. 15,
No. 12, pp. 1518-1524, Dec. 1996.
[6] H. Onodera, Y. Taniguchi, K. Tamaru, "Branch-andbound placement for building block layout," Proc.
28th ACM/IEEE Design Automation Conf., pp. 433-439, 1991.
[7] R. Otten, "Complexity and diversity in IC layout design," Proc. IEEE Intn'l Symp. Circuits and
Computers, 1980.
[8] Y.-X. Pang, F. Balasa, K. Lampaert, C.-K. Cheng, "Block placement with symmetry constraints based
on the 0-tree non-slicing representation," Proc. 37th ACM/IEEE Design Automation Conf., June
2000.
[9] D.F. Wong, C.L. Liu, "A new algorithm for floorplan design," Proc. 23rd ACM/IEEE Des. Aut.
Conf., pp. 101-107,1986.
[10] C.-S. Ying, S.-L. Wong, "An analytical approach to floorplanning for hierarchical building blocks
layout," IEEE Trans. on CAD of IC's and Syst., Vol. 8, pp. 403-412, April 1989.
ICCAD2000, Pages 17-21
On Mismatches Between Incremental Optimizers and Instance Perturbations in
Physical Design Tools
Andrew B. Kahng and Stefanus Mantik
UCLA Computer Science Dept., Los Angeles, CA 90095-1596
Abstract
The incremental, "construct by correction" design methodology has become widespread
in constraint-dominated DSM design. We study the problem of ECO for physical design
domains in the general context of incremental optimization. We observe that an
incremental design methodology is typically built from a full optimizer that generates a
solution for an initial instance, and an incremental optimizer that generates a sequence of
solutions corresponding to a sequence of perturbed instances. Our hypothesis is that in
practice, there can be a mismatch between the strength of the incremental optimizer and
the magnitude of the perturbation between successive instances. When such a mismatch
occurs, the solution quality will degrade - perhaps to the point where the incremental
optimizer should be replaced by the full optimizer. We document this phenomenon for
three distinct domains - partitioning, placement and routing - using leading industry and
academic tools. Our experiments show that current CAD tools may not be correctly
designed for ECO-dominated design processes. Thus, compatibility between optimizer
and instance perturbation merits attention both as a research question and as a matter of
industry design practice.
References
[1] C. J. Alpert, “Partitioning Benchmarks for VLSI CAD Community”, Web page,
http://vlsicad.cs.ucla.edu/˜cheese/benchmarks.html
[2] C. J. Alpert, “The ISPD-98 Circuit Benchmark Suite”, Proc. ACM/IEEE Intl. Symposium on Physical
Design, April 98, pp. 80-85. See errata at http://vlsicad.cs.ucla.edu/˜cheese/errata.html
[3] K. D. Boese, Models for Iterative Global Optimization, Ph.D. Thesis, UCLA Computer Science Dept.,
1996.
[4] A. E. Caldwell, A. B. Kahng and I. L. Markov, “Improved Algorithms for Hypergraph Bipartitioning”,
Proc. Asia and South Pacific Design Automation Conf., Jan. 2000, pp. 661-666, available at
http://vlsicad.cs.ucla.edu/GSRC/bookshelf
[5] J. Cong and M. Sarrafzadeh, “Incremental Physical Design”, Proc. ISPD, 2000, pp. 84-92.
[6] S. Fenstermaker, D. George, A. B. Kahng, S. Mantik and B. Thielges, “METRICS: A System
Architecture for Design Process Optimization”, Proc. ACM/IEEE Design Automation Conf., June 2000, pp.
705-710.
[7] A. S. Fukunaga, J. H. Huang and A. B. Kahng, “On Clustered Kick Moves For Iterated-Descent Netlist
Partitioning”, Proc. IEEE Intl. Symp. on Circuits and Systems, May 1996, pp. IV/496-499.
[8] B. Hajek, “Cooling Schedules for Optimal Annealing”, Mathematics of Operations Research 13(2)
(1988), pp. 311-329.
[9] D. J. Hathaway, R. R. Habra, E. C. Schanzenbach and S. J. Rothman, “Circuit Placement, Chip
Optimization, andWire Routing for IBM IC Technology”, J. VLSI Signal Processing Systems for Signal,
Image, and Video Technology 16(2-3) (1997), pp. 191-198.
[10] L. N. Kannan, P. R. Suaris and H.-G. Fang, “A Methodology and Algorithms for Post-Placement
Delay Optimization”, Proc. ACM/IEEE Design Automation Conf., 1994, pp. 327-332.
[11] O. C. Martin, S. W. Otto and E. W. Felten, “Large-Step Markov Chains for the Traveling Salesman
Problem”, Complex Systems, 5(3), 1991, pp. 299-326.
[12] I. H. Osman and J. P. Kelly, eds.,Meta-Heuristics: Theory and Applications, Kluwer, 1996.
[13] R. Otten, “GlobalWires Harmful?”, Proc. Intl. Symposium on Physical Design, 1998, pp. 104-109.
[14] R. Otten and R. K. Brayton, “Planning for Performance: the Constant Delay Paradigm”, Proc.
ACM/IEEE Design Automation Conf., 1998.
[15] J. C. Shah and S. S. Sapatnekar, “Wiresizing With Buffer Placement and Sizing for Power-Delay
Tradeoffs”, Proc. Intl. Conf. on VLSI Design, Bangalore, 1996, pp. 346-351.
[16] T. Shibuya, I. Nitta and K. Kawamura, “SMINCUT: VLSI Placement Tool Using Min-Cut”, Fujitsu
Scientific and Technical Journal 31(2) (1995), pp. 197-207.
[17] Semiconductor Industry Association, “The National Technology Roadmap for Semiconductors:
Technology Needs”, December 1997.
[18] R. H. Storer, S. D. Wu and R. Vaccari, “New Search Spaces for Sequencing Problems With
Application to Job Shop Scheduling”, Management Science 38 (1992), pp. 1495-1509.
[19] W. Sun and C. Sechen, “Efficient and Effective Placements for Very Large Circuits”, Proc.
IEEE/ACM Intl. Conf. on Computer-Aided Design, 1993, pp. 170-177.
[20] M.Wang, P. Banerjee andM. Sarrafzadeh, “Potential-NRG: Placement With Incomplete Data”, Proc.
ACM/IEEE Design Automation Conf., 1998.
ICCAD2000, Pages 23-26
Event Driven Simulation Without Loops or Conditionals
Peter M. Maurer
Dept. of Comp. Sci. & Eng. Univ. of South Florida, Tampa, Florida 33620.
Introduction
The past several years have seen much research in event driven logic simulation[1].
Various logic and delay models have been explored[2]. Most simulation research has
focused on improving simulation performance. New approaches to both compiled and
event driven simulation have been explored[3,4,5].
The internal operations of event-driven simulators can be divided into two categories,
scheduling, and gate simulation. Much effort has been focused on reducing the cost of
scheduling[3,4,5]. There has also been effort to reduce the cost of gate simulation[6,7]. It
has also been shown that explicit computation of gate outputs is unnecessary, as long as
event-propagation is computed correctly[7].
Even though research has reduced the complexity of both scheduling and gate simulation,
it is still necessary to test for event propagation and cancellation, and it is necessary to
perform some computations during gate simulation.
This paper will show that none of these computations are necessary. Most computations
are devoted testing internal states and computing new internal states. In our technique,
subroutine addresses are used to maintain states. This permits the elimination of all statetesting and state-computation code. Our technique is significantly faster than
conventional event-driven simulation[1]. Unlike earlier methods[7], our approach can
easily be extended to any logic model or any delay model.
References.
1. E. G. Ulrich, "Event Manipulation for Discrete Simulations Requiring Large Numbers of Events, "
JACM, V.21, N.9, Sep. 1978, pp. 777-85.
2. Szygenda, S., D. Rouse, E. Thompson, “A Model and Implementation of a Universal Time-Delay
Simulator for Large Digital Nets,” Spring Joint Computer Conference, 1970, pp. 491-496.
3. D. M. Lewis, “A Hierarchical Compiled Code Event-Driven Logic Simulator,” IEEE Transactions on
Computer Aided Design, Vol 10, No. 6, pp.726-737, June 1991.
4. D. M. Lewis, " Hierarchical Compiled Event-Driven Logic Simulation," Proceedings of ICCAD-89,
pp.498-501.
5. Z. Wang and P. M. Maurer, “LECSIM: A Levelized Event Driven Compiled Logic Simulator,”
Proceedings of the 27th Design Automation Conference, 1990, pp. 491-496.
6. M. Heydemann, D. Dure, “The Logic Automation Approach to Accurate Gate and Functional Level
Simulation,” Proceedings of ICCAD-88, pp. 250-253.
7. P. M. Maurer, “The Inversion Algorithm for Digital Simulation,” Proceedings of ICCAD-94, pp. 259-61.
8. P. M. Maurer, “The Shadow Algorithm: A Scheduling Technique for Both Compiled and Interpreted
Simulation,” IEEE Transactions on Computer Aided Design,V11, No 12, Sept. 1993, pp. 1411-1413.
9. F. Brglez, Pownall, Hum, “Accelerated ATPG and Fault Grading via Testability Analysis,” ISCAS-85,
pp. 695-698.
ICCAD2000, Pages 27-32
Observability Analysis of Embedded Software for Coverage-Directed Validation
José C. Costa
IST/INESC
Srinivas Devadas
MIT
José C. Monteiro
IST/INESC
The most common approach to checking correctness of a hardware or software design is
to verify that a description of the design has the proper behavior as elicited by a series of
input stimuli. In the case of software, the program is simply run with the appropriate
inputs, and in the case of hardware, its description written in a hardware description
language (HDL) is simulated with the appropriate input vectors. In coverage-directed
validation, coverage metrics are defined that quantitatively measure the degree of
verification coverage of the design.
Motivated by recent work on observability-based coverage metrics for models described
in a hardware description language, we develop a method that computes an observabilitybased code coverage metric for embedded software written in a high-level programming
language. Given a set of input vectors, our metric indicates the instructions that had no
effect on the output. An assignment that was not relevant to generate the output value
cannot be considered as being covered. Results show that our method offers a
significantly more accurate assessment of design verification coverage than statement
coverage. Existing coverage methods for hardware can be used with our method to build
a verification methodology for mixed hardware/software or embedded systems.
References
[1] B. Beizer. Software Testing Techniques. Van Nostrand Rheinhold, New York, second edition, 1990.
[2] A. Benveniste and P. Le Guernic. Hybrid Dynamical Systems Theory and the SIGNAL Language.
IEEE Transactions on Automatic Control, 35(5):525–546, May 1990.
[3] G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics,
Implementation. Science of Computer Programming, 19(2):87–152, 1992.
[4] T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. McGraw Hill e MIT Press, 1990.
[5] S. Edwards, L. Lavagno, E. Lee, , and A. Sangiovanni-Vincentelli. Design of Embedded Systems:
Formal Models, Validation, and Synthesis. Proceedings of the IEEE, 85(3):336–390, 1997.
[6] F. Fallah, S. Devadas, and K. Keutzer. OCCOM: Efficient Computationa of Observability-Based Code
Coverage Metrics for Functional Simulation. In Proceedings of the 35th Design Automation Conference,
pages 152–157, June 1998.
[7] T. Goradia. Dynamic Impact Analysis: A Cost Effective Technique to Enforce Error Propagation. In
Proceedings of Int’l Symposium on Software Testing and Applications, March 1993.
[8] R. K. Gupta, C. N. Coelho Jr, and G. De Micheli. Synthesis and Simulation of Digital Systems
Containing Interacting Hardware and Software Components. In Proceedings of the Design Automation
Conference, June 1992.
[9] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The Synchronous Data Flow Programming
Language LUSTRE. Proceedings of the IEEE, 79(9):1305–1319, 1991.
[10] A. Kalavade and Edwards A. Lee. Hardware/Software Co-design Using Ptolemy - a Case Study. In
Proceedings of the International Workshop on Hardware-Software Codesign, September 1992.
[11] Edward A. Lee. Embedded Software - An Agenda for Research. ERL Technical Report UCB/ERL
M99/63, University of California, Berkeley, CA, USA 94720, December 1999.
[12] S. Lee and J. M. Rabaey. A Hardware-Software Co-simulation Environment. In Proceedings of the
International Workshop on Hardware-Software Codesign, October 1993.
[13] F. Maraninchi. The Argos Language: Graphical Representation of Automata and Description of
Reactive Systems. In Proceedings of the IEEE Workshop on Visual Languages, Kobe, Japan, October
1991.
[14] William H. Press, Saul A. Taukolsky, William T. Vetterling, and Brian P. Flannery. Numerical
Recipes in C. Cambridge University Press, second edition edition, 1992.
[15] J. Rowson. Hardware/Software Co-simulation. In Proceedings of the Design Automation Conference,
pages 439–440, 1994.
[16] K. ten Hagen and H. Meyr. Timed and Untimed Hardware/Software Cosimulation: Application and
Efficient Implementation. In Proceedings of the International Workshop on Hardware-Software Codesign,
October 1993.
[17] J. M. Voas. PIE: A Dynamic Failure-Based Technique. IEEE Transactions on Software Engineering,
18(8):717–727, August 1992.
ICCAD2000, Pages 33-38
A Methodology for Verifying Memory Access Protocols in Behavioral Synthesis
Gernot Koch* , Taewhan Kim** , Reiner Genevriere*
* Synopsys Inc. , Mountain View, CA 94043 USA
**Dept. of Electrical Engineering & Computer Science
and Advanced Information Technology Research Center
KAIST, Taejon, 305-701 KOREA
Abstract
Memory is one of the most important components to be optimized in the several phases
of the synthesis process. In behavioral synthesis, a memory is viewed as an abstract
construct which hides the detail implementations of the memory. Consequently, for a
vendor's memory, behavioral synthesis should create a clean model of the memory
wrapper which abstracts the properties of the memory that are required to interface to the
rest of the circuit. However, this wrapping process invariably demands the verification
problem of the memory access protocols in order to be safely used in behavioral synthesis
environment. In this paper, we propose a systematic methodology of verifying the
correctness of the memory wrapper. Specifically, we analyze the complexity of the
problem, and derive an effective solution which is not only practically efficient but also
highly reliable. For designers who use memories as design components in behavioral
synthesis, automating our solution shortens the verification time significantly in contrast
of simulating memory accesses in the context of full design, which is a quite complex and
time-consuming process, especially for designs with many memory access operations.
References
[1] D. Gajski, N. Dutt, A. Wu, S. Lin, High-Level Synthesis: Introduction to Chip and System Design,
Kluwer Academic Publisher, 1992.
[2] D. W. Knapp, Behavioral Synthesis, Prentice Hall, 1996.
[3] T. Ly, D. Knapp, R. Miller, D. MacMillen, “Scheduling using Behavioral Templates,” DAC, 1995.
[4] Behavioral Compiler User Guide, Synopsys Inc., 1998.
[5] D. Knapp, T. Ly, D. MacMillen, R. Miller, “Behavioral Synthesis Methodology for HDL-Based
Specification and Validation,” DAC, 1995.
ICCAD2000, Pages 40-43
SYMBOLIC DEBUGGING SCHEME FOR OPTIMIZED HARDWARE AND
SOFTWARE
Farinaz Koushanfar* , Darko Kirovski** , and Miodrag Potkonjak*
* Electrical Engineeing and Computer Science Departments, University of California, Los Angeles, CA
** Microsoft Research, One MicrosoftWay, Redmond,WA
ABSTRACT
Symbolic debuggers are system development tools that can accelerate the validation
speed of behavioral specifications by allowing a user to interact with an executing code at
the source level. In response to a user query, the debugger retrieves the value of a source
variable in a manner consistent with respect to the source statement where execution has
halted. However, when a behavioral specification has been optimized using
transformations, values of variables may be inaccessible in the run-time state.
We have developed a set of techniques that, given a behavioral specification CDFG,
enforce computation of a selected subset Vcut of user variables such that (i) all other
variables v ∈ CDFG can be computed from Vcut and (ii) this enforcement has minimal
impact on the optimization potential of the computation. The implementation of the new
debugging approach poses several optimization tasks. We have formulated the
optimization tasks and developed heuristics to solve them. The effectiveness of the
approach has been demonstrated on a set of benchmark design.
REFERENCES
[Adl96] A.-R. Adl-Tabatabai and T. Gross. Source-level debugging of scalar optimized code. SIGPLAN
Notices, vol.31, (no.5), p.33-43, 1996.
[Aho77] A.V. Aho and J.D. Ullman. Principles of Compiler Design. Addison-Wesley, Reading, MA, 1977.
[Bha93] S.S. Bhattacharyya and E.A. Lee. Scheduling synchronous dataflow graphs for efficient looping.
Journal of VLSI Signal Processing, vol.6, (no.3), pp.271-88, 1993.
[Cor90] T.H. Cormen, et al. Introduction to algorithms. McGraw-Hill, 1990
[Dey99] S. Dey, et al. Controller-based power management for control-flow intensive designs. TCAD,
vol.18, (no.10), pp.1496-508, 1999.
[Dun92] P. Duncan, et al. Hi-Pass: A Computer Aided Synthesis System for Fully Parallel Digital Signal
Processing ASICs. ICASSP, pp. V-605-608, 1992.
[Gar79] M.R. Garey and D.S. Johnson. Computers and intractability. W. H. Freeman, San Francisco, CA,
1979.
[Hen82] J. Hennessy. Symbolic debugging of optimized code. Trans. on Programming Languages and
Systems, vol.4, (no.3), pp.323-44, 1982.
[Kir99] D. Kirovski, et al. Improving the Observability and Controllability of Datapaths for Emulationbased Debugging. TCAD, 1999.
[Hon00] I, Hong et al. Symbolic Debugging of Globally Optimized Behavioral Specifications. ASP-DAC,
2000.
[Pot92] M. Potkonjak and J. Rabaey. Maximally fast and arbitrarily fast implementation of linear
computations. ICCAD, pp.304-8, 1992.
[Rab91] J. Rabaey, et al. Fast Prototyping of Datapath-Intensive Architectures. Design and Test of
Computers, vol.8, (no.2), pp.40-51, 1991.
[Ziv98] A. Ziv and J. Bruck. Analysis of checkpointing schemes with task duplication. Trans. on
Computers, vol.47, (no.2), pp.222-7, 1998.
ICCAD2000, Pages 44-50
Automated Data Dependency Size Estimation with a Partially Fixed
Execution Ordering
Per Gunnar Kjeldsberg1, Francky Catthoor2, and Einar J. Aas1
1
Norwegian University of Science and Technology, Trondheim, Norway, -pgk|aas}@fysel.ntnu.no
2
IMEC, Leuven, Belgium, also at Katholieke Universiteit Leuven, catthoor@imec.be
Abstract
For data dominated applications, the system level design trajectory should first focus on
finding a good data transfer and storage solution. Since no realization details are available
at this level, estimates are needed to guide the designer. This paper presents an algorithm
for automated estimation of strict upper and lower bounds on the individual data
dependency sizes in high level application code given a partially fixed execution
ordering. Previous work has either not taken execution ordering into account at all,
resulting in large overestimates, or required a fully specified ordering which is usually
not available at this high level. The usefulness of the methodology is illustrated on
representative application demonstrators.
References
[1] F. Balasa, F. Catthoor, and H. De Man, "Background memory area estimation for multidimensional
signal processing systems", IEEE Trans. on VLSI Systems, Vol. 3, No. 2, June 1995, pp. 157-72
[2] U. Banerjee, "Dependency Analysis for Supercomputing", Kluwer Academic Publishers,
Boston/Dordrecth/London, 1988
[3] E. Brockmeyer, L. Nachtergaele, F. Catthoor, J. Bormans, and H. De Man, "Low Power Memory
Storage and Transfer Organization for the MPEG-4 Full Pel Motion Estimation on a Multimedia
Processor", IEEE Trans. on Multimedia, Vol. 1, No. 2, June 1999, pp. 202-16
[4] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, "Custom
Memory Management Methodology Exploration of Memory Organization for Embedded Multimedia
Systems Design", Kluwer Academic Publishers, 1998
[5] P. Feautrier, "Dataflow analysis of array and scalar references", International Journal of Parallel
Programming, Vol. 20, No. 1, Feb. 1991, pp. 23-52
[6] D.D. Gajski, F. Vahid, S. Narayan, and J. Gong, "Specification and Design of Embedded Systems",
Prentice Hall, 1994
[7] P. Grun, F. Balasa, and N. Dutt, "Memory Size Estimation for Multimedia Applications", Proc. Sixth
Int. Workshop on HW/SW Codesign (CODES/CACHE), March 1998, pp. 145-9
[8] P.G. Kjeldsberg, F. Catthoor, E.J. Aas, ”Storage requirement estimation for data intensive applications
with partially fixed execution ordering”, Proc. Int. Workshop on HW/SW Co-Design, CODES 2000, San
Diego, May 2000, pp. 56-60
[9] F.J. Kurdahi and A.C. Parker, "REAL: A Program for Register ALlocation", Proc. 24th DAC, 1987, pp.
210-5
[10] M. Moonen, P. Van Dooren, J. Vandewalle, "An SVD updating algorithm for subspace tracking",
SIAM Journal on Matrix Analysis and Applications, Vol. 13, No. 4, 1992, pp. 1015-1038
[11] W. Pugh and D. Wonnacott, "An exact method for analysis of value-based array data dependences",
Proc. 6th International Workshop on Languages and Compilers for Parallel Computing, Portland, USA,
Aug. 1993, pp. 546-66
[12] C-J. Tseng, and D.P. Siewiorek, "Automated Synthesis of Data Paths in Digital Systems", IEEE Trans.
on Computer Aided Design of Integrated Circuits and Systems, Vol. 5, No. 3, July 86, pp. 379-95
[13] I.M. Verbauwhede, C.J. Scheers, J.M. Rabaey, "Memory Estimation for High Level Synthesis", Proc.
31st DAC, 1994, pp. 143-8
[14] Y. Zhao and S. Malik, "Exact Memory Size Estimation for Array Computation without Loop
Unrolling", Proc 36th DAC, 1999, pp.811-6
ICCAD2000, Pages 51-54
FIR Filter Synthesis Algorithms for Minimizing the Delay and the
Number of Adders
Hyeong-Ju Kang , Hansoo Kim , and In-Cheol Park
VLSI Systems Lab., Dept. of Electrical Eng. and Comp.Science,
Korea Advanced Institute of Science and Technology, Korea
Abstract
As the complexity of digital filters is dominated by the number of multiplications, many
works have focused on minimizing the complexity of multiplier blocks that compute the
constant coefficient multiplications required in filters. Although the complexity of
multiplier blocks is significantly reduced by using efficient techniques such as
decomposing multiplications into simple operations and sharing common subexpressions,
previous works have not considered the delay of multiplier blocks which is a critical
factor in the design of complex filters. In this paper, we present new algorithms to
minimize the complexity of multiplier blocks under the given delay constraints. By
analyzing multiplier blocks in view of delay, three delay reduction methods are proposed
and combined into previous algorithms. Since the proposed algorithms can generate
multiplier blocks that meet the specified delay, a trade-off between delay and hardware
complexity is enabled by changing the delay constraints. Experimental results show that
the proposed algorithms can reduce the delay of multiplier blocks at the cost of a little
increase of complexity.
References
[1] A. G. Dempster and M. D. Macleod, “Use of minimum adder multiplier blocks in FIR digital filters,”
IEEE Transactions on Circuits and Systems-II:Analog and Digital Signal Processing, vol. 42, no. 9, pp.
569–77, 1995.
[2] D. R. Bull and D. H. Horrocks, “Primitive operator digital filters,” IEE Proceedings-G, vol. 138, no. 3,
pp. 401–12, 1991.
[3] M. Potkonjak, M. B. Srivastava, and A. Chandrakasan, “Efficient substitution of multiple constant
multiplications by shifts and additions using iterative pairwise matching,” in DAC94, Proceedings of 31st
ACM/IEEE Design Automation Conference, 1994, pp. 189–94.
[4] R. I. Hartley, “Subexpression sharing in filters using canonic signed digit multipliers,” IEEE
Transactions on Circuits and Systems-II:Analog and Digital Signal Processing, vol. 43, no. 10, pp. 677–88,
1996.
[5] R. Pa-sko, P. Schaumont, V. Derudder, S. Vernalde, and D. - Dura-ckov´a, “A new algorithm for
elimination of common subexpressions,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 18, no. 1, pp. 58–68, 1999.
[6] J. T. Kim, Design and implementation of computationally efficient FIR filters, and scalable VLSI
architectures for discrete wavelet transform, Ph.D. thesis, Korea Advanced Institute of Science and
Technology, 1998.
[7] A. G. Dempster and M. D. Macleod, “Constant integer multiplication using minimum adders,” IEE
Proc.-Circuits Devices Systems, vol. 141, no. 5, pp. 407–13, 1994.
ICCAD2000, Pages 56-61
Effects of Global Interconnect Optimizations on Performance
Estimation of Deep Submicron Design*
Yu Cao1, Chenming Hu1, Xuejue Huang1, Andrew B. Kahng2,
Sudhakar Muddu3, Dirk Stroobandt4, Dennis Sylvester5
1
EECS Department, UC Berkeley, USA; 2CS Department, UCLA, USA; 3Silicon Graphics, Inc., USA;
4
ELIS Department, Ghent University, Belgium; 5Synopsys, Inc., USA
ABSTRACT
In this paper, we quantify the impact of global interconnect optimization techniques that
address such design objectives as delay, peak noise, delay uncertainty due to noise,
power, and cost. In doing so, we develop a new system-performance simulation model as
a set of studies within the MARCO GSRC Technology Extrapolation (GTX) system. We
model a typical point-to-point global interconnect and focus on accurate assessment of
both circuit and design technology with respect to such issues as inductance, signal line
shielding, dynamic delay, buffer placement uncertainty and repeater staggering. We
demonstrate, for example, that optimal wire sizing models need to consider inductive
effects - and that use of more accurate {-1,3} worst-case capacitive coupling noise switch
factors substantially increases peak noise estimates compared to traditional {0,2} bounds.
We also find that optimal repeater sizes are significantly smaller than conventional
models would suggest, especially when considering energy-delay issues.
Keywords: System performance models, interconnect delay, crosstalk noise, inductance,
VLSI, technology extrapolation
REFERENCES
[1] H.B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley, 1990.
[2] G.A. Sai-Halasz, “Performance Trends in High-Performance Processors,” Proc. IEEE, Jan. 1995, pp.
20-36.
[3] J.C. Eble, V.K. De, D.S. Wills and J.D. Meindl, “A Generic System Simulator (GENESYS) for ASIC
Technology and Architecture Beyond 2001,” Proc. ASIC, 1996, pp. 193-196.
[4] Rensselaer Interconnect Performance Estimator (RIPE), http://latte.cie.rpi.edu/ripe.html
[5] D. Sylvester and K. Keutzer, “System-Level Performance Modeling with BACPAC -Berkeley
Advanced
Chip
Performance
Calculator,”
Proc.SLIP,
1999,
pp.109-114,
http://www.eecs.berkeley.edu/~dennis/bacpac/
[6] International Technology Roadmap for Semiconductors,” December 1999, http://www.itrs.net/
[7] P. D. Fisher and R. Nesbitt, “The Test of Time: Clock-Cycle Estimation and Test Challenges for Future
Microprocessors,” IEEE Circuits and Devices Magazine 14(2) 1998, pp. 37-44.
[8] A. E. Dunlop and P. D. Fisher, personal communication, 1999.
[9] J. Cong and D.Z. Pan, “Interconnect Estimation and Planning for Deep Submicron Designs,” Proc.
DAC, 1999, pp. 507-510.
[10] A.B. Kahng, S. Muddu and E. Sarto, “On Switch Factor Based Analysis of Coupled RC
Interconnects,” Proc. DAC, 2000, pp. 79-84.
[11] A.B. Kahng, S. Muddu and E. Sarto, “Tuning Strategies for Global Interconnects in High-Performance
Deep Submicron IC’s,” VLSI Design 10(1), 1999, pp. 21-34.
[12] http://vlsicad.cs.ucla.edu/GSRC/GTX/
[13] A E. Caldwell, Y. Cao, A.B. Kahng, F. Koushanfar, H. Lu, I. Markov, M. Oliver, D. Stroobandt and
D. Sylvester, “GTX: The MARCO GSRC Technology Extrapolation System,” Proc. DAC, 2000, pp. 693698.
[14] L. He, N. Chang, S. Lin, and O.S. Nakagawa, “An Efficient Inductance Modeling for On-Chip
Interconnects,” Proc. CICC, 1999, pp. 457-460.
[15] X. Qi, G. Wang, Z. Yu, R.W. Dutton, T. Young and N. Chang, “On-Chip Inductance Modeling and
RLC Extraction of VLSI Interconnects for Circuit Simulation,” Proc. CICC, 2000.
[16] A. E. Ruehli, “Inductance calculations in a complex integrated circuit environment,” IBM J. Res. Dev.,
September 1972, pp. 470-480.
[17] A.B. Kahng and S. Muddu, “An analytical delay model for RLC interconnects,” IEEE Trans. CAD
16(12) (1997), pp. 1507-1514.
[18] Y.I. Ismail, E. G. Friedman, J.L. Neves, “Equivalent Elmore delay for RLC trees”, IEEE Trans. CAD
19(1) (2000), pp. 83-97.
[19] Y. Massoud, S. Majors, T. Bustami and J. White, “Layout Techniques for Minimizing On-Chip
Interconnect Self-Inductance,” Proc. DAC, 1998, pp. 566-571.
[20] S. P. Khatri, A. Mehrotra, R. K. Brayton, A. Sangiovanni-Vincentelli, and R.H.J.M. Otten, “A Novel
VLSI Layout Fabric for Deep Submicron Applications,” Proc. DAC, 1999, pp. 491-496.
[21] K. L. Shepard et al., “Design Methodology for the S/390 Parallel Enterprise Server G4
Microprocessors,” IBM J. Res. Dev., July-Sept. 1997, pp. 515-554.
[22] T. Sakurai, “Closed-Form Expressions for Interconnect Delay, Crosstalk, and Coupling in VLSI’s,”
IEEE Trans. Electron Devices, Jan. 1993, pp. 118-124.
[23] D. Sylvester and K. Keutzer, “Getting to the Bottom of Deep Submicron,” Proc. ICCAD, 1998, pp.
203-211.
[24] J. Cong, T. Kong and D. Z. Pan, “Buffer Block Planning for Interconnect-Driven Floorplanning,”
Proc. ICCAD, 1999, pp. 358-363.
ICCAD2000, Pages 62-67
Impact of Systematic Spatial Intra-Chip Gate Length Variability on Performance
of High-Speed Digital Circuits
Michael Orshansky, Linda Milor*, Pinhong Chen, Kurt Keutzer, and Chenming Hu
University of California, Berkeley, CA, 94720
*eSilicon Corporation, Berkeley, CA, 94710
Abstract
Using data collected from an actual state-of-the-art fabrication facility, we conducted a
comprehensive characterization of an advanced 0.18 µm CMOS process. The measured
data revealed significant systematic, rather than random, spatial intra-chip variability of
MOS gate length, leading to large circuit path delay variation. The critical path value of a
combinational logic block varies by as much as 17%, and the global skew is increased by
8%. Thus, a significant timing error (~25%) and performance loss takes place if
variability is not properly addressed. We derive a model, which allows estimating
performance degradation for the given circuit and process parameters. Analysis shows
that the spatial, rather than proximity-dependent, systematic Lgate variability is the main
cause of large circuit speed degradation. The degradation is worse for the circuits with a
larger number of critical paths and shorter average logic depth. We propose a locationdependent timing analysis methodology that allows to mitigate the detrimental effects of
Lgate variability, and developed a tool linking the layout-dependent spatial information
to circuit analysis. We discuss the details of the practical implementation of the
methodology, and provide the guidelines for managing the design complexity.
References
[1] A. Kahng, Y. Pati, “Subwavelength optical lithography: challenges and impact on physical design,”
Proceedings of ISPD, p.112, 1999
[2] S. Nassif, “Within-chip variability analysis,” IEDM Technical Digest, p.283, 1998.
[3] V.Mehrotra, S.Nassif, D.Boning, J.Chung, “Modeling the effects of manufacturing variation on highspeed microprocessor interconnect performance,” IEDM Technical Digest, p.767, 1998.
[4] C. Yu et al, “Use of short-loop electrical measurements for yield improvement,” IEEE Trans. on
Semiconductor Manufacturing, vol. 8, no. 2, May 1995.
[5] B. Stine, D. S. Boning, J. E. Chung, “Analysis and decomposition of spatial variation in integrated
circuit processes and devices,” IEEE Trans. On Semiconductor Manufacturing, No.1, pp. 24-41, Feb. 1997.
[6] HSPICE User Manual, Avant!, 1999.
[7] Benchmark Combinational Circuits, ISCAS, 1985.
[8] PathMill User Guide, Synopsys, 1999.
[9] S. Nassif, “Statistical worst-case analysis for integrated circuits,” Statistical Approaches to VLSI,
Elsevier Science, 1994.
[10] P. Yang et al, “An integrated and efficient approach for MOS VLSI statistical circuit design,” IEEE
Trans. on CAD, No 1, Jan. 1986.
[11]Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 1997.
[12] D. Sylvester, K.Keutzer, “Getting to the bottom of deepsubmicron,” Proc. IC-CAD 1998.
ICCAD2000, Pages 68-74
Miller Factor for Gate-Level Coupling Delay Calculation
Pinhong Chen
Dept. of EECS
U. C. Berkeley
Berkeley, CA 94720, USA
pinhong@eecs.berkeley.edu
Desmond A. Kirkpatrick
Intel Corp.
Microprocessor Products Group
Hillsboro, OR 97124, USA
desmond.a.kirkpatrick@intel.com
Kurt Keutzer
Dept. of EECS
U. C. Berkeley
Berkeley, CA 94720, USA
keutzer@eecs.berkeley.edu
Abstract
In coupling delay computation, a Miller factor of more than 2X may be necessary to
account for active coupling capacitance when modeling the delay of deep submicron
circuitry in the presence of active coupling capacitance. We propose an efficient method
to estimate this factor such that the delay response of a decoupling circuit model can
emulate the original coupling circuit. Under the assumptions of zero initial voltage, equal
charge transfer, and 0.5VDD as the switching threshold voltage, an upper bound of 3X for
maximum delay and a lower bound of -1X for minimum delay can be proven. Efficient
Newton-Raphson iteration is also proposed as a technique for computing the Miller factor
or effective capacitance. This result is highly applicable to crosstalk coupling delay
calculation in deep submicron gate-level static timing analysis. Detailed analysis and
approximation are presented. SPICE simulations are demonstrated to show high
correlation with these approximations.
References
[1] G. Yee, R. Chandra, V. Ganesan, and C. Sechen. “Wire Delay in the Presence of Crosstalk”. In
IEEE/ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems,
pages 170–175, 1997.
[2] A. B. Kahng, S. Muddu, and E. Sarto. “On Switch Factor Based Analysis of Coupled RC
Interconnects”. In Design Automation Conference, pages 79–84, 2000.
[3] B. Franzini, C. Forzan, D. Pandini, P. Scandolara, and A. D. Fabbro. “Crosstalk Aware Static Timing
Analysis: a Two Step Approach”. In IEEE of 1st International Symposium on Quality Electronic Design,
pages 499–503, Mar. 2000.
[4] P. F. Tehrani, S. W. Chyou, and U. Ekambaram. “Deep Sub-Micron Static Timing Analysis in Presence
of Crosstalk”. In IEEE of 1st International Symposium on Quality Electronic Design, pages 505–512, Mar.
2000.
[5] F. Dartu and L. T. Pileggi. “Calculating Worst-Case Gate Delays Due to Dominant Capacitance
Coupling”. In Proc. of 34th ACM/IEEE Design Automation Conference, Jun. 1997.
[6] F. Dartu, N. Menezes, and L. T. Pileggi. “Performance computation for precharacterized CMOS gates
with RCloads”. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 15:544–553,
May 1996.
[7] A. B. Kahng, S. Muddu, and D. Vidhani. “Noise and Delay Estimation for Coupled RC Interconnects”.
In IEEE AISC/SoC, Mar. 1999.
[8] M. Becer and I. N. Hajj. “An Analytical Model for Delay and Crosstalk Estimation with Application to
Decoupling”. In IEEE of 1st International Symposium on Quality Electronic Design, pages 51–57, Mar.
2000.
[9] W. Chen, S. K. Gupta, andM. A. Breuer. “Analytic Models for Crosstalk Delay and Pulse Analysis
Under Non-Ideal Inputs”. In International Test Conference, pages 809–818, Nov. 1997.
[10] T. Xiao and M. Marek-Sadowska. “Efficient Delay Calculation in Presence of Crosstalk”. In IEEE of
1st International Symposium on Quality ElectronicDesign, pages 491–497, Mar. 2000.
[11] K. L. Shepard, V.Narayanan, P.C. Elmendor, and Gutuan Zheng. “Global Harmony: Coupled Noise
Analysis for Full-Chip RC Interconnect Network”. In Proc. of International Conference on Computer
Aided Design, pages 139–146, 1997.
[12] A. Devgan. “Efficient Coupled Noise Estimation for On-Chip Interconnects”. In Proc. of International
Conference on Computer Aided Design, pages 147–151, 1997.
[13] K. Aringaran, F. Klass, C. M. Kim, C. Amir, J. Mitra, E. You, J. Mohd, and S. K. Dong. “Coupling
Noise Analysis for VLSI and ULSI Circuits”. In IEEE of 1st International Symposium on Quality
Electronic Design, pages 485–489, Mar. 2000.
ICCAD2000, Pages 76-82
Challenges and Opportunities in Broadband and Wireless Communication Designs
Jan M. Rabaey1, Miodrag Potkonjak2, Farinaz Koushanfar3, Suet-Fei Li1, Tim Tuan1
1
EECS Department, University of California, Berkeley, CA 94720
{ CS, 3 EE} Departments, University of California, Los Angeles, CA 90095
2
ABSTRACT
Communication designs form the fastest growing segment of the semiconductor market.
Both network processors and wireless chipsets have been attracting a great deal of
research attention, financial resources and design efforts. However, further progress is
limited by lack of adequate system methodologies and tools. Our goal in this tutorial is to
provide impetus for development of communication design techniques and tools.
The first part addresses network processors (NP) that we study from three viewpoints:
application, architecture, and system software and compilation tools. In addition to
summary of main issues and representative case studies, we identify main system design
issues. The second part of the tutorial focuses on wireless design. The main emphasis is
on platform-based design methodology that leverages on functional profiling, architecture
exploration, and orthogonalization of concerns to facilitate low-power wireless
communication systems. The highlight of the paper, an in-depth study of the state-of-theart wireless design, PicoRadio, is used as explanatory design example.
References
[1] G. Qu and M. Potkonjak, Techniques for Energy Minimization of Communication Pipelines, ICCAD.
pp.597-600, Nov 1998.
[2] R.Y Wang; A. Krishnamurthy; R.P. Martin; T.E. Anderson and others, Modeling Communication
Pipeline Latency. SIGMETRICS’98. pp.22-32, 1998.
[3] T. Wolf and M. Franklin, CommBench – A Telecommunication Benchmark for Network Processors.
Proceedings of IEEE International Symposium on Performance Analysis of Systems and Softwares
(ISPASS-2000), Austin, Texas, pp. 154-162, April 2000.
[4] J.L. Henning, SPEC CPU 2000: Measuring CPU Performance in the New Millennium. Computer,
vol33, (no.7), pp. 28-35, July 2000.
[5] D. Husak and R. Gohn, Communication Processor Programming Models: Keys to The Promise.
Proceedings of Gigabit Ethernet Conference(GEC), pp. 298-309, March 2000.
[6] M. Hathaway, Building Next Generation Network Processors. Proceedings of Gigabit Ethernet
Conference(GEC), pp. 310-319, March 2000.
[7] F. Koushanfar, V. Prabhu, M. Potkonjak and J.M. Rabaey, Processors for Mobile Applications. To
appear in Proceedings of International Conference on Computer Designs (ICCD), Sept. 2000.
[8] I. Verbauwhede and C. Nicol, Low Power DSP’s for Wireless Communications. Proceedings of the
International Symposium on Low Power Electronics and Design (ISLPED), pp. 303-310, July 2000.
[9] H. Soor, Performance Analysis on Gigabit Ethernet Switches, Proceedings of Gigabit Ethernet
Conference(GEC), pp. 77-81, March 2000.
[10] The Bluetooth Special Interest Group, http://www.bluetooth.com/.
[11] IEEE 802.11 Working Group for WLAN, http://www.manta.ieee.org/groups/802/11/.
[12] The HomeRF Working Group http://www.homerf.org/.
[13] A. Ferrari and A. Sangiovanni-Vincentelli, System Design: Traditional Concepts and New Paradigms,
Proceedings of the 1999 Int. Conf. On Comp. Des., Austin, Oct. 1999.
[14] B. Kienhuis et al, An Approach for Quantitative Analysis of Application-specific Dataflow
Architectures, Proceedings of International Conf. of Application-specific Systems, Architectures and
Processors, pp. 338-349, Zurich, Switzerland 1997.
[15] J. Rowson, A. Sangiovanni-Vincentelli, System Level Design, EE Times, 1996.
[16] J. Rowson and A. Sangiovanni-Vincentelli, Interfacebased Design, Proceedings of the 34th Design
Automation Conference (DAC-97). pp. 178-183, Las Vegas, June 1997.
[17] E. Lee and A. Sangiovanni-Vincentelli, A Unified Framework for Comparing Models of Computation,
IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, Vol. 17, N. 12:1217-1229,
December 1998.
[18] J. Rabaey et al. PicoRadio Supports Ad Hoc Ultra-Low Power Wireless Networking. IEEE Computer,
Vol. 33, No. 7, pp. 42-48, July 2000.
[19] R. Amirtharajah, A. P. Chandrakasan, Self-Powered Signal Processing Using Vibration-based Power
Generation. IEEE Journal of Solid State Circuits, vol. 33, no. 5, pp. 687-95, May 1998.
[20] OPNET Radio Modeler, OPNET Technologies, Inc., http://www.mil3.com
[21] C. Perkins and E. Royer, Ad-hoc On-Demand Distance Vector Routing, Proceeding of the 2nd IEEE
Workshop. Mobile Comp. Sys. and Apps. pp. 90-100, Feb. 1999.
[22] C. Perkins and P. Bhagwat, Highly Dynamic Destination Sequenced Distance-Vector Routing (DSDV)
for Mobile Computers, Computer Communications Review, pp. 234-44, Oct. 1994.
[23] Tensilica, Inc. http://www.tensilica.com/.
[24] M. Smith, Application Specific Integrated Circuits, Addison-Wesley, 1997.
[25] BWRC homepage, http://bwrc.eecs.berkeley.edu/
ICCAD2000, Pages 84-91
Challenges in Physical Chip Design
Ralph H.J.M. Otten
Eindhoven University of Technology, Eindhoven, The Netherlands, otten@ics.ele.tue.nl
Paul Stravers
Philips Research, Eindhoven, The Netherlands, paulus.stravers@philips.com
Introduction
Chip industry obeys a number of laws, various kinds of laws. Mathematical laws if
accurate models can be formulated, physical laws, especially solid state physics, obtained
by observation and induction, chemical laws pertinent for the manufacturing processes,
economical and judicial laws that concern such industries. The most famous and most
cited law of chip industry is the one that Gordon Moore formulated in 1964 after
observing trends in the then very young field of integration of electronic circuits.
Mathematically formulated, Moore's law reads as follows: dN/dt ∝ N, (1) where N is the
maximum number of devices on a single chip. The proportionality constant is called the
moore exponent which according to Moore, with years as the unit of time, equaled 0:7.
An even older law, also formulated after observing properties of early logic circuitry in
computers, is known as Rent's rule. dT/dG ∝ T/G, (2) where T is the number of external
connections of a part containing G gates. The proportionality constant is called rent
exponent.
Both laws seem to hold surprisingly accurate. Moore's law soon became the ultimate
guideline for setting targets in the chip industry. In a sense it has thus become a selffulfilling prophesy, although it is still remarkable that that industry was able to satisfy
such ambitious goals. Rent's rule went through stages of neglect and popularity. A
convincing case for the usefulness of such a law came with IBM's need for wire space
estimations for gate arrays, as documented in the Donath's landmark paper [5]. Both, the
and rent exponents, had to be tied to a more specific class of circuits. The recent report
[17] of ICE established a moore exponent of 0:2 for microprocessors and 0:4 for memory
(figure 1). Bakoglu [1] showed rent exponents between 0:12 and 0:63, distinguishing
dynamic and static memory, microprocessors, gate arrays and high-speed processors…
References
[1] H.B. Bakoglu, “Circuits, interconnections, and packaging for vlsi”, Addison-Wesley Pub Co, 1990
[2] S. Bruma, “Into deep submicron: a simulation perspective”, PhD-thesis, Delft University of
Technology, Delft, The Netherlands
[3] D. Burger and J.R. Goodman and A. K¨agi, “Memory Bandwidth Limitations of Future
Microprocessors”, 23rd annual International Symposium on Computer
Architecture (ISCA-96), 1996, pp.78–88.
[4] D.M. Caughey and R.E. Thomas, “Carrier mobilities in silicon empirically related to doping and field”.
Proceedings IEEE, 55, 2192 (1967).
[5] W.E. Donath, “Placement and average interconnection lengths of computer logic” IEEE Transactions
on Circuits and Systems, CAS-26, 4, April 1979
[6] W.E. Donath, “Wire length distribution for placements of computer logic”, IBM Journal of Research
and Development,25, 3, May 1981, pp. 152-155.
[7] H. Garcia-Molina and L.R. Rogers, “Performance through memory”, Proceedings of the ACM
conference on Measurement and modeling of computer systems, May 1987, pp. 122-131
[8] E.A. de Kock, G. Essink, W.J.M. Smits, P. van der Wolf, J.-Y. Brunel, W.M. Kruijtzer, P. Lieverse,
K.A. Vissers, “YAPI: application modelling for signal processing
systems” Proceedings of the 37th Design Automation Conference, Los Angeles, Ca, USA, June 2000, pp.
402-405.
[9] R.H.J.M. Otten “Complexity and diversity in ic layout design” International Conference on Circuits
and Computers, New York, NY., USA, pp 764-767, October 1980.
[10] R.H.J.M. Otten, “Global wires harmful?”, Proc. 1998 International symposium on physical design’
Monterey, CA, USA, April 1998,pp.104-109.
[11] R.H.J.M. Otten and R.K. Brayton, “Planning for performance”, Proc. 1998 Design Automation
Conference. San Fransisco, CA,,USA, June 1998, p. 122-127.
[12] M.B. Kleiner, S.A. K¨uhn, P.Ramm and W. Weber, “Performance improvement of the memory
hierarchy of risc-systems by application of 3-d technology”, IEEE Transactions on Components,
Packaging and Manufacturing Technology, Part B, vol 19,
4, November 1996., pp 709-718
[13] S.A. K¨uhn, M.B. Kleiner, P.Ramm and W. Weber, “Performance modeling of the interconnect
structure of a three-dimensional integrated risc processor/cache system”, IEEE Transactions on
Components, Packaging and Manufacturing Technology, Part B, vol 19, 4, November 1996., pp 719-718
[14] M.B. Kleiner, S.A. K¨uhn, P.Ramm and W. Weber, “Thermal analysis of vertically integrated
circuits”, Proceedings IEDM, 1995, pp 487-490
[15] S.A. K¨uhn, M.B. Kleiner, P.Ramm and W. Weber, “Interconnect capacitances, crosstalk and signal
delay in vertically integrated circuits”, Proceedings IEDM, 1995, pp 249-252
[16] S. Strickland, E. Ergin, D.R. Kaeli and P. Zavracky, “VLSI in the third dimension” Integration, the
VLSI journal, 25, 1, September 1998, pp 1-16
[17] “Status2000: integrated circuit industry report”, Integrated Circuit Engineering Corporation, Scotsdale,
AZ, USA, 2000
ICCAD2000, Pages 93-98
General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs
Hongbing Fan
Dept. of Computer Science, University of Victoria, Victoria BC Canada V8W 3P6
Jiping Liu
Dept. of Math. & Comp. Sci., University of Lethbridge, Lethbridge AB Canada T1K 3M4
Yu-LiangWu
Dept. of Comp. Sci. & Eng., Chinese University of Hong Kong, Shatin N.T. Hong Kong
Abstract
An FPGA switch box is said to be hyper-universal if it is routable for all possible
surrounding multi-pin net topologies satisfying the routing resource constraints. It is
desirable to design hyper-universal switch boxes with the minimum number of switches.
A previous work, Universal Switch Module, considered such a design problem
concerning 2-pin net routings around a single FPGA switch box. However, as most nets
are multi-pin nets in practice, it is imperative to study the problem that involves multi-pin
nets. In this paper, we provide a new view of global routings and formulate the most
general k-sided switch box design problem into an optimum k-partite graph design
problem. Applying a powerful decomposition theorem of global routings, we prove that,
for a fixed k, the number of switches in an optimum k-sided switch box with W terminals
on each side is O{W), by constructing some hyper-universal switch boxes with O{W)
switches. Furthermore, we obtain optimum, hyper-universal 2-sided and 3-sided switch
boxes, and propose hyper-universal 4-sided boxes with less than 6.7W switches, which is
very close to the lower bound 6W obtained for pure 2-pin net models in [5].
References
[1] M. J. Alexander and G. Robins. "New performance FPGA routing algorithms". Proceedings DAC,
pages 562-567,1995.
[2] J. A. Bondy and U.S.R. Murty. Graph Theory with Applications. Macmillan Press, London, 1976.
[3] S. Brown, R. J. Francise, J. Rose, and Z. G. Vranesic. Field-Programmable Gate Arrays. KluwerAcademic Publisher, Boston MA, 1992.
[4] S. Brown, J. Rose, and Z. G. Vranesic. "A detailed router for fieldprogrammable gate arrays". IEEE
Trans. on Computer-Aided Design, 11:620-628, May 1992.
[5] Y W. Chang, D. F. Wong, and C. K. Wong. "Universal switch models for FPGA". ACM Trans. on
Design Automation of Electronic Systems,1(1):80-101, January 1996.
[6] A. Corp. The Maximalist Handbook. 1990.
[7] H. Fan, P. Haxell, and J. Liu. "The global routing-a combinatorial design problem". (submitted).
[8] Y S. Lee and C. H. Wu. "A performance and routability driven router for FPGAs considering path
delays". Proceedings DAC, pages 557-561,1995.
[9] J. F. Pan, Y L. Wu, G. Yan, and C. K. Wong. "On the optimal four-way switch box routing structures of
FPGA greedy routing architectures". Integration, the VLSI Journal, 25:137-159,1998.
[10] J. Rose and S. Brown. "Flexibility of interconnection structures for field-programmable gate arrays".
IEEE J Solid-State Circuits, 26(3):277-282,1991.
[11] Y L. Wu and D. Chang. "On NP-completeness of 2-D FPGA routing architectures and a novel
solution". Proceedings of International Conference on Computer-Aided-Design, pages 362-366, 1994.
[12] Y L. Wu and M. Marek-Sadowska. "Routing for array type FPGAs". IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems, 16(5):506-518, May 1997.
[13] Y L. Wu, S. Tsukiyama, and M. Marek-Sadowska. "On computational complexity of a detailed
routing problem in two-dimensional FPGAs". Proc. 4th Great Lakes Symp. VLSI, March 1994.
[14] Y L. Wu, S. Tsukiyama, and M. Marek-Sadowska. "Graph based analysis of 2-D FPGA routing".
IEEE Trans. on Computer-Aided Design, 15(1):33-44,1996.
ICCAD2000, Pages 99-103
A Timing-constrained Algorithm for Simultaneous
Global Routing of Multiple Nets
Jiang Hu, Sachin S. Sapatnekar
Department of ECE, University of Minnesota, Minneapolis, MN 55455, USA
ABSTRACT
In this paper, we propose a new approach for VLSI interconnect global routing that can
optimize both congestion and delay, which are often competing objectives. Our approach
provides a general framework that may use any single-net routing algorithm and any
delay model in global routing. It is based on the observation that there are several routing
topology flexibilities under timing constraints. These flexibilities are exploited for
congestion reduction through a network flow based hierarchical bisection and assignment
process. Experimental results on benchmark circuits are quite promising.
REFERENCES
[1] C. Chiang, C. K. Wong and M. Sarrafzadeh, “A weighted Steiner tree-based global router with
simultaneous length and density minimization,” IEEE Trans. on CAD, Vol. 13, No. 12, pp. 1461-1469,
Dec., 1994.
[2] B. S. Ting and B. N. Tien, “Routing techniques for gate array,” IEEE Trans. on CAD, Vol. CAD-2, No.
4, pp. 301-312, Oct., 1983.
[3] R. C. Carden IV, J. Li and C.-K. Cheng, “A global router with a theoretical bound on the optimal
solution,” IEEE Trans. on CAD, Vol. 15, No. 2, pp. 208-216, Feb., 1996.
[4] M. Burstein and R. Pelavin, “Hierarchical wire routing,” IEEE Trans. on CAD, Vol. CAD-2, No. 4, pp.
223-234, Oct., 1983.
[5] M. Marek-Sadowska, “Route planner for custom chip design,” Proc. ICCAD, pp. 246-249, 1986.
[6] D. Wang and E. S. Kuh, “Performance-driven interconnect global routing,” Proc. of the IEEE Great
Lakes Symp. on VLSI, pp. 132-136, 1996.
[7] J. Huang, X.-L. Hong, C.-K. Cheng and E. S. Kuh, “An efficient timing-driven global routing
algorithm,” Proc. DAC, pp. 596-600, 1993.
[8] J. Cong and P. H. Madden, “Performance driven global routing for standard cell design,” Proc. ISPD,
pp.73-80, 1997.
[9] K. D. Boese, A. B. Kahng, B. A. McCoy and G. Robins, “Nearoptimal critical sink routing tree
constructions,” IEEE Trans. on CAD, Vol. 14, No. 12, pp. 1417-36, Dec. 1995.
[10] K. Zhu, Y.-W. Chang and D. F. Wong, “Timing-driven routing for symmetrical-array-based FPGAs,”
Proc. ICCD, pp.628-633, 1998.
[11] J. Hu and S. S. Sapatnekar, “FAR-DS: full-plane AWE routing with driver sizing,” Proc. DAC, pp. 8489, 1999.
[12] H. Hou, J. Hu and S. S. Sapatnekar, “Non-Hanan routing”, IEEE Trans. on CAD, Vol. 18, No. 4, pp.
436-444, Apr., 1999.
[13] J. Lillis, C. K. Cheng, T. T. Lin and C. Y. Ho, “New performance driven routing techniques with
explicit area/delay tradeoff and simultaneous wire sizing,” Proc. DAC, pp. 395-400, Jun. 1996.
[14] J. Cong and C. K. Koh, “Interconnect layout optimization under higher-order RLC model,” Proc.
ICCAD, pp. 713-720, 1997.
[15] R. K. Ahuja, T. L. Magnanti and J. B. Orlin, Network flows: theory, algorithms, and applications.
Prentice Hall, Upper Saddle River, NJ, 1993.
[16] K. D. Wayne and L. Fleischer, “Fast and simple approximation schemes for generalized flow,” Proc.
Symposium on Discrete Algorithms, pp. 981-982, 1999.
ICCAD2000, Pages 104-109
Provably Good Global Buffering Using an Available Buffer Block Plan
Feodor F. Dragan, Andrew B. Kahng, Ion Măndoiu†, Sudhakar Muddu‡, and Alexander
Zelikovsky¶
UCLA Department of Computer Science, Los Angeles, CA 90095-1596
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280
‡
Silicon Graphics, Inc., Mountain View, CA 94039
¶
Department of Computer Science, Georgia State University, Atlanta, GA 30303
†
Abstract
To implement high-performance global interconnect without impacting the performance
of existing blocks, the use of buffer blocks is increasingly popular in structured-custom
and block-based ASIC/SOC methodologies. Recent works by Cong et al. [6] and Tang
and Wong [25] give algorithms to solve the buffer block planning problem. In this paper
we address the problem of how to perform buffering of global nets given an existing
buffer block plan. Assuming as in [6, 25] that global nets have been already decomposed
into two-pin connections, we give a provably good algorithm based on a recent approach
of Garg and Könemann [8] and Fleischer [7]. Our method routes connections using
available buffer blocks, such that required upper and lower bounds on buffer intervals as well as wirelength upper bounds per connection - are satisfied. Unlike [6, 25], our
model allows more than one buffer to be inserted into any given connection. In addition,
our algorithm observes buffer parity constraints, i.e., it will choose to use an inverter or a
buffer (= colocated pair of inverters) according to source and destination signal parity.
The algorithm outperforms previous approaches [6] and has been validated on top-level
layouts extracted from a recent high-end microprocessor design.
References
[1] R.K. Ahuja, T.L. Magnanti, and J.L. Orlin, Network Flows: Theory, Algorithms, and Applications,
Prentice Hall, Englewood Cliffs, NJ, 1993.
[2] A. E. Caldwell, Y. Cao, A. B. Kahng, F. Koushanfar, H. Lu, I. L. Markov, M. R. Oliver, D. Stroobandt
and D. Sylvester, “GTX: The MARCO GSRC Technology Extrapolation System”, Proc. DAC, 2000, pp.
693–698.
[3] R. C. Carden and C.-K. Cheng, “A Global Router Using an Efficient Approximate Multicommodity
Multiterminal Flow Algorithm”, Proc. DAC, 1991, pp. 316–321.
[4] J. Castro and N. Nabona, “An Implementation of Linear and Non-Linear Multicommodity Network
Flows”, Eur. J. Oper. Res.92 (1996), pp. 37–53.
[5] J. Cong, L. He, C.-K. Koh and P. H. Madden, “Performance Optimization of VLSI Interconnect
Layout”, Integration 21 (1996), pp. 1–94.
[6] J. Cong, T. Kong and D. Z. Pan, “Buffer Block Planning for Interconnect-Driven Floorplanning”, Proc.
ICCAD, 1999, pp. 358–363.
[7] L. K. Fleischer, “Approximating Fractional Multicommodity Flow Independent of the Number of
Commodities”, Proc. 40th Annual Symposium on Foundations of Computer Science, 1999, pp. 24–31.
[8] N. Garg and J. K¨onemann, “Faster and Simpler Algorithms for Multicommodity Flow and Other
Fractional Packing Problems”, Proc. 39th Annual Symposium on Foundations of Computer Science, 1998,
pp. 300–309.
[9] R. Goering, “Cadence looks to overhaul chip design flow”, EE Times, May 15, 1999.
[10] A.V. Goldberg, J.D. Oldham, S. Plotkin, and C. Stein, “An Implementation of a Combinatorial
Approximation Algorithm for Minimum-Cost Multicommodity Flow”, Proc. Integer Programming and
Combinatorial Optimization, LNCS 1412, Springer, Berlin, 1998, pp. 338–352.
[11] J. Huang, X.-L. Hong, C.-K. Cheng and E. S. Kuh, “An Efficient Timing-Driven Global Routing
Algorithm”, Proc. DAC, 1993, pp. 596–600.
[12] A. B. Kahng, S. Muddu and E. Sarto, “Tuning Strategies for Global Interconnects in HighPerformance Deep-Submicron ICs”, VLSI Design 10(1) (1999), pp. 21–34.
[13] A. B. Kahng, S. Muddu and E. Sarto, “On Switch Factor Based Analysis of Coupled RC
Interconnects”, Proc. DAC, 2000, pp. 79–84.
[14] A. B. Kahng, S. Muddu, E. Sarto and R. Sharma, “Interconnect Tuning Strategies for HighPerformance ICs”, Proc. DATE, 1998.
[15] A. Kamath and O. Palmon, “Improved Interior Point Algorithms for Exact and Approximate Solution
of Multicommodity Flow Problems”, Proc. 6th Annual ACM–SIAM Symp. on Discrete Algorithms, 1995,
pp. 502–511.
[16] M. Kang, W. Dai, T. Dillinger and D. LaPotin, “Delay Bounded Buffered Tree Construction for
Timing Driven Floorplanning”, Proc. ICCAD, 1997, pp. 707–712.
[17] T. Leong, P. Shor, and C. Stein, “Implementation of a Combinatorial Multicommodity Flow
Algorithm”, In D.S. Johnson and C.C. McGeoch, eds., Network Flows and Matching, vol. 12 of Series in
Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, 1993, pp. 387–
405.
[18] J. Lillis, C. K. Cheng and T. T. Y. Lin, “Optimal Wire Sizing and Buffer Insertion for Low Power and
a Generalized Delay Model”, Proc. ICCAD, 1995, pp. 138–143.
[19] R. Motwani, J. Naor, and P. Raghavan, “Randomized Approximation Algorithms in Combinatorial
Optimization”, In Approximation Algorithms for NP-hard Problems (Boston, MA, 1997), D. Hochbaum,
ed., PWS Publishing, pp. 144–191.
[20] T. Okamoto and J. Cong, “Buffered Steiner Tree Construction With Wire Sizing for Interconnect
Layout Optimization”, Proc. ICCAD, 1996, pp. 44–49.
[21] P. Raghavan and C.D. Thompson, “Randomized Rounding”, Combinatorica 7 (1987), pp. 365–374.
[22] J. O’Rourke, Computational Geometry in C, Cambridge University Press, 1993.
[23] Semiconductor Industry Association, International Technology Roadmap for Semiconductors, 1999.
[24] E. Shragowitz and S. Keel, “A Global Router Based on a Multicommodity Flow Model”, Integration
5(1) (1987), pp. 3–16.
[25] X. Tang and D. F. Wong, “Planning Buffer Locations by Network Flows”, Proc. ISPD, 2000, pp. 180–
185.
ICCAD2000, Pages 110-113
Predictable Routing
Ryan Kastner, Elaheh Bozorgzadeh and Majid Sarrafzadeh
Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208
Abstract
Predictable routing is the concept of using prespecified patterns to route a net. By doing
this, we allow an more accurate prediction mechanism for metrics such as congestion and
wirelength earlier in the design flow. Additionally, we can better plan the routes, insert
buffers and perform wire sizing earlier. With comparable routing quality, we show that
we can predictably route up to 80% of a selected subset of nets. Also, we introduce
methods for finding a group of nets which can be predictably routed.
References
[1] A. Kahng, S. Mantik and D. Stroobandt. “Requirements for Models of Achievable Routing". In Proc.
International Symposium on Physical Design, April 2000.
[2] D. Sylvester and K. Keutzer. “A Global Wiring Paradigm for Deep Submicron Design". In IEEE
Transactions on Computer Aided Design, February 2000.
[3] K. Kozminski. “Benchmarks for Layout Synthesis - Evolution and Current Status". In Proc. ACM/IEEE
Design Automation Conference, June 1991.
[4] L.P.P.P van Ginneken. “Buffer Placement in Distributed RC-tree Networks for Minimal Elmore Delay".
In Proc. International Symposium on Circuits and Systems, 1990.
[5] M. Sarrafzadeh and C.K. Wong. An Introduction to VLSI Physical Design. McGraw-Hill, New York,
NY, 1996.
[6] M.Wang, X. Yang and M. Sarrafzadeh. “DRAGON: Fast Standard-Cell Placement for Large Circuits".
In Proc. IEEE International Conference on Computer Aided Design, November 2000.
[7] N. Sherwani. “Algorithms For VLSI Physical Design Automation". Kluwer Academic Publishers,
Boston, MA, 1993.
[8] R. Kastner, E. Bozorgzadeh and M. Sarrafzadeh. “Coupling Aware Routing". In IEEE International
ASIC/SOC Conference, September 2000.
[9] R. Kastner, E. Bozorgzadeh and M. Sarrafzadeh. Predictable Routing. Technical Report, Northwestern
University Department of Electrical and Computer Engineering, 2000.
[10] R. Nair. “A Simple Yet Effective Technique for Global Wiring". In IEEE Transactions on Computer
Aided Design, March 1987.
[11] T. Lengauer. “Combinatorial Algorithms for Integrated Circuit Layout". John Wiley and Sons, New
York, 1990.
[12] W. Dougherty and D. Thomas. “Unifying Behavioral Synthesis and Physical Design". In Proc.
ACM/IEEE Design Automation Conference, June 2000.
[13] W.J. Sun and C. Sechen. “Efficient and Effective Placement for Very Large Circuits". In Proc.
International Conference on Computer Aided Design, November 1993.
ICCAD2000, Pages 115-119
Counterexample-Guided Choice of Projections in Approximate Symbolic Model
Checking
Shankar G. Govindaraju, David L. Dill
Computer Systems Laboratory, Stanford University, Stanford, CA 94305
Abstract
BDD-based symbolic techniques of approximate reachability analysis based on
decomposing the circuit into a collection of overlapping sub-machines (also referred to as
overlapping projections) have been recently proposed. Computing a superset of the
reachable states in this fashion is susceptible to false negatives. Searching for real
counterexamples in such an approximate space is liable to failure. In this paper, the
hybridization effect" induced by the choice of projections is identified as the cause for
the failure. A heuristic based on Hamming Distance is proposed to improve the choice of
projections, that reduces the hybridization effect and facilitates either a genuine
counterexample or proof of the property. The ideas are evaluated on a real large design
example from the PCI Interface unit in the MAGIC chip of the Stanford FLASH
Multiprocessor.
References
[1] Balarin, F. and Sangiovanni-Vincentelli, A. L., “An iterative approach to language containment," CAV,
pp. 29-40, 1993.
[2] Bryant, R. E., “Graph-based algorithms for Boolean function manipulation," IEEE Transactions on
Computers, Vol. C-35, No. 8, pp. 677-691, August 1986.
[3] Burch, J. R. et al., “Symbolic model checking: 1020 states and beyond," LICS, pp. 428-439, 1990.
[4] Cho, H. et al., “Automatic state space decomposition for approximate FSM traversal based on circuit
analysis," IEEE-TCAD, Vol. 15, No. 12, pp. 1451-1464, December 1996.
[5] Clarke, E. et al., “Counterexample-guided abstraction refinement," CAV, pp. 154-169, July 2000.
[6] Govindaraju, G. S., Dill, D. L., Hu, A. J, and Horowitz, M. A., “Approximate reachability with BDDs
using overlapping projections," DAC, pp. 451-456, 1998.
[7] Govindaraju, G. S. and Dill, D. L., “Verification by approximate forward and backward reachability,"
ICCAD, pp. 366-370, 1998.
[8] Kurshan, R. P., “Timing verification by successive approximation," US Patent US05483470.
[9] Kuskin, J., et al., “The Stanford FLASH multiprocessor," ISCA, pp. 301-313, April 1994.
[10] Shimizu, K., Dill, D., and Hu, A. J., “Monitor based formal specification of PCI," (To appear).
[11] Yang, C. H., and Dill, D., “Validation with guided search of the state space," DAC, pp. 599-604, 1998.
ICCAD2000, Pages 120-126
Smart Simulation
Using Collaborative Formal and Simulation Engines
Pei-Hsin Ho, Thomas Shiple, Kevin Harer, James Kukula,
Robert Damiano, Valeria Bertacco, Jerry Taylor, Jiang Long
Advanced Technology Group, Synopsys Inc.
Abstract
We present Ketchum, a tool that was developed to improve the productivity of
simulation-based functional verification by providing two capabilities: (1) automatic test
generation and (2) unreachability analysis. Given a set of "interesting" signals in the
design under test ( DUT), automatic test generation creates input stimuli that drive the
DUT through as many different combinations (called coverage states) of these signals as
possible to thoroughly exercise the DUT. Unreachability analysis identifies as many
unreachable coverage states as possible.
Ketchum differs from the previous published results for several reasons. First, Ketchum
provides 10x higher capacity than previous published results. The higher capacity is
achieved by carefully orchestrating simulation and multiple formal methods including
symbolic simulation, SAT-based BMC, symbolic fixpoint computation and automatic
abstraction. Second, Ketchum performs not only automatic test generation but also
unreachability analysis, which enables the test generation effort to be focused on
coverage states that are not unreachable. Third, the backbone of Ketchum is an off-theshelf commercial simulator. It enables Ketchum to reach deep states of the design quickly
and supports simulation monitors through the standard API of the simulator during test
generation.
We applied Ketchum to several industrial designs, including the picoJava microprocessor
from SUN and the DW8051 microcontroller from Synopsys and obtained very promising
results. The experiments show that Ketchum can (1) handle design blocks containing
more than 4500 latches and 170K gates, (2) reach up to 6x more coverage states than
random simulation and (3) identify a majority of the unreachable coverage states.
References
[1] J. Bergmann and M. Horowitz. Improving coverage analysis and test generation for large designs. In
Proceedings of ICCAD, 1999.
[1] V. Bertacco, M. Damiani and S. Quer. Cyclebased symbolic simulation of gate-level
synchronous circuits. In Proceedings of DAC, pp. 391-396, 1999.
[2] A. Biere, A. Cimatti, E. Clarke, M. Fujita and Y. Zhu. Symbolic model checking using SAT procedures.
In Proceedings of DAC, 1999
[3] R.E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on
Computers, C-35(8), pp 677-691, 1986.
[4] R.E. Bryant. Symbolic simulation--techniques and applications. In DAC, pp. 517-521, 1990.
[5] H. Cho, G. Hatchel, E. Macii, M. Poncino, and F. Somenzi. Automatic state space decomposition for
approximate FSM traversal based on circuit analysis. IEEE TCAD, 15(12), pp. 1451-1464, 1996.
[6] D.L. Dill. Embedded tutorial: formal verification meets simulation. In Proceedings of ICCAD, 1999.
[7] M.K. Ganai, A. Aziz and A. Kuehlmann. Enhancing simulation with BDDs and ATPG. In Proceedings
of DAC, pp. 385-390, 1999.
[8] S.G. Govindaraju, D.L. Dill, A.J. Hu, and M.A. Horowitz. Approximate reachability with BDDs using
overlapping projections. In Proceedings of DAC, pp. 451-455, 1998.
[9] P.-H. Ho, A. Isles and T. Kam. Formal verification of pipeline control using controlled token nets and
abstract interpretation. In ICCAD, 1998.
[10] R.C. Ho and M. Horowitz. Validation coverage analysis for complex digital designs. In ICCAD, 1996.
[11] M. Kantrowitz and L. Noack. I’m Done Simulating: Now what? verification coverage analysis and
correctness checking of the DEC chip 21164 Alpha microprocessor. In DAC, pp. 325-330, 1996.
[12] K.L. McMillan. Symbolic model checking. Kluwer Academic Publishers, 1994.
[13] I.-H. Moon, J. Kukula, T. Shiple and F. Somenzi. Least fixpoint approximation for reachability
analysis. In Proceedings of ICCAD, pp. 41-44, 1999.
[14] J.P. Marques-Silva and K.A. Sakallah. GRASP: a search algorithm for propositional satisfiability. In
IEEE Transaction on Computers, pp. 506-521, May 1999.
[15] D. Moundanos, J.A. Abraham and Y.V. Hoskote. Abstraction techniques for validation coverage
analysis and test generation. In IEEE Transactions on Computers, January 1998.
[16]
Sun
Microsystems.
PicoJava
technology.
http://www.sun.com/microelectronics/communitysource/picojava.
[17] C.H. Yang and D.L. Dill. Validation with guided search of the state space. In DAC, pp. 599-604,
1998.
[18] F. Somenzi. CUDD: CU Decision Diagram Package. ftp://vlsi.colorado.edu/pub/.
ICCAD2000, Pages 127-133
Simulation Coverage Enhancement Using Test Stimulus Transformation
C. Norris Ip
Cadence Berkeley Labs, California
Abstract
This paper introduces the concept of abstract state exploration histories to a simulation
environment, and presents a test stimulus transformation (TST) technique to improve
simulation coverage. State exploration histories are adapted from reachability analysis in
Formal Verification. In TST, an aggressively abstracted state exploration history is
maintained during simulation. While this history is being collected, test stimuli from an
existing test bench are transformed on-the-fly to explore new scenarios that are not in the
history. The results showed that 3-fold increase in transition coverage for a cache
coherence controller, and 10 times faster coverage convergence for a MPEG2 decoder
can be achieved.
References
1. A. Bouajjani, S. Bensalem, S. Graf, C. Loiseaux, and J. Sifakis. Property preserving abstractions for the
verification of concurrent systems. Formal Methods in System Design, 6(1):11--44, 1993.
2. J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang. Symbolic model checking:1020
states and beyond. IEEE Symp. on Logic in Computer Science, 1990.
3. Mike Benjamin, Daniel Geist, Alan Hartman, Yaron Wolfsthal, Gerard Mas, and Ralph Smeets. A study
in coverage-driven test generation. Design Automation Conference, 1999.
4. Malay K. Ganai, Adnan Aziz, and Andreas Kuehlmann. Enhancing simulation with BDDs and ATPG.
Design Automation Conference, 1999.
5. Daniel Geist, Monica Farkas, Avner Landver, Yossi Lichtenstein, Shmuel Ur, and Yaron Wolfsthal.
Coveragedirected test generation using symbolic techniques. Int. Conf. on Formal Mehtods in ComputerAided Design, 1996.
6. G. J. Holzmann. An analysis of bitstate hashing. 15th Int. Conf. on Protocol Specification, Testing, and
Verification. 1995.
7. Richard C. Ho, C. Han Yang, Mark A. Horowitz, and David L. Dill. Architecture validation for
processors. 22nd Symp. on Computer Architecture, 1995.
8. Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The
directory-based cache coherence protocol for the DASH multiprocessor. Int. Symp. on Computer
Architecture, 1990.
9. D. Moundanos, J.A. Abraham, and Y. V. Hoskote. Abstraction techniques for validation coverage
analysis and test generation. IEEE Transaction on Computers, 47(1):2--14, 1998.
10. C. Han Yang and David L. Dill. Validation with guided search of the state space. 35th Design
Automation Conference, 1998.
11. Jun Yuan, Jian Shen, Jacob Abraham, and Adnan Aziz. On combining formal and informal verification.
Int.l Conf. on Computer-Aided Verification, 1997.
12. Pitro Zafiropulo, Colin H. West, Harry Rudin, D.D. Cowan, and Daniel Brand. Towards analyzing and
synthesizing protocols. IEEE Transactions on Communications, 28(4), April 1980.
13. Verilog PLI 1.0, IEEE 1364 Verilog Standard.
ICCAD2000, Pages 135-140
Dynamic Response Time Optimization for SDF Graphs
Dirk Ziegenbein, Jan Uerpmann, Rolf Ernst
TU Braunschweig
Abstract
Synchronous Data Flow (SDF) is a well-known model of computation that is widely used
in the control engineering and digital signal processing domains. Existing scheduling
methods are mainly static approaches that assume full knowledge of the environment, e.
g. data arrival times. In a growing number of practical cases like internet multimedia
applications there exists only partial knowledge of the environment, e. g. average data
rates. Here, only dynamic scheduling can yield optimal results. In this paper, we propose
a new dynamic scheduling method that minimizes the maximal response time of the
system. It is a generalization of a deadline revision method to allow treatment of datadependent tasks using EDF scheduling. The applicability and benefit of the new approach
is shown using a real-world example.
References
[1] G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclostatic dataflow. IEEE Transactions on
Signal Processing, 44(2), February 1996.
[2] J. Blazewicz. Modeling and Performance Evaluation of Computer Systems, chapter Scheduling
Dependent Tasks with Different Arrival Times to Meet Deadlines. North-Holland, Amsterdam, 1976.
[3] J. Buck and E.A. Lee. Scheduling dynamic dataflow graphs with bounded memory using the token flow
model. In Proceedings of ICASSP’93, 1993.
[4] S. Ha and E.A. Lee. Compile-time scheduling and assignment of data-flow program graphs with datadependent iteration. IEEE Transactions on Computers, 40(11):1225–1238, November 1991.
[5] E. A. Lee and D.G. Messerschmitt. Static scheduling of synchronous data flow programs for digital
signal processing. IEEE Transactions on Computers, 36(1), January 1987.
[6] E.A. Lee and D.G. Messerschmit. Synchronous data flow. In Proceedings of the IEEE, volume 75,
pages 1235–1245, September 1987.
[7] C. Liu and J. Layland. Scheduling algorithm for multiprogramming in a hard-real-time environment.
Journal of the ACM, pages 46–61, 1973.
[8] P. Pop, P. Eles, and Z. Peng. Performance estimation for embedded systems with data and control
dependencies. In Proceedings Eigth International Workshop on Hardware/Software Co-Design
(Codes/CASHE ’00), San Diego, May 2000.
[9] J. Stankovic, M. Spuri, M. Di Natale, and G. Buttazzo. Implications of classical scheduling results for
real-time systems. IEEE Computer, 28(6), June 1995.
[10] H. Weisser, P. Schulenberg, R. Bergholz, and U. Lages. Autonomous driving on vehicle test tracks:
Overview, motivation and concept. In International Conference on Intelligent Vehicles, volume 2, pages
439–443, 1998.
[11] T. Yen and W.H. Wolf. Performance estimation for real-time distributed embedded systems. IEEE
Transactions on Parallel and Distributed Systems, 9(11):1125–1136, November 1998.
ICCAD2000, Pages 142-149
Full-Chip, Three-Dimensional, Shapes-Based RLC Extraction
K. L. Shepard, D. Sitaram1 , and Yu Zheng
Columbia Integrated Systems Laboratory (CISL), Department of Electrical Engineering
Columbia University, New York, NY 10027
1
Cadence Design Systems, New Providence, NJ
Abstract
In this paper, we report the development of the first commercial full-chip, threedimensional, shapes-based, RLCK extraction tool, developed as part of a universityindustry collaboration. The technique of return-limited inductances is used to provide a
sparse, frequency-independent inductance and resistance network with self-inductances
that represent sensible "nominal" values in the absence of mutual coupling. Mutual
inductances are extracted for accurate noise analysis. The tool, Assura RLCX, exploits
high-capacity scan-band techniques and disk caching for inductance extraction as an
extension to Cadence's existing Assura RCX extractor.
References
[1] K. L. Shepard, V. Narayanan, P. C. Elmendorf, and Gutuan Zheng. Global Harmony: Coupled Noise
Analysis for Full-Chip RC Interconnect Networks. In Proceedings of the IEEE International Conference on
Computer-Aided Design, 1997.
[2] A. Deutsch, G. V. Kopcsay, P. J. Restle, H. H. Smith, G. Katopis, W. D. Becker, P. W. Coteus, C. W.
Surovic, B. J. Rubin, Jr. R. P. Dunne, T. Gallo, K. A. Jenkins, L. M. Terman, R. H. Dennard, G. A. SaiHalasz, B. L. Krauter, and D. R. Knebel. When are transmission-line effects important for on-chip
interconnections. IEEE Transactions on Microwave Theory and Techniques, 45(10):1836 –1846, October
1997.
[3] Y. I. Ismail, E. G. Friedman, and J. L. Neves. Figures of merit to characterize the importance of on-chip
inductance. In 35th ACM/IEEE Design Automation Conference, pages 560–565, June 1998.
[4] S. Lin, N. Chang, and S. Nakagawa. Quick on-chip self- and mutual-inductance screen. In Proceedings
of the International Symposium on Quality Electronic Design, pages 513 – 520, March 2000.
[5] B. Krauter, S. Mehrotra, and V. Chandramouli. Including inductive effects in interconnect timing
analysis. In Proceedings of the CICC, pages 445 – 452, 1999.
[6] K. L. Shepard, S. Carey, E. Cho, B. Curran, R. Hatch, D. Hoffman, S. McCabe, G. Northrop, and R.
Seigler. Design Methodology for the G4 S/390 Microprocessors. IBM Journal of Research and
Development, 21(4/5):515 – 548, 1997.
[7] R. M. Averill and et al. Chip integration methodology for the IBM S/390 G5 and G6 custom
microprocessors. IBM Journal of Research and Development, 45(5/6):681 – 706, 1999.
[8] K. L. Shepard and Z. Tian. Return-limited inductances: a practical approach to on-chip inductance
extraction. In Proceedings of the CICC, pages 453 – 456, 1999.
[9] K. L. Shepard and Zhong Tian. Return-limited inductances: a practical approach to on-chp inductance
extraction. IEEE Trans. CAD, April 2000.
[10] IEEE 1481 delay and power calculation system standard. Technical report, 1999.
[11] E. B. Rosa. The self and mutual inductance of linear conductors. Bulletin of the National Bureau of
Standards, 4:301 – 344, 1908.
[12] F. Grover. Inductance Calculations: Working Formulas and Tables. Dover, New York, 1962.
[13] A. E. Ruehli. Inductance calculations in a complex integrated circuit environment. IBM Journal of
Research and Development, 16(5):470 – 481, 1972.
[14] A. E. Ruehli. Equivalent circuit models for three-dimensional multiconductor systems. IEEE
Transactions on Microwave Theory and Techniques,MTT-22:216 – 221, March 1974.
[15] A. E. Ruehli, N. Kulasza, and J. Pivnichny. Inductance of nonstraight conductos close to a ground
return plane. IEEE Transactions on Microwave Theory and Techniques, pages 706 – 708, August 1975.
[16] Kaushik Gala, Vladimir Zolotov, Rajendran Panda, BrianYoung, JunfengWang, and David Blaauw.
On-chip inductance modeling and analysis. In Proceedings of the Design Automation Conference, pages 63
– 68, 2000.
[17] K.-W. Chiang, S. Nahar, and C.-Y. Lo. Time-efficient VLSI artwork analysis algorithms in GOALIE2.
25th ACM/IEEE Design Automation Conference, pages 471 – 475, 1988.
[18] K.-W. Chiang. Resistance extraction and resistance calculation in GOALIE2. 26th ACM/IEEE Design
Automation Conference, pages 682 – 685, 1989.
[19] Chandromouli Kashyap and Byron Krauter. A realizable driving point model for on-chip interconnect
with inductance. In Proceedings of the ACM/IEEE Design Automation Conference, pages 190 – 195, June
2000.
ICCAD2000, Pages 150-155
How to Efficiently Capture On-Chip Inductance Effects:
Introducing a New Circuit Element K
Anirudh Devgan
IBM Microelectronics
Austin, TX 78758
Hao Ji
UC Santa Cruz CE Dept.
Santa Cruz, CA 95064
Wayne Dai
UC Santa Cruz CE Dept.
Santa Cruz, CA 95064
Abstract
On-chip inductance extraction and analysis is becoming increasing critical. Inductance
extraction can be difficult, cumbersome and impractical on large designs as inductance
depends on the current return path - which is typically unknown prior to extracting and
simulating the circuit model. In this paper, we propose a new circuit element, K, to model
inductance effects, at the same time being easier to extract and analyze. K is defined as
inverse of partial inductance matrix L, and has locality and sparsity normally associated
with a capacitance matrix. We propose to capture inductance effects by directly
extracting and simulating K, instead of partial inductance, leading to much more efficient
procedure which is amenable to full chip extraction. This proposed approach has been
verified through several simulation results.
References
[1] H. Ji, A. Devgan, and W. Dai, “Ksim: A stable and efficient RKC simulator for capturing on-chip
inductance effect,” UCSC Technique Report, UCSC-CRL-00-10, Apr. 2000.
[2] E. Rosa, “The self and mutual inductance of linear conductors,” in Bulletin of the National Bureau of
Standars, pp. 301–344, 1908.
[3] A. Ruehli, “Inductance calculations in a complex integrated circuit environment,” IBM Journal of
Research and Development, pp. 470–481, Sept. 1972.
[4] A. Ruehli, “Equivalent circuit models for threedimensional multiconductor systems,” IEEE Trans. on
MTT, pp. 216–220, Mar. 1974.
[5] M. Kamon, M. J. Tsuk, and J. K. White, “FASTHENRY: A multipole-accelerated 3-D inductance
extraction program,” IEEE Trans. on MTT, pp. 216–220, Sept. 1994.
[6] Z. He, M. Celik, and L. Pileggi, “SPIE: Sparse partial inductance extraction,” in Proc. 34-th Design
Automation Conference, pp. 137–140, June 1997.
[7] B. Krauter and L. Pileggi, “Generating sparse partial inductance matrixes with guaranteed stability,” in
Proc. IEEE International Conference on Computer Aided Design, pp. 45–52, Nov. 1995.
[8] K. L. Shepard and Z. Tian, “Return-limited inductances: A practical approach to on-chip inductance
extraction,” in Proc. IEEE Custom Integrated Circuits Conference, pp. 453–456, 1999.
[9] S. Lin, N. Chang, and S. Nakagawa, “Quick on-chip self- and mutual-inductance screen,” in Proc. IEEE
1st International Symposium on Quality Electronic Design, pp. 513–520, Mar. 2000.
[10] K. Nabors and J. K.White, “FastCap: a multipole accelerated 3-D capacitance extraction program,”
IEEE Trans. on CAD, pp. 1447–1459, Nov. 1991.
ICCAD2000, Pages 156-163
Generalized FDTD-ADI: An Unconditionally Stable Full-Wave Maxwell’s
Equations Solver for VLSI Interconnect Modeling
Charlie C.-P. Chen, Tae-Woo Lee, Narayanan Murugesan, and Susan C. Hagness
Department of Electrical and Computer Engineering, University of Wisconsin-Madison,
Madison, WI 53706
Abstract
The finite-difference time-domain (FDTD) method of solving the full-wave Maxwell's
equations has been recently extended to provide accurate and numerically stable
operation for time steps exceeding the Courant limit. The elimination of an upper bound
on the size of the time step was achieved using an alternating-implicit direction (ADI)
time-stepping scheme. This greatly increases the computational efficiency of the FDTD
method for classes of problems where the cell size of the three-dimensional space lattice
is constrained to be much smaller than the shortest wavelength in the source spectrum.
One such class of problems is the analysis of high-speed VLSI interconnects where fullwave methods are often needed for the accurate analysis of parasitic electromagnetic
wave phenomena. In this paper, we present an enhanced FDTD-ADI formulation which
permits the modeling of realistic lossy materials such as semiconductor substrates and
metal conductors as well as artificial lossy materials needed for perfectly matched layer
(PML) absorbing boundary conditions. Simulations using our generalized FDTD-ADI
formulation are presented to demonstrate the accuracy and extent to which the
computational burden is reduced by the ADI scheme.
REFERENCES
[1] K. Nabors and J. White, “Fastcap: a multipole accelerated 3-d capacitance extraction program,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 10, no. 11, 1991.
[2] W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitance extraction,” in
Design Automation Conference, 1998.
[3] M. Kamon, M.J. Ttsuk, and J.K. White, “Fasthenry: a multipole-accelerated 3-d inductance extraction
program,” IEEE Trans. on Microwave Theory and Techniques, vol. 42, pp. 1750–1758, Sept. 1994.
[4] J. Cullum, A. Ruehli, and T. Zhang, “A method for reduced-order modeling and simulation of large
interconnect circuits and its application to peec models with retardation,” IEEE Trans. on Circuits and
Systems II: Analog and Digital Signal Processing, vol. 47, no. 4, pp. 261–273, 2000.
[5] R. Mittra, W. D. Becker, and P. H. Harms, “A general purpose Maxwell solver for the extraction of
equivalent circuits of electronic package components for circuit simulation,” IEEE Trans. on Circuits and
Systems I: Fundamental Theory and Applications, vol. 39, no. 11, pp. 964–973, 1992.
[6] M. Piket-May, A. Taflove, and J. Baron, “FD-TD modeling of digital signal propagation in 3-D circuits
with passive and active loads,” IEEE Trans. on Microwave Theory and Techniques, vol. 42, pp. 1514–
1523, Aug. 1994.
[7] P. Vichot, J. A. Mix, Z. Schoenborn, J. Dunn, and M. Piket-May, “Numerical modeling of a clock
distribution network for a superconducting multichip module,” IEEE Trans. on Components, Hybrids, and
Manufacturing Technology, Part B: Advanced Packaging, vol. 21, pp. 98–104, Feb. 1998.
[8] Y.-S. Tsuei, A. C. Cangellaris, and J. L. Prince, “Rigorous electromagnetic modeling of chip-topackage (first-level) interconnections,” IEEE Trans. on Components, Hybrids, and Manufacturing
Technology, vol. 16, no. 8, pp. 876–883, 1993.
[9] H. H. M. Ghouz and E.-B. El-Sharawy, “An accurate equivalent circuit model of flip chip and via
interconnects,” IEEE Trans. on Microwave Theory and Techniques, vol. 44, no. 12, pp. 2543–2554, Dec.
1996.
[10] K. L. Shlager and J. B. Schneider, “A survey of the finite-difference time-domain literature,” in
Advances in Computational Electrodynamics: The Finite-Difference Time-Domain Method, A. Taflove,
Ed., chapter 1, pp. 1–62. Artech House, Boston, MA, 1998.
[11] T. Namiki, “A new FDTD algorithm based on alternating-direction implicit method,” IEEE Trans. on
Microwave Theory and Techniques, vol. 47, pp. 2003–2007, Oct. 1999.
[12] F. Zheng, Z. Chen, and J. Zhang, “A finite-difference time-domain method without the Courant
stability conditions,” IEEE Microwave Guided Wave Letters, vol. 9, pp. 441–443, Nov. 1999.
[13] J.-P. Berenger, “A perfectly matched layer for the absorption of electromagnetic waves,” J.
Computational Physics, vol. 114, no. 1, pp. 185–200, 1994.
[14] A. Taflove and S. Hagness, Computational Electrodynamics: The Finite-Difference Time-Domain
Method, 2 ed., Artech House, Boston, MA, 2000.
[15] G. Liu and S. D. Gedney, “Perfectly matched layer media for an unconditionally stable threedimensional ADI-FDTD method,” IEEE Microwave Guided Wave Letters, vol. 10, no. 7, pp. 261–263,
2000.
ICCAD2000, Pages 165-170
Mongrel: Hybrid Techniques for Standard Cell Placement
Sung-Woo Hur and John Lillis
EECS Department, University of Illinois at Chicago
ABSTRACT
We give an overview of a standard-cell placer Mongrel. The prototype tool adopts a
middle-down methodology in which a grid is imposed over the layout area and cells are
assigned to bins forming a global placement. The optimization technique applied in this
phase is based on the Relaxation-Based Local Search (RBLS) framework in which a
combinatorial search mechanism is driven by an analytical engine. This enables a more
global view of the problem and results in complex modifications of the placement in a
single search "move". Details of this approach including a novel placement legalization
procedure are presented. When a global placement has converged, a detailed placement is
formed and further optimized by the proposed optimal interleaving technique.
Experimental results are presented and are quite promising, demonstrating that there is
significant room for improvement in state of the art placement.
References
[1] M. Wang and M. Sarrafzadeh, “Behavior of Congestion Minimization During Placement,” in Proc. of
Intl. Symposium on Physical Design, pp. 145–150, 1999.
[2] M. A. Breuer, “Min-cut Placement,” Design Automation and Fault-Tolerant Comuting, pp. 343–382,
October 1977.
[3] H. Eisenmann and F. M. Johannes, “Generic Global Placement and Floorplanning,” in Proc.
ACM/IEEE Design Automation Conference, pp. 269–274, 1998.
[4] W.-J. Sun and C. Sechen, “Efficient and Effective Placement for Very Large Circuits,” IEEE
Transactions on Computer-Aided Design, pp. 349–359, 1995.
[5] M. Sarrafzadeh and M. Wang, “NRG: Global and Detailed Placement,” in Proc. of IEEE Intl.
Conference on Coumputer- Aided Design, pp. 532–537, IEEE Computer Society Press, 1997.
[6] S.-W. Hur and J. Lillis, “Relaxation and Clustering in a Local Search Framework: Application to Linear
Placement,” in Proc. of ACM/IEEE Design Automation Conference, pp. 360–366, 1999.
[7] C. M. Fiduccia and R. M. Mattheyses, “A Linear Time Heuristic for Improving Network Partitions,” in
Proc. of ACM/IEEE Design Automation Conference, pp. 175–181, 1982.
[8] X. Yang, M. Wang, K. Egur, and M. Sarrafzadeh, “A Snapon Placement Tool,” in Proc. of Intl.
Symposium on Physical Design, pp. 153–158, 2000.
ICCAD2000, Pages 171-176
Multilevel Optimization for Large-Scale Circuit
Tony Chan1, Jason Cong2, Tianming Kong2, Joseph R. Shinnerl2
1
Mathematics Department, UCLA; Los Angeles, CA 90095-1555
Computer Science Department, UCLA; Los Angeles, CA 900951596
2
Abstract
We have designed and implemented a new class of fast and highly scalable placement
algorithms that directly handle complex constraints and achieve total wirelengths
comparable to the state of art. Our approach exploits recent advances in (i)multilevel
methods for hierarchical computation, (ii)interior-point methods for nonconvex nonlinear
programming, and (iii) the Fast Multipole Method for the order N evaluation of sums
over the N(N-1)/2 pairwise interactions of N components. Significant adaption of these
methods for the placement problem is required, and we have therefore developed a set of
customized discrete algorithms for clustering, declustering, slot assignment, and local
refinement with which the continuous algorithms are naturally combined. Preliminary
test runs on benchmark circuits with up to 184,000 cells produce total wirelengths within
approximate 5-10% of those of GORDIAN-L [1] in less than one tenth the run time. Such
an ultra-fast placement engine is badly needed for timing convergence of the synthesis
and layout phases of integrated circuit design.
References
[1] G. Sigl, K. Doll, and F. M. Johannes, "Analytical placement: A linear or a quadratic objective
function?," in Proc. 28th ACM/IEEE Design Automation Conference, pp. 427-432, 1991.
[2] C. Sechen, VLSI Placement and Global Routing Using Simulated Annealing. Kluwer Academic
Publishers, 1988.
[3] W.-J. Sun and C. Sechen, "Efficient and effective placement for very large circuits," IEEE Trans. on
Computer-Aided Design, pp. 349-359, March 1995.
[4] J. Kleinhans, G. Sigl, F. Johannes, and K. Antreich, "Gordian: VLSI placement by quadratic
programming and slicing optimization," IEEE Trans. on Computer-Aided Design, vol. 10, pp. 356365, 1991.
[5] Y. Sankar and J. Rose, "Trading quality for compile time: Ultra-fast placement for fpgas," in FPGA
`99, ACM Symp. on FPGAs, pp. 157-166, 1999.
[6] M. Sarrafzadeh and M. Wang, "Nrg: global and detailed placement," in IEEE/ACM International
Conference on Computer-Aided Design, pp. 532-537, IEEE, 1997.
[7] H. Eisenmann and F. M. Johannes, "Generic global placement and floorplanning," in Proc. 35th
ACM/IEEE Design Automation Conference, pp. 269-274, 1998.
[8] R. Tsay, E. S. Kuh, and C. Hsu, "Proud: A sea-of-gates placement algorithm," IEEE Design and
Test of Computers, vol. 12, pp. 44-56, 1988.
[9] W. L. Briggs, A Multigrid Tutorial. Philadelphia: SIAM, 1987.
[10] A. Brandt, "Mufti-level adaptive solutions to boundary value problems," Mathematics o f
Computation, vol. 31(138), pp. 333-390, 1977.
[11] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, "Multilevel hypergraph partitioning:
Application in vlsi domain," in Proc. 34th ACM/IEEE Design Automation Conference, pp. 526-529,
1997.
[12] "Interior point methods online: http://www-unix.mcs.anl.gov/otc/interiorpoint/archive.html."
[13] P. Chin and A. Vannelli, "Interior point methods for placement," IEEE Int. Symp. on Circuits and
Systems, pp. 169-172, 1994.
[14] T. Gao, P. M. Vaidya, and C. L. Liu, "A performance driven macro-cell placement algorithm," in
Proc. 29th ACM/IEEE Design Automation Conference, pp. 147-152, 1992.
[15] N. Quinn and M. Breuer, "A force-directed component placement procedure for printed circuit
boards," IEEE Trans. on Circuits and Systems CAS, vol. CAS-26, pp. 377388, 1979.
[16] W. Murray and M. H. Wright, "Line search procedures for the logarithmic barrier function," SIAM
J. on Optimization, vol. 4, number 2, pp. 229-246, 1994.
[17] S. G. Nash, "Newton-type minimization via the Lanczos method," SIAM J. on Numerical
Analysis, vol. 21, pp. 770-788, 1984.
[18] Y. Saad, Iterative methods for sparse linear systems. Pacific Grove, California: PWS
publishing, 1996.
[19] M. Gu and S. Eisenstat, "A divide-and-conquer algorithm for the symmetric tridiagonal
eigenproblem," SIAM J. on Matrix Analysis and Applications, vol. 16, no. 1, pp. 172-191, 1995.
[20] K. Nabors and J. White, "Fastcap: A multipole accelerated 3-d capacitance extraction program,"
IEEE Trans. on Computer-Aided Design, vol. 10, no. 11, pp. 1447-1459, 1991.
[21] S. Goto, "An efficient algorithm for the two-dimensional placement problem in electrical circuit
layout," IEEE Trans. on Circuits and Systems, vol. 28, pp. 12-18, January 1981.
[22] J. Cong and S. K. Lim, "Edge separability based circuit clustering with application to circuit
partitioning," in Asia South Pacific Design Automation Conference, Yokohama Japan, pp. 429-434,
2000.
[23] H. Nagamochi and T. Ibaraki, "Computing edge connectivity in multigraphs and capacitancd
graphs," SIAM Journal on Discrete Math., pp. 54-66, 1992.
[24] K. Kozminski, "Benchmarks for layout synthesis," in Proc. 28th ACM/IEEE Design Automation
Conference, pp. 265-270, 1991.
[25] C. J. Alpert, "The ISPD98 circuit benchmark suite," in Proc. Intl Symposium on Physical Design,
pp. 80-85, 1998. [26] "http://www.twolf.com."
ICCAD2000, Pages 177-180
A Force-Directed Macro-Cell Placer
Fan Mo, Abdallah Tabbara and Robert K. Brayton
Dept. of EECS, University of California, Berkeley, CA94720
Abstract
In this paper we present a novel force-directed placement algorithm, which is used to
solve macro-cell placement problems. A new wire model replaces the traditional clique
model and makes possible early awareness of routing congestion. Issues such as cell
orientation, overlap elimination, and pad positioning are also considered. Experiments
show satisfactory performance and fast run time.
References
[1] H.Eisenmann and F.M.Johannes, “Generic Global Placement and Floorplanning”, In Proceedings of the
35th Design Automation Conference, pages.269-274, 1998
[2] S.Goto, “An Efficient Algorithm for the Two-Dimensional Placement Problem in Electrical Circuit
Layout”, IEEE Trans. CAS, vol.28, no.1, Jan1981, pages 12-18
[3] N.R.Quinn,JR. and M.A.Breuer, “A Forced Directed Component Placement Procedure for Printed
Circuit Boards”, IEEE Trans. CAS, vol.26, no.6, Jun1979, pages 377-388
[4] J.M.Kleinhans, G.Sigl, F.M.Johannes and K.J.Antreich, “GORDIAN: VLSI Placement by Quadratic
Programming and Slicing Optimization”, IEEE Trans. CAD, vol.10, no.3, Mar1991, pages.356-365
[5] H.Etawil, S.Areibi and A.Vannelli, “Attractor-Repeller Approach for Global Placement”, In
Proceedings of International Conference on Computer-aided Design, pages.20-24, 1999
[6] “Placement and Routing of Electronic Modules”, edited by M.Pecht, New York: M.Dekker, 1993
[7] “Algorithms for VLSI physical design automation”, edited by N.Sherwani, 3rd ed. Boston: Kluwer
Academic Publishers, 1999
[8] Routing in the Third Dimension from VLSI chips to MCMs, edited by N.Sherwani, S.Bhingarde,
A.Panyam, IEEE Press,1995
[9] P.Chong, Y.Jiang, S.Khatri, F.Mo, S.Sinha and R.Brayton, “Don’t Care Wires in Logic/Physical
Design”, In the Proceedings of International Workshop on Logic Synthesis, pages 1-9, 2000
[10] MCNC92 benchmark, http://www.cbl.ncsu.edu/CBL_Docs/lys92.html
[11] S.Khatri, “Cross-talk Noise Immune VLSI Design using Regular Layout Fabrics”, PhD thesis,
University of California at Berkeley, Spring 2000
ICCAD2000, Pages 182-187
Verification of Delta-Sigma Converters Using Adaptive Regression Modeling
Jeongjin Roh, Suresh Seshadri and Jacob A. Abraham
Computer Engineering Research Center, The University of Texas at Austin, Austin, TX 78712
Abstract
A new verification technique for Delta-Sigma analog-to-digital converters (ADC) is
proposed. The ADC is partitioned into functional blocks, and adaptive regression models
for each partition are constructed using transistor-level simulation data. Non-idealities in
circuit behavior are captured by the adaptive regression technique from the collected
data. The algorithms have been implemented in a simulation program ARSIM (Adaptive
Regression Simulator), which performs data sampling, model building, and simulation.
Experimental results using ARSIM are shown on a second-order Delta-Sigma modulator,
and they demonstrate the effectiveness of our technique as a fast and accurate approach
for verifying Delta-Sigma converters.
References
[1] S. R. Norsworthy, R. Schreier and G. C. Temes, “Delta-sigma data converters: theory, design and
simulation,” IEEE press, 1997
[2] B. E. Boser and B. A. Wooley, “The design of sigma-delta modulation analog-to-digital converters,”
IEEE J. of Solid-State Circuits, pp.1298-1308, December 1988
[3] A. Opal, “Simulation of oversampled sigma delta convertors,” Proc. IEEE ISCAS, pp.727-730, 1996
[4] A. Opal, “Sampled data simulation of linear and nonlinear circuits,” IEEE Trans. Computer-Aided
Design, Vol.15, No.3, pp.295-307, March 1996
[5] G. T. Brauns, R. J. Bishop, M. B. Steer, J. J. Paulos and S. H. Ardalan, “Table-based modeling of deltasigma modulators using ZSIM,” IEEE Trans. Computer-Aided Design, pp.142-150, February 1990
[6] R. J. Bishop, J. J. Paulos, M. B. Steer and S. H. Ardalan, “Table-based simulation of delta-sigma
modulators,” IEEE Trans. Circuits and Systems, Vol.37, No.3, pp.447-451, March 1990
[7] S. Rabii and B. A. Wooley, “A 1.8-V digital-audio sigma-delta modulator in 0.8-µm CMOS,” J. of
Solid-State Circuits, Vol.32, No.6, pp.783-796, June 1997.
[8] J. H. Friedman, “Multivariate additive regression splines,” Annals of Statistics, Vol.19, pp.1-141, March
1991
[9] V. Cherkassky and F. Mulier, Learning from data, JohnWiley & Sons, 1998
[10] P. N. Variyam and A. Chatterjee, “Enhancing test effectiveness for analog circuits using synthesized
measurements,” Proc. IEEE VTS, pp.132-137, 1998
[11] G. Devarayanadurg, P. Goteti and M. Soma, “Hierarchy based statistical fault simulation of mixedsignal ICs,” Proc. IEEE ITC, pp.521-527, 1996
ICCAD2000, Pages 188-192
DAISY: A Simulation–Based High–Level Synthesis Tool for ∆Σ Modulators
K. Francken, P. Vancorenland and G. Gielen
Katholieke Universiteit Leuven, Dept. of Electrical Engineering, ESAT-MICAS
Kardinaal Mercierlaan 94, B-3001 Leuven, Belgium
Abstract
An integrated tool called DAISY (Delta-Sigma Analysis and Synthesis) is presented for
the high-level synthesis of ∆Σ modulators. The approach determines both the optimum
modulator topology and the required building block specifications, such that the system
specifications - mainly accuracy and signal bandwidth - are satisfied at the lowest
possible power consumption. A genetic-based differential evolution algorithm is used in
combination with a fast dedicated behavioral simulator that includes the major
nonidealities of the building blocks to realistically analyze and optimize the modulator
performance. Experimental results illustrate the effectiveness of the approach. Also, an
overview of optimized topologies as a function of the modulator specifications for a wide
range of values shows the capabilities and performance range covered by the tool.
REFERENCES
[1] Steven Norsworthy, Richard Schreier, and Gabor Temes, Eds., Delta-Sigma Data Converters: theory,
design, and simulation, IEEE, 1996.
[2] V. Dias, V. Liberali, and F. Maloberti, “Tosca: A user-friendly behavioural simulator for oversampling
a/d converters,” Proc. of IEEE International Symposium on Circuits and Systems, pp. 2677–2680, 1991.
[3] F. Medeiro, Top-Down Design of High-Performance Sigma-Delta Modulators, Kluwer Academic
Publishers, 1999.
[4] R. Storn and K. Price, “Differential evolution - a simple and efficient adaptive scheme for global
optimization over continuous spaces,” Technical Report TR-95-012, ICSI, March 1995.
[5] K. Francken and G. Gielen, “Optimum system-level design of delta-sigma modulators,” in ProRISC
Workshop on Circuits, Systems and Signal Processing, 1999.
[6] K. Martin and A. Sedra, “Finite amplifier gain and bandwidth effects in switched-capacitor filters,”
IEEE Journal of Solid-State Circuits, vol. 15, no. 3, pp. 358–361, 1981.
[7] K. Martin and A. Sedra, “Effects of the op amp finite gain and bandwidth on the performance of
switched-capacitor filters,” IEEE Transactions on Circuits and Systems, vol. 28, no. 8, pp. 822–829, 1981.
[8] G. Fischer and G. Moschytz, “On the frequency limitations of switched capacitor filters,” IEEE Journal
of Solid-State Circuits, vol. 19, no. 4, pp. 510–518, 1984.
[9] W. Sansen, H. Qiuting, and K. Halonen, “Transient analysis of charge transfer in sc filters - gain error
and distortion,” IEEE Journal of Solid-State Circuits, vol. 22, no. 2, pp. 268–276, 1987.
[10] A. Marques, High Speed CMOS Data Converters, Ph.D. thesis, K.U.Leuven, 1999.
[11] R. Storn, “On the usage of differential evolution for function optimization,” in NAFIPS, 1996, pp.
519–523.
[12] Y. Geerts, A. Marques, M. Steyaert, and W. Sansen, “A 3.3 v 15-bit delta-sigma adc with a signal
bandwith of 1.1 mhz for adsl-applications,” IEEE Journal of Solid-State Circuits, vol. 34, no. 7, pp. 927–
936, 1999.
ICCAD2000, Pages 193-
ACTIF: A high-level power estimation tool for Analog Continuous-Time Filters
Erik Lauwers, Georges Gielen
Katholieke Universiteit Leuven
Abstract
A tool is presented that gives a high-level estimation of the power consumed by an
analog continuous-time OTA-C filter when given only high-level input parameters such
as dynamic range and signal swing. When used in combination with estimators for other
building blocks (ADC's, DAC's, mixers,...) a truly high-level analog system exploration
becomes feasible such as needed for architectural exploration of telecom systems. In
literature only fundamental relations exist for analog filters, that predict the power with
an error of orders of magnitude, which makes them hard to use in real system design.
ACTIF combines existing filter synthesis methods with new behavioral models for
transconductance stages in a novel way to obtain an optimized high-level yet accurate
power estimation. To verify the presented approach, two recently published design
examples are compared with the results from ACTIF.
References
[1] J.Crols, M.Steyaert, S.Donnay, G.Gielen, “A high-level design and optimization tool for analog RF
receiver front-ends”, Proc. ICCAD, pp. 550-553, 1995
[2] Y. Tsividis, “Integrated continuous-time filter design – an overview”, JSSC, vol. SC-29, no. 4, pp.166176, 1994
[3] J.Crols, M.Steyaert, “A high-level design methodology for the power optimization of highly integrated
receiver architectures”, Proc. ESSCIRC, 1995, pp. 442-445
[4] R.Harjani, J.Shao, “Feasibility and performance region modeling of analog and digital circuits”, Analog
Integrated Circuits and Signal Processing 10, pp. 23-43, 1996
[5] E.Lauwers, G.Gielen, “A power estimation model for high-speed CMOS A/D convertors”, Proc.
DATE, 1999, pp. 401-405
[6] R.Schaumann, M.S.Ghausi and K.R.Laker, “Design of analog filters: passive, active RC and switched
capacitor”, Englewood Cliffs, Prentice-Hall, 1990
[7] G.Groenewold, “Optimal dynamic range integrated continuous-time filters”, Ph.D. thesis, T.U.Delft,
Delft University Press, 1992
[8] F.Krummenacher, N.Joehl, “A 4-Mhz CMOS continuous-time filter with on-chip automatic tuning”,
ibid.,vol SC-23, pp.750-758, june 1988
[9] R.Torrance, T. Viswanathan, J.Hanson, “CMOS voltage to current transducers”, IEEE Trans. On
circuits and systems, Vol CAS-32, pp. 1097-1104, june 1998
[10] M.Steyaert, J.Silva-Martinez, W.Sansen, “High performance OTA-R-C continuous-time filters with
full CMOS low distortion floating resistors, Proc. ESSCIRC, 1991, pp. 5-8
[11] R.Castello, I.Bietti, F.Svelto, “High-frequency analog filters in deep-submicron CMOS technology”,
Proc. ISSCC, 1999, MP4.5
[12] C.Yoo et All., “A +/-1.5V, 4MHz CMOS C.T. filter with single-integrator based tuning”, JSSC,
Vol.33, jan. 1998, pp. 18-27
ICCAD2000, Pages 198-201
Potential Slack: An Effective Metric of Combinational Circuit Performance
Chunhong Chen, Xiaojian Yang and Majid Sarrafzadeh
Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208-3118
Abstract
This paper proposes the concept of potential slack and show it is an effective metric of
combinational circuit performance. We provide several methods for estimating potential
slack and prove one (a maximal-independent-set based algorithm) in particular works
best. Experiments in gate sizing show that potential slack provides 100% correct
prediction for circuit area optimization. We also explore the role of potential slack in
timing-driven placement.
References
[1] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, “Generation of Performance Constraints for
Layout,” IEEE Transactions on Computer-Aided Design, CAD-8(8): 860-874, August 1989.
[2] T. Gao, P. M. Vaidya, and C. L. Liu, “A New Performance Driven Placement Algorithm,” in
Proceedings of ICCAD, pp.44-47, 1991.
[3] H. Youssef and E. Shragowitz, “Timing Constraints for Correct Performance,” in Proceedings of
ICCAD, pp.24-27, 1990.
[4] E. M. Sentovich et al., “SIS: A System for Sequential Circuit Synthesis,” Technical Report UCB/ERL
M92/41, Univ. of California, Berkeley, May 1992.
[5] C. Chen and M. Sarrafzadeh, “An Effective Algorithm for Gate-Level Power-Delay Tradeoff Using
Two Voltages,” in Proceedings of ICCD, pp.222-227, 1999.
[6] P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac, “A Gate Resizing Technique for High
Reduction in Power Consumption,” in Proceedings of International Symposium on Low Power Electronics
and Design, pp.281-286, 1997.
[7] H. R. Lin and T. Hwang, “Power Reduction by Gate Sizing with Path-Oriented Slack Calculation,” in
Proceedings of IEEE ASPDAC’ 95/CHDL’95/VLSI’95, pp.7-12, June 1995.
[8] W. Swartz and C. Sechen, “Timing Driven Placement for Large Standard Cell Circuits,” in Proceedings
of DAC, 1995.
[9] J.Cong, L.He, K. Khoo, C. Koh and Z. Pan, “Interconnection Design for Deep Submicron ICs,” in
Proceedings of ICCAD, pp.478-485, 1997.
[10] R. H. Moehring, “Graphs and Orders: the Role of Graphs in the Theory of Ordered Sets and Its
Applications,” published by D. Reidel Publishing Company, edited by I. Rival, New York and London,
pp.41-101, May 1984.
[11] Itools 1.4.0 (formerly TimberWolf): http://www.internetcad.com.
[12] C. Chen and M. Sarrafzadeh, “Provably Good Algorithm for Low Power Consumption with Dual
Supply Voltages,” in Proceedings of ICCAD, pp.76-79, 1999.
[13] C. Chen and M. Sarrafzadeh, “Power Reduction by Simultaneous Voltage Scaling and Gate Sizing,” in
Proceedings of ASPDAC, pp.333-338, 2000.
ICCAD2000, Pages 202-207
Delay Budgeting for A Timing-Closure-Driven Design Method
Chien-Chu Kuo and Allen C.-H. Wu
Department of Computer Science, Tsing Hua University, Hsinchu, Taiwan, 30043, ROC
Abstract
In this paper, we present an RTL delay-budgeting approach for a timing-closure-driven
design method. We formulate the delay-budgeting problem into the LagrangeMultipliers-based slack distribution problem. We present two algorithms, namely the
balanced slack distribution algorithm and the AT-based (Area-Time) slack distribution
algorithm, to solve the problem. We also present a timing-closure-driven design flow by
integrating commercial synthesis/layout tools with the proposed algorithms. We have
demonstrated the viability of the proposed RTL delay-budgeting method. The results
show that without an accurate AT-characteristic projection of modules the balanced slack
distribution algorithm will be a good choice for delay budgeting at RTL.
References
[1] B. T. Preas and M. J. Lorenzetti, Physical Design Automation of VLSI Systems, Benjamin Cummings,
Menlo Park, CA., 1988.
[2] N. Sherwani, Algorithms for VLSI Physical Design Automation, 2nd ed., Kluwer Academic Publishers,
1995.
[3] C.J. Alpert and A. B. Kahng, “Recent Direction in Netlist Partition-ing: A Survey," INTEGRATION:
the VLSI Journal, N19, pp. 1-81, 1995.
[4] E. S. Kuh, “Physical Design: Reminiscing and Looking Ahead," Proc. of Int. Symp. on Physical Design,
pp. 206, 1997.
[5] R. Composano, “The Quarter Micro Challenge: Integrating Physical and Logic Design," Proc. of Int.
Symp. on Physical Design, pp. 211, 1997.
[6] K. Keutzer, A. R. Newton, and N. Shenoy, “The future of Logic Synthesis and Physical Design in
Deep-Submicron Process Geometries," Proc. of Int. Symp. on Physical Design, pp. 218-223, 1997.
[7] P. S. Hauge, R. Nair, and E. J. Yoffa, “Circuit Placement for Predictable Performance," Proc. of Int.
Conf. Computer-Aided Design, pp. 88-91, 1987.
[8] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, “Generation of Performance Constraints for
Layout," Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 8, no. 8, pp. 860-874,
August, 1989.
[9] H. Youssef and E. Shragowitz, “Timing Constraints for Correct Performance," Proc. of Int. Conf.
Computer-Aided Design, pp. 24-27, 1990.
[10] M. Sarrafzadeh, D. Knol, and G. Tellez, “Unification of Budgeting and Placement," Proc. of 34th
Design Automation Conf., pp. 758-761, 1997.
[11] M. Sarrafzadeh, D. Knol, and G. Tellez, “A Delay Budgeting Algorithm Algorithm Ensuring
Maximum Flexibility in Placement," Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 16, no. 11, pp. 1332-1341, Nov. 1997.
[12] C. C. Kuo, “Delay Budgeting for A Timing-Closure-Driven Design Method", Master Thesis, Dept. of
Computer Science, Tsing Hua Univ., Hinchu, Taiwan, ROC, June, 1998.
[13] “HDP Implementation Document", Technical Report, Dept. of Computer Science, Tsing Hua Univ.,
Hsinchu, Taiwan, ROC., June, 1997.
[14] “HDL Compiler for Verilog Reference Manual Version 3.4b", Synopsys, 1996.
[15] C. Sechen, “TimberWolf6.0: Mixed macro/Standard cell oor planning, placement and routing
package," user's manual, Yale University, Sep. 1991.
[16] “Silicon Ensemble Reference Manual Version 5.0", Cadence, 1996.
ICCAD2000, Pages 208-213
Stochastic Wire-Length and Delay Distributions of 3-Dimensional Circuits
Rongtian Zhang, Kaushik Roy, Cheng-Kok Koh, and David B. Janes
ECE, Purdue University, West Lafayette, IN 479071285
ABSTRACT
3-D technology promises higher integration density and lower interconnection
complexity and delay. At present, however, not much work on circuit applications has
been done due to lack of insight into 3-D circuit architecture and performance. In this
paper, we investigate the interconnect distributions of 3-D circuits. We divide the 3-D
interconnects into horizontal wires and vertical wires and derive their wire-length
distributions, respectively. Based on the stochastic wire-length distributions, we calculate
3-D circuit interconnect delay distribution. We show that 3-D structures effectively
reduce the number of long delay nets, significantly reduce the number of repeaters
needed, and dramatically improve the performance. With 3-D structures, a circuit can
work at a much higher clock rate (double, even triple) than with 2-D. However, we also
show that the impacts of vertical wires on chip area and interconnect delay may limit the
number of device layers that we can integrate.
REFERENCES
[1] Semiconductor Industry Association, The International Technology Roadmap for Semiconductors, 1999
Edition.
[2] T. Kunio, K. Oyama, Y. Hayashi, and M. Morimoto, “Three Dimensional ICs, Having Four Stacked
Active Device Layers”, IEDM’89 Conf. Proc., pp.837-840, 1989.
[3] T. Nishimura, Y. Inoue, K. Sugahara, S. Kusunoki, T. Kumamoto, M. Nakaya, Y. Horiba, and Y.
Akasaka, “Three Dimensional IC for High Performance Image Signal Processor”, IEDM’87 Conf. Proc.,
pp.111-114, 1987.
[4] J. F. Gibbsons, and K. F. Lee, “One-Gate-Wide CMOS Inverter on Laser-Recrystallized Polysilicon”,
IEEE Electron Device Letters, Vol. EDL-2, No. 6, pp.117-118, 1980.
[5] S. Kawamura, N. Sasaki, T. Iwai, M. Nakano, and M. Takagi, “Three-Dimensional CMOS IC’s
Fabricated by Using Beam Recrystallization”, IEEE Electron Device Letters, Vol. EDL-4, No. 10, pp.366368, 1983.
[6] R. Zingg, J. A. Friedrich, G. W. Neudeck, and B. Hofflinger, “Three-Dimensional Stacked MOS
Transistors by Localized Silicon Epitaxial Overgrowth”, IEEE Transactions on Electron Devices, Vol. 37,
No. 6, 1452, 1990.
[7] G. W. Neudeck, S. Pae, J. P. Denton, and T. Sue, “Multiple Layers of Silicon-on-Insulator for
Nanostructure Devices”, Journal of Vacuum Science & Technology B, Vol. 17, No. 3, pp.994-998, 1999.
[8] K. Sarawat, S. J. Souri, V. Subramanian, A. R. Joshi, and A.W.Wang, “Novel 3D Structures”,
Proceedings of 1999 International SOI Conference, pp.54-55, 1999.
[9] P. Ramm, D. Bollman, R. Braun, R. Buchner, et al, “Three Dimensional Metalization for Vertically
Integrated Circuits”, Microelectronic Engineering, 37/38, pp.39-47, 1997.
[10] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Reading, MA: Addison-Wesley,
1990.
[11] W. Donath, “Placement and Average Interconnection Lengths of Computer Logic”, IEEE Trans.
Circuits and Systems, Vol. CAS-26, No. 4, pp. 272-277, Apr. 1979.
[12] W. Donath, “Wire Length Distribution for Placement of Computer Logic”, IBM Journal of Research
and Development, Vol. 25, No. 3, pp.152-155, 1981.
[13] J. A. Davis, V. K. De, and J. Meindl, “A Stochastic Wire-Length Distribution for Gigascale
Integration (GSI) – Part I: Derivation and Validation”, IEEE Trans. Electron Devices, Vol. 45, No. 3,
pp.580-589, Mar. 1998.
[14] A. Gamal, “Two-Dimensional Models for Interconnections lengths in Master Slice Integrated
Circuits”, IEEE Trans. Circuit Syst., Vol. CAS-26, pp.272-277, 1979.
[15] C. J. Alpert and A. Devgan, “Wire Segmenting for Improved Buffer Insertion”, 1997 Design
Automation Conference, pp.588-pp.593, 1997.
ICCAD2000, Pages 215-221
Hierarchical Interconnect Circuit Models
Michael Beattie, Satrajit Gupta and Lawrence Pileggi
Carnegie Mellon University, Department of Electrical and Computer Engineering
5000 Forbes Ave., Pittsburgh, PA 15213
ABSTRACT
The increasing size of integrated systems combined with deep submicron physical
modeling details creates an explosion in RLC interconnect modeling complexity of
unmanageable proportions. Interconnect extraction tools employ hierarchy to manage
complexity, but this hierarchy is discarded via eliminating far away coupling terms when
the equivalent RLC circuits are formed. The increasing dominance of capacitance
coupling along with the emergence of on-chip inductance, however, makes the composite
effect of faraway couplings increasingly evident. Even if newly enforced design rules and
practices will ultimately obviate the need for modeling these couplings for design
verification, some approximation of the "exact" solution is required to validate these
rules. This paper proposes an efficient hierarchical equivalent circuit representation of
interconnect parasitics that utilizes the efficient hierarchical long-distance modeling
already existing within extractors. Results from a prototype simulator based on these
hierarchical models demonstrates the simulation inaccuracy incurred when the far-away
coupling terms are ignored. Such a form of interconnect modeling may provide the key to
hierarchical modeling of electromagnetic interactions between large components on
future gigascale systems.
REFERENCES
[1] L. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems, The MIT Press,
Cambridge, MA (1987).
[2] K. Nabors, J. White, FastCap: A Multipole Accelerated 3–D Capacitance Extraction Program, IEEE
Trans. CAD, Vol. 10, No. 11, pp. 1447–1459 (November 1991).
[3] M. Kamon, M. Tsuk, J. White, FastHenry: A Multipole Accelerated 3–D Inductance Extraction
Program, IEEE Trans. Microwave Theory and Techniques, 42, No. 9, pp. 1750–1758 (Sept. 1994).
[4] C. Brebbia (Ed.), Boundary element techniques in computer aided engineering, NATO ASI on BEM in
CAD (1983).
[5] W. Shi, J. Liu, N. Kakani, T. Yu, A Fast Hierarchical Algorithm for 3–D Capacitance Extraction, Proc.
35th Design Automation Conference (DAC) (June 1998).
[6] M. Beattie, L. Pileggi, IC Analyses Including Extracted Inductance Models, 36th Design Automation
Conference (DAC) (June 1999).
[7] L. Nagel, R. Rohrer, Computer Analysis of Nonlinear circuits, Excluding Radiation (CANCER), IEEE
Journal of Solid State Circuits, SC–6, pp. 162–182 (Aug. 1971).
ICCAD2000, Pages 222-228
Hurwitz Stable Reduced Order Modeling for RLC Interconnect Trees
Xiaodong Yang1, Chung-Kuan Cheng2, Walter H. Ku1, Robert J. Carragher3
1
Dept. of E. C. E., U.C. San Diego, 2Dept. of C. S. E., U.C. San Diego, 3Fujitsu Laboratories of America
Abstract
We present a new realizable reduced order modeling technique for RLC interconnect
trees. Both lumped and distributed wire models can be used with this technique. Provable
stability is achieved by using Hurwitz polynomials. Moment computation process is
avoided but moments can still be matched implicitly. In experiments, the proposed
Hurwitz three-pole model can accurately and efficiently capture inductive effect for both
near end and far end nodes.
References
[1] L. T. Pillage, R. A. Rohrer, “Asymptotic Waveform Evaluation for Timing Analysis”, IEEE Trans. on
CAD, Apr.1990,vol 9, p.p. 352-66.
[2] C. L. Ratzlaff, L. T. Pillage, “RICE: Rapid Interconnect Circuit Evaluation Using AWE”, IEEE Trans.
on CAD, Jun. 1994,vol 13, p.p. 763-6
[3] D. F. Anastasakis, N. Gopal, S.Y. Kim, L. T. Pillage, “On the stability of moment-matching
approximations in asymptotic waveform evaluation”, Proc. ACM/IEEE DAC, June, 1993. pp.367-72
[4] M. Sriram and S. M kahng, “Fast Approximation of the Transient Response of Lossy Transmission Line
Trees”, Proc. 30th ACM/IEEE DAC., June 1993, pp. 691-6
[5] D. S. Gao, D. Zhou, “Propagation delay in RLC interconnection Networks”, Proc. IEEE ISCAS, May
1993. p.2125-8
[6] S. Muddu and A. B Khang, “An Analytical Delay Model for RLC Interconnects”, IEEE Trans. on
Computer-Aided Design of Integrated Circuits and System, Vol.16, No, 12, Dec. 1997
[7] M.Sriram, S.M.kahng, “Performance Driven MCM Routing Using a Second Order RLC Tree Delay
Model”, 1993 Proceedings. Fifth Annual IEEE International Conference onWafer Scale Integration,
pp.262-7
[8] Y. I. Ismail, E. G. Friedman and Jose L. Neves,”Equivalent Elmore Delay for RLC Trees”, ACM/IEEE
Proc. DAC, June 1999, pp. 715-721
[9] F. J. Liu, C. K. Cheng, “Extend Moment Computation to 2-Port Circuit Representations”. ACM/IEEE,
Proc. 35th DAC, June. 1998.
[10] A. Odabasioglu, M. Celik, L. T. Pileggi, “PRIMA: passive reduced-order interconnect macromodeling
algorithm”, IEEE Trans. on CAD. vol.17,Aug. 1998. p.645-54
[11] C. J. Alpert, A. Devgan and S. T. Quay, “Buffer Insertion with Accurate Gate and Interconnect Delay
Computation”, ACM/IEEE, Proc. DAC, June 1999, pp. 479-84
[12] S. Barnett, Polynomials and Linear Control Systems. Marcel Dekker, Inc. 1983
[13] P. R. O’Brien and T. L. Savarino, “Modeling the Driving-Point Characteristic of Resistive
Interconnect for Accurate Delay Estimation”, Proc. IEEE ICCAD, 1989, pp. 512-5.
[14] B. N. Sheehan, “TICER: Realizable reduction of extracted RC circuits”. 1999 IEEE/ACM
International Conference on Computer-Aided Design. pp.200-3.
[15] A. Devgan, P. R. O’Brien, “Realizable reduction for RC interconnect circuits”. 1999 IEEE/ACM
International Conference on Computer-Aided Design, pp.204-7
[16] Xiaodong Yang, “Technical report on RLC tree Delay calculation”, UCSD
[17] J. Qian; S. Pullela; L. Pillage. “Modeling the Effective capacitance for the RC interconnect of CMOS
gates”. IEEE Trans. on CAD, vol.13, (no.12), Dec. 1994. pp.1526-35.
[18] Personnel Communication with Frank Liu in Synopsys
ICCAD2000, Pages 229-234
An “Effective” Capacitance Based Delay Metric for RC Interconnect
Chandramouli V. Kashyap, Charles J. Alpert, and Anirudh Devgan
IBM Corp., Austin, TX 78758
Abstract
Efficient, yet accurate delay estimation for RC interconnect is required for the
optimization loop of timing-driven physical design tools. For many applications, the
Elmore delay metric [4] has been widely used due to its efficiency and ease of use.
However, it is well known that the Elmore metric can have significant error since it
ignores the resistive shielding of downstream capacitance. We present a new interconnect
metric called ECM that accounts for this resistive shielding by computing an effective
capacitance to model the downstream capacitance. ECM can also be computed with the
same complexity as the Elmore delay and does not require the computation of moments.
Experiments show that ECM is significantly more accurate than Elmore delay and is
competitive with other metrics that use multiple
moments.
REFERENCES
[1] C. J. Alpert, A. Devgan, and C. Kashyap, “A Two Moment RC Delay Metric for Performance
Optimization”, Intl. Symp. Physical Design, pp. 69-74, 2000.
[2] “The AS/X User’s Guide”, copyright IBM Corporation, 1996.
[3] J. Cong, L. He, C.-K. Koh, and P. H. Madden, “Performance Optimization of VLSI Interconnect
Layout”, Integration: the VLSI Journal, 21, 1996, pp. 1-94.
[4] W. C. Elmore, “The Transient Response of Damped Linear Network with Particular Regard to
Wideband Amplifiers”, J. Applied Physics, 19, 1948, pp. 55-63.
[5] R. Gupta, B. Tutuianu, and L. T. Pileggi, “The Elmore Delay as a Bound for RC Trees with Generalized
Input Signals”, IEEE Trans. on Computer Aided Design, 16(1), pp. 95-104, 1997.
[6] A. B. Kahng and S. Muddu, “Two-Pole Analysis of Interconnection Trees”, Proceedings IEEE MultiChip Module Conference, Santa Cruz, February 1995, pp. 105-110.
[7] A. B. Kahng And S. Muddu, “A General Methodology for Response and Delay Computations in VLSI
Interconnects”, UCLA CS Dept. TR-940015, 1994.
[8] A. B. Kahng and S. Muddu, “An Analytical Delay Model for RLC Interconnects”, IEEE Trans. on
Computer-Aided Design, 16(12) (1997), pp. 1507-1514.
[9] R. Kay and L. Pileggi, “PRIMO: Probability Interpretation of Moments for Delay Calculation”,
IEEE/ACM Design Automation Conference, 1998, pp. 463-468.
[10] T. Lin, E. Acar, and L. Pileggi, “h-gamma: An RC Delay Metric Based on a Gamma Distribution
Approximation to the Homogeneous Response”, IEEE/ACM ICCAD, pp. 19-25.
[11] P. R. O’Brien and T. L. Savarino, “Modeling the Driving-Point Characteristic of Resistive
Interconnect for Accurate Delay Estimation”, IEEE/ACM ICCAD, 1989, pp. 512-515.
[12] L. T. Pileggi, “Timing Metrics for Physical Design of Deep Submicron Technologies”, Intl. Symp.
Physical Des., 1998, 28-33.
[13] L. T. Pillage and R. A. Rohrer, “Asymptotic Waveform Evaluation for Timing Analysis”, IEEE
TCAD, 9(4), 352-366, 1990.
[14] J. Qian, S. Pulllela, and L. Pillage, “Modeling the “Effective Capacitance” for the RC Interconnect of
CMOS Gates”, IEEE Trans. CAD, 13(12), 1994, pp. 1526-1535.
[15] J. Rubenstein, P. Penfield, Jr., and M. A. Horowitz, “Signal Delay in RC Tree Networks”, IEEE Trans.
on Computer-Aided Design, CAD-2, (July 1983), pp. 202-211.
[16] B. Tutuianu, F. Dartu, and L. Pileggi, “Explicit RC-Circuit Delay Approximation Based on the First
Three Moments of the Impulse Response”, IEEE/ACM DAC, 1996, pp. 611-616.
ICCAD2000, Pages 236-243
INCREMENTAL CAD
Olivier Coudert
Monterey Design Systems, Sunnyvale, CA 94089-1443
Jason Cong
Computer Science Dept., University of California, Los Angeles, CA 90095
Sharad Malik
Dept. of Electrical Engineering, Princeton University, Princeton, NJ 08544
Majid Sarrafzadeh
Dept. of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208
ABSTRACT
Comprehensive study of incremental algorithms and solutions in the context of CAD tool
development is an open area of research with a great deal of potential. Incremental
algorithms for synthesis and layout are needed when design undergoes local or
incremental change. Often these local changes are made to react to local change in the
design, correct local errors or to make local improvements in one or more of the design
quality metrics. In this paper we outline fundamental problems in incremental logic
synthesis and physical design. Preliminary solutions to a subset of these problems will be
outlined.
REFERENCES
[1] M.H. Arnold and W.S. Scott. An interactive maze router with hints. In Proc. 25th Design Automation
Conference, pages 672-676, Jun 1988.
[2] D. Brand, A. Drumm, S. Mundu, and P. Narain. “Incremental Synthesis". In Proceedings of the
International Conference on Computer-Aided Design, pages 14-18. IEEE, November 1994.
[3] D. Brand, Anthony Drumm, Sandip Kundu, and Prakash Narain. Incremental synthesis. In IEEE
International Conference on Computer-Aided Design, 1994.
[4] C.-S. Choy, T.-S. Cheung, and K.-K. Wong. “Incremental Layout Placement Modification Algorithms".
IEEE Transactions on Computer Aided Design, 15(4):437-445, April 1996.
[5] J. Cong, J. Fang, and K.Y. Khoo. An implicit connection graph maze routing algorithm for ECO
routing. In Proc. ACM/IEEE International Conference on Computer Aided Design, pages 163-167, Nov
1999.
[6] J. Cong, J. Fang, and K.Y. Khoo. Via design rule consideration in multi-layer maze routing algorithms.
In Proc. International Symposium on Physical Design, pages 214-220, Apr 1999.
[7] J. Cong and L. He. Theory and algorithm of local-refinement-based optimization with application to
device and interconnect sizing. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, pages 406-420, 1999.
[8] J. Cong and H. Huang. Technology mapping for field programmable gate arrays with incremental
changes. In IEEE-ACM Design Automation Conference, pages 290-293, 2000.
[9] J. Cong, C.-K. Koh, and P. Madden. Performance optimization of VLSI interconnect layout.
Intergration, the VLSI Journal, 21(1-2):1-94, 1996.
[10] J. Cong and Majid Sarrafzadeh. “Incremental Physical Design". In International Symposium on
Physical Design, pages 84-92, 2000.
[11] Olivier Coudert. Gate sizing for constrained delay/power/area optimization. pages 465-472, December
1997.
[12] J. Crenshaw, M. Sarrafzadeh, P. Banerjee, and P. Prabhakaran. “An Incremental Floorplanner". In
Great Lakes Symposium on VLSI, March 1999.
[13] H. Edelsbrunner. A new approach to rectangle intersections. Intl. Journal of Computer Mathematics,
13(3-4):209-229, 1983.
[14] J.M. Emmert and D. Bhatia. \Incremental Routing in FPGAs". In IEEE International ASIC Conference
and Exhibit, 1998.
[15] J. P. Fishburn and A. E. Dunlop. TILOS: A posynomial programming approach to transistor sizing. In
IEEE International Conference on Computer-Aided Design, pages 326-328, 1985.
[16] J. Grodstein, E. Lehman, H. Harkness, B. Grundmann, and Y. Watanabe. A delay model for logic
synthesis of continuously-sized networks. In IEEE International Conference on Computer-Aided Design,
1995.
[17] Gary D. Hachtel and Fabio Somenzi. Logic Synthesis and Verification Algorithms. Kluwer Academic
Publishers, 1996.
[18] R. Haddad, L. P. P. P. van Ginneken, and N. Shenoy. Discrete drive selection for continuous sizing. In
IEEE International Conference on Computer Design, 1997.
[19] F.O. Hadlock. A shortest path algorithm for grid graphs. Networks, 7(4):323-334, 1977.
[20] A Hetzel. A sequential detailed router for huge grid-graphs. In Proc. Design Automation and Test in
Europe, pages 332-338, Feb 1998.
[21] J. Hu and S. S. Sapatnekar. Simultaneous buffer insertion and non-Hanan optimization for VLSI
interconnect under a higher order AWE model. In International Symposium on Physical Design, 1999.
[22] J. Fang J. Cong and K.Y. Khoo. DUNE: A multi-layer gridless routing system with wire planning. In
Proc. International Symposium on Physical Design, 2000.
[23] Y. Jiang, S. S. Sapatnekar, C. Bamji, and J. Kim. Interleaving buffer insertion and transistor sizing into
a single optimization. December 1998.
[24] A.B. Kahng and S. Mantik. “Mismatches of Incremental Optimizers and Instance Perturbations in
Current Place-and-Route Tools". In Proceedings of International Conference on Computer-Aided Design,
2000.
[25] I. Kato, S. Ohhira, and Y. Hisatomi. A method of pattern data management of PWB layout system. In
Proc. 35th Annual Convention IPS Japan, pages 2429-2430, 1987.
[26] K. Kawamura, T. Shindo, T. Shibuya, H. Miwatari, and Y. Ohki. Touch and cross router. In Proc. of
IEEE Conference on Computer-Aided Design, pages 56-59, Nov 1990.
[27] E.S. Kuh and T. Ohtsuki. Recent advances in VLSI layout. Proc. of the IEEE, 78(2):237-263, Feb
1990.
[28] C.Y. Lee. An algorithm for path connections and its applications. IRE Trans Electronic Computers,
EC-10:346-365, 1961.
[29] Y.-L. Lin, Y.-C. Hsu, and F.-S. Tsai. Silk: a simulated evolution router. IEEE Transactions on
Computer-Aided Design, 8(10), Oct 1989.
[30] L.-C. Liu, H.-P. Tseng, and C. Sechen. Chip-level area routing. In Proc. of Interational Symposium on
Physical Design, pages 197-204, Apr 1998.
[31] J. Jess M. Berkelaar. Gate sizing in mos digital circuits with linear programming. pages 217-221,
1990.
[32] K. Eguro M. Wang, X. Yang and M. Sarrafzadeh. “Multi-center Congestion Estimation and
Minimization During Placement". In International Symposium on Physical Design, 2000.
[33] X. Yang M. Wang and M. Sarrafzadeh. “DRAGON2000: A Fast Standard-Cell Placement Tool". In
Proceedings of International Conference on Computer-Aided Design, 2000.
[34] A. Margarino, A. Romano, A. De Gloria, F. Curatelli, and P. Antognetti. A tile-expansion router.
IEEE Trans. Computer-Aided Design, CAD-6(4):507-517, Jul 1987.
[35] L. McMurchie and C. Ebeling. Pathfinder: a negotiation-based performance-driven router for FPGAs.
In Proc. of ACM Symposium on Field-Programmable Gate Array, pages 111-117, Feb 1995.
[36] S. Manne O. Coudert, R. Haddad. New algorithms for gate sizing: A comparative study. In IEEEACM Design Automation Conference, pages 197-202, 1996.
[37] T. Ohtsuki. Gridless routers | new wire routing algorithms based on computational geometry. In Proc.
International Conference of Circuits and Systems, 1985.
[38] T. Okamoto and J. Cong. Interconnect layout optimization by simultaneous steiner tree construction
and buffer insertion. In International Symposium on Physical Design, 1996.
[39] J.K. Ousterhout. Corner stitching: a data-structuring technique for VLSI layout tools. IEEE Trans.
Computer-Aided Design, CAD-3(1):87-100, Jan 1984.
[40] J.K. Ousterhout, G.T. Hamachi, R.N. Mayo, W.S. Scott, and G.S. Taylor. Magic: A VLSI layout
system. In Proc. 21st Design Automaton Conference, pages 152-159, Jun 1984.
[41] M. Pedram and N. Bhat. Layout driven logic restructuring and decomposition. In IEEE International
Conference on Computer-Aided Design, 1991.
[42] M. Pedram and N. Bhat. Layout driven technology mapping. In IEEE-ACM Design Automation
Conference, 1991.
[43] S. Raman, C.L. Liu, and L.G. Jones. “A Timing Constrained Incremental Routing Algorithm for
Symmetrical FPGAs". In European Design and Test Conference, 1996.
[44] P. M. Vaidya S. S. Sapatnekar, V. B. Rao and S. M. Kang. An exact solution to the transistor sizing
problem for cmos circuits using convex optimization. pages 1621-1634, December 1993.
[45] A. Salek, J. Lou, and M. Pedram. A simultaneous routing tree construction and fanout optimization
algorithm. In IEEE International Conference on Computer-Aided Design, 1998.
[46] M. Sato, J. Sakanaka, and T. Ohtsuki. A fast line-search method based on a tile plane. In IEEE
International Symposium on Circuits and Systems, pages 588-591, May 1987.
[47] J. Soukup. Fast maze router. In Proc. 15th Design Automation Conference, pages 100-102, 1978.
[48] R. F. Sproull and I. E. Sutherland. Logical effort: Designing for speed on the back of an envelope. In
Proceedings of the IEEE Advanced Research in VLSI Conference, 1991.
[49] L. Stok, D.S. Kung, D. Brand, A.D. Drumm, A.J. Sullivan, L.N. Reddy, N. Hieter, D.J. Geiger, H.
Chao, and P.J. Osler. “BooleDozer: Logic Synthesis for ASICs". IBM Journal of Research and
Development, 40(4):407-430, July 1996.
[50] G. Swamy, S. Rajamani, C. Lennard, and R.K. Brayton. “Minimal Logic Resynthesis for Engineering
Change". In International Symposium on Physical Design, pages i1596-1599. IEEE, 1997.
[51] H. J. Touati. Performance Oriented Technology Mapping. PhD thesis, University of California,
Berkeley, 1990.
[52] Lukas P. P. P. van Ginneken. Buffer placement in distributed RC-tree networks for minimal Elmore
delay. In IEEE International Symposium on Circuits and Systems, 1990.
[53] M. Wang and M. Sarrafzadeh. “Behavior of Congestion Minimization During Placement". In
International Symposium on Physical Design, pages 145-150. ACM, April 1999.
[54] P.Widmayer. On graphs preserving rectilinear shortest paths in the presence of obstacles. Annals of
Operations Research, 33(1-4):557-75, Dec 1991.
[55] Y.F. Wu, P. Widmayer, M.D.F Schlag, and C.K. Wong. Rectilinear shortest paths and minimum
spanning trees in the presence of rectilinear obstacles. IEEE Trans. Computers, C-36(3):321-331, Mar
1987.
[56] S.Q. Zheng, Joon Shink Lim, and S.S. Iyengar. Finding obstacle-avoiding shortest paths using implicit
connection graphs. IEEE Trans. Computer-Aided Design, 15(1):103-110, Jan 1996.
[57] Hai Zhou, Martin Wong, I-Min Liu, and Adnan Aziz. Simultaneous routing and buffer insertion with
restrictions on buffer locations. In IEEE-ACM Design Automation Conference, 1999.
ICCAD2000, Pages 245-252
Decomposing Refinement Proofs Using Assume-Guarantee Reasoning
Thomas A. Henzinger,
University of California, Berkeley
Shaz Qadeer
Compaq Systems Research
Sriram K. Rajamani
Center Microsoft Research
Abstract
Model-checking algorithms can be used to verify, formally and automatically, if a lowlevel description of a design conforms with a high-level description. However, for
designs with very large state spaces, prior to the application of an algorithm, the
refinement-checking task needs to be decomposed into subtasks of manageable
complexity. It is natural to decompose the task following the component structure of the
design. However, an individual component often does not satisfy its requirements unless
the component is put into the right context, which constrains the inputs to the component.
Thus, in order to verify each component individually, we need to make assumptions
about its inputs, which are provided by the other components of the design. This
reasoning is circular: component A is verified under the assumption that context B
behaves correctly, and symmetrically, B is verified assuming the correctness of A. The
assume-guarantee paradigm provides a systematic theory and methodology for ensuring
the soundness of the circular style of postulating and discharging assumptions in
component-based reasoning.
We give a tutorial introduction to the assume-guarantee paradigm for decomposing
refinement-checking tasks. To illustrate the method, we step in detail through the formal
verification of a processor pipeline against an instruction set architecture. In this example,
the verification of a three-stage pipeline is broken up into three subtasks, one for each
stage of the pipeline.
References
[AL93] M. Abadi and L. Lamport. Composing specifications. ACM Transactions on Programming
Languages and Systems, 15(1):73-132, 1993.
[AL95] M. Abadi and L. Lamport. Conjoining specifications. ACM Transactions on Programming
Languages and Systems, 17(3):507-534, 1995.
[Eir98] A.T. Eiriksson. The formal design of 1M-gate ASICs. In G. Gopalakrishnan and P. Windley,
editors, FMCAD 98: Formal Methods in Computer-aided Design, Lecture Notes in Computer Science
1522, pages 49-63. Springer-Verlag, 1998.
[HLQR99] T.A. Henzinger, X. Liu, S. Qadeer, and S.K. Rajamani. Formal specification and verification of
a dataow processor array. In Proceedings of the International Conference on Computer-aided Design, pages
494-499. IEEE Computer Society Press, 1999.
[HQR98] T.A. Henzinger, S. Qadeer, and S.K. Rajamani. You assume, we guarantee: methodology and
case studies. In A. Hu and M. Vardi, editors, CAV 98: Computer-aided Verification, Lecture Notes in
Computer Science, pages 440-451. Springer-Verlag, 1998.
[HQR99] T.A. Henzinger, S. Qadeer, and S.K. Rajamani. Assume-guarantee refinement between different
time scales. In N. Halbwachs and D. Peled, editors, CAV 99: Computer-aided Verification, Lecture Notes
in Computer Science 1633, pages 208-221. Springer-Verlag, 1999.
[HQRT98] T.A. Henzinger, S. Qadeer, S.K. Rajamani, and S. Tasiran. An assume-guarantee rule for
checking simulation. In G. Gopalakrishnan and P. Windley, editors, FMCAD 98: Formal Methods in
Computer-aided Design, Lecture Notes in Computer Science 1522, pages 421-432. Springer-Verlag, 1998.
[MC81] J. Misra and K.M. Chandy. Proofs of networks of processes. IEEE Transactions on Software
Engineering, SE-7(4):417-426, 1981.
[McM97] K.L. McMillan. A compositional rule for hardware design refinement. In O. Grumberg, editor,
CAV 97: Computer-aided Verification, Lecture Notes in Computer Science 1254, pages 24-35. SpringerVerlag, 1997.
[McM98] K.L. McMillan. Verification of an implementation of Tomasulo's algorithm by compositional
model checking. In A. Hu and M. Vardi, editors, CAV 98: Computer-aided Verification, Lecture Notes in
Computer Science 1427. pages 110-121. Springer-Verlag, 1998.
[McM99] K.L. McMillan. Circular compositional reasoning about liveness. In L. Pierre and T. Kropf,
editors, CHARME 99: Correct Hardware Design and Verification, Lecture Notes in Computer Science
1703, pages 342-345. Springer-Verlag, 1999.
ICCAD2000, Pages 254-259
Effective Partition-Driven Placement with Simultaneous Level
Processing and Global Net Views
Ke Zhong and Shantanu Dutt
Dept. of EECS, University of Illinois-Chicago
Abstract
In this paper we take a fresh look at the partition-driven placement (PDP) paradigm for
standard-cell placement for wire-length minimization. The goal is to develop several new
algorithms for incorporation into a PDP framework that can rectify the well-known
drawbacks of traditional PDP (increasingly localized view of nets with increasing levels
of the partitioning tree, min-cut objective, inaccuracy and cost of terminal propagation
(TP), irreversibility of move decisions), while preserving its considerable advantages
(time efficiency, flexibility in accurately incorporating many optimization metrics, and
flexibility in satisfying most constraints). We have developed several novel techniques
within a PDP-based framework that yield the best wire-length results so far on all but two
of the MCNC benchmark suite. Our major innovations are: (1) simultaneous level
partitioning (SLP) in which we partition the entire circuit globally in every level of the
partitioning tree, across the current cutline(s); (2) cell gain computation based on a global
or distributed view of entire nets (thus obviating TP) and on the bounding-box (BB)
minimization of nets (as opposed to mincut in prior PDP); (3) move irreversibility tackled
in a post-processing phase via vertical and horizontal swaps. Empirical results indicate
that our PDP algorithm SPADE (for Simultaneous level PArtitioning with Distributed
[i.e., global] nEt views) provides almost 20% better wirelength results than an internal
version of "regular" PDP with min-cut based gains, 10.8% better than the previous best
PDP method QUAD, 10.6% better than TimberWolf (TW) 7.0, 15.8% better than the
state-of-the-art force-directed technique from U. Munich (termed FD-98 here), and
15.3% better than the multilevel placement technique Snap-On. Besides TW7.0, we are
also the only ones to report results on the approximately 100K-cell circuit golem3 (12.2%
better than TW7.0). Our run times are quite reasonable.
References
[1] M.Breuer, “A Class of Min-cut Placement Algorithm,” Proc. 14th DAC, pp. 284-290, 1977.
[2] A. Dunlop and B. Kernighan, “A Procedure for Placement of Standard Cell VLSI Circuits,” IEEE
Trans. CAD, Vol. CAD-4, No. 1, pp. 92-98, Jan. 1985.
[3] P. Suaris and G. Kedem, “An Algorithm for Quadrisection and Its Application to Standard Cell
Placement,” IEEE Trans. CAS, Vol. 35, No. 3, pp. 294-303, Mar. 1988.
[4] D. J-H. Huang and A. B. Kahng, “Partitioning-based Standard-cell Global Placement with An Exact
Objective,” Proc. ISPD, pp. 18-25, 1997.
[5] A. E. Caldwell, A. B. Kahng, S. Mantik, I. L. Markov and A. Zelikovsky, “On Wirelength Estimations
for Row-Based Placement,” IEEE Trans. on CAD, pp. 1265-1278, Vol. 18, No. 9, Sept. 1999.
[6] C. M. Fiduccia and R. M. Mattheyses, “A Linear-time Heuristic for Improving Network Partitions,”
Proc. 19th DAC, pp. 175-181, 1982.
[7] S. Dutt and W. Deng, “A Probability-based Approach to VLSI Circuit Partitioning,” Proc. 33rd DAC,
pp. 100-105, 1996.
[8] S. Dutt and W. Deng, “VLSI Circuit Partitioning by Cluster-removal Using Iterative Improvement
Techniques,” Proc. ICCAD, pp. 194-200, 1996.
[9] C. J. Alpert and A. B. Kahng, “Recent Directions in Netlist Partitioning: A Survey,” Integration, The
VLSI Journal, 19(1-2), pp. 1-81, 1995.
[10] G. Karypis, R. Aggarwal, V. Kumar and S. Shekhar, “Multilevel Hypergraph Partitioning: Application
to VLSI Domain,” Proc. 34th DAC, pp. 526-529, 1997.
[11] C. J. Alpert, D. J. Huang and A. B. Kahng, “Multilevel Circuit Partitioning,” Proc. 34th DAC, pp. 530533, 1997.
[12] C. Sechen, VLSI Placement and Global Routing Using Simulated Annealing, Kluwer, B.V., Deventer,
The Netherlands, 1988.
[13] W-J. Sun and C. Sechen, “Efficient and Effective Placement for Very Large Circuits,” IEEE Trans.
CAD, Mar. 1995.
[14] J.M. Kleinhans, G. Sigl, F.M. Johannes and K.J. Antreich, “GORDIAN: VLSI Placement by Quadratic
Programming and Slicing Optimization,” IEEE Trans. CAD, vol.
10, no. 3, pp. 356-365, 1991.
[15] G. Sigl, K. Doll and F.M. Johannes, “Analytical placement: A Linear or Quadratic Objective
Function?”, Proc. 28th DAC, pp. 427-432, 1991.
[16] H. Eisenmann and F. M. Johannes, “Generic Global Placement and Floorplanning,” Proc. 35th DAC,
pp. 269-274, 1998.
[17] G. Persky, “PRO: an Automatic String Placement Program for Polycell Layout”, Proc. 13th DAC, pp.
417-423, 1976.
[18] X. Yang, M. Wang, K. Eguro and M. Sarrafzadeh, “A Snap-On Placement Tool”, Proc. ISPD-2000,
pp. 153-158.
ICCAD2000, Pages 260-263
DRAGON2000: STANDARD-CELL PLACEMENT TOOL FOR LARGE
INDUSTRY CIRCUITS
Maogang Wang Xiaojian Yang Majid Sarrafzadeh
Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208
ABSTRACT
In this paper, we develop a new standard cell placement tool, Dragon2000, to solve large
scale placement problem effectively. A top-down hierarchical approach is used in
Dragon2000. State-of-the-art partitioning tools are tightly integrated with wirelength
minimization techniques to achieve superior performance. We argue that net-cut
minimization is a good and important short-cut to solve the large scale placement
problem. Experimental results show that minimizing net-cut is more important than
greedily obtain a wirelength optimal placement at intermediate hierarchical levels. We
run Dragon2000 on recently released large benchmark suite ISPD98 as well as MCNC
circuits. For circuits which have more than 100k cells, comparing to iTools1.4.0,
Dragon2000 can produce slightly better placement results (1: 4%) while spending much
less amount of time (2X speedup). This is also the first published placement result on the
publicly available large industrial circuits.
REFERENCES
[1] C. J. Alpert. “The ISPD98 Circuit Benchmark Suite”. In International Symposium on Physical Design,
pages 18–25. ACM, April 1998.
[2] A. E. Caldwell, A. B. Kahng, and I. L. Markov. “Can Recursive Bisection Alone Produce Routable
Placements?”. In Design Automation Conference. IEEE/ACM, 2000.
[3] A. E. Dunlop and B. W. Kernighan. “A Procedure for Placement of Standard Cell VLSI Circuits”. IEEE
Transactions on Computer Aided Design, 4(1):92–98, January 1985.
[4] H. Eisenmann and F. M. Johannes. “Generic Global Placement and Floorplanning”. In Design
Automation Conference, pages 269–274. IEEE/ACM, 1998.
[5] G. Karypis and V. Kumar. “Multilevel k-way Hypergraph Partitioning”. In Design Automation
Conference, pages 343–348, 1999.
[6] J. M. Kleinhans, G. Sigl, F. M. Johannes, and K. J. Antreich. “GORDIAN: VLSI Placement by
Quadratic Programming and Slicing Optimization”. IEEE Transactions on Computer Aided Design,
10(3):365–365, 1991.
[7] M. Sarrafzadeh and M. Wang. “NRG: Global and Detailed Placement”. In International Conference on
Computer-Aided Design. IEEE, November 1997.
[8] W. J. Sun and C. Sechen. “A Loosely Coupled Parallel Algorithm for Standard Cell Placement ”. In
International Conference on Computer-Aided Design, pages 137–144. IEEE, 1994.
[9] M. Wang, X. Yang, and M. Sarrafzadeh. “Congestion Minimization During Placement”. IEEE
Transactions on Computer Aided Design, 2000. to appear.
[10] X. Yang, M. Wang, K. Eguro, and M. Sarrafzadeh. “A Snap-On Placement Tool”. In International
Symposium on Physical Design, pages 153–158. ACM, April 2000.
ICCAD2000, Pages 264-270
Data Path Placement with Regularity
Terry Tao Ye, Giovanni De Micheli
Department of Electrical Engineering, Stanford University, CA 94305
Abstract
As more data processing functions are integrated into systems-on-chip, data path is
becoming a critical part of the whole VLSI design. However, traditional physical design
methodology can not satisfy the data path performance requirement because it has no
knowledge of the data path bit-sliced structure. In this paper, an Abstract Physical Model
(APM) is proposed to extract bit-slice regularity information from Data Flow Graph
(DFG) and it is used for interconnect and congestion planning. A two step heuristic
algorithm is introduced to optimize the linear placement of APM to satisfy both the wire
length and routing track budget.
References
[1] Askar, S.; Ciesielski, M.; Analytical approach to custom datapath design Computer-Aided Design,
1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on , 1999 , Page(s): 98 -101
[2] Serdar, T.; Sechen, C. ; AKORD: transistor level and mixed transistor/gate level placement tool for
digital data paths Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM
International Conference on , 1999 , Page(s): 91-97
[3] Kim, J.; Kang, S.M. ;
A timing-driven data path layout synthesis with integer programming
Computer-Aided Design, 1995. ICCAD-95. Digest of Technical Papers., 1995 IEEE/ACM International
Conference on , 1995 , Page(s): 716 -719
[4] Synopsys Module Compiler User Manual
[5] Nijssen, R. X. T. ; van Eijk, C. A. J. ; Regular layout generation of logically optimized datapaths
Proceedings of the 1997 international symposium on Physical design, 1997, Page(s): 42 -47
[6] Arikati, S.R.; Varadarajan, R. ; A signature based approach to regularity extraction Computer-Aided
Design, 1997. Digest of Technical Papers., 1997 IEEE/ACM International Conference on , 1997 , Page(s):
542 -545
[7] Hassoun, S.; McCreary, C. ; Regularity extraction via clanbased structural circuit decomposition
Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on ,
1999 , Page(s) : 414 -418
[8] Luk, W.K.; Dean, A.A. ;
Multistack optimization for datapath chip layout Computer-Aided Design
of Integrated Circuits and Systems, IEEE Transactions on Volume: 10 1 , Jan. 1991 , Page(s): 116 -129
[9] Cai, H.; Note, S.; Six, P.; de Man, H. ; A data path layout assembler for high performance DSP
circuits Design Automation Conference, 1990. Proceedings., 27th ACM/IEEE , 1990, Page(s): 306 -311
[10] Buddi, N.; Chrzanowska-Jeske, M.; Saxe, C.L. ; Layout synthesis for data-path designs Design
Automation Conference, 1995, with EURO-VHDL, Proceedings EURO-DAC '95., European , 1995 ,
Page(s): 86 -90
[11] Yim, J.-S.; Kyung, C.-M. ; Data path layout optimisation using genetic algorithm and simulated
annealing Computers and Digital Techniques, IEE Proceedings- Volume: 145 2, March 1998 , Page(s):
135 -141
[12] Nakao, H.; Kitada, 0.; Hayashikoshi, M.; Okazaki, K.;Tsujihashi, Y. ; A high density data path layout
generation method under path delay constraints. Custom Integrated Circuits Conference, 1993.,
Proceedings of the IEEE 1993 , 1993 , Page(s): 9.5.1 -9.5.5
[13] Leveugle, R.; Safinia, C.; Magarshack, P.; Sponga, L. ; Data path implementation: bit-slice structure
versus standard cells Euro ASIC '92, Proceedings. , 1992 , Page(s): 83 -88
[14] Ienne, P.; Griessing, A. ; Practical experiences with standardcell based data path design tools. Do
we really need regular layouts? Design Automation Conference, 1998. Proceedings, 1998 , Page(s): 396 401
[15] Chowdhury, S. ; Analytical approaches to the combinatorial optimization in linear placement
problems Computer-Aided Design of Integrated Circuits and S stems, IEEE Transactions on Volume: 86 ,
June 1989 , Page(s): 630 -639
[16] Kang, S. ; Linear Ordering and Application to Placement Proc. 20th Design Automation Conf., 1983,
pp. 457-464
[17] Sung-Woo Hur; Lillis, J. ; Relaxation and clustering in a local search framework: application to
linear placement Design Automation Conference, 1999. Proceedings. 36th, 1999 , Page(s): 360 -366
[18] Alpert, C.J.; Chan, T.; Huang, D.J.-H.; Markov, I.; Yan, K. Quadratic Placement Revisited Design
Automation Conference, 1997. Proceedings of the 34th , Page(s): 752 -757
ICCAD2000, Pages 272-276
Efficient Finite-Difference Method for Quasi-Periodic
Steady-State and Small Signal Analyses
Baolin Yang, Dan Feng
Cadence, San Jose, CA
ABSTRACT
This paper discusses a finite-difference mixed frequency-time (MFT) method for the
quasi-periodic steady-state analysis and introduces the quasi-periodic small signal
analysis. A new approach for solving the huge nonlinear system the MFT finite
difference method generates from practical circuits is given, which makes efficient
frequency-sweeping quasi-periodic small-signal analysis possible. The new efficient
solving technique works well with the Krylov-subspace recycling or reuse [4], which can
not be achieved with existing techniques. In addition, this paper gives a way to calculate
the quasi-periodic Fourier integration weights, necessary in the adjoint MFT small-signal
analyses, and a way to calculate quasi-periodic large-signal Fourier spectrum that is more
efficient than existing methods. Numerical examples also show that the finite-difference
MFT method can be significantly more accurate than shooting-Newton MFT method and
the new preconditioning technique is more efficient.
REFERENCES
[1] D. Feng, J. Phillips, K. Nabors, K. Kundert and J. White, Efficient computation of quasi-periodic circuit
operating conditions using a mixed frequency/time approach, Proc.
36th Design Automation Conference, New Orleans, LA, June, 1999.
[2] B. Yang and J. Phillips, A multi-interval Chebyshev collocation method for efficient high-accuracy RF
circuit simulation, to appear in Proc. 36th Design Automation Conference, 2000.
[3] J. Roychowdhury and D. Long and P. Feldmann, Cyclostationary noise analysis of large RF circuits
with multitone excitations, IEEE J. Sol. St. Circuits, vol. 33, pp. 324-336, 1998.
[4] D. Feng, , unpublished.
[5] R. Melville and P. Feldmann and J. Roychowdhury, Efficient multi-tone distortion analysis of analog
integrated circuits, Proc. Custom Integrated Circuits Conference, May, 1995.
[6] R. Telichevesky and J. White and K. Kundert, Receiver characterization using periodic small-signal
analysis, Proceedings of the Custom Integrated Circuits Conference, May, 1996.
[7] J. Roychowdhury, Efficient methods for simulating highly nonlinear multirate circuits, Proc. 34th
Design Automation Conference, Anaheim, CA, June, 1997.
ICCAD2000, Pages 277-282
Noise Analysis of Phase-Locked Loops
Amit Mehrotra
Department of Electrical and Computer Engineering, Univerity of Illinois at Urbana-Champaign
Abstract
This work addresses the problem of noise analysis of phase locked loops (PLLs). The
problem is formulated as a stochastic differential equation and is solved in presence of
circuit white noise sources yielding the spectrum of the PLL output. Specifically, the
effect of loop filter characteristics, phase-frequency detector and phase noise of the open
loop voltage controlled oscillator (VCO) on the PLL output spectrum is quantified. These
results are derived using a full nonlinear analysis of the VCO in the feedback loop and
cannot be predicted using traditional linear analyses or the phase noise analysis of open
loop oscillators. The computed spectrum matches well with measured results,
specifically, the shape of the output spectrum matches very well with measured PLL
output spectra reported in the literature for different kinds of loop filters and phase
detectors. The PLL output spectrum computation only requires the phase noise of the
VCO, loop filter and phase detector noise, phase detector gain and loop filter transfer
function and does not require the transient simulation of the entire PLL which can be
very expensive. The noise analysis technique is illustrated with some examples.
References
[1] A. Mehrotra, Simulation and Modelling Techniques for Noise in Radio Frequency Integrated Circuits.
PhD thesis, University of California, Berkeley, 1999.
[2] A. Demir, A. Mehrotra, and J. Roychowdhury, “Phase noise in oscillators: a unifying theory and
numerical methods for characterisation,” in Proceedings 1998 Design Automation Conference, pp. 26–31,
1998.
[3] V. F. Kroupa and L. Šojdr, “Phase-lock loops of higher orders,” in Second International Conference on
Frequency Control and Synthesis, pp. 65–68, 1989.
[4] B. Kim, T. C. Weigandt, and P. R. Gray, “PLL/DLL system noise analysis for low jitter clock
synthesizer design,” vol. 4, pp. 31–34, 1994.
[5] U. L. Rohde, Microwave and wireless synthesizers : theory and design. Wiley, 1997.
[6] J. C. Nallatamby, M. Prigent, J. C. Sarkissian, R. Quere, and J. Obregon, “A new approach to nonlinear
analysis of noise behavior of synchronized oscillators and analog-frequency dividers,” IEEE Transactions
on Microwave Theory and Techniques, vol. 46, pp. 1168–1171, Aug. 1998.
[7] L. Lin, L. Tee, and P. R. Gray, “A 1.4GHz differential low-noise CMOS frequency synthesizer using a
wideband PLL architecture,” in Digest of Technical Papers, IEEE International Solid-State Circuits
Conference, pp. 204–205, 2000.
[8] K. Lim, S. Choi, and B. Kim, “Optimal loop bandwidth design for low noise PLL applications,” in
Proceedings of the ASP-DAC ’97. Asia and South Pacific Design Automation Conference 1997, pp. 425–
428, 1997.
[9] K. Lim, C.-H. Park, and B. Kim, “Low noise clock synthesizer design using optimal bandwidth,” in
Proceedings of the 1998 IEEE International Symposium on Circuits and Systems, vol. 1, pp. 163–166,
1998.
[10] G. Kolumbán, “Frequency domain analysis of sampling phase-locked loops,” in Proceedings 1988
IEEE International Symposium on Circuits and Systems, vol. 1, pp. 611–614, 1988.
[11] D. Asta and D. N. Green, “Analysis of a hybrid analog/switched-capacitor phase-locked loop,” IEEE
Transactions on Circuits and Systems, vol. 37, pp. 183–197, Feb. 1990.
[12] A. Demir, Analysis and simulation of noise in nonlinear electronic circuits and systems. PhD thesis,
UC Berkeley, 1997.
[13] W. E. Thain, Jr. and J. A. Connelly, “Simulating phase noise in phase-locked loops with a circuit
simulator,” vol. 3, pp. 1760–1763, 1995.
[14] L. Wu, H. Jin, and W. C. Black Jr., “Nonlinear behavioral modeling and simulation of phase-locked
and delay-locked systems,” in Proceedings of the IEEE 2000 Custom Integrated Circuits Conference, pp.
447–450, 2000.
[15] P. Heydani and M. Pedram, “Analysis of jitter due to power-supply noise in phase-locked loops,” in
Proceedings of the IEEE 2000 Custom Integrated Circuits Conference, pp. 443–446, 2000.
[16] B. K. Øksendal, Stochastic differential equations: an introduction with applications. Springer–Verlag,
1998.
[17] C. W. Gardiner, Handbook of stochastic methods for physics, chemistry, and the natural sciences, vol.
13 of Springer series in synergetics. Berlin, Heidelberg, New York, Tokyo: Springer–Verlag, seconded.,
1983.
[18] P. Dupuis and H. J. Kushner, “Stochastic systems with small noise, analysis and simulation; a phase
locked loop example,” SIAM Journal on Applied Mathematics, vol. 47, pp. 643–661, June 1987.
[19] A. Dembo and O. Zeitouni, Large deviations techniques and applications. Boston : Jones and Bartlett,
1993.
[20] M. I. Freidlin and A. D. Wentzell, Random Perturbations of Dynamical Systems. Springer–Verlag,
1984.
[21] B. Razavi, Design of Analog CMOS Integrated Circuits. McGraw Hill, 2000.
[22] J. F. Parker and D. Ray, “A 1.6-GHz CMOS PLL with on-chip loop filter,” IEEE Journal of SolidState Circuits, vol. 33, pp. 337–343, Mar. 1998.
[23] A. Ali and J. L. Tham, “A 900MHz frequency symthesizer with integrated LC voltage-controlled
oscillator,” in Digest of Technical Papers, IEEE International Solid-State Circuits Conference, pp. 390–
391, 1996.
[24] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” in Digest
of Technical Papers, IEEE International Solid-State Circuits Conference, pp. 372–373, 1998.
[25] A. Demir, E. Liu, and A. Sangiovanni-Vincentelli, “Time-domain non Monte-Carlo noise simulation
for nonlinear dynamic circuits with arbitrary excitations,” IEEE Transactions for Computer-Aided Design,
vol. 15, pp. 493–505, May 1996.
ICCAD2000, Pages 283-288
Computing Phase Noise Eigenfunctions Directly from Steady-State
Jacobian Matrices
Alper Demir, David Long, Jaijeet Roychowdhury
Bell Laboratories, Murray Hill, New Jersey, USA
Abstract
The main effort in oscillator phase noise calculation lies in computing a vector function
called the Perturbation Projection Vector (PPV). Current techniques for PPV calculation
use time domain numerics to generate the system's monodromy matrix, followed by full
or partial eigenanalysis. We present a superior method that finds the PPV using only a
single linear solution of the oscillator's time- or frequency-domain steady-state Jacobian
matrix. The new method is better suited for existing tools with fast harmonic balance or
shooting capabilities, and also more accurate than explicit eigenanalysis. A key
advantage is that it dispenses with the need to select the correct one-eigenfunction from
amongst a potentially large set of choices, an issue that explicit eigencalculation based
methods have to face.
References
[1] A. Demir. Floquet Theory and Nonlinear Perturbation Analysis for Oscillators with DifferentialAlgebraic Equations. In Proc. ICCAD, 1998. 1, 2, 2, 2
[2] A. Demir, A. Mehrotra, and J. Roychowdhury. Phase noise and timing jitter in oscillators. In Proc.
IEEE CICC, May 1998. 1
[3] A. Demir, A. Mehrotra, and J. Roychowdhury. Phase Noise in Oscillators: A Unifying Theory and
Numerical Methods for Characterization. In Proc. IEEE DAC, pages 26–31, June 1998. 1, 1, 2
[4] M. Farkas. Periodic Motions. Springer-Verlag, 1994. 3
[5] R.C. Melville, P. Feldmann, and J. Roychowdhury. Efficient multi-tone distortion analysis of analog
integrated circuits. In Proc. IEEE CICC, pages 241–244, May 1995. 2.4
[6] O. Narayan and J. Roychowdhury. Multi-time simulation of voltage-controlled oscillators. In Proc.
IEEE DAC, New Orleans, LA, June 1999. 2.4
[7] A. Nayfeh and B. Balachandran. Applied Nonlinear Dynamics. Wiley, 1995. 2.4
[8] W.P. Robins. Phase Noise in Signal Sources. Peter Peregrinus, 1991. 1
[9] J. Roychowdhury, D. Long, and P. Feldmann. Cyclostationary noise analysis of large RF circuits with
multitone excitations. IEEE J. Solid-State Ckts., 33(2):324–336, Mar 1998. 2.4
ICCAD2000, Pages 290-295
Modeling and Analysis of Communication Circuit Performance
using Markov Chains and Efficient Graph Representations
Alper Demir, Peter Feldmann
Bell Laboratories, Murray Hill, New Jersey, USA
Abstract
In high-speed data networks, the bit-error-rate specification on the system can be very
stringent, i.e., 10-14 . At such error rates, it is not feasible to evaluate the performance of a
design using straightforward, simulation based, approaches. Nevertheless performance
prediction before actual hardware is built is essential for the design process.
This work introduces a stochastic model and an analysis-based, non-Monte-Carlo method
for performance evaluation of digital data communication circuits. The analyzed circuit is
modeled by a number of interacting finite state machines with inputs described as
functions on a Markov chain state-space. The composition of these elements results in a
typically very large Markov chain. System performance measures, such as probability of
bit errors and rate of synchronization loss, can be evaluated by solving linear problems
involving the large Markov chain's transition probability matrix. This paper first
describes a dedicated multi-grid method used to solve these very large linear problems.
The principal bottleneck in such an approach is the size of the Markov chain state-space,
which grows exponentially with system complexity. The second part of this paper
introduces a novel, graph based, data structure capable of efficiently storing and
manipulating transition probability matrices for several million state Markov chains. The
methods are illustrated on a real industrial clock-recovery circuit design.
References
[1] J. Sonntag and R. Leonowich. A monolithic CMOS 10 MHz DPLL for burst-mode data retiming. In
IEEE International Solid-State Circuits Conference, 1990.
[2] P. Larsson. A 2-1600 MHz 1.2-2.5V CMOS clock-recovery PLL with feedback phase-selection and
averaging phase-interpolation for jitter reduction. In IEEE International Solid-State Circuits Conference,
1999.
[3] J.G. Kemeny and J.L. Snell. Finite Markov Chains. Springer-Verlag, 1976.
[4] W.J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton University Press,
1994.
[5] G. Horton and S. Leutenegger. A multi-level solution algorithm for steady-state Markov chains. ACM
Performance Evaluation Review, 22:191–200, 1996.
[6] A. Demir and P. Feldmann. Stochastic modeling and performance evaluation for digital clock and data
recovery circuits. In DATE 2000, 2000.
[7] B. Plateau. On the stochastic structure of parallelism and synchronization models for distributed
algorithms. Performance Evaluation Review, August 1985.
[8] P. Buchholz. An adaptive aggregation/disaggregation algorithm for hierarchical Markovian models.
European Journal of Operational Research, 116(3):85–104, 1999.
[9] M. Bozga, O. Maler. On the Representation of Probabilities over Structured Domains. In Proc. CAV’99.
Springer, 1999.
ICCAD2000, Pages 296-302
Pipeline Optimization for Asynchronous Circuits:
Complexity Analysis and an Efficient Optimal Algorithm
Sangyun Kim, Peter A. Beerel
Department of Electrical Engineering – Systems, University of Southern California
Los Angeles, CA 90089-2562
Abstract
This paper addresses the problem of identifying the minimal pipelining needed in an
asynchronous circuit (e.g., number/size of pipeline stages/latches required) to satisfy a
given performance constraint, thereby implicitly minimizing area and power for a given
performance. In contrast to the somewhat analogous problem of retiming in the
synchronous domain, we first show that the basic pipeline optimization problem for
asynchronous circuits is NP-complete. This paper then presents an efficient branch and
bound algorithm that can find the optimal pipeline configuration for moderately-sized
problems. Our experimental results on a few scalable system models demonstrate that our
novel branch and bound solver can find the optimal pipeline configuration for models
that have up to 235 possible pipeline configurations.
References
[1] V. Akella and G. Gopalakrishnan. SHILPA: A high-level synthesis system for self-timed circuits. In
Proc. International Conf. Computer-Aided Design (ICCAD), pages 587–591. IEEE Computer Society
Press, Nov. 1992.
[2] B. Bachman, H. Zheng, and C. J. Myers. Architectural synthesis of timed asynchronous systems. In
Proc. International Conf. Computer Design (ICCD). IEEE Computer Society Press, Oct. 1999.
[3] R. M. Badia and J. Cortadella. High-level synthesis of asynchronous systems: Scheduling and process
synchronization. In Proc. European Conference on Design Automation (EDAC), pages 70–74. IEEE
Computer Society Press, Feb. 1993.
[4] M. Benes, S. M. Nowick, and A. Wolfe. A fast asynchronous Huffman decoder for compressed-code
embedded processors. In Proc. International Symposium on Advanced Research in Asynchronous Circuits
and Systems, pages 43–56, 1998.
[5] S. M. Burns. PerformanceAnalysis and Optimization of Asynchronous Circuits. PhD thesis, California
Institute of Technology, 1991.
[6] W.-C. Chou, P. A. Beerel, and K. Y. Yun. Average-case technology mapping of asynchronous burstmode circuits. IEEE Transactions on Computer-Aided Design, 18(10):1418–1434, Oct. 1999.
[7] U. Cummings, A. Lines, and A. Martin. An asynchronous pipelined lattice structure filter. In Proc.
International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 126–133,
Nov. 1994.
[8] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman and Company, 1979.
[9] G. D. Hachtel and F. Somenzi. Logic Synthesis and Verification Algorithms. Kluwer Academic
Publishers, 1996.
[10] H. Hulgaard. Timing Analysis and Verification of Timed Asynchronous Circuits. PhD thesis,
Department of Computer Science, University of Washington, 1995.
[11] C. E. Leiserson and J. B. Saxe. Optimizing synchronous systems. Journal of VLSI Computer System,
1:41–67, 1983.
[12] A. J. Martin, A. Lines, R. Manohar, M. Nystroem, P. Penzes, R. Southworth, and U. Cummings. The
design of an asynchronousMIPS R3000microprocessor. In AdvancedResearch in VLSI, pages 164–181,
Sept. 1997.
[13] C. V. Ramamoorthy and G. S. Ho. Performance evaluation of asynchronous concurrent systems using
Petri nets. IEEE Transactions on Software Engineering, 6(5):440–449, September 1980.
[14] N. Shenoy. Retiming: Theory and practice. Integration, the VLSI journal, 22:1–21, 1997.
[15] F. Somenzi. Private Communications, 1999. F. Somenzi is a professor of computer science at the
University of Colorado.
[16] J. Sparsø and J. Staunstrup. Delay-insensitive multi-ring structures. Integration, the VLSI journal,
15(3):313–340, Oct. 1993.
[17] T. E. Williams. Self-Timed Rings and their Application to Division. PhD thesis, Stanford University,
June 1991.
[18] A. Xie, S. Kim, and P. A. Beerel. Bounding average time separations of events in stochastic timed
Petri nets with choice. In Proc. International Symposium on Advanced Research in AsynchronousCircuits
and Systems, pages 94–107, Apr. 1999.
ICCAD2000, Pages 303-310
Achieving Fast and Exact Hazard-Free Logic Minimization of
Extended Burst-Mode gC Finite State Machines
Hans Jacobson
Department of Computer Science, University of Utah
Chris Myers
Department of Electrical Engineering, University of Utah
Ganesh Gopalakrishnan
Department of Computer Science, University of Utah
Abstract
This paper presents a new approach to two-level hazard-free logic minimization in the
context of extended burst-mode finite state machine synthesis targeting generalized Celements (gC). No currently available minimizers for literal-exact two-level hazard-free
logic minimization of extended burst-mode gC controllers can handle large circuits
without synthesis times ranging up over thousands of seconds. Even existing heuristic
approaches take too much time when iterative exploration over a large design space is
required and do not yield minimum results. The logic minimization approach presented in
this paper is based on state graph exploration in conjunction with single-cube cover
algorithms, an approach that has not been considered for minimization of extended burstmode finite state machines previously. Our algorithm achieves very fast logic
minimization by introducing compacted state graphs and cover tables and an efficient
single-cube cover algorithm for single-output minimization. Our exact logic minimizer
finds minimal number of literal solutions to all currently available benchmarks, in less
than one second on a 333 MHz microprocessor - more than three orders of magnitude
faster than existing literal exact methods, and over an order of magnitude faster than
existing heuristic methods for the largest benchmarks. This includes a benchmark that has
never been possible to solve exactly in number of literals before.
References
[1] Bill Coates, Al Davis, and Ken Stevens, “The Post Office experience: Designing a large asynchronous
chip,” Integration, the VLSI journal, vol. 15, no. 3, pp. 341–366, Oct. 1993.
[2] Steven M. Nowick, Kenneth Y. Yun, and David L. Dill, “Practical asynchronous controller design,” in
Proc. International Conf. Computer Design (ICCD). Oct. 1992, pp. 341–345, IEEE Computer Society
Press.
[3] Alan Marshall, Bill Coates, and Polly Siegel, “Designing an asynchronous communications chip,” IEEE
Design & Test of Computers, vol. 11, no. 2, pp. 8–21, 1994.
[4] Kenneth Y. Yun, Peter A. Beerel, Vida Vakilotojar, Ayoob E. Dooply, and Julio Arceo, “The design
and verification of a high-performance lowcontrol-overhead asynchronous differential equation solver,”
IEEE Transactions on VLSI Systems, vol. 6, no. 4, pp. 643–655, Dec. 1998.
[5] Hans M. Jacobson and Ganesh Gopalakrishnan, “Application-specific programmable control for highperformance asynchronous circuits,” Proceedings of the IEEE, vol. 87, no. 2, pp. 319–331, Feb. 1999.
[6] Prabhakar Kudva, Ganesh Gopalakrishnan, and Hans Jacobson, “A technique for synthesizing
distributed burst-mode circuits,” in Proc. ACM/IEEE Design Automation Conference, 1996.
[7] Robert M. Fuhrer, Sequential Optimization of Asynchronous and Synchronous Finite-State Machines,
Ph.D. thesis, Department of Computer Science, Columbia University, 1999.
[8] Michael Theobald and Steven M. Nowick, “Fast heuristic and exact algorithms for two-level hazardfree logic minimization,” IEEE Transactions on Computer-Aided Design, vol. 17, no. 11, pp. 1130–1147,
Nov. 1998.
[9] J. Cortadella, M. Kishinevsky, A.Kondratyev, L. Lavagno, and A. Yakovlev, “Petrify: a tool for
manipulating concurrent specifications and synthesis of asynchronous controllers,” IEICE Transactions on
Information and Systems, vol. E80-D, no. 3, pp. 315–325, Mar. 1997.
[10] Chantal Ykman-Couvreur, Bill Lin, and Hugo de Man, “Assassin: A synthesis system for
asynchronous control circuits,” Tech. Rep., IMEC, Sept. 1994, User and Tutorial manual.
[11] Chris J. Myers, Computer-Aided Synthesis and Verification of Gate-Level Timed Circuits, Ph.D. thesis,
Dept. of Elec. Eng., Stanford University, Oct. 1995.
[12] Kenneth Y. Yun and David L. Dill, “Automatic synthesis of extended burst-mode circuits: Part I
(specification and hazard-free implementation),” IEEE Transactions on Computer-Aided Design, vol. 18,
no. 2, pp. 101–117, Feb. 1999.
[13] Kuan-Jen Lin, Chi-Wen Kuo, and Chen-Shang Lin, “Synthesis of hazard-free asynchronous circuits
based on characteristic graph,” IEEE Transactions on Computers, vol. 46, no. 11, pp. 1246–1263, Nov.
1997.
[14] Enric Pastor, Jordi Cortadella, Alex Kondratyev, and Oriol Roig, “Structural methods for the synthesis
of speed-independent circuits,” IEEE Transactions on Computer-Aided Design, vol. 17, no. 11, pp. 1108–
1129, Nov. 1998.
[15] Sung Tae Jung and Chris J. Myers, “Direct synthesis of timed asynchronous circuits,” in Proc.
International Conf. Computer-Aided Design (ICCAD), Nov. 1999, pp. 332–337.
[16] Robert M. Fuhrer and Steven M. Nowick, “OPTIMISTA: State minimization of asynchronous FSMs
for optimum output logic,” in Proc. International Conf. Computer-Aided Design (ICCAD), 1999.
[17] Shai Rotem, Ken Stevens, Ran Ginosar, Peter Beerel, Chris Myers, Kenneth Yun, Rakefet Kol,
Charles Dike, Marly Roncken, and Boris Agapiev, “RAPPID: An asynchronous instruction length
decoder,” in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems,
Apr. 1999, pp. 60–70.
[18] Kenneth Y. Yun, “Automatic synthesis of extended burst-mode circuits using generalized Celements,” in Proc. European Design Automation Conference (EURO-DAC), Sept. 1996, pp. 290–295.
[19] Kenneth Yi Yun, Synthesis of Asynchronous Controllers for Heterogeneous Systems, Ph.D. thesis,
Stanford University, Aug. 1994.
[20] K. W. James and K. Y. Yun, “Average-case optimized transistor-level technology mapping of
extended burst-mode circuits,” in Proc. International Symposium on Advanced Research in Asynchronous
Circuits and Systems, 1998, pp. 70–79.
[21] P. Beerel and T.H.-Y. Meng, “Automatic gate-level synthesis of speed-independent circuits,” in Proc.
International Conf. Computer-Aided Design (ICCAD). Nov. 1992, pp. 581–587, IEEE Computer Society
Press.
[22] P. A. Beerel, C. J. Myers, and T. H.-Y. Meng, “Covering conditions and algorithms for the synthesis
of speed-independent circuits,” IEEE Transactions on Computer-Aided Design, Mar. 1998.
[23] Alex Kondratyev, Michael Kishinevsky, Bill Lin, Peter Vanbekbergen, and Alex Yakovlev, “Basic
gate implementation of speed-independent circuits,” in Proc. ACM/IEEE Design Automation Conference,
June 1994, pp. 56–62.
[24] Alex Kondratyev, Michael Kishinevsky, and Alex Yakovlev, “Hazard-free implementation of speedindependent circuits,” IEEE Transactions on Computer-Aided Design, vol. 17, no. 9, pp. 749–771, Sept.
1998.
[25] Kenneth Y. Yun and David L. Dill, “Automatic synthesis of extended burst-mode circuits: Part II
(automatic synthesis),” IEEE Transactions on Computer-Aided Design, vol. 18, no. 2, pp. 118–132, Feb.
1999.
[26] Steven M. Nowick, Mark E. Dean, David L. Dill, and Mark Horowitz, “The design of a highperformance cache controller: a case study in asynchronous synthesis,” Integration, the VLSI journal, vol.
15, no. 3, pp. 241–262, Oct. 1993.
[27] Joep Kessels, Kees van Berkel, Ronan Burgess, Marly Roncken, and Frits Schalij, “An error decoder
for the compact disc player as an example of VLSI programming,” Tech. Rep., Philips Research
Laboratories, Eindhoven, The Netherlands, 1992.
[28] Kenneth Y. Yun and David L. Dill, “Automatic synthesis of 3D asynchronous state machines,” in
Proc. International Conf. Computer-Aided Design (ICCAD). Nov. 1992, pp. 576–580, IEEE Computer
Society Press.
[29] P R Panda and N Dutt, “1995 high level synthesis design repository,” Tech. Rep. 95-04, University of
California, Irvine, U.S.A., 1995.
[30] Prabhakar Kudva, Synthesis of Asynchronous Systems Targeting Finite State Machines, Ph.D. thesis,
Computer Science Department, University of Utah, 1995.
[31] Prabhakar Kudva, Ganesh Gopalakrishnan, Hans Jacobson, and Steven M. Nowick, “Synthesis of
hazard-free customized CMOS complex-gate networks under multiple-input changes,” in Proc. ACM/IEEE
Design Automation Conference, 1996.
ICCAD2000, Pages 312-317
Bus Optimization for Low-Power Data Path Synthesis Based on Network Flow
Method
Sungpack Hong and Taewhan Kim
Department of Electrical Engineering & Computer Science
and Advanced Information Technology Research Center (AITrc)
KAIST, Taejon, 305-701 KOREA
Abstract
Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated
on the buses, causing an increased attention on savings for power at the behavioral level
and RT level of design. This paper addresses the problem of minimizing power dissipated
in switching of the buses in data path synthesis. Unlike the previous approaches in which
minimization of the power consumed in buses has not been considered until operation
scheduling is completed, our approach integrates the bus binding problem into scheduling
to exploit the impact of scheduling on reduction of power dissipated on the buses more
fully and effectively. We accomplish this by formulating the problem into a flow problem
in a network, and devising an efficient algorithm which iteratively finds maximum flow
of minimum cost solutions in the network. Experimental results on a number of
benchmark problems show that given resource and global timing constraints our designs
are 22% power-efficient over the designs produced by a random-move based solution,
and 18% power-efficient over the designs by a clock-step based optimal solution.
References
[1] A. P. Chandrakasan and R. W. Broderson, Low Power Digital CMOS Design, Kluwer Academic
Publishers, 1995.
[2] M. R. Stan and W. P. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI, Vol.3,
No.1, 1995.
[3] H. B. Bakoglu, “Circuits, Interconnection and Packaging for VLSI,” AddissonWesley, 1990.
[4] C. Svenson and D. Liu, “A Power Estimation Tool and Prospects of Power Savings in CMOS VLSI
Chips,” ISLPED, 1994.
[5] J.-M. Chang and M. Pedram, “Register Allocation and Binding for Low Power,” DAC, 1995.
[6] J.-M. Chang and M. Pedram, “Module Assignment for Low Power,” EDAC, 1996.
[7] A. Raghunathan and N. K. Jha, “SCALP: An Iterative Improvement Based Low-Power Data Path
Synthesis System,” IEEE Trans. CAD, Vol.16, No.11, 1997.
[8] P. R. Panda and N. D. Dutt, “Low-Power Memory Mapping Through Reducing Address Bus Activity,”
IEEE Trans. VLSI, Vol.7, No.3, 1999.
[9] S. Ramprasad, N. R. Shanbhag, and I. Hajj, “A Coding Framework for Low-Power Address and Data
Busses,” IEEE Trans. VLSI, Vol.7, No.2, 1999.
[10] A. Dasgupta and R. Karri, “Simultaneous Scheduling and Binding for Power Minimization During
Microarchitecture Synthesis,” ISLPED, 1995.
[11] A. Dasgupta and R. Karri, “High-Reliability, Low-Energy Microarchitecture Synthesis,” IEEE Trans.
CAD, Vol.17, No.12, 1998.
[12] F. N. Najm, “Transition Density, A Stochastic Measure of Activity in Digital Circuits,” DAC, 1991.
[13] R. E. Tarjan, Data Structures and Network Algorithms, Society for Industrial and Applied
Mathematics, 1983.
ICCAD2000, Pages 318-321
Coupling-Driven Signal Encoding Scheme for Low-Power Interface Design
Ki-Wook Kim, Kwang-Hyun Baek, Naresh Shanbhag, C. L. Liu† and Sung-Mo Kang
Coordinated Science Laboratory, Univ. of Illinois at Urbana-Champaign, USA
† Dept. of Computer Science, National Tsing Hua University, Taiwan
Abstract
Coupling effects between on-chip interconnects must be addressed in ultra deep
submicron VLSI and system-on-a-chip (SoC) designs. A new low-power bus encoding
scheme is proposed to minimize coupled switchings which dominate the on-chip bus
power consumption. The coupling-driven bus invert method use slim encoder and
decoder architecture to minimize the hardware overhead. Experimental results indicate
that our encoding methods save effective switchings as much as 30% in an 8-bit bus with
one-cycle redundancy.
References
[1] L. Benini, A. Macii, E. Macii, M. Poncino, and R. Scarsi. Synthesis of low-overhead interfaces for
power-efficient communication over wide buses. In Proc. ACM/IEEE Design Automation Conf., pages
128–133, 1999.
[2] F. Catthoor, F. Franssen, S.Wuytack, L. Nachtergaele, and H. D. Man. Global communication and
memory optimizing transformations for low power signal processing systems. In VLSI Signal Processing
VII, pages 178–187, 1994.
[3] J. Cong. An Interconnect-centric design flow for nanometer technologies. In Int. Symp. VLSI
Technology, Systems, and Applications, pages 54–57, June 1999.
[4] W. Fornaciari, D. Sciuto, and C. Silvano. Power estimation for architectural exploration of HW/SW
communication on system-level buses. In Int. Workshop on Hardware/Software Codesign, pages 152–156,
1999.
[5] R. Hegde and N. R. Shanbhag. Energy-efficiency in presence of deep submicron noise. In Proc.
IEEE/ACM Int. Conf. Computer Aided Design, pages 228–234, 1998.
[6] S. M. Kang and Y. Leblebici. CMOS Digital Integrated Circuits: Analysis and Design. McGraw-Hill,
2nd edition, 1998.
[7] L. Benini, G. De Micheli, E. Macii, D. Sciuto and C. Silvano. Asymptotic zero-transition activity
encoding for address busses in low-power microprocessor-based systems. In Proc. the Great Lakes Symp.
VLSI, pages 77–82, 1997.
[8] S. Landman and R. L. Russo. On a pin versus block relationship for partitions of logic paths. IEEE
Trans. Computers, C-20:1469–1479, 1971.
[9] H. Mehta, R. M. Owens, and M. J. Irwin. Some issues in Gray code addressing. In Proc. the Great
Lakes Symp. VLSI, pages 178–180, Mar. 1996.
[10] E. Musoll, T. Lang, and J. Cortadella. Working-zone encoding for reducing the energy in
microprocessor address buses. IEEE Trans. on VLSI Systems, 6(4):568–572, Dec. 1998.
[11] P. R. Panda and N. D. Dutt. Reducing address bus transitions for low power memory mapping. In
Proc. European Design and Test Conf., pages 63–37, Mar. 1996.
[12] S. Ramprasad, N. R. Shanbhag, and I. N. Hajj. Information-theoretic bounds on average signal
transition activity. IEEE Trans. on VLSI, Sept. 1999.
[13] Semiconductor Industry Association. International technology roadmap for semiconductors.
http://notes.sematech.org/1999 SIA Roadmap/Home.htm, 1999.
[14] M. R. Stan and W. P. Burleson. Bus-invert coding for low-power I/O. IEEE Trans. on VLSI Systems,
pages 49–58, Mar. 1995.
[15] M. R. Stan andW. P. Burleson. Two-dimensional codes for low-power. In International Symposium on
Low-Power Electronics and Design, pages 335–340, Aug. 1996.
[16] C. L. Su, C. Y. Tsui, and A. M. Despain. Saving power in the control path of embedded processors.
IEEE Design and Test of Computers, 11(4):24–30, 1994.
[17] Y. Zhang, W. Ye, and M. J. Irwin. An alternative architecture for onchip global interconnect:
segmented bus power modeling. In Asilomar Conf. on Signals, Systems, and Computers, pages 1062–1065,
1998.
ICCAD2000, Pages 322-327
Bus Energy Minimization by Transition Pattern Coding (TPC) in
Deep Sub-Micron Technologies
Paul P. Sotiriadis, Anantha Chandrakasan
Department of EECS, Massachusetts Institute of Technology
Cambridge, MA 02139
Abstract
The energy dissipation associated with driving long wires accounts for a significant
fraction of the overall system energy. This is particularly the case with the increasing
importance of the inter-wire parasitic capacitance in deep sub-micron technology. A
closed form solution for estimating the energy dissipation of a data bus is presented that
uses an elaborate parasitic wire model. This includes the distributed RLC effects of wires
as well as the coupling between wires. We also propose a general class of coding
techniques to reduce energy dissipation for data transmission by trading-off between
computation and communication costs. An algorithm is presented to design efficient
coding strategies to minimize energy. When the effects of interwire capacitance are taken
into account, the best coding strategy is not to simply minimize transitions - an approach
followed by previous research. Instead, Transition Pattern Coding (TPC) modifies the
transition profile to minimize energy, and in many cases higher transition activity can
result in lower energy. Results show that up to a factor of 2 reduction in energy.
References
[1] H. Zhang, J. Rabaey, “Low swing interconnect interface circuits,” IEEE/ACM ISLPED, pp. 161-166,
August 1998.
[2] K.Y. Khoo, A. Willson, Jr., “Charge recovery on a databus,” IEEE/ACM International Symposium on
Low Power Electronics and Design, pp. 185-189, 1995.
[3] S. Ramprasad, N. Shanbhag, I. Hajj, “A coding framework for low-power address and data busses,”
IEEE Transactions on VLSI Systems, pp. 212-221, Vol. 7, No. 2, June 1999.
[4] M. Stan, W. Burleson, “Low-power encodings for global communication in cmos VLSI,” IEEE
Transactions on VLSI Systems, pp. 49-58, Vol. 5, No. 4, Dec. 1997.
[5] P. Sotiriadis, A. Wang, A. Chandrakasan, “Transition Pattern Coding: An approach to reduce Energy in
Interconnect”, ESSCIRC’ 2000, Stockholm, Sweden, Sept. 2000.
[6] Frank Harary, “Structural models: an introduction to the theory of directed graphs”, New York, Wiley
1965.
[7] Robert G. Gallager, “Discrete Stochastic Processes”, Kluwer Academic Publishers 1998.
[8] Roger A. Horn and Charles R. Johnson, “Matrix Analysis”, Cambridge University Press 1994.
ICCAD2000, Page 329
Panel: Why Doesn't EDA Get Enough Respect?
Moderator: A. Richard Newton – Univ. of California, Berkeley, CA
The success of EDA is a critical ingredient to the success of the semiconductor industry.
But while semiconductor companies will spend billions on fabs there is a perception that
funding for EDA is much less important. To be successful in EDA requires the brightest
and broadest engineers as compared to any other industries, yet EDA company market
caps and salaries do not seem to reflect this. What will it take for EDA to get more
respect?
ICCAD2000, Pages 331-337
Switching Window Computation for Static Timing Analysis in Presence of
Crosstalk Noise
Pinhong Chen
Dept. of EECS
U. C. Berkeley
Berkeley, CA 94720, USA
Desmond A. Kirkpatrick
Intel Corp.
Microprocessor Products Group
Hillsboro, OR 97124, USA
Kurt Keutzer
Dept. of EECS
U. C. Berkeley
Berkeley, CA 94720, USA
Abstract
Crosstalk effect is crucial for timing analysis in very deep submicron design. In this
paper, we present and compare multiple scheduling algorithms to compute switching
windows for static timing analysis in presence of crosstalk noise. We also introduce an
efficient technique to evaluate the worst case alignment of multiple aggressors.
References
[1] G. Yee, R. Chandra, V. Ganesan, and C. Sechen. “Wire Delay in the Presence of Crosstalk”. In
IEEE/ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems,
pages 170–175, 1997.
[2] P. D. Gross, R. Arunachalam K. Rajagopal, and L. T. Pileggi. “Determination of Worst-Case Aggressor
Alignment for Delay Calculation”. In Proc. of International Conferences on Computer Aided Design, pages
212–219, Nov. 1998.
[3] T. Sasao(ed). “Logic Synthesis and Optimization, Ch.8: DelayModels and Exact Timing Analysis”.
Kluwer Academic Publishers, 1993.
[4] S. Devadas, K. Keutzer, S. Malik, and A. Wang. “Certified Timing Verification and the Transition
Delay of a Logic Circuit”. IEEE Trans. on Very Large Scale Integration Systems, 2:333–342, Sep. 1994.
[5] S. Devadas, K. Keutzer, and S. Malik. “Computation of floating mode delay in combinational networks:
Theory and algorithms”. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
12:1913–1923, Dec. 1993.
[6] K. L. Shepard. “Design Methodologies for Noise in Digital Integrated Circuits”. In Design Automation
Conference, pages 94–99, 1998.
[7] D. A. Kirkpatrick. “The Implication of Deep Sub-micron Technology on the Design of High
Performance Digital VLSI System”. PhD thesis, CAD Group Ph.D. Dissertation, U.C. Berkeley, 1997.
[8] P. Chen and K. Keutzer. “Towards True Crosstalk Noise Analysis”. In Proc. of International
Conferences on Computer Aided Design, pages 132–137, Nov. 1999.
[9] A. B. Kahng, S. Muddu, and E. Sarto. “On Switch Factor Based Analysis of Coupled RC
Interconnects”. In Design Automation Conference, pages 79–84, 2000.
[10] P. Chen, D. A. Kirkpatrick, and K. Keutzer. “Miller Factor for Gate-Level Coupling Delay
Calculation”. In Proc. of International Conferences on Computer Aided Design, 2000.
[11] B. Franzini, C. Forzan, D. Pandini, P. Scandolara, and A. D. Fabbro. “Crosstalk Aware Static Timing
Analysis: a Two Step Approach”. In IEEE of 1st International Symposium on Quality Electronic Design,
pages 499–503, Mar. 2000.
[12] P. F. Tehrani, S. W. Chyou, and U. Ekambaram. “Deep Sub-Micron Static Timing Analysis in
Presence of Crosstalk”. In IEEE of 1st International Symposium on Quality Electronic Design, pages 505–
512, Mar. 2000.
[13] R. Arunachalam, K. Rajagopal, and L. T. Pileggi. “TACO: Timing AnalysisWith COpling”. In Design
Automation Conference, pages 266–269, 2000.
[14] J. Qian, S. Pullela, and L. T. Pillage. “Modeling the “Effective Capacitance” for the RC Interconnect
of CMOS gates”. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 13:1526–
1535, Dec. 1994.
[15] K. L. Shepard, V.Narayanan, P.C. Elmendor, and Gutuan Zheng. “Global Harmony: Coupled Noise
Analysis for Full-Chip RC Interconnect Network”. In Proc. of International Conference on Computer
Aided Design, pages 139–146, 1997.
ICCAD2000, Pages 338-343
Slope Propagation in Static Timing Analysis
David Blaauw, Vladimir Zolotov, Savithri Sundareswaran*, Chanhee Oh
and Rajendran Panda
Motorola Inc. Austin, TX,
*Motorola India Electronics Ltd., Bangalore, India
Abstract
Static timing analysis has traditionally used the PERT method for identifying the critical
path of a digital circuit. Due to the influence of the slope of a signal at a particular node
on the subsequent path delay, an earlier signal with a signal slope greater than the slope
of the later signal may result in a greater delay. Therefore, the traditional method for
timing analysis may identify the incorrect critical path and report an optimistic delay for
the circuit. We show that the circuit delay calculated using the traditional method is a
discontinuous function with respect to transistor and gate sizes, posing a severe problem
for circuit optimization methods. We propose a new timing analysis algorithm which
resolves both these issues. The proposed algorithm selectively propagates multiple
signals through each timing edge in cases where there exists ambiguity regarding which
arriving signal represents the critical path. The algorithm for propagating the
corresponding required times is also presented. We prove that the proposed algorithm
identifies a circuit's true critical path, where the traditional timing analysis method may
not. We also show that under this method circuit delay and node slack are continuous
functions with respect to a circuit's transistor and gate sizes. In addition, we present a
heuristic method which reduces the number of signals to be propagated at the expense of
a slight loss in accuracy. Finally, we show how the proposed algorithm was efficiently
implemented in an industrial static timing analysis and optimization tool, and present
results for a number of industrial circuits. Our results show that the traditional timing
analysis method underestimates the circuit delay by as much as 38%, while that the
proposed method efficiently finds the correct circuit delay with only a slight increase in
run time.
References
[1] Ayman I. Kayssi, Karem A. Sakallah, Trevor N.Mudge The Impact of Signal Transition Time on Path
Delay Computation, IEEE Transactions on circuits and systems-II: Analog and digital signal processing,
Vol. 40, No. 5, May 1993
[2] Chandu Visweswariah, Andrew R.Conn, Formulation of Static Circuit Optimization with Reduced Size,
Degeneracy and Redundancy by Timing Graph Manipulation, Proc. IEEE/ACM ICCAD, 1999, pp.244251.
[3] Gill, P.E., Murray, W. and Wright, M.H., Practical Optimization, Academic Press, New York, 1983.
[4] Hitchcock, R.B. Timing verification and the Timing Analysis program, Proc., IEEE/ACM DAC, 1982,
pp.594-604
[5] Jouppi, N.P. Timing analysis for nMOS VLSI, IEEE/ACM Design Automation Conf., 1983, pp. 411418
[6] J.P.Fishburn, A.Dunlop, “TILOS: A posynomial programming approach to transistor sizing”, ICCAD,
Nov 1985
[7] S.Devadas, K.Keutzer, S.Malik, “Computation of Floating Mode Delay in Combinational Circuit:
Theory and Algorithms”, IEEE Trans. on Computer Aided Design, Dec 1993.
[8] Y.Kukimoto, W.Gosti, A.Saldanha, R.Brayton, “Approximate Timing Analysis of Combinatorial
Circuits under XBD0 Model”, ICCAD, 1997, pp. 176-181
[9] H.Yalcin, J.P.Hayes, “Event propagation conditions in circuit delay computation”, ACM Transactions
on Design Automation of Electronic Systems, July 1997
ICCAD2000, Pages 344-348
Transistor-Level Timing Analysis Using Embedded Simulation
Pawan Kulshreshtha, Robert Palermo, Mohammad Mortazavi,
Cyrus Bamji*, Hakan Yalcin
Cadence Design Systems Inc., San Jose, CA 95134, USA
*Canesta Inc., Santa Clara, CA 95054, USA
Abstract
A high accuracy system for transistor-level static timing analysis is presented. Accurate
static timing verification requires that individual gate and interconnect delays be
accurately calculated. At the sub-micron level, calculating gate and interconnect delays
using delay models can result in reduced accuracy. Instead, the proposed method
calculates delays through numerical integration using an embedded circuit simulator. It
takes into account short circuit current and carefully chooses the set of conditions that
results in a tight upper bound of the worst case delay for each gate. Similar repeating
transistor configurations of gates in the circuit are automatically identified and a novel
interpolation based caching scheme quickly computes gate delays from the delays of
similar gates. A tight object code level integration with a commercial high speed
transistor-level circuit simulator allows efficient invocation of the simulation.
References
[1] J. Ousterhout, "A Switch-Level Timing Verifier for Digital MOS VLSI", IEEE Trans. on CAD, July
1985, pp. 336-349.
[2] A. Hirata, H. Onodera, K. Tamaru, “Proposal of a Timing Model for CMOS Logic Gates Driving a
CRC π Load”, Proc. of ICCAD, November 1998, pp. 537-544.
[3] B. Ackland, R. Clark, "Event-EMU: An Event Driven Timing Simulator for MOS VLSI Circuits", Proc.
of ICCAD, Nov. 1989, pp. 80-83.
[4] V. Rao, J. Soreff, T. Brodnax, R. Mains, “EinsTLT: Transistor Level Timing with EinsTimer”, Proc. of
Int. Workshop on Timing Issues in the Spec. and Syn. of Digital Systems (TAU), 1999, pp. 1-6.
[5] M. Ohlrich, C. Ebeling, E. Ginting, L. Sather, "SubGemini: Identifiying Subcircuits using a Fast
Subgraph Isomorhism Algorithm", Proc. of IEEE/ACM DAC, 1993, pp. 31-37.
[6] J. Cherry, "A CMOS Timing Analyzer", Proc. of IEEE/ACM DAC, 1988, pp. 148-153.
[7] M. L. Yu, B. Ackland, "VLSI Timing Simulation with Selective Dynamic Regionization", 1994 ACM
0-89791-690-5/94/0011, pp. 195-199.
[8] G. Strang, "Introduction to Applied Mathematics", Wellesley-Cambridge Press., 1986.
[9] F. Dartu, L.T. Pileggi, “TETA: Transistor-Level Engine for Timing Analysis”, Proc.of IEEE/ACM
DAC, 1998, pp. 595-598.
ICCAD2000, Pages 350-356
Latency Effects of System Level Power Management Algorithms
Dinesh Ramanathan, Sandy Irani, Rajesh Gupta
Department of Information and Computer Science, University of California, Irvine, CA 92697
Abstract
A power management algorithm for an embedded system reduces system level power
dissipation by shutting off parts of the system when they are not being used and turning
them back on when they are required. Algorithms for this problem are online in nature
since they must operate without knowledge of the arrival time or service requirements of
future requests. In this paper, we present online algorithms to manage power for
embedded systems. We perform an empirical analysis of these algorithms and give
theoretical justification for the empirical results. Effective power management strategies
have an adverse impact on the latency of the system for which the strategy is designed.
Typically, the more aggressive the power management scheme, the greater the increase in
the latency of the system. In this paper, we prove an upper bound on the additional
latency of the system introduced by power management strategies. Moreover, we show
that this upper bound occurs each time the system is shutdown and hence is an important
system design parameter.
In addition, service time and latencies have an effect on power management strategies
since they alter the length and occurrences of idle periods which. We study this
phenomenon experimentally, by modeling the disk drive of a laptop computer as an
embedded system. The results show that if service times of arriving requests are modeled,
the relative performance of algorithms can change leading to non-adaptive algorithms
performing better than adaptive ones. We compare the performance of adaptive and nonadaptive power management algorithms. In particular, our experimental results show that
an "immediate" shutdown strategy that shuts down the system whenever it encounters an
idle period performs surprising better than sophisticated adaptive algorithms suggested in
the literature. We provide an analytical explanation for the effectiveness of power
management strategies.
References
[1] Chi-Hong Hwang, Allen C.-H. Wu. A Predictive System Shutdown Method For Energy Saving of
Event-Driven Computation. IEEE/ACM International Conference on Computer Aided Design, Nov 1997,
pages 28-32.
[2] G. A. Paleologo, L. Benini, A. Bogliolo, G. De Micheli. Policy Optimization for Dynamic Power
Management. Proc. of 35th Design Automation Conference, pp.182-187, June 1998
[3] M. B Srivastava, A. P. Chandrakasan, R. W. Broderson. Predictive Shutdown and Other Architectural
Techniques for Energy Efficient Programmable Computation. IEEE Trans. on VLSI Systems, vol. 4, no. 1,
pp.42-54, M arch 1996
[4] Karlin A. R., Manasse M.S., McGeoch L.A., Owicki S. Competitive Randomized Algorithms for
Nonuniform Problems. Algorithmica, vol. 11, no 6, pp 542-571, June 1994
[5] A. R. Karlin, M. S. Manasse, L. Rudolph, and D.D. Sleator. Competitive Snoopy Caching.
Algorithmica, 3(1):70-119, 1988.
[6] M. Pedram. Power Minimization in IC Design: Principles and Applications. ACM Trans. on Design
Automation of Electronic Systems, vol 1, no. 1, pages 3-56, January 1996
[7] S. Devadas, S. Malik. A Survey of Optimization Techniques Targeting Low Power VLSI Circuits. Proc.
of the 32nd Design Automation Conference, pages 242-247 , 1995
[8] F. N. Najm. A Survey of Power Estimation Techniques in VLSI Circuits. IEEE Trans. of VLSI
Systems, vol 2, no 4, pp 446-455, December 1994
[9] J. L. Peterson, A. Silberchatz. Operating Systems Concepts. 2nd Ed, pp.118-120, Addision-Wesley
Publishing Co. Inc.
[10] R. El-Yaniv, R. Kaniel, N. Linial. On the Equipment Rental Problem. Manuscript.
[11] D.D. Sleator, R.E. Tarjan. Self-adjusting binary search trees. Journal of the ACM, Vol. 32, No. 3,
pages 652-686, July 1985.
[12] D. Ramanathan and R. Gupta System Level Online Power Management Algorithms In Proceedings of
the Design Automation and Test in Europe Conference Paris, March 2000
[13] D. Ramanathan and R. Gupta On System Level Online Power Management Algorithms Technical
Report, University of California, Irvine, 2000.
[14] Auspex File Traces from the NOW project, available at http://now.cs.berkeley.edu/Xfs/
AuspexTraces/auspex.html (1993)
[15] L. Benini and G. De Micheli. Dynamic Power Management: Design Techniques and CAD Tools,
Kluwer, 1997
[16] Technical specifications of hard drive IBM Travelstar VP 2.5inch, available at
http://www.storage.ibm.com/storage/oem/data/travvp.htm (1996)
ICCAD2000, Pages 357-364
Power-conscious Joint Scheduling of Periodic Task Graphs and Aperiodic Tasks in
Distributed Real-time Embedded Systems
Jiong Luo and Niraj K. Jha
Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544
Abstract
In this paper , we present a power-conscious algorithm for jointly scheduling multi-rate
periodic task graphs and aperiodic tasks in distributed real-time embedded systems.
While the periodic task graphs have hard deadlines, the aperiodic tasks can have either
hard or softdeadlines. Periodic task graphs are first scheduled statically. Slots are created
in this static schedule to accommodate hard aperiodic tasks. Soft aperiodic tasks are
scheduled dynamically with an on-line scheduler. Flexibility is introduced into the static
schedule and optimized to allow the on-line scheduler to make dynamic modifications to
the static schedule. This helps minimize the response times of soft aperiodic tasks
through both resource reclaiming and slack stealing. Of course, the validity of the static
schedule is maintained. The on-line scheduler also employs dynamic voltage scaling and
power management to obtain a power-efficient schedule. Experimental results show that
the flexibility introduced into the static schedule helps improve the response times of soft
aperiodic tasks by up to 43%. Dynamic voltage scaling and power management reduce
power by up to 68%. The scheme in which the static schedule is allowed to be flexible
achieves up to 32% more power saving compared to the scheme in which no flexibility is
allowed, when both schemes are power-conscious. Our work gives an average
architecture price saving of 30% over a previous approach for embedded system
architectures synthesized with execution slots for hard aperiodic tasks present.
References
[1] W. H. Wolf, “Hardware-software co-design of embedded systems,” Proc. IEEE, vol. 82, pp. 967-989,
July 1994.
[2] C. Shen and K. Ramamritham, “Resource reclaiming in multiprocessor real-time systems,” IEEE Trans.
Parallel & Distributed Systems, vol. 4, no. 4, pp. 382-397, Apr. 1993.
[3] J. P. Lehoczky and S. Ramos-Thuel, “An optimal algorithm for scheduling soft-aperiodic tasks in fixedpriority preemptive systems,” in Proc. Real-time Systems Symp., pp. 110-123, Dec. 1992.
[4] J. Lee, S. Lee, and H. Kim, “Scheduling soft aperiodic tasks in adaptable fixed-priority systems,”
Operating Systems Review, pp. 17-28, Oct. 1996.
[5] R. P. Dick, D. L. Rhodes, and W. Wolf, “TGFF: Task graphs for free,” in Proc. Int. Workshop
Hardware/Software Codesign, pp. 97-101, Mar. 1998.
[6] J. P. Lehoczyky, L. Sha, and J. K. Strosnider, “Enhanced aperiodic responsiveness in hard real-time
environments,” in Proc. Real-time Systems Symp., pp. 261-270, Dec. 1987.
[7] B. P. Dave and N. K. Jha, “CASPER: Concurrent hardwaresoftware co-synthesis of hard real-time
aperiodic and periodic specifications of distributed embedded systems,” in Proc. Design Automation & Test
in Europe Conf., pp. 118-124, Feb. 1998.
[8] M. B. Srivastava, A. P. Chandrakasan, and R. W. Brodersen, “Predictive system shutdown and other
architectural techniques for energy efficient programmable computation,” IEEE Trans. VLSI Systems, vol.
4, no. 1, pp. 42-55, Mar. 1996.
[9] I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M. B. Srivastava, “Power optimization of variablevoltage core-based systems,” IEEE Trans. Computer-Aided Design, vol. 18, no. 12, pp. 1702-1714, Dec.
1999.
[10] C.-H. Hwang and A. C. Wu, “A predictive system shutdown method for energy saving of event-driven
computation,” in Proc. Int. Conf. Computer-Aided Design, pp. 28-32, Nov. 1997.
[11] L. Benini, G. Paleologo, A. Bogliolo, and G. De Micheli, “Policy optimization for dynamic power
management,” IEEE Trans. Computer-Aided Design, vol. 18, no. 6, pp. 813-833, June 1999.
[12] E. Y. Chung, L. Benini, and G. De Micheli, “Dynamic power management using adaptive learning
tree,” in Proc. Int. Conf. Computer-Aided Design, pp. 274-279, Nov. 1999.
[13] Q. Qiu and M. Pedram, “Dynamic power management based on continuous-time Markov decision
processes,” in Proc. Design Automation Conf., pp. 555-561, June 1999.
[14] W. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for reduced CPU energy,” in Proc.
USENIX Symp. Operating Systems Design & Implementation, pp. 13-23, Nov. 1994.
[15] Y. Shin and K. Choi, “Power conscious fixed priority scheduling for hard real-time systems,” in Proc.
Design Automation Conf., pp. 134-139, June 1999.
[16] T. Pering, T. Burd, and R. Brodersen, “The simulation and evaluation of dynamic voltage scaling
algorithms,” in Proc. Int. Symp. Low Power Electronics and Design, pp. 76-81, Aug. 1998.
[17] T. Burd and R. Brodersen, “Processor design for portable systems,” J. VLSI Signal Processing, vol.
13, pp. 203-222, Aug. 1996.
[18] R. P. Dick and N. K. Jha, “MOCSYN: Multiobjective corebased single-chip system synthesis,” in
Proc. Design Automation & Test in Europe Conf., pp. 263-270, Mar. 1999.
[19] E. L. Lawler and C. U. Martel, “Scheduling periodically occurring tasks on multiple processors,”
Information Processing Letters, vol. 7, pp. 9-12, Feb. 1981.
[20] http://www.arm.com/Pro+Peripherals/
[21] http://www.transmeta.com/
[22]http://www.chips.ibm.com:80/products/powerpc/chips/
[23] G. Fohler, “Joint scheduling of distributed complex periodic and hard aperiodic Tasks in statically
scheduled systems,” in Proc. Realtime Systems Symp., pp. 152-161, Dec. 1995.
[24] E. Y. Chung, L. Benini, A. Bogliolo, and G. De Micheli, “Dynamic power management for nonstationary service requests,” in Proc. Design Automation & Test in Europe Conf., pp. 77-81, Mar. 1999.
[25] J. K. Strosnider, J. P. Lehoczky, and L. Sha, "The deferrable server algorithm for enhanced aperiodic
responsiveness in hard realtime environments," IEEE Trans. Computers, vol. 44, no. 1, pp. 73-91, Jan.
1995.
ICCAD2000, Pages 365-368
Power Optimization of Real-Time Embedded Systems on Variable Speed
Processors
Youngsoo Shin*, Kiyoung Choi**, and Takayasu Sakurai*
*Center for Collaborative Research and Institute of Industrial Science,
University of Tokyo, Tokyo 106-8558, Japan
**School of Electrical Engineering and Computer Science,
Seoul National University, Seoul 151-742, Korea
Abstract
Power efficient design of real-time embedded systems based on programmable
processors becomes more important as system functionality is increasingly realized
through software. This paper presents a power optimization method for real-time
embedded applications on a variable speed processor. The method combines off-line and
on-line components. The off-line component determines the lowest possible maximum
processor speed while guaranteeing deadlines of all tasks. The on-line component
dynamically varies the processor speed or bring a processor into a power-down mode
according to the status of task set in order to exploit execution time variations and idle
intervals. Experimental results show that the proposed method obtains a significant power
reduction across several kinds of applications.
References
[1] C. Hwang and A. Wu, “A predictive system shutdown method for energy saving of event-driven
computation,” in Proc. Int’l Conf. on Computer Aided Design, Nov. 1997, pp. 28–32.
[2] M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for reduced CPU energy,” in Proc.
USENIX Symposium on Operating Systems Design and Implementation, 1994, pp. 13–23.
[3] F. Yao, A. Demers, and S. Shenker, “A scheduling model for reduced CPU energy,” in Proc. IEEE
Annual Foundations of Computer Science, 1995, pp. 374–382.
[4] Y. Shin and K. Choi, “Power conscious fixed priority scheduling for hard real-time systems,” in Proc.
Design Automat. Conf., June 1999, pp. 134–139.
[5] T. Pering, T. Burd, and R. Brodersen, “The simulation and evaluation of dynamic voltage scaling
algorithms,” in Proc. Int’l Symposium on Low Power Electronics and Design, Aug. 1998, pp. 76–81.
[6] C. L. Liu and James W. Layland, “Scheduling algorithms for multiprogramming in a hard real time
environment,” Journal of the ACM, vol. 20, no. 1, pp. 46–61, Jan. 1973.
[7] N. Audsley, A. Burns, M. Richardson, and A. Wellings, “Hard real-time scheduling: The deadlinemonotonic approach,” in Proc. IEEE Workshop on Real-Time Operating Systems and Software, May 1991,
pp. 133–137.
[8] J. Lehoczky, L. Sha, and Y. Ding, “The rate monotonic scheduling algorithm: Exact characterization
and average case behavior,” in Proc. IEEE Real-Time Systems Symposium, Dec. 1989, pp. 166–171.
[9] D. Katcher, H. Arakawa, and J. Strosnider, “Engineering and analysis of fixed priority
schedulers,” IEEE Trans. on Software Eng., vol. 19, no. 9, pp. 920–934, Sept. 1993.
[10] T. Burd and R. Brodersen, “Processor design for portable systems,” Journal of VLSI Signal
Processing, vol. 13, no. 2/3, pp. 203–222, Aug. 1996.
[11] C. Locke, D. Vogel, and T. Mesler, “Building a predictable avionics platform in Ada: A case study,”
in Proc. IEEE Real-Time Systems Symposium, Dec. 1991.
[12] A. Burns, K. Tindell, and A. Wellings, “Effective analysis for engineering real-time fixed priority
schedulers,” IEEE Trans. on Software Eng., vol. 21, no. 5, pp. 475–480, May 1995.
[13] J. Liu, J. Redondo, Z. Deng, T. Tia, R. Bettati, A. Silberman, M. Storch, R. Ha, andW. Shih, “PERTS:
A prototyping environment for real-time systems,” Tech. Rep. UIUCDCS-R-93-1802, University of
Illinois, 1993.
[14] N. Kim, M. Ryu, S. Hong, M. Saksena, C. Choi, and H. Shin, “Visual assessment of a realtime system
design: A case study on a CNC controller,” in Proc. IEEE Real-Time Systems Symposium, Dec. 1996.
ICCAD2000, Pages 369-372
A Data Flow Fault Coverage Metric For Validation of
Behavioral HDL Descriptions
Qiushuang Zhang and Ian G. Harris
Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003
Abstract
Behavioral HDL descriptions are commonly used to capture the high-level functionality
of a hardware circuit for simulation and synthesis. The manual process of creating a
behavioral description is error prone, so significant effort must be made to verify the
correctness of behavioral descriptions. Simulation-based validation and formal
verification are both techniques used to verify correctness. We investigate validation
because formal verification techniques are frequently intractable for large designs. The
first step toward a behavioral validation technique is the development of a validation fault
coverage metric which can be used to evaluate the likelihood of design defect detection
with a given test sequence.
We propose a validation fault coverage metric which is based on an analysis of the
control data flow description associated with the behavior. The proposed metric identifies
a subset of paths through the data flow which must be traversed during testing to detect
faults. The proposed metric is a tractable compromise between the statement coverage
metric which requires only that each statement be executed, and the path coverage metric
which requires that all data flow paths be executed. Data flow paths are identified based
on the relative code locations of definitions and uses of variables which may be assigned
incorrectly due to a design error. We propose an efficient method to compute all data
flow paths which must be traversed, and we generate coverage results for several
benchmark VHDL circuits for comparison to other approaches.
REFERENCES
[1] A Gupta, S. Malik, and P. Ashar, “Toward formalizing a validation methodology using simulation
coverage”, in Design Automation Conference, pp. 740–745, 1997.
[2] F. Fallah, P. Ashar, and S. Devadas, “Simulation vector coverage”, in Proceedings of the 36th Design
Automation Conference, pp. 666–671, 1999.
[3] S. Devadas, A. Ghosh, and K. Keutzer, “An observability-based code coverage metric for functional
simulation”, in International Conference on Computer-Aided Design, pp. 418–425, 1996.
[4] G. Al Hayek and C. Robach, “From specification validation to hardware testing: a unified method”, in
International Test Conference, pp. 885–893, 1996.
[5] K. N. King and A. J. Offutt, “A fortran language system for mutation-based software testing”, Software
Practice and Engineering, vol. 21, pp. 685–718, 1991.
[6] S Rapps and E. J. Weyuker, “Selecting software test data using data flow information”, IEEE Trans. on
Software Engineering, vol. SE-11, pp. 367–375, April 1985.
[7] P. G. Frankl and J. E. Weyuker, “An applicable family of data flow testing criteria”, IEEE Trans. on
Software Engineering, vol. SE-14, pp. 1483–1498, Oct. 1988.
[8] S. C. Ntafos, “A comparison of some structural testing strategies”, IEEE Trans. on Software
Engineering, vol. SE-14, pp. 868–874, 1988.
[9] J. Laski and B. Korel, “A data flow oriented program testing strategy”, IEEE Trans. on Software
Engineering, vol. SE-9, pp. 33–43, 1983.
[10] L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil, “A formal evaluation of data flow path
selection criteria”, IEEE Trans. on Software Engineering, vol. SE-15, pp. 1318–1332, 1989.
ICCAD2000, Pages 374-378
Simultaneous Gate Sizing and Fanout Optimization
Wei Chen*, Cheng-Ta Hsieh+, Massoud Pedram*
*University of Southern California, Los Angeles, CA 90089
+Verplex Systems, Inc., Milpitas, CA 95035
Abstract
This paper describes an algorithm for simultaneous gate sizing and fanout optimization
along the timing-critical paths in a circuit. First, a continuous-variable delay model that
captures both sizing and buffering effects is presented. Next, the optimization problem is
formulated as a non-convex mathematical program. To manage the problem size, only a
small number of critical paths are considered simultaneously. The mathematical program
is solved by a non-linear programming package. Finally, a design flow based on iterative
selection and optimization of the k most critical paths in the circuit is proposed.
Experimental results show that the proposed flow reduces the circuit delay by an average
of 9.2% compared to conventional flows that separate gate sizing from fanout
optimization.
Reference
[1] O. Coudert, R. Haddad, "New Algorithms for Gate Sizing: a Comparative Study", Proc. of 33rd DAC,
pp.734-739, Jun 1996.
[2] M. Berkelaar, J. Jess, "Gate Sizing in MOS Digital Circuits with Linear Programming", Proc. of
European DAC, pp.217-221, 1990.
[3] C. L. Berman, J. L. Carter, K. F. Day, “The Fanout Problem: From Theory to Practice”, Advanced
Research in VLSI: Proc. of the 1989 Decennial Caltech Conference, pp. 69-99, 1989.
[4] H. Touati, “Performance-oriented Technology Mapping”, Ph.D. thesis, University of California,
Berkeley, Technical Report UCB.ERL M90/109, November 1990.
[5] K. Kodandapani, J. Grodstein, A. Dominic, H. Touati, “A Simple Algorithm for Fanout Optimization
using High-Performance Buffer Libraries”, Proc. of ICCAD, pp. 466-471, November 1993.
[6] Y. Jiang, S. Sapatnekar, C. Bamji, J. Kim, “Interleaving Buffer Insertion and Transistor Sizing into a
Single Optimization”, IEEE Transactions on VLSI Systems, vol.6, No.4, pp. 625 - 633, December 1998.
[7] W. Chen, C. T. Hsieh, M. Pedram, “Simultaneous Gate Sizing and Placement”, IEEE Transactions on
CAD, Vol.19, No.2, pp.206-214, February 2000.
[8] I. Sutherland, R. Sproul, “The Theory of Logical Effort: Designing for Speed on the Back of an
Envelope”, Advanced Research in VLSI, Santa Cruz, 1991.
[9] D. Kung, “A Fast Fanout Optimization Algorithm for Near-Continuous Buffer Libraries”, Proc. of 35th
DAC, pp. 352-355, June 1998.
[10] P. Rezvani, A. Ajami, M. Pedram, H. Savoj, “Leopard: A Logical Effort-based fanout OPtimization
for Area and Delay”, Proc. of ICCAD, pp. 516-519, November 1999.
[11] D. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, pp.180, 1984.
[12] A. R. Conn, N. I. M. Gould, P. Toint, LANCELOT: A Fortran Package for Large-Scale Nonlinear
Optimization, Springer-Verlag, 1992.
ICCAD2000, Pages 379-386
Layout-driven Area-constrained Timing Optimization by Net Buffering
Rajeev Murgai
Fujitsu Laboratories of America, Inc., Sunnyvale, CA, USA.
Abstract
With the advent of deep sub-micron technologies, interconnect loads and delays are
becoming significant, and layout-driven synthesis has become the need of the day.
However, given the tight constraints imposed by the layout (e.g., area availability,
congestion), only those synthesis transforms can be made layout-driven that are local and
layout-friendly. Examples of such transforms are net buffering, gate resizing, and gate
replication.
In this paper, we address the problem of minimizing the delay of a mapped, roughly
placed, and globally-routed design by buffer insertion and/or deletion without violating
the local area constraints imposed by the layout and without overloading any buffer/cell
pins. We believe this is the one of the most fundamental problems in layout-driven buffer
optimization. To the best of our knowledge, no technique has been published to date that
solves this problem. The concept of local (or block) area constraints we use in this paper
is more powerful than that of the total design area traditionally-used in logic synthesis.
Our main contributions are the following: 1. We propose an exact, layout-driven net
buffering algorithm to minimize the delay of an extended net under the area constraint of
each block in the design. 2. We propose a simple yet effective scheme for applying the
single-net algorithm to an entire design. 3. We apply our technique successfully on three
real, large, industrial designs. The largest design (172K gates and 211K nets) could be
optimized in about 20 minutes. The technique is remarkably effective when the available
area in the design is small: it generates 6-9 times better delay improvements than the
unconstrained delay minimization technique [8] modified to handle area constraints. Over
an entire range of available areas, it gives about 115% better delay improvements.
References
[1] C. J. Alpert and A. Devgan. Wire Segmenting For Improved Buffer Insertion. In DAC, pages 588-593,
1997.
[2] C. J. Alpert, A. Devgan, and S. T. Quay. Buffer Insertion for Noise and Delay Optimization. In DAC,
pages 362-367, 1998.
[3] R. Carragher, S. Chakraborty, R. Murgai, M. Prasad, A. Srivastava, and N. Vemuri. Layout-driven
Logic Optimization. In IWLS, 2000.
[4] L. Kannan, P. Suaris, and H. G. Fang. A Methodology and Algorithms for Post-Placement Delay
Optimization. In DAC, pages 327-332, 1994.
[5] D. Kung. A Fast Fanout Optimization Algorithm for Near-Continuous Bu_er Libraries. In DAC, pages
352-355, 1998.
[6] J. Lillis, C. K. Cheng, and T. T. Y. Lin. OptimalWire Sizing and Buffer Insertion for Low Power and a
Generalized Delay Model. In ICCAD, pages 138-143, 1995.
[7] R. Murgai. Delay Constrained Area Recovery via Layout-driven Buffer Optimization. In IWLS, June
1999.
[8] Lukas P. P. P. van Ginneken. Buffer Placement in Distributed RC-tree Networks for Minimum Elmore
Delay. In ISCAS, pages 865-868, 1990.
[9] J. Rubinstein, P. Penfield, and M. A. Horowitz. Signal Delay in RC Tree Networks. In IEEE Trans. on
CAD, July 1983.
[10] H. Vaishnav and M. Pedram. Routability-Driven Fanout Optimization. In DAC, pages 230-235, 1993.
ICCAD2000, Pages 387-390
Synthesis of CMOS Domino Circuits for Charge Sharing Alleviation
Ching-Hwa Cheng, Shih-Chieh Chang, Shin-De Li, Wen-Ben Jone, Jinn-Shyan Wang*
Department of Computer Science and Information Engineering
*Department of Electrical Engineering
National Chung Cheng University, Chiayi, Taiwan, Republic of China
Abstract
The Charge Sharing (CS) problem is one of notorious noise problems in domino circuits
design and test. In this paper, this problem is thoroughly investigated by considering
circuit topology and circuit function. The sensitivity of each domino gate to the CS
problem is represented by the concept of CS-vulnerability. A method to derive the CSvulnerability and the test pattern for each domino gate is suggested. We also propose a
transistor reordering method to dramatically reduce the CS-vulnerabilities for all domino
gates, so that the CS problem can be alleviated. Simulation results demonstrate that our
transistor reordering method can efficiently reduce the CS-vulnerabilities for most of
domino circuits.
References
[ 1 ] F. Brglez, "On Testability of Combinational Networks," Proc. o f International Symposium on
Circuits and Systems, pp. 221-225, 1984.
[2] S. M. Kang, Y. Leblebici, CMOS Digital Integrated Circuits: Analysis and Design, McGraw-Hill
Book Co., 1996.
[3] J. A. Pretorius, A. S. Shubat, C. A. Salama, "Charge Redistribution and Noise Margins in Domino
CMOS logic", IEEE Transactions on circuits and systems, Vol. CAS-33, pp 786-793, Aug. 1986.
[4] S. C. Prasad, K. Roy, "Transistor Reordering for Power Minimization Under Delay Constraint,"
ACM Transaction on Design Automation o f Electronic Systems, Vol. l, No. 2, pp. 280-300, April
1996.
[5] J. M. Rabaey, Digital Integrated Circuits-A Design Perspective, Prentice Hall Pub. Co., 1996.
[6] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Muragi, A Saldanha, H. Savoi, P. R. Stephan,
R. K. Brayton, A. Sangiovanni-Vincentelli, "SIS: A System for Sequential Synthesis," U. C. Berkeley,
Technical Report UCB/ERL M92141, May 1992.
[7] Z. Wang, G. A. Jullien, W. C. Miller, J. Wang, S. S. Bizzan, "Fast Adder Using Enhanced MultipleOutput Domino Logic," IEEE Journal of Solid-State Circuits, Vol. 32, No. 2, Feb. 1997.
ICCAD2000, Pages 392-398
Test of Future System-on-Chips
Yervant Zorian1, Sujit Dey2, +, Michael J. Rodgers3
1
LogicVision Inc., San Jose, CA
Dept. of ECE, University of California at San Diego, La Jolla, CA
3
Test Technology, Microprocessor Products Group, Intel Corp., Santa Clara, CA
2
Abstract
Spurred by technology leading to the availability of millions of gates per chip, systemlevel integration is evolving as a new paradigm, allowing entire systems to be built on a
single chip. Being able to rapidly develop, manufacture, test, debug and verify complex
SOCs is crucial for the continued success of the electronics industry. This growth is
expected to continue full force at least for the next decade, while making possible the
production of multimillion transistor chips. However, to make its production practical
and cost effective, the industry road maps identify a number of major hurdles to be
overcome. The key hurdle is related to test and diagnosis. This embedded tutorial
analyzes these hurdles, relates them to the advancements in semiconductor technology
and presents potential solutions to address them. These solutions are meant to ensure that
test and diagnosis contribute to the overall growth of the SOC industry and do not slow it
down. This embedded tutorial in addition presents the state-of-the-art in system-level
integration and addresses the strategies and current industrial practices in the test of
system-on-chip. It discusses the requirements for test reuse in hierarchical design, such as
embedded test strategies for individual cores, test access mechanisms, optimizing test
resource partitioning, and embedded test management and integration at the System-onChip level. Processor cores being one of the most common cores embedded in a SOC,
issues related to self-testing embedded processor cores are addressed. Future research
challenges and opportunities are discussed in enabling testing of future SOCs which use
deep submicron technologies.
References
1. The National Technology Roadmap for Semiconductors (ITRS), 1997 Edition. Semiconductor Industry
Association.
2. The International Technology Roadmap for Semiconductors, 1999 Edition, ITRS.
3. Rajesh K. Gupta and Yervant Zorian. Introducing Core-Based System Design. IEEE Design & Test of
Computers, 14(4), pp.15–25, December 1997.
4. Yervant Zorian, Erik Jan Marinissen, and Sujit Dey. Testing Embedded Core Based System Chips. In
Proceedings IEEE International Test Conference (ITC), pages 130–143, Washington, DC, IEEE Computer
Society Press, Oct. 1998.
5. Yervant Zorian, Erik Jan Marinissen, and Sujit Dey. Testing Embedded-Core-Based System Chips, IEEE
Computer, 32(6), pp. 52–60, June 1999.
6. IEEE Computer Society. IEEE Standard Test Access Port and Boundary-Scan Architecture IEEE Std.
1149.1-1990. IEEE, New York, June 1993.
7. Erik Jan Marinissen and Yervant Zorian. Challenges in Testing Core-Based System ICs. IEEE
Communications Magazine, 37(6), pp. 104–109, June 1999.
8. IEEE P1500 Web Site. http://grouper.ieee.org/groups/1500.
9. Erik Jan Marinissen, Yervant Zorian, Rohit Kapur, Tony Taylor, and Lee Whetsel. Towards a Standard
for Embedded Core Test: An Example. In Proceedings IEEE International Test Conference (ITC), pages
616–627,
Atlantic City, NJ, September 1999. IEEE Computer Society Press.
10. Lee Whetsel. Addressable Test Ports: An Approach to Testing Embedded Cores. Proceedings IEEE
International Test Con-ference (ITC), pages 1055–1064, Atlantic City, NJ, September 1999. IEEE
Computer Society Press. IEEE Computer Society Press.
11. Erik Jan Marinissen, Sandeep Kumar Goel, and Maurice Lousberg. Wrapper Design for Embedded
Core Test, Proceedings IEEE International Test Conference (ITC), Atlantic City, NJ, October 20000. IEEE
Computer Society Press. April 2000.
12. Yervant Zorian. A Distributed BIST Control Scheme for Complex VLSI Devices, Proceedings IEEE
VLSI Test Symposium (VTS), pages 6–11, Princeton, NJ, April 1993. IEEE Computer Society Press.
13. V.D. Agrawal et al.. Built-in self-test for digital integrated circuits, AT&T Technical Journal, pp. 30,
Mar. 1994.
14. T.G. Foote, D. E. Hoffman, W. V. Huott, T. J. Koprowski, B. J. Robbins, and M. P. Kusko, Testing the
400 MHz IBM Generation-4 CMOS Chip, Proceedings of the International Test Conference 1997,
Washington DC, pp. 106 – 114, Nov. 1999.
15. L. Chen, S. Dey, P. Sanchez, K. Sekar, and Y. Chen. Embedded Hardware and Software Self-Testing
Methodologies for Processor Cores, Proceedings of the Design Automation Conference, Los Angeles, CA,
pp. 625 – 630, June 2000.
16. “PicoJava Microprocessor Core,” Sun Microsystems, http:// www. sun. com
/microelectronics/picoJava/.
17. R. Dorsch and H.-J. Wunderlich, "Accumulator based deterministic BIST," Proceedings of the
International Test Conference, Washington DC, pp. 412 - 421, Oct. 1998.
18. N. Touba and E. J. McCluskey, Synthesis of Mapping Transformed Pseudo-random Patterns for BIST,
Proceedings of International Test Conference, pp. 674-682, 1995.
19. J. Shen and J. A. Abraham, “Native mode functional test generation for processors with applications to
self test and design validation,” Proceedings of the International Test Conference 1998, Washington, DC,
Oct. 1998, pp. 990-999.
20. K. Batcher and C. Papachristou. Instruction randomization self test for processor cores, Proceedings of
the 17th IEEE VLSI Test Symposium, Dana Point, California, pp. 34 – 40, April 1999.
21. J. Rajski and J. Tyszer. Arithmetic Built-in Self-Test for Embedded Systems, Prentice Hall, 1998.
22. K. Radecka, J. Rajski, and J. Tyszer. Arithmetic built-in self-test for DSP cores, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol.16, no.11, pp. 1358 – 69, Nov. 1997.
23. S. Hellebrand and H.-J. Wunderlich. Mixed-mode BIST using embedded processors, Proceedings of the
International Test Conference 1996, Washington DC, pp. 195 – 204, Oct. 1996.
24. L. Chen and S. Dey. DEFUSE: A Deterministic Functional Self-Test Methodology for Processors,
Proceedings of the 18th IEEE VLSI Test Symposium, pp. 255 – 262 ,April 2000.
25. W.-C. Lai, A. Krstic, and K.-T. Cheng. On Testing the Path Delay Faults of a Microprocessor Using its
Instruction Set. Proceedings of IEEE VLSI Test Symposium , pp. 15-20, May, 2000.
26. M. Gumm. VLSI Design Course: VHDL-Modelling and Synthesis of the DLXS RISC Processor.
University of Stuttgart, Germany, December 1995.
27. W. C. Lai, A. Krstic, and K. T. Cheng. Test Program Synthesis for Path Delay Faults in Microprocessor
Cores. Proceedings of International Test Conference, Oct., 2000.
28. K. L. Shepard. Design Methodologies for Noise in Digital Integrated Circuits, Proceedings Design
Automation Conference, pp. 94-99, 1998.
29. D. S. Gao, A. T. Yang and S. M. Kang. Modeling and simulation of interconnection delays and
crosstalk in high-speed integrated circuits, IEEE Trans. on Circuits and Systems, Vol. 37, pp. 1-9, January
1990.
30. H. You and M. Soma. Crosstalk and transient analysis of high-speed interconnects and packages, IEEE
Trans. on Solid State Circuits, Vol. 26, pp. 319-30, March 1991.
31. C. Gordon and K. M. Roselle. Estimating crosstalk in multiconductor transmission lines, IEEE Trans.
On Components Packaging and Manufacturing Technology, Vol.19, May 1996.
32. K. Rahmat J. Neves, and J. Lee. Methods for calculating coupling noise in early design: a comparative
analysis. Proceeding International Conference on Computer Design VLSI in Computers and Processors,
pp.76-81, 1998.
33. N. Itazaki, Y. Matsumoto, and K. Kinoshita. An Algorithmic Test Generation Method for crosstalk
Faults in Synchronous Sequential Circuits, Proceedings Sixth Asian Test Symposium, pp. 22-7, Nov. 1997.
34. K. T. Lee, C. Nordquist, and J. Abraham. Automatic Test Pattern generation for Crosstalk Glitches in
Digital Circuits, Proceedings IEEE VLSI Test Symposium, pp. 34-39, 1998.
35. W. Chen, S. K. Gupta, and M. A. Breuer. Test Generation in VLSI Circuits for Crosstalk Noise,
Proceedings IEEE International Test Conference, pp. 641-650, 1998.
36. W. Chen, S. K. Gupta, and M. A. Breuer. Test Generation for Crosstalk-Induced Delay in Integrated
Circuits, Proceedings IEEE International Test Conference, pp. 191-200, October 1999.
37. A. Sinha, S.K. Gupta, and M.A. Breuer. Validation and Test Generation for Oscillatory Noise in VLSI
Interconnects, Proceedings of the International Conference on Computer-Aided Design, November 1999.
38. P. Nordholz, D. Treytnar, J. Otterstedt, H. Grabinski, D. Niggemeyer, and T.W. Williams, Signal
Integrity Problems in Deep Submicron arising from Interconnects between Cores, Proceedings IEEE VLSI
Test Symposium, pages 28-33, 1998.
39. M. Cuviello, S. Dey, X. Bai, and Y. Zhao. Fault Modeling and Simulation for Crosstalk in System-onChip Interconnects, Proceedings of the International Conference on Computer-Aided Design, pp. 297-303,
November 1999.
40. X. Bai, S. Dey, and J. Rajski. Self-Test Methodology for At-Speed Test of Crosstalk in Chip
Interconnects, Proceedings of the Design Automation Conference, June 2000.
41. HSPICE, v98.2, Avant! Corporation, Fremont, California.
ICCAD2000, Pages 400-405
UST/DME: A Clock Tree Router For General Skew Constraints
ChungWen, Albert Tsao
Ultima Interconnect Technology, Sunnyvale, CA 94089, USA
ChengKok Koh
ECE, Purdue University, West Lafayette, IN 479071285, USA
ABSTRACT
In this paper, we propose new approaches for solving the useful-skew tree (UST) routing
problem [17]: Clock routing subject to general skew constraints. The clock layout
synthesis engine of our UST algorithms is based on the deferred-merge embedding
(DME) paradigm for zero-skew tree [5; 1] and bounded-skew tree [8; 2] routings; hence,
the names UST/DME and Greedy-UST/DME for our algorithms. They simultaneously
perform skew scheduling and tree routing such that each local skew range is
incrementally refined to a skew value that minimizes the wirelength during the bottom-up
merging phase of DME. The resulting skew schedule is not only feasible, but is also best
for routing in terms of wirelength. The experimental results show very encouraging
improvement over the previous BST/DME algorithm on three ISCAS89 benchmarks
under general skew constraints in terms of total wirelength.
REFERENCES
[1] T.-H. Chao, Y.-C. Hsu Hsu, J.-M. Ho, K. D. Boese, and A. B. Kahng. Zero skew clock routing with
minimum wirelength. IEEE Trans. on Circuits and Systems, 39(11):799–814, November 1992.
[2] J. Cong, A. B. Kahng, C.-K. Koh, and C.-W. A. Tsao. Bounded-skew clock and Steiner routing. ACM
Trans. on Design Automation of Electronics Systems, 3(3):341–388, 1998.
[3] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms, chapter 25.5, pages 539–
543. 1990.
[4] R. B. Deokar and S. S. Sapatnekar. A graph-theoretic approach to clock skew optimization. In Proc.
IEEE Int. Symp. on Circuits and Systems, pages 407–410, 1994.
[5] M. Edahiro. Minimum path-length equi-distant routing. In Proc. IEEE Asia-Pacific Conf. on Circuits
and Systems, pages 41–46, December 1992.
[6] M. Edahiro. An efficient zero-skew routing algorithm. In Proc. Design Automation Conf, pages 375–
380, June 1994.
[7] J. P. Fishburn. Clock skew optimization. IEEE Trans. on Computers, 39(7):945–951, July 1990.
[8] A. B. Kahng and C.-W. A. Tsao. Practical bounded-skew clock routing. Journal of VLSI Signal
Processing (Special issue on High Performance Clock Distribution Networks), 16(2-3):199–215, June-July
1997.
[9] J. L. Neves and E. G. Friedman. Optimal clock skew scheduling tolerant to process variations. In Proc.
Design Automation Conf, pages 623–628, 1996.
[10] K. A. Sakallah, T. N. Mudge, and O. A. Olukotun. checkTc and minTc: Timing verification and
optimal clocking of synchronous digital circuits. In Proc. Int. Conf. on Computer Aided Design, pages 552–
555, 1990.
[11] N. Shenoy, R. K. Brayton, and A. L. Sangiovanni-Vincentelli. Graph algorithms for clock schedule
optimization. In Proc. Int. Conf. on Computer Aided Design, pages 132–136, 1992.
[12] T.G. Szymanski. Computing optimal clock schedules. In Proc. Design Automation Conf., pages 399–
404, 1992.
[13] R.-S. Tsay. An exact zero-skew clock routing algorithm. IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, CAD-12(2):242–249, February 1993.
[14] A. Vittal, H. Ha, F. Brewer, and M. Marek-Sadowska. Clock skew optimization for ground bounce
control. In Proc. Int’l Conf. on Computer Aided Design, pages 395–399, Nov. 1996.
[15] P. Vuillod, L. Benini, A. Bogliolo, and G. De Micheli. Clockskew optimization for peak current
reduction. In Proc. Int’l Symp. on Low Power Electronics and Design, pages 265–270, Aug. 1996.
[16] J. G. Xi and W. W.-M. Dai. Jitter-tolerant clock routing in two-phase synchronous systems. In Proc.
Int. Conf. on Computer Aided Design, pages 316–320, 1996.
[17] J. G. Xi and W. W.-M. Dai. Useful-skew clock routing with gate sizing for low power design. Journal
of VLSI Signal Processing Systems, 16(2/3):163–170, 1997.
ICCAD2000, Pages 406-411
A TwistedBundle Layout Structure for Minimizing Inductive Coupling Noise
Guoan Zhong, Cheng-Kok Koh, and Kaushik Roy
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907-1285
ABSTRACT
In this paper, we propose a novel twisted-bundle layout structure for minimizing
inductive coupling noise. In this structure, we create several routing regions and re-order
the routing of nets in each of these routing regions. The purpose is to create
complementary and opposite current loops in the twisted-bundle layout structure, such
that the magnetic fluxes arising from any signal net within a twisted group cancel each
other in the current loop of a net of interest. The effectiveness of the twisted-bundle
structure in minimizing coupling inductance has been verified by the application of
FastHenry extraction on a 16-bit bus structure. We achieve about two orders of
magnitude reduction in inductive coupling. SPICE simulations also show that the 16-bit
twisted-bundle bus structure is able to maintain high signal integrity at high frequency of
operation.
REFERENCES
[1] Semiconductor Industry Association. International Technology Roadmap for Semiconductors. 1999.
[2] D. Bailey and B. Benschneider. Clocking design and analysis for a 600-MHz Alpha microprocessor.
IEEE Journal on Solid-State Circuits, 33(11):1627–1633, November 1998.
[3] M. Beattie and L. Pileggi. IC analyses including extracted inductance models. In Proc. Design
Automation Conf, pages 915–920, 1999.
[4] D. Blaauw, K. Gala, V. Zolotov, R. Panda, and J. Wang. On-chip inductance modeling. In Great Lakes
Symposium on VLSI, pages 75–80, 2000.
[5] C.K. Cheng, J. Lillis, S. Lin, and N. Chang. Interconnect Analysis and Synthesis. John-Wiley, 2000.
[6] T. Gao and Liu C. Minimum crosstalk channel routing. In Proc. Design Automation Conf, pages 692–
696, 1993.
[7] L. He, N. Chang, S. Lin, and O. S. Nakagawa. An efficient inductance modeling for on-chip
interconnects. In Proc. IEEE Custom Integrated Circuits Conference, pages 457–
460, 1999.
[8] L. He and K. M. Lepak. Simultaneous shield insertion and net ordering for capacitive and inductive
coupling minimization. In Proc. Int. Symp. on Physical Design, pages 55–60, 2000.
[9] M. Kamon, M. J. Tsuk, and J. K. White. FASTHENRY: A multipole-accelerated 3-D inductance
extraction program. IEEE Journal on Microwave Theory and Techniques, 42(9):1750–1758, September
1994.
[10] B. Krauter and S. Mehrotra. Layout based frequency dependent inductance and resistance extraction
for on-chip interconnect timing analysis. In Proc. Design Automation Conf, pages 303–308, 1998.
[11] Y. Massoud, S. Majors, T. Bustami, and J. White. Layout techniques for minimizing on-chip
interconnect self inductance. In Proc. Design Automation Conf, pages 566–571,
1998.
[12] Allen Nussbaum. Electromagnetic theory for engineers and scientists. Englewood Cliffs, N.J., 1965.
[13] S. Ramo, J.Whinnery, and T. Duzer. Fields and waves in communication electronics. New York: John
Wiley & Sons Inc, 1984.
[14] P. Restle, A. Ruehli, and S. Walker. Dealing with inductance in high-speed chip design. In Proc.
Design Automation Conf, pages 904–909, 1999.
[15] A. E. Ruehli. Inductance calculation in a complex integrated circuit environment. IBM Journal of
Research and Development, pages 470–481, September 1972.
[16] K. L. Shepard and V. Narayanan. Noise in deep submicron digital design. In Proc. Int. Conf. on
Computer Aided Design, pages 524–531, 1996.
[17] T. Xue and E. S. Kuh. Post global routing crosstalk synthesis. IEEE Trans. on Computer-Aided Design
of Integrated Circuits and Systems, 16:1418–1430, December 1997.
ICCAD2000, Pages 412-418
Cross-talk Immune VLSI Design using a Network of PLAs Embedded in a Regular
Layout Fabric
Sunil P. Khatri
(spkhatri@colorado.edu)
Robert K. Brayton
(brayton@ic.eecs.berkeley.edu)
Alberto Sangiovanni-Vincentelli
(alberto@ic.eecs.berkeley.edu)
Abstract
We present a VLSI design methodology to address the cross-talk problem, which is
becoming increasingly important in Deep Sub-Micron (DSM) IC design. In our
approach, we implement the logic netlist in the form of a network of medium sized PLAs.
We utilize two regular layout "fabrics" in our methodology, one for areas where PLA
logic is implemented, and another for routing regions between such logic blocks. We
show that a single PLA implemented in the first fabric style is not only cross-talk
immune, but also about 2 x smaller and faster than a traditional standard cell based
implementation of the same logic. The second fabric, utilized in the routing region
between individual PLAs, is also highly cross-talk immune. Additionally, in this fabric,
power and ground signals are essentially "pre-routed" all over the die.
Our synthesis flow involves decomposing the design into a network of PLAs, each of
which has a bounded width and height. The number of inputs and outputs of each PLA
are flexible as long as the resulting PLA width is bounded. We perform folding of PLAs
to achieve better logic density. Routing is performed using 2, 3, 4, 5 and 6 routing layers.
State-of-the-art commercial routing tools are utilized for the experiments involving the
use of 3, 4, 5 and 6 routing layers.
We have implemented the entire design flow using these ideas. Our scheme results in a
reduction in the cross-talk between signal wires of between one and two orders of
magnitude. As a result, for a 0.1 µm process, the delay variation due to cross-talk
dramatically drops from 2.47:1 to 1.02:1. Additionally, our methodology results in
circuits that are extremely fast and dense, with a timing improvement of about 15% and
an overall area penalty of about 3% compared to standard cells. The regular arrangement
of metal conductors in our scheme results in low and highly predictable inductive and
capacitive parasitics, resulting in highly predictable designs. The crosstalk immunity,
high speed, low area overhead and high predictability of our methodology indicate that it
is a strong candidate as the preferred design methodology in the DSM era.
References
[1] J. Grodstein, “Member, DEC Alpha microprocessor design team.” Personal communication, 1998.
[2] S. Khatri, A. Mehrotra, R. Brayton, A. Sangiovanni-Vincentelli, and R. Otten, “A novel VLSI layout
fabric for deep sub-micron applications,” in Proceedings of the Design Automation Conference, (New
Orleans), June 1999.
[3] R. Otten and R. Brayton, “Planning for performance,” in Proceedings of the Design Automation
Conference, pp. 122–127, Jun 1998.
[4] S. Posluszny, N. Aoki, D. Boerstler, J. Burns, S. Dhong, U. Ghoshal, P. Hofstee, D. LaPotin, K. Lee, D.
Meltzer, H. Ngo, K. Nowka, J. Silberman, O. Takahashi, and I. Vo, “Design methodology for a 1.0 ghz
microprocessor,” in Proceedings of the International Conference on Computer Design (ICCD), pp. 17–23,
Oct 1998.
[5]
“Physical
Design
Modelling
and
Verification
Project
(SPACE
Project).”
http://cas.et.tudelft.nl/research/space/html.
[6] L. Nagel, “Spice: A computer program to simulate computer circuits,” in University of California,
Berkeley UCB/ERL Memo M520, May 1995.
[7] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan,
R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “SIS: A System for Sequential Circuit Synthesis,” Tech.
Rep. UCB/ERL M92/41, Electronics Research Laboratory, Univ. of California, Berkeley, CA 94720, May
1992.
[8] A. Casotto, ed., Octtools-5.1 Manuals, (Electronics Research Laboratory, College of Engineering,
University of California, Berkeley, CA 94720), University of California at Berkeley, Sept. 1991.
[9] C. Sechen and A. Sangiovanni-Vincentelli, “The TimberWolf Placement and Routing Package,” IEEE
Journal of Solid-State Circuits, 1985.
[10] J. Reed, M. Santomauro, and A. Sangiovanni-Vincentelli, “A new gridless channel router: Yet another
channel router the second (YACR-II),” in Digest of Technical Papers International Conference on
Computer-Aided Design, 1984.
[11] G. T. Hamachi, R. N. Mayo, and J. K. Ousterhout, “Magic: A VLSI Layout system,” in 21st Design
Automation Conference Proceedings, 1984.
[12] J. Rabaey, Digital Integrated Circuits: A Design Perspective. Prentice Hall Electronics and VLSI
Series, Prentice Hall, 1996.
[13] R. K. Brayton, G. D. Hachtel, C. T. McMullen, and A. L. Sangiovanni-Vincentelli, Logic
Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers, 1984.
[14] V. Betz and J. Rose, “VPR: A new packing, placement and routing tool for FPGA research,” in
Proceedings of the International Workshop on Field Programmable Logic and Applications, 1997.
[15] Cadence Design Systems, Inc., 555 River Oaks Parkway, San Jose, CA 95134, USA, Envisia Silicon
Ensemble Place-and-route Reference, Nov 1999.
[16] “The National Tecnology Roadmap for Semiconductors.” http://notes.sematech.org/97melec.htm,
1997.
[17] P. D. Fisher, “Clock Cycle Estimation for Future Microprocessor Generations,” tech. rep.,
SEMATECH, 1997.
[18] P. McGeer, A. Saldanha, R. Brayton, and A. Sangiovanni-Vincentelli, Logic Synthesis and
Optimization, ch. Delay Models and Exact Timing Analysis, pp. 167–189. Kluwer Academic Publishers,
1993.
[19] M. S. Schmookler, “Design of large ALUs using multiple PLA macros,” IBM Journal of Research and
Development, vol. 24, pp. 2–14, Jan 1980.
[20] S. Sinha, S. Khatri, R. Brayton, and A. Sangiovanni-Vincentelli, “Binary and multivalued SPFD-based
wire removal in PLA networks,” in Proceedings of the ICCD, Sep 2000. To Appear.
[21] S. Yamashita, H. Sawada, and A. Nagoya, “A new method to express functional permissibilities for
LUT based FPGAs and its applications,” in Proceedings of the International Conference on ComputerAided Design, pp. 254–61, Nov 1996.
[22] R. Brayton, “Understanding SPFDs: A new method for specifying flexibility,” in Workshop Notes,
International Workshop on Logic Synthesis, (Tahoe City, CA), May 1997.
[23] S. Patil and T. Welch, “A programmable logic approach for VLSI,” IEEE Transactions on Computers,
vol. c-28, pp. 594–601, Sep 1979.
[24] K. Smith, T. Carter, and C. Hunt, “Structured logic design of integrated circuits using the storage/logic
array (SLA),” IEEE Transactions on Electron Devices, vol. ED-29, pp. 765–76, Apr 1982.
ICCAD2000, Pages 420-423
Latency-Guided On-Chip Bus Network Design
Milenko Drinic, Darko Kirovski, Seapahn Meguerdichian, Miodrag Potkonjak
Computer Science Department, University of California, Los Angeles, Ca 90095-1596
Abstract
Deep submicron technology scaling has two major ramifications on the design process.
First, reduced feature size significantly increases wire delay, thus resulting in critical
paths being dominated by global interconnect rather than gate delays. Second, ultra high
level of integration mandates design of systems-on-chip that encompass numerous intrasynchronous blocks with decreased functional granularity and increased communication
demands. To address these issues we have developed an on-chip bus network design
methodology and corresponding set of tools which, for the first time, close the synthesis
loop between system and physical design. The approach has three components: a
communication profiler, a bus network designer, and a fast approximate floorplanner.
The communication profiler collects run-time information about the traffic between
system cores. The bus network design component optimizes the bus network structure by
coordinating information from the other two components. The floorplanner aims at
creating a feasible floorplan and to communicate information about the most constrained
parts of the network.
References
[1] C.J. Alpert, et al. Buffer insertion for noise and delay optimization. TCAD, vol.18, (no.11),
pp.1633-45, 1998.
[2] G.P. Bischoff, et al. Formal implementation verification of the bus interface unit for the Alpha
21264 microprocessor. ICCD, pp.16-24, 1997.
[3] L. Carloni, et a1. A methodology for Correct-by-Construction Latency Insensitive Design. ICCAD,
pp.309-315, 1999.
[4] J. Cong and L. He. Optimal Wiresizing for Interconnects with Multiple Sources. TODAES, vol.1,
(no.4), pp.478-511, 1996.
[5] B. Davari. CMOS technology: Present and future. Symposium on VLSI Circuits, pp.5-10, 1099.
[6] J. Duato. A necessary and sufficient condition for deadlock-free routing in cut-through and storeand-forward-networks. Transactions on Parallel and Distributed Systems, vol.7, (no.8), pp.841-54,
1996.
[7] M. R. Carey and D. S. Johnson. Computers and intractability: a guide to the theory of NPcompleteness. W.H. Freeman, 1979.
[8] M.B. Hadim and I. Sakho. A new methodology for deriving deadlockfree routing strategies in
processor networks. International Conference on Parallel and Distributed Computing Systems, pp.38590, 1997.
[9] A. B. Kahng, and S. Muddu. An analytical model for RLC interconnections. TCAD, vol.16,
(no.12), pp.1507-1514, 1997.
[10] D. Kirovski, and M. Potkonjak. Engineering change: methodology and applications to behavioral
and system synthesis. DAC, pp.604-9, 1999.
[11] C.: Lee, et al. MediaBench: a tool for evaluating and synthesizing multimedia and
communications systems. International Symposium on Microamhitecture, pp.330-5, 1997.
[12] E. Macii and M. Poncino. Automatic synthesis of easily scalable bus arbiters with dynamic
priority assignment strategies. Computers & Electrical Engineering, vol.24, (no-3-4), pp.223-8, 1998.
[13] http://www.mentor.com/inventra/cores/catalog/index.html
[14] R.G. Pomerleau, et al. Improved delay prediction for on-chip buses. DAC, pp.497-501, 1999.
[15] A.M. Rincon, et al. The changing landscape of system-on-a-chip design. Custom Integrated
Circuits Conference, pp.83-90, 668, 1999.
[16] D. Sylvester, and K. Keutzer. Rethinking deep-submicron circuit design. Computer, vol-32,
(no.11), pp.25-33, 1999.
[17] http://www.vsi.org.
ICCAD2000, Pages 424-430
Efficient Exploration of the SoC Communication Architecture Design Space
Kanishka Lahiri
Department of ECE, UC San Diego
Anand Raghunathan
NEC USA C&C Research Labs, Princeton, NJ
Sujit Dey
Department of ECE, UC San Diego
Abstract
In this paper, we present a methodology and efficient algorithms for the design of highperformance system-on-chip communication architectures. Our methodology
automatically and optimally maps the various communications between system
components onto a target communication architecture template that can consist of an
arbitrary interconnection of shared or dedicated channels. In addition, our techniques
simultaneously configure the communication protocols of each channel in the
architecture in order to optimize system performance.
We motivate the need for systematic exploration of the communication architecture
design space, and highlight the issues involved through illustrative examples. We present
a methodology and algorithms that address these issues, including the size and
complexity of the design space. We present experimental results on example systems,
including a cell forwarding unit of an ATM switch, that demonstrate the benefits of using
the proposed techniques. Experimental results indicate that our techniques are successful
in achieving significant improvements in system performance over conventional
communication architectures (observed speedups over typical architectures such as single
shared buses averaged 53%). Moreover, we demonstrate that our design space
exploration methodology and optimization algorithms are efficient (low CPU times),
underlining their usefulness as part of any system design flow.
References
[1] D. D. Gajski, F. Vahid, S. Narayan and J. Gong, Specification and Design of Embedded Systems.
Prentice Hall, 1994.
[2] R. Ernst, J. Henkel, and T. Benner, “Hardware-software cosynthesis for microcontrollers,” IEEE Design
& Test Magazine, pp. 64–75, Dec. 1993.
[3] T. B. Ismail, M. Abid, and M. Jerraya, “COSMOS:A codesign approach for a communicating system ,”
in Proc. IEEE International Workshop on Hardware/Software Codesign, pp. 17–24, 1994.
[4] A. Kalavade and E. Lee, “A globally critical/locally phase driven algorithm for the constrained
hardware sowftware partitioning problem ,” in Proc. IEEE International Workshop on Hardware/Software
Codesign, pp. 42–48, 1994.
[5] P. H. Chou, R. B. Ortega, and G. B. Borriello, “The CHINOOK hardware/software cosynthesis
system,” in Proc. Int. Symp. System Level Synthesis, pp. 22–27, 1995.
[6] B. Lin, “A system design methodolgy for software/hardware codevelopment of telecommunication
network applications ,” in Proc. Design Automation Conf., pp. 672–677, 1996.
[7] B. P. Dave, G. Lakshminarayana, and N. K. Jha, “COSYN: hardwaresoftware cosynthesis of embedded
systems ,” in Proc. Design Automation Conf., pp. 703–708, 1997.
[8] P. Knudsen and J. Madsen, “Integrating communication protocol selection with partitioning in
hardware/software codesign ,” in Proc. Int. Symp. System Level Synthesis, pp. 111–116, Dec. 1998.
[9] T. Yen and W. Wolf, “Communication synthesis for distributed embedded systems ,” in Proc. Int. Conf.
Computer-Aided Design, pp. 288–294, Nov. 1995.
[10] J. Daveau, T. B. Ismail, and A. A. Jerraya, “Synthesis of system-level communication by an allocation
based approach ,” in Proc. Int. Symp. System Level Synthesis, pp. 150–155, Sept. 1995.
[11] M. Gasteier and M. Glesner, “Bus-based communication synthesis on system level ,” in ACM Trans.
Design Automation Electronic Systems, pp. 1–11, Jan. 1999.
[12] R. B. Ortega and G. Borriello, “Communication synthesis for distributed embedded systems ,” in Proc.
Int. Conf. Computer-Aided Design, pp. 437–444, Nov. 1998.
[13] “Sonics Integration Architecture, Sonics Inc.” http://www.sonicsinc.com.
[14] “IBM On-chip CoreConnect Bus Architecture.”
http://www.chips.ibm.com/products/coreconnect/index.html.
[15] J. A. Rowson and A. Sangiovanni-Vincentelli, “Interface Based Design ,” in Proc. Design Automation
Conf., pp. 178–183, June 1997.
[16] K. Hines and G. Borriello, “Optimizing Communication in embedded system cosimulation ,” in Proc.
International Workshop on Hardware/Software Codesign (codes/CASHE), pp. 121–125, Mar. 1997.
[17] S. Dey and S. Bommu, “Performance analysis of a system of communication processes,” in Proc. Int.
Conf. Computer-Aided Design, pp. 590–597, Nov. 1997.
[18] K. Lahiri, A. Raghunathan, and S. Dey, “Fast performance analysis of bus-based system-on-chip
communication architectures,” in Proc. Int. Conf. Computer-Aided Design, pp. 566–572, Nov. 1999.
[19] K. Lahiri, A. Raghunathan, and S. Dey, “Performance analysis of systems with multi-channel
communication architectures,” in Proc. Int. Conf. VLSI Design, pp. 530–537, Jan. 2000.
[20] “Peripheral Interconnect Bus Architecture.” http://www.omimo.be.
[21] B. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” Bell Systems
Technical Journal, vol. 49, pp. 291–307, 1970.
[22] F. Balarin, M. Chiodo, H. Hsieh, A. Jureska, L. Lavagno, C.Passerone, A. Sangiovanni-Vincentelli, E.
Sentovich, K. Suzuki and B. Tabbara., Hardware-software Co-Design of Embedded Systems: The POLIS
Approach. Kluwer Academic Publishers, Norwell, MA, 1997.
[23] J. Buck and S. Ha and E. A. Lee and D. D.Masserchmitt, “Ptolemy: A framework for simulating and
prototyping heterogeneous systems,” International Journal on Computer Simulation, Special Issue on
Simulation Software Management, vol. 4, pp. 155–182, Apr. 1994.
ICCAD2000, Pages 431-437
MIST: An Algorithm for MemoryMiss Traffic Management.
Peter Grun, Nikil Dutt, Alex Nicolau
Center for Embedded Computer Systems, University of California, Irvine, CA 92697-3425, USA
Abstract
Cache misses represent a major bottleneck in embedded systems performance.
Traditionally, compilers optimistically treated all memory accesses as cache hits, relying
on the memory controller to account for longer miss delays. However, the memory
controller has only a local view of the program, and is not able to efficiently hide the
latency of these memory operations. Our compiler technique actively manages cache
misses, and performs global miss traffic optimizations, to better hide the latency of the
memory operations. Our memory-aware compiler scheduled several benchmarks on the
TIC6211 processor architecture with a direct mapped cache, and generated an average of
61.6% improvement over the best schedule of the traditional (memory-transparent)
optimizing compiler, demonstrating the utility of our miss traffic optimization approach.
References
[1] M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. Cache behavior prediction by abstract interpretation.
In SAS, 1996.
[2] D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In ASPLOS, 1991.
[3] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle. Custom
Memory Management Methodology. Kluwer, 1998.
[4] P. Chou, R. Ortega, and G. Borriello. Interface co-synthesis techniques for embedded systems. In
ICCAD, 1995.
[5] K.-S. Chung, R. Gupta, and C. L. Liu. Interface co-synthesis techniques for embedded systems. In
ICCAD, 1996.
[6] S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and
tuning memory behavior. TOPLAS, 21(4), 1999.
[7] E. Gornish, E. Granston, and A. Veidenbaum. Compiler-directed data prefetching in multiprocessors
with memory hierarchies. In ICS, 1990.
[8] P. Grun, N. Dutt, and A. Nicolau. Improved performance through compile-time miss traffic
optimization. Technical report, University of California, Irvine, 2000.
[9] P. Grun, N. Dutt, and A. Nicolau. Memory aware compilation through accurate timing extraction. In
DAC, 2000.
[10] P. Grun, A. Halambi, N. Dutt, and A. Nicolau. RTGEN: An algorithm for automatic generation of
reservation tables from architectural descriptions. In ISSS, 1999.
[11] A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, and A. Nicolau. EXPRESSION: A language for
architecture exploration through compiler/simulator retargetability. In Proc. DATE,March 1999.
[12] F. Harmsze, A. Timmer, and J. van Meerbergen. Memory arbitration and cache management in
stream-based systems. In DATE, 2000.
[13] T. Johnson, D. Connors, M. Merten, and W. Hwu. Run-time cache bypassing. Transactions on
Computers, 48(12), 1999.
[14] N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative
cache and prefetch buffers. In ISCA, 1990.
[15] A. Khare, N. Savoiu, A. Halambi, P. Grun, N. Dutt, and A. Nicolau. V-SAT: A visual specification
and analysis tool for system-on-chip exploration. In Proc. EUROMICRO, October 1999.
[16] T. Ly, D. Knapp, R. Miller, and D. MacMillen. Scheduling using behavioral templates. In DAC, San
Francisco, 1995.
[17] T. Mowry, M. Lam, and A Gupta. Design and evaluation of a compiler algorithm for prefetching. In
ASPLOS, 1992.
[18] A. Nicolau and S. Novack. Trailblazing: A hierarchical approach to percolation scheduling. In ICPP,
St. Charles, IL, 1993.
[19] V. Pai and S. Adve. Code transformations to improve memory parallelism. In MICRO, 1999.
[20] S. Palacharla and R. Kessler. Evaluating stream buffers as a secondary cache replacement. In ISCA,
1994.
[21] P. Panda,N.Dutt, and N. Nicolau. Memory Issues in Embedded Systems-on-Chip. Kluwer, 1999.
[22] P. R. Panda, N. D. Dutt, and A. Nicolau. Exploiting off-chip memory access modes in high-level
synthesis. In IEEE Transactions on CAD, February 1998.
[23] S. Przybylski. Sorting out the new DRAMs. In Hot Chips Tutorial, Stanford, CA, 1997.
[24] A. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, and X. Ji. Adapting cache line size to application
behavior. In ICS, 1999.
[25] M. Wolf and M. Lam. A data locality optimizing algorithm. In PLDI, 1991.
ICCAD2000, Pages 439-446
Regularity Driven Logic Synthesis
Thomas Kutzschebauch, Leon Stok
IBM TJ Watson Research Center, Yorktown Heights, NY
Abstract
We present a new and innovative logic synthesis approach using regularity information of
a design to selectively apply transformations and globally guide the synthesis process.
Since traditional logic synthesis applies transformations without consideration of global
design characteristics such as regularity and dataflow, it destroys a substantial amount of
regular structures. In addition, due to the non-incremental nature of most logic
transformations, synthesis relies vastly on the computationally expensive concept of trial
and error application of transformations, a time-consuming process in the synthesis of
large designs.
The proposed approach addresses both shortcomings of traditional logic synthesis and
describes a mechanism to speed up logic synthesis and preserve regularity. It selectively
applies transformations to places with similar characteristics and to the same stage of a
regular structure, introducing a notion of dataflow-aware synthesis.
Preservation of regular structures has tremendous advantages to the following physical
design stages. It yields high-density layouts, shorter wiring length and improved delay. In
addition, the layout becomes more predictable at an earlier design stage.
References
[1] S.R. Arikati and R. Varadarajan. A signature based approach to regularity extraction. In Proceedings of
the International Conference on Computer Aided Design, p. 542-545, November 1997.
[2] T. Chan et. al. Challenges of CAD Development for Datapath Design, In Intel Technical Journal,
01/1999
[3] A. Chowdhary et.al. A general approach for regularity extraction in datapath circuits. In Proceedings of
the International Conference on Computer Aided Design, p. 332-338, November 1998.
[4] R. F. Damiano, A. Drumm et. al. Method for mapping in logic synthesis by logic classification, U.S.
Patent No. 5,537,330
[5] S.-T. Hui and D.M. Wong, Method for regular placement of data path components in VLSI circuits,
U.S. Patent No. 5,359,538
[6] V.N.Kravets and K.A.Sakallah , M32: A Constructive Multilevel Logic Synthesis System, In
Proceedings of the Design Automation Conference, p. 336-341, June 1998
[7] T. Kutzschebauch. Logic optimization using regularity extraction. In Proceedings of the Internat.
Workshop on Logic Synthesis, p. 264-270, June 1999
[8] Marshburn et. al. Datapath: A CMOS datapath silicon assembler. In Proceedings of the Design
Automation Conference, p. 722-12, 1986
[9] R.X.T. Nijssen and J.A.G. Jess, Two-dimensional datapath regularity extraction. In Proceedings of the
5th ACM/IEEE Physical Design Workshop, Reston, Virginia, 1996.
[10] R.X.T. Nijssen and C.A.J. van Eijk. Regular layout generation of logically optimized datapaths. In
Proceedings of the International Symposium on Physical Design, p. 42-47, 1997.
[11] D.S. Rao and F.J. Kurdahi. On clustering for maximal regularity extraction. In IEEE Transactions on
Computer Aided Design, p. 1198-1208, Aug. 1993
[12] L. Stok, D. Brand, D. Kung et. al. Booledozer: Logic synthesis for ASICs. In IBM Journal of Research
and Development, 40(4), p. 515-547, July 1996
ICCAD2000, Pages 447-450
Timing Driven Gate Duplication: Complexity Issues and Algorithms
Ankur Srivastava, Ryan Kastner, Majid Sarrafzadeh
Department of Electrical and Computer Engineering, Northwestern University,
Evanston, Illinios 60208, USA
Abstract
This paper addresses the issue of timing driven gate duplication for delay optimization.
Gate duplication has been used extensively for cutset minimization but the usefulness in
minimizing the circuit delay has not been addressed. This paper studies the complexity
issues in timing driven gate duplication and proposes an algorithm for solving the so
called global gate duplication problem. Delay improvements over highly optimized
results from SIS have been reported.
References
[1] A. Srivastava, R Kastner and M. Sarrafzadeh. "Complexity Issues in Gate Duplication". In
Workshop Handouts, International Workshop on Logic Synthesis, May 2000.
[2] A. Srivastava, R Kastner and M. Sarrafzadeh. "Gate Duplication for Performance Optimization".
In Internal Memo, Northwestern University, June 2000.
[3] C. Chen and C. Tsui. "Timing Optimization of Logic Network using Gate Duplication". In Proc.
Asia and South Pacific Design Automation Conference, pages 233-236, January 1999.
[4] E.M. Sentovich, K.J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P.R.
Stephan, R.K. Brayton, A.L. Sangiovanni-Vincentelli. SIS: A System for Sequential Circuit
Synthesis. Memorandum No. UCB/ERL M92/41, Department of EECS. UC Berkeley, May 1992.
[5] I. Neumann, D. Stoffel, H. Hartje and W. Kunz. "Cell Replication and Redundancy Elimination
During Placement for Cycle Time Optmization". In Proc. International Conference on Computer
Aided Design, pages 25-30, November 1999.
[6] C. Kring and A.R.Newton. "A Cell replication Approach to Mincut-Based Circuit Partitioning". In
Proc. International Conference on Computer Aided Design, pages 2-5, November 1991.
[7] M. Enos, S. Hauck and M. Sarrafzadeh. "Evaluation and Optimization of Replication Algorithms
for Logic Bipartitioning ". In IEEE Transactions on Computer Aided Design, pages 1237-1248,
September 1999.
[8] R. Murgai. "On the Complexity of Minimum-delay Gate Resizing/Technology Mapping under
Load-dependent Delay Model". In Workshop Handouts, International Workshop on Logic
Synthesis, pages 209-211, June 1999.
[9] R. Murgai. "On the Global Fanout Optimization Problem". In Proc. International Conference on
Computer Aided Design, pages 511-515, November 1999.
ICCAD2000, Pages 451-457
An Exact Gate Assignment Algorithm for Tree Circuits Under Rise and Fall Delays
Arlindo L. Oliveira
Cadence European Labs./IST-INESC
Lisboa, Portugal.
Rajeev Murgai
Fujitsu Laboratories of America, Inc.
Sunnyvale, CA, USA.
Abstract
In most libraries, gate parameters such as the pin-to-pin intrinsic delays, load-dependent
coefficients, and input pin capacitances have different values for rising and falling
signals. The performance optimization algorithms, however, assume a single value for
each parameter.
It is known that under the load-independent delay model, the gate assignment (or
resizing) problem is solvable in time polynomial in the circuit size when a single value is
assumed for each parameter [5]. In the presence of different rise and fall parameter
values, this problem was recently shown to be NP-complete even for chain and tree
topology circuits under the simple load-independent delay model [8]. In this paper, we
propose a dynamic programming algorithm for solving this problem exactly in pseudopolynomial time for tree circuits. More specifically, we show that the problem can be
solved in time proportional to the size of the tree circuit, the number of choices available
in the library for each gate, and the delay of the circuit. To the best of our knowledge, this
is the first pseudo-polynomial exact algorithm for the gate assignment problem for trees
in the presence of different rise and fall delays. We present a straightforward way of
extending this algorithm to general directed acyclic graphs. We present experimental
results on a set of benchmark problems using a standard commercial library and show
that our algorithm generates provably optimum delays for 72 out of 76 circuits. We also
compare our technique with two approaches traditionally used slightly better than these
two. Interestingly, both traditional approaches also yield delays not far from the
optimum.
References
[1] C. L. Berman, J. L. Carter, and K. F. Day. The Fanout Problem: From Theory to Practice. In C. L. Seitz,
editor, Advanced Research in VLSI: Proceedings of the 1989 Decennial Caltech Conference, pages 69-99.
MIT Press, March 1989.
[2] P. Chan. Algorithms for Library-specific Sizing of Combina tional Logic. In DAC, pages 353-356,
1990.
[3] O. Coudert, R. Haddad, and S. Manne. New Algorithms for Gate Sizing: A Comparative Study. In
DAC, pages 734-739, 1996.
[4] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. Mathematical Sciences Series. Freeman, 1979.
[5] Y. Kukimoto, R. K. Brayton, and P. Sawkar. Delay-Optimal Technology Mapping by DAG Covering.
In DAC, pages 348-351, 1998.
[6] J. Lillis, C. K. Cheng, and T. T. Y. Lin. Optimal Wire Sizing and Buffer Insertion for Low Power and a
Generalized Delay Model. In ICCAD, pages 138-143, 1995.
[7] R. Murgai. On The Complexity of Minimum-delay Gate Resizing/Technology Mapping Under LoadDependent Delay Model. In IWLS, pages 209-211, 1999.
[8] R. Murgai. Performance Optimization Under Rise and Fall Parameters. In ICCAD, pages 185-190,
1999.
[9] Lukas P. P. P. van Ginneken. Buffer Placement in Distributed RC-tree Networks for Minimum Elmore
Delay. In ISCAS, pages 865-868, 1990.
[10] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. Sangiovanni-Vincentelli.
Sequential circuit design using synthesis and optimization. In Proceedings of the International Conference
on Computer Design, October 1992.
[11] K. J. Singh. Performance Optimization of Digital Circuits. PhD thesis, UC Berkeley, December 1992.
[12] I. Sutherland and R. Sproul. The Theory of Logical Effort: Designing for Speed on the Back of an
Envelope. In Advanced Research in VLSI, University of California, Santa Cruz, 1991.
[13] H. Touati. Performance-oriented Technology Mapping. PhD thesis, UC Berkeley, November 1990.
UCB/ERL M90/109.
[14] H. Vaishnav and M. Pedram. Routability-Driven Fanout Optimization. In DAC, pages 230-235, 1993.
ICCAD2000, Pages 459-463
Improving the proportion of At-Speed Tests in Scan BIST
Y. Huang 1 I. Pomeranz 2 S.M. Reddy 1 J. Rajski 3
1
2
Department of Electrical & Computer Engineering, University of Iowa, Iowa City, IA 52242
School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN 47907
3
Mentor Graphics Corporation, 8005 S.W. Boeckman Rd., Wilsonville, OR 97070
Abstract
A method to select the lengths of functional sequences in a BIST scheme for scan designs
is proposed in this paper. A functional sequence is a sequence of primary input vectors
applied when the circuit operates as a sequential circuit, without using scan. These
sequences can be applied at-speed, i.e., at the normal circuit clock speed. The objectives
set for choosing the lengths of the functional sequences are to increase the number of
vectors applied at-speed, and to reduce the number of settings of functional sequence
lengths, without compromising the fault coverage achieved. The experimental results
presented demonstrate that compared to earlier methods, the proposed method achieves
the above objectives while also achieving higher fault coverages for most of the
benchmark circuits considered.
References
[1] V.D.Agrawal, C.R.Kime and K.K.Saluja, "A Tutorial on Built-In Self-Test, Part 2: Applications," IEEE
Design & Test of Computers, vol. 10, no. 22 pp.69-77, June 1993.
[2] P. C. Maxwell, R. C. Aitken, V. Johansen, and I. Chiang, “The Effect of Different Test Sets on Quality
Level Prediction: When is 80% Better Than 90%?” Proc. ITC., pp.358-364, Sept. 1991.
[3] H.C.Tsai, K.T. Cheng and S.Bhawmik, "Improving The Test Quality for Scan-Based BIST Using A
General Test Application Scheme," Proc. of DAC, pp.748-753, June 1999.
[4] I. Pomeranz and S. M. Reddy, “On Full Reset as a Design-for-Testability Technique,” VLSI Design, pp.
534-536, Jan. 1997; also in “On the Use of Fully Specified Initial States for Testing of Synchronous
Sequential Circuits,” IEEE Trans. On Computers, pp.175-182, Feb. 2000.
[5] F. Brglez, P. Pownall, and P.Hum, "Application of Testability Analysis: From ATPG to Critical Path
Tracing," Proc. IEEE Int. Test Conf., pp.705-712, Sept. 1984.
ICCAD2000, Pages 464-
Fast Test Application TechniqueWithout Fast Scan Clocks
Seonki Kim and Bapiraju Vinnakota
ECE Department, University of Minnesota, Minneapolis, MN 55455
Abstract
Built-in self-test (BIST) schemes need to set the state of the circuit under test (CUT) for
each test vector applied. The two primary techniques by which the state is set are test-perscan and test-per-clock. In a test-per-scan scheme, circuit states are set using one or more
scan chains. Several scan cycles are required to apply a single test vector. In very large
circuits, the time to apply each test vector may be quite high. The direct option of
reducing test time with a fast scan clock is difficult to realize in practice. In a test-perclock scheme, all circuit flip-flops are loaded in parallel. A new test vector can be applied
in each cycle. The area overhead incurred in accessing each storage element directly is
quite significant. We propose a new Broadcast BIST (B2IST) scheme as a compromise
between the two approaches. B2IST uses time-division multiplexing (TDM) to load
multiple storage elements in a broadcast group in a single clock cycle, but through only a
single scan data input. Based on our B2IST simulation, we compare the layout overhead
and performance of B2IST with that of traditional BIST schemes on ISCAS benchmark
circuits. Thus, B2IST can achieve the performance of a test-per-clock scheme, but only
incur the overhead of a test-per-scan scheme.
REFERENCES
[1] S. Akers, “On the Use of Linear Sums in Exhaustive Testing,” Digest of Papers 15th Annual
International Fault-Tolerant Computing Symp., pp.148-153, June, 1985.
[2] M. Abramovici, M. Breuer, and A. Friedman, Digital Systems Testing and Testable Design, pp.421-448,
Rev. printing, IEEE Press, New York, 1990.
[3] P. Bardell and W. McAnney, “Self-Testing of Multichip Logic Modules,” Digest of Papers 1982
International Test Conf., pp.200-204, November, 1982.
[4] B. Konemann, J. Mucha, and G. Zwiehoff, “Built-In Logic Block Observation Technique,” Digest of
Papers 1979 Test Conf., pp.37-41, October, 1979.
[5] A. Krasniewski and S. Pilarski, “Circular Self-Test Path:A Low-Cost BIST Technique for VLSI
Circuits,” IEEE Trans. on Computer-Aided Design, vol.8, no.1, pp.46-55, January, 1989.
[6] P.H. Bardell and W.H. McAnney, “Parallel Pseudorandom Sequences for Built-In Test,” IEEE
Proceedings of International Test Conf., pp.302-308, October, 1984.
[7] J. Savior, “Scan Latch Design for Delay Test,” IEEE International Test Conf., pp.446-453, 1997.
[8] C-. Lin, Y. Zorin, and S. Bhawmik, “Integration of Partial Scan and Built-In Self-Test”, Journal of
Electronic Testing: Theory and Applications, vol.7, no.1-2, pp.125-137, Aug. 1995.
[9] H-. Tsai, S. Bhawmik, and K-. Cheng, “An Almost Full-Scan BIST Solution- High Fault Coverage and
Shorter Test Application Time,” IEEE Int’l Test Conf., pp.1065-1073, 1998.
[10] F. Gardner, “Charge-pump phase-locked loops,” IEEE Trans. on Communications, vol. COM-28,
pp.1849-1858, November, 1980.
[11] E. Sentovich and et al, “SIS:A system for Sequential Circuit Synthesis,” Electr. Research Lab.
Memorandum No. UCB/ERL M92/41, EECS, University of California, Berkely, CA, May, 1992.
ICCAD2000, Pages 468-471
Error Catch and Analysis for Semiconductor Memories Using March Tests
Chi-Feng Wu, Chih-Tsun Huang, Chih-Wea Wang,
Kuo-Liang Cheng, and Cheng-Wen Wu
Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30013
Abstract
We present an error catch and analysis (ECA) system for semiconductor memories. The
system consists of a test algorithm generator called TAGS, a fault simulator called
RAMSES, and an error analyzer (ERA). We use TAGS to generate a set of test
algorithms of different lengths and diagnostic resolutions for the memory under test, and
use RAMSES to generate the March dictionary for each test algorithm. With the March
dictionaries, ERA is able to support March algorithms for easy diagnosis of faulty RAMs.
Legacy test algorithms also can be reused. When integrated with a RAM tester, our ECA
system can generate RAM bitmaps that are similar to the RAM layout. The bitmaps
provide detail information about the error locations and faults causing the errors. Based
on the information, diagnosis of the RAM chips for yield and reliability improvement can
be done more easily.
References
[1] V. N. Yarmolik, Y. V. Klimets, A. J. van de Goor, and S. N. Demidenko, “RAM diagnostic tests”, in
Proc. IEEE Int. Workshop on Memory Technology, Design and Testing (MTDT), 1996, pp. 100–102.
[2] T. J. Bergfeld, D. Niggemeyer, and E. M. Rudnick, “Diagnostic testing of embedded memories using
BIST”, in Proc. Design, Automation and Test in Europe (DATE), Paris, Mar. 2000, pp. 305–309.
[3] L. Shen and B. F. Cockburn, “An optimal march test for locating faults in DRAMs”, in Proc. IEEE Int.
Workshop on Memory Testing, 1993, pp. 61–66.
[4] C.-F. Wu, C.-T. Huang, K.-L. Cheng, and C.-W. Wu, “Simulation-based test algorithm generation for
random access memories”, in Proc. IEEE VLSI Test Symp. (VTS), Montreal, Apr. 2000, pp. 291–296.
[5] C.-F. Wu, C.-T. Huang, and C.-W. Wu, “RAMSES: a fast memory fault simulator”, in Proc. IEEE Int.
Symp. Defect and Fault Tolerance in VLSI Systems (DFT), Albuquerque, Nov. 1999, pp. 165–173.
[6] A. J. van de Goor, Testing Semiconductor Memories: Theory and Practice, John Wiley & Sons,
Chichester, England, 1991.
[7] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems Testing and Testable Design,
Computer Science Press, New York, 1990.
[8] R. Dekker, F. Beenker, and L. Thijssen, “Fault modeling and test algorithm development for static
random access memories”, in Proc. Int. Test Conf. (ITC), 1988, pp. 343–352.
ICCAD2000, Pages 472-475
Diagnosis of Interconnect Faults in Cluster-Based FPGA Architectures
Ian Harris and Russell Tessier
Department of Electrical and Computer Engineering, University of Massachusetts at Amherst
Abstract
Fault diagnosis has particular importance in the context of field programmable gate
arrays (FPGAs) because faults can be avoided by reconfiguration at almost no real cost.
Cluster-based FPGA architectures, in which several logic blocks are grouped together
into a coarse-grained logic block, are rapidly becoming the architecture of choice for
major FPGA manufacturers. The high density interconnect found within clusters greatly
complicates the problem of FPGA diagnosis. We propose a technique for the testing and
diagnosis of cluster-based FPGA architectures. We present a hierarchical approach to
define a set of FPGA configurations in which each fault is detectable, and each fault pair
is differentiable. The cornerstone of this work is the concise expression of the
distinguishing conditions of each fault pair. Experimental results demonstrate that nearly
100% fault coverage and diagnostic resolution are achieved with a low number of test
configurations.
REFERENCES
[1] Virtex data sheet. Xilinx Corporation, 2000.
[2] M. Abramovici and P. R. Menon. A practical approach to fault simulation and test generation for
bridging faults. IEEE Transactions on Computers, C-34(7):658–663, July 1985.
[3] M. Abramovici, C. Stroud, C. Hamilton, S. Wijesuriya, and V. Verma. Using roving STARs for on-line
testing and diagnosis of FPGAs in fault-tolerant applications. In International Test Conference, September
1999.
[4] V. Betz, J. Rose, and A. Marquardt. Architecture and CAD for Deep-Submicron FPGAs. Kluwer
Academic Publishers, 1999.
[5] S. D. Brown, R. J. Francis, J. Rose, and Z. G. Vranesic. Field-Programmable Gate Arrays. Kluwer
Academic Publishers, 1992.
[6] Altera Corporation. Altera Apex Data Sheet. 2000.
[7] I. G. Harris and R. Tessier. Interconnect testing in cluster-based FPGA architectures. In Design
Automation Conference, June 2000.
[8] V. Lakamraju and R. Tessier. Tolerating operational faults in cluster-based FPGAs. In 8th International
ACM/SIGDA Symposium on Field Programmable Gate Arrays, February 2000.
[9] M. Renovell, J. M. Portal, J. Figueras, and Y. Zorian. SRAM-based FPGAs: Testing the LUT/RAM
modules. In International Test Conference, pages 1102–1111, October 1998.
[10] M. Renovell, J. M. Portal, J. Figueras, and Y. Zorian. Testing the interconnect of RAM-based FPGAs.
IEEE Design & Test of Computers, 15(1):45–50, January-March 1998.
[11] C. Stroud, S. Wijesuriya, C. Hamilton, and M. Abramovici. Built-in self-test of FPGA interconnect. In
International Test Conference, pages 404–411, October 1998.
[12] M. J. Y. Williams and J. B. Angel. Enhancing testability of large-scale integrated circuits via test
points and additional logic. IEEE Transactions on Computers, C-22(1):46–60, January 1973.
[13] L. Zhao, D. M. H. Walker, and F. Lombardi. Bridging fault detection in FPGA interconnects using
iDDQ. In International Symposium on Field Programmable Gate Arrays, pages 95–104, February 1998.
ICCAD2000, Pages 477-480
Fast Analysis and Optimization of Power/Ground Networks
Haihua Su, Kaushik Gala, Sachin S. Sapatnekar
Department of Electrical and Computer Engineering, University of Minesota,
Minneapolis, MN 55455, USA
Abstract
This paper presents an efficient method for optimizing power/ground (P/G) networks. It
proposes a structured skeleton that is intermediate to the conventional method that uses
full meshes (which are hard to analyze efficiently), and tree-structured P/G networks
(which provide poor performance). As an example, we consider a P/G network structure
modeled as an overlying mesh with underlying trees originating from the mesh, which
eases the task of analysis with acceptable performance sacrifices. A fast and efficient
event-driven P/G network simulator is proposed, which hierarchically simulates the P/G
network with an adaptation of PRIMA to handle non-zero initial conditions. An adjoint
network that incorporates the variable topology of the original P/G network, as elements
switch in and out of the network, constructed to calculate the transient adjoint sensitivity
over multiple intervals. These are used to drive a sensitivity-based heuristic optimization
method. Experimental results show that this procedure can be used to efficiently optimize
large networks.
References
[CB90] S. Chowdhry and J. S. Barkatullah, “Estimation of maximum currents in MOS IC logic circuits,”
IEEE Transactions on Computer-Aided Design, vol. 9, pp. 642-654, June 1990.
[CL97] H. H. Chen and D. D. Ling, “Power Supply Noise Analysis Methodology for Deep-Submicron
VLSI Chip Design,” Proceedings of the ACM/IEEE Design Automation Conference, pp. 638-643, 1997.
[FD85] J. P. Fishburn and A. E. Dunlop, “TILOS: A posynomial programming approach to transistor
sizing,” Proceedings of the 1985 International Conference on Computer-Aided Design, pp. 326-328, 1985.
[FNDR91] P. Feldmann, T. V. Nguyen, S. W. Director and R. A. Rohrer, “Sensitivity Computation in
Piecewise Approximate Circuit Simulation,” IEEE Transactions on Computer-Aided Design, vol. 10, pp.
171-183, Feb. 1991.
[MK92] T. Mitsuhashi and E. S. Kuh, “ Power and Ground Network Topology Optimization for Cell-based
VLSIs,” Proceedings of the ACM/IEEE Design Automation Conference, pp. 524-529, 1992.
[OCP98] A. Odabasioglu, M. Celik, and L. T. Pilleggi, “PRIMA: Passive reduced-order interconnect
macromodeling algorithm,” IEEE Transactions on Computer-Aided Design, vol. 17, pp. 645-654, Aug.
1998.
[PRV94] L. T. Pillage, R. A. Rohrer and C. Visweswariah, “Electronic Circuit and System Simulation
Methods,” McGraw-Hill, New York, NY, 1994.
[RGP91] C. L. Ratzlaff, N. Gopal, and L. T. Pillage “RICE: Rapid interconnect circuit evaluator,”
Proceedings of the ACM/IEEE Design Automation Conference, pp. 555-561, 1991.
[Rud87] W. Rudin, Real and Complex Analysis, 3rd edition, McGraw-Hill, New York, NY, 1987.
[SH90] D. Stark and M. Horowitz, “Techniques for calculating currents and voltages in VLSI power supply
networks,” IEEE Transactions on Computer-Aided Design, vol. 9, pp. 126-132, Feb. 1990.
[SRC95] B. R. Stanisic, R. A. Rutenbar, and L. R. Carley, “Addressing noise decoupling in mixed-signal
IC’s: Poer distribution design and cell customization,” IEEE Journal of Solid-State Circuits, vol. 30, pp.
321-326, Mar. 1995.
[SVR+94] B. R. Stanisic, N. K. Verghese, R. A. Rutenbar, L. R. Carley, and D. J. Allstot, “Addressing
substrate coupling in mixedmode IC’s: Simulation and power distribution synthesis,” IEEE Journal of
Solid-State Circuits, vol. 29, pp. 226-238, Mar. 1994.
[SYSH98] J. C. Shah, A. A. Younis, S. S. Sapatnekar, and M. M. Hassoun, “An algorithm for simulating
power/ground networks using Padé approximants and its symbolic implementation,” IEEE Transactions on
Circuits and Systems I: Fundamental Theory and Applications, vol. 45, pp. 1372-1382, Oct. 1998.
[TSL+99] X. Tan, C. J. R. Shi, D. Lungeanu, J. Lee and L. Yuan, “Reliability-Constrained Area
Optimization of VLSI Power/Ground Networks Via Sequence of Linear Programmings,” Proceedings of
the ACM/IEEE Design Automation Conference, pp. 78-83, 1999.
ICCAD2000, Pages 481-486
Simulation and Optimization of the Power Distribution Network in VLSI Circuits
G. Bai, S. Bobba and I. N. Hajj
Coordinated Science Lab & ECE Dept., University of Illinois at Urbana-Champaign
Urbana, Illinois 61801
Abstract
In this paper, we present simulation techniques to estimate the worst-case voltage
variation using a RC model for the power distribution network. Pattern independent
maximum envelope currents are used as a periodic input for performing the frequencydomain steady-state simulation of the linear RC circuit to evaluate the worst-case
instantaneous voltage drop for the RC power distribution networks. The proposed
technique unlike existing techniques, is guaranteed to give the maximum voltage drop at
nodes in the RC power distribution network. We present experimental results to compare
the frequency-domain and time-domain simulation techniques for estimating the
maximum instantaneous voltage drop. We also present frequency domain sensitivity
analysis based decoupling capacitance placement for reducing the voltage variation in the
power distribution network. Experimental results on circuits extracted from layout are
presented to validate the simulation and optimization techniques.
References
[1] D. Stark and M. Horowitz, “Techniques for calculating currents and voltages in VLSI power supply
networks," IEEE transactions on CAD, vol. 9, no. 2, pp. 126-132, Feb. 1990.
[2] A. Dalal, L. Lev and S. Mitra, “Design of an efficient power distribution network for the UltarSPARC-I
microprocessor," in Proc. of ICCD, pp. 118-123, 1995.
[3] G. Steele, D. Overhauser, S. Rochel and S. Z. Hussain, “Fullchip verification methods for DSM power
distribution systems," in Proc. of DAC, pp. 744-749, 1998.
[4] A. Dharchoudhury, R. Panda, D. Blaauw and R. Vaidyanathan, “Design and analysis of power
distribution networks in PowerPC microprocessor," in Proc. of DAC, pp. 738-743, June 1998.
[5] H. H. Chen and D. D. Ling, “Power supply noise analysis methodology for deep-submicron VLSI chip
design," in Proc. of DAC, pp. 638-643, 1997.
[6] Y.-M. Jiang, K.-T. Cheng and A.-C. Deng, “Estimation of maximum power supply noise for deep submicron designs," in Proc. of ISLPED, pp. 233-238, Aug. 1998.
[7] G. Bai, S. Bobba and I. N. Hajj, “Power bus maximum voltage drop in digital VLSI circuits", in Proc.
of ISQED, pp. 263-268, Mar. 2000.
[8] S. Chowdhury and J. Barkatullah, “Estimation of maximum currents in MOS IC logic circuits," IEEE
Trans. on CAD, vol. 9, no. 6, pp. 642-654, June 1990.
[9] A. Krstic and K.-T. Cheng ”Vector generation for maximum instantaneous current through supply lines
for CMOS circuits," in Proc. of 34th DAC, pp. 383-388, June 1997.
[10] H. Kriplani, F. N. Najm and I. N. Hajj, “Pattern independent maximum current estimation in power
and ground buses of CMOS VLSI circuits: Algorithms, signal correlations, and their resolution," IEEE
transactions on CAD, vol. 14. no. 8, pp. 998-1012, Aug. 1995.
[11] S. Bobba and I. N. Hajj ”Estimation of maximum current envelope for power bus analysis and design,"
in Proc. of ISPD, pp. 141-146, Apr. 1998.
[12] K. S. Kundert, J. K. White and A. Sangiovanni-Vincentelli, Steady-state methods for simulating
analog and microwave circuits. Boston, MA: Kluwer Academic Publishers, 1990.
[13] J. Vlach and K. Singhal, Computer methods for circuit analysis and design. New York, NY: Van
Nostrand Reinhold, 1983.
ICCAD2000, Pages 487-492
Frequency Domain Analysis of Switching Noise on Power Supply Network
Shiyou Zhao, Kaushik Roy and Cheng-Kok Koh
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907
ABSTRACT
In this paper, we propose an approach for the analysis of power supply noise in the
frequency domain for power/ground (P/G) networks of tree topologies. We model the
P/G network as a linear time invariant (LTI) pseudo-distributed RLC network and the
gates (or cells) as time-varying current sources. Voltage fluctuation caused by the
switching events is calculated based on the effective impedances seen by the
corresponding current sources and the spatial correlation between the nodes of the power
network. Superposition is applied to the LTI system to obtain the overall noise spectrum
at any node of the power supply network. Inverse Fast Fourier Transformation (IFFT) is
then performed on the frequency domain noise spectrum to obtain the time domain noise
waveform. The proposed algorithm has a complexity of O(n2). Experimental results show
that our approach can produce accurate noise waveforms.
REFERENCES
[1] J. G. Proakis, D. G. Manolakis, Digital Signal Processing, 3rd, NJ.: Prentice-Hall, 1996.
[2] J. M. Rabaey, Digital Integrated Circuits, NJ.: Prentice-Hall, 1996.
[3] Blaauw, Cong, Tsay, “IR-Drop Analysis and Signal Net Noise Analysis,” DAC-97 Tutorial, June, 1997.
[4] H. H. Chen and S. E. Schuster, “On-Chip Decoupling Capacitor Optimization for high-performance
VLSI design,” Intl. Symposium on VLSI Technology, Systems, and Application, Proc. pp.99-103, 1995.
[5] T. Sakurai, A. Newton,”Delay Analysis of Series-Connected MOSFET Circuits,” IEEE Journal of Solid
State Circuits, Vol.26, No.2, pp. 122-131, Feburary, 1991.
[6] H. H. Chen and D. D. Ling, “Power Supply AnalysisMethodology for Deep-Submicron VLSI Chip
Design,” ACM/IEEE Design Automation Conf., pp. 638-643, June, 1997.
[7] Y.-S. Chang, S. K. Gupta, and M. A. Breuer, “Analysis of Ground Bounce in Deep-Submicron
Circuits,” Proc. of 15th IEEE VLSI Test Symposium, pp. 110-116, April, 1997.
[8] Y.-M. Jiang, K.-T. Cheng and A.-C. Deng, “Estimation of Maximum Power Supply Noise for Deep
Sub-Micron Designs,” ISLPED 98., pp. 233-238, August, 1998.
[9] K.-H. Erhard, F.M. Johannes and R. Dachauer, “Topology Optimization Techniques for Power/Ground
Networks in VLSI,” Proc. European Design Automation Conference, pp. 362-367, 1992.
[10] T. Cormen, C. Leiserson, R. Rivest, Introduction to Algorithms, MA.: MIT Press, 1990.
[11] G. Strang, Linear Algebra and Its Application, 3rd, Saunders College Publishing, 1988.
[12] M. Zhao, R. Panda, S. Sapatnekar, T. Edwards, R. Chaudhry, D. Blaauw, “Hierarchical Analysis of
Power Distribution Network,” Proc. Design Automation Conference, pp. 150-155, 2000.
[13] S. Nassif, J. Kozhaya, “Fast Power Grid Simulation,” Proc. Design Automation Conference, pp. 156161, 2000.
[14] S. Zhao, K. Roy, C. Koh, “Estimation of Inductive and Resistive Switching Noise on Power Supply
Network in Deep Sub-micron CMOS Circuits,” Proc. of International Conference on Computer Design,
2000, to appear.
ICCAD2000, Pages 493-496
Path Selection and Pattern Generation for Dynamic Timing Analysis
Considering Power Supply Noise Effects
Jing-Jia Liou, Angela Krstić, Yi-Min Jiang* and Kwang-Ting Cheng
Electrical and Computer Engineering Department, University of California, Santa Barbara
*Synopsys Inc., Mountain View, CA 94043
Abstract
Noise effects such as power supply and crosstalk can significantly affect the performance
of deep submicron designs. These delay effects are highly input pattern dependent.
Existing path selection and timing analysis techniques cannot capture the effects of noise
on cell/interconnect delays. Therefore, the selected critical paths may not be the longest
paths and predicted circuit performance might not reflect the worst-case circuit delay. In
this paper, we propose a path selection technique that can consider power supply noise
effects on the propagation delays. Next, for the selected critical paths, we propose a
pattern generation technique for dynamic timing analysis such that the patterns produce
the worst-case power supply noise effects on the delays of these paths. Our experimental
results demonstrate the difference in estimated circuit performance for the case when
power supply noise effects are considered vs. when these effects are ignored. Thus, they
validate the need for considering power supply noise effects on delays during path
selection and dynamic timing analysis.
References
[1] Y.-M. Jiang and K.-T. Cheng. Analysis of Performance Impact Caused by Power Supply Noise in Deep
Submicron Devices. Proc. DAC, pages 760–765, June 1999.
[2] Synopsys. PrimeTime User Guide. May 1999.
[3] A. Krsti´c, Y.-M. Jiang, and K.-T. Cheng. Delay Testing Considering Power Supply Noise Effects.
Proc. ITC, pages 181–190, September 1999.
[4] J.-J. Liou, A. Krsti´c, K.-T. Cheng, D. Mukherjee, and S. Kundu. Performance Sensitivity Analysis
Using Statistical Methods and Its Applications to Delay Testing. Proc. ASP DAC, pages 587–592, January
2000.
[5] R. Senthinathan and J. L. Prince. Simultaneous Switching Noise of CMOS Devices and Systems. Kluwer
Academic Publishers, Boston, MA, 1997.
[6] Y.-S. Chang, S. K. Gupta, and M. A. Breuer. Analysis of Ground Bounce in Deep Sub-Micron Circuits.
Proc. VTS, pages 110–116, April 1997.
[7] H. H. Chen and D. D. Ling. Power Supply Noise Analysis Methodology for Deep Submicron VLSI
Chip Design. Proc. DAC, pages 638–643, June 1997.
[8] Y.-M. Jiang, K.-T. Cheng, and A.-C. Deng. Estimation of Maximum Power Supply Noise for Deep
Sub-Micron Designs. Proc. of ISLPED, pages 233–238, August 1998.
[9] D. E. Goldberg and R. Burch. Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading, MA, 1989.
[10] H. Edamatsu, K. Homma, M. Kakimoto, Y. Koike, and K. Tabuchi. Pre-Layout Delay Calculation
Specification for CMOS ASIC Libraries. Proc. ASP DAC, pages 241–248, February 1998.
[11] Y.-M. Jiang, K.-T. Cheng, and A. Krsti´c. Estimation of Maximum Power and Instantaneous Current
Using A Genetic Algorithm. Proc. CICC, pages 135–138, May 1997.
[12] Synopsys. PowerMill Reference Manual. May 1999.
ICCAD2000, Pages 498-503
Power Exploration for Embedded VLIW Architectures
Mariagiovanna Sami, Donatella Sciuto, Cristina Silvano, Vittorio Zaccaria
Politecnico di Milano, Dip. di Elettronica e Informazione, 20133 Milano, ITALY
Abstract
In this paper, we propose a system-level power exploration methodology for embedded
VLIW architectures based on an instruction-level analysis. The instruction-level energy
model targets a general pipeline scalar processor; several architectural parameters such as
number and type of pipeline stages as well as average stall/latency cycles per instruction
and inter-instruction effects are taken into account. The application of the proposed
model to VLIW processors results intractable from the point of view of both spatial and
temporal complexity (which grow exponentially w.r.t. the number of possible operations
in the ISA). To reduce this complexity, the basic model has been extended by assuming
that the energy associated with a long instruction is given by the sum of the energy
associated with the single operations of the long instruction and the single pipeline stages.
The instruction-level energy model has been applied to a simplified VLIW architecture to
demonstrate the validity of the proposed approach.
References
[1] A. Chandrakasan and R. Brodersen, “Minimizing Power Consumption in Digital CMOS Circuits," Proc.
of IEEE, 83(4), pp. 498-523, 1995.
[2] M. Sami, D. Sciuto, C. Silvano and V. Zaccaria, “InstructionLevel Power Estimation for Embedded
VLIW Cores," CODES 2000: Eighth Int. Workshop on Hardware/Software Codesign, pp. 34-38, San
Diego, CA, May 3-5, 2000.
[3] V. Tiwari, S. Malik and A. Wolfe, “Power Analisys of Embedded Software: A First Step Towards
Software Power Minimization," IEEE Trans. VLSI Systems, pp. 437-445, Dec. 1994.
[4] M. T.-C. Lee, V. Tiwari, S. Malik and M. Fujita, “Power Analisys and Minimization Techinques for
Embedded DSP Software," IEEE Trans. VLSI System, pp. 123-135, Mar. 1997.
[5] J. T. Russel and M. F. Jacome, “Software Power Estimation for High Performance 32-bit Embedded
Processors," Proc. of ICCD '98.
[6] D. Sarta, D. Trifone and G. Ascia, “A Data Dependent Approach to Instruction Level Power
Estimation," Low Power Design, 1999 Proc. IEEE Alessandro Volta Memorial Workshop on, pp. 182-190,
Como, Italy, Mar. 1999.
[7] B. Klass, D. E. Thomas, H. Schmit and D. F. Nagle “Modeling Inter-Instruction Energy Effects in a
Digital Signal Processor," Proc. of ISCAS '98.
[8] J. Hennessy and D. A. Patterson, “Computer Architecture: A Quantitative Approach," Morgan
Kaufmann Publishers, San Mateo, CA, Second Edition, 1996.
[9] K. Masselos et al. “Interaction between Sub-word Prallelism Exploitation and Low Power Code
Transformations for VLIW Multi-media Processors," Low Power Design, 1999 Proc. IEEE Alessandro
Volta Memorial Workshop on, pp. 52-60, Como, Italy, March 1999.
[10] H. Mehta et al. “Techniques for Low Energy Software," Proc. of ISLPED-97: ACM/IEEE Int.
Symposium on Low Power Electronics and Design, pp. 72-75, Monterey, CA, August 1997.
[11] E. Macii, "Sequential Synthesis and Optimization for Low Power", in Low Power Design in Deep
Submicron Electronics, NATO ASI Series, Series E: Applied Sciences, Vol. 337, Kluwer Academic
Publisher, pp. 321-353, 1997.
[12] G. Goossens et al. “Embedded Software in Real-Time Signal Processing Systems: Design
Technologies ," Proc. of IEEE, Vol. 35, No. 3, March 1997.
[13] Trimaran Home Page, http://www.trimaran.org
[14] M. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches," Proc. of
ISLPED-97: ACM/IEEE Int. Symposium on Low Power Electronics and Design, pp. 143-148, Monterey,
CA, August 1997.
[15] H. Corporaal, “Microprocessor Architectures from VLIW to TTA," John Wiley and Sons, Chichester,
England.
ICCAD2000, Pages 504-510
Exploring Performance Tradeoffs FOR Clustered VLIW ASIPS
Margarida F. Jacome, Gustavo de Veciana and Viktor Lapinskii
Department of Electrical and Computer Engineering
University of Texas, Austin, TX 78712
Abstract
VLIW ASIPs provide an attractive solution for increasingly pervasive real-time
multimedia and signal processing embedded applications. In this paper we propose an
algorithm to support trade-off exploration during the early phases of the
design/specialization of VLIW ASIPs with clustered datapaths. For purposes of an early
exploration step, we define a parameterized family of clustered datapaths D(m,n), where
m and n denote interconnect capacity and cluster capacity constraints on the family.
Given a kernel, the proposed algorithm explores the space of feasible clustered datapaths
and returns: a datapath configuration; a binding and scheduling for the operations: and a
corresponding estimate for the best achievable latency over the specified family.
Moreover, we show how the parameters m and n, as well as a target latency optionally
specified by the designer, can be used to effectively explore trade-offs among delay,
power/energy, and latency. Extensive empirical evidence is provided showing that the
proposed approach is strikingly effective at attacking this complex optimization problem.
References
[1] D.C. Burger and J.R. Goodman. Billion-transistor architectures. IEEE Computer, 30(9), 1997.
[2] C.M.Chu and J.Rabaey. Hardware selection and clustering in the HYPER synthesis system. In Proc.
IEEE European Conference of Design Automation, March 1992.
[3] G. de Micheli. Synthesis and Optimization of Digital Ciruits. McGraw-Hill, Inc, 1994.
[4] D.S.Rao and F.J.Kurdahi. Partitioning by regularity extraction. In Proc. IEEE/ACM Design Automation
Conference, June 1992.
[5] W. Geurts, F. Catthor, S. Vernalde, and H. DeMan. Accelerator Data-Path Synthesis for HighThroughput Signal Processing Applications. Kluwer Academic Publishers, 1997.
[6] E. Ifeachor and B. Jervis. Digital signal processing: A practical approach. Addison-Wesley, 1993.
[7] M. Jacome and G. de Veciana. Lower bound on latency for VLIW ASIPs. In Proc. of ACM/IEEE
International Conference on Computer Aided Design (ICCAD), Nov 1999.
[8] K.R.Rao and P.Yip. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic
Press, 1990.
[9] C. Liem. Retargetable compilers for embedded core processors. Kluwer Academic Publishers, 1997.
[10] P. Marwedel and Gert Goossens, editors. Code Generation for Embedded Processors. Kluwer
Academic Publishers, 1995.
[11] NOVA Project: ASIPs and retargetable compilers; CAD for embedded systems. Department of ECE,
U.T. Austin. http://horizon.ece.utexas.edu/. jacome/nova/.
[12] M. Rim and R. Jain. Lower bound performance estimation for the high-level synthesis scheduling
problem. IEEE Trans. on CAD of ICs and Systems, 13(4):451–58, 1994.
[13] S. Rixner, W. Dally, U. Kapasi, B. Khailany, A. Lopez-Lagunas, P. Mattson, and J. Owens. A
bandwidth-efficient architecture for media processing. In Proc. 31st Annual International Symposium on
Microarchitecture, pages 3–13., Nov.-Dec. 1998.
[14] S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. Register organization for
media processing. In Proc. 26th International Symposium on High-Performance Computer Architecture,
May 1999.
[15] E. A. Rundensteiner, D. Gajski, and L. Bic. Component synthesis from functional descriptions. IEEE
Transactions on Computer Aided Design, 12(9), 1993.
[16] G. Tiruvuri and M. Chung. Estimation of lower bounds in scheduling algorithms for high-level
synthesis. ACM Trans. on DAES (TODAES), 3(2):162–80, 1998.
ICCAD2000, Pages 511-518
Synthesis of Operation-Centric Hardware Descriptions
James C. Hoe
Dept. of Electrical and Computer Engineering, Carnegie Mellon University
Arvind
Laboratory for Computer Science, Massachusetts Institute of Technology
Abstract
Most hardware description frameworks, whether schematic or textual, use cooperating
finite state machines (CFSM) as the underlying abstraction. In the CFSM framework, a
designer explicitly manages the concurrency by scheduling the exact cycle-by-cycle
interactions between multiple concurrent state machines. Design mistakes are common in
coordinating interactions between two state machines because transitions in different
state machines are not semantically coupled. It is also difficult to modify one state
machine without considering its interaction with the rest of the system.
This paper presents a method for hardware synthesis from an "operation centric"
description, where the behavior of a system is described as a collection of "atomic"
operations in the form of rules. Typically, a rule is defined by a predicate condition and
an effect on the state of the system. The atomicity requirement simplifies the task of
hardware description by permitting the designer to formulate each rule as if the rest of the
system is static.
An implementation can execute several rules concurrently in a clock cycle, provided
some sequential execution of those rules can reproduce the behavior of the concurrent
execution. In fact, detecting and scheduling valid concurrent execution of rules is the
central issue in hardware synthesis from operation-centric descriptions. The result of this
paper shows that an operation-centric framework offers significant reduction in design
time, without loss in implementation quality.
References
[1] Arvind and X. Shen. Using term rewriting systems to design and verify processors. IEEE Micro Special
Issue on Modeling and Validation of Microprocessors, May 1999.
[2] F. Baader and T. Nipkow. Term Rewriting and All That. Cambridge University Press, 1998.
[3] J. Babb, M. Rinard, C. A. Moritz, W. Lee, M. Frank, R. Barua, and S. Amarasinghe. Parallelizing
applications into silicon. In Proceedings of the 7th IEEE Symposium on Field-Programmable Custom
Computing Machines (FCCM’99), Napa Valley, CA, April 1999.
[4] G. Berry. The foundations of Esterel. In Proof, Language and Interaction: Essays in Honour of Robin
Milner. MIT Press, 1998.
[5] D. D. Gajski, J. Zhu, R. D¨omer, A. Gerslauer, and S. Zhao. SpecC Specification Language and
Methodology. Kluwer Academic Publishers, 2000.
[6] D. Galloway. The Transmogrifier C hardware description language and compiler for FPGAs. In
Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines (FCCM’95), Napa Valley,
CA, April 1995.
[7] M. Gokhale and E. Gomersall. High level compilation for fine grained FPGAs. In Proceedings of the
IEEE Symposium on FPGA-based for Custom Computing Machines (FCCM’97), Napa Valley, CA, April
1997.
[8] M. Gokhale and R. Minnich. FPGA computing in a data parallel C. In Proceedings of IEEE Workshop
on FPGAs for Custom Computing Machines (FCCM’93), Napa Valley, CA, April 1993.
[9] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan
Kaufmann, 2nd edition, 1996.
[10] J. C. Hoe. Operation-Centric Hardware Description and Synthesis. PhD Thesis, Massachusetts
Institute of Technology, June 2000.
[11] J. C. Hoe and Arvind. Hardware synthesis from term rewriting systems. In Proceedings of X IFIP
International Conference on VLSI (VLSI 99), Lisbon, Portugal, November 1999.
[12] G. Kane. MIPS R2000 RISC Architecture. Prentice Hall, 1987.
[13] L. Lavagno and E. Sentovich. ECL: A specification environment for system-level design. In
Proceedings of the 36th ACM/IEEE Design Automation Conference (DAC’99), New Orleans, LA, June
1999.
[14] S. Liao, S. Tjinag, and R. Gupta. An efficient implementation of reactivity for modeling hardware in
the Scenic design environment. In Proceedings of the 34th ACM/IEEE Design Automation Conference
(DAC’97), Anaheim, CA, June 1997.
[15] J. Matthews, J. Launchbury, and B. Cook. Microprocessor specification in Hawk. In Proceedings of
the 1998 International Conference on Computer Languages, Chicago, IL, 1998.
[16] Stanford University. HardwareC – A Language for Hardware Design, December 1990.
[17] D. E. Thomas, J. K. Adams, and H. Schmit. A model and methodology for hardware-software
codesign. IEEE Design and Test of Computers, September 1993.
[18] D. E. Thomas and P. R. Moorby. The Verilog Hardware Description Language. Kluwer Academic
Publishers, 3rd edition, 1996.
[19] P. J. Windley. Verifying pipelined microprocessors. In Proceedings of the 1995 IFIP Conference on
Hardware Description Languages and their Applications (CHDL’95), Tokyo, Japan, 1995.
ICCAD2000, Pages 520-525
Don’t Cares and Multi-Valued Logic Network Minimization
Yunjian Jiang, Robert K. Brayton
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley
Abstract
We address optimizing multi-valued (MV) logic functions in a multi-level combinational
logic network. Each node in the network, called an MV-node, has multi-valued inputs
and single multi-valued output. The notion of don't cares used in binary logic is
generalized to multi-valued logic. It contains two types of flexibility: incomplete
specification and non-determinism. We generalize the computation of observability don't
cares for a multi-valued function node. Methods are given to compute (a) the maximum
set of observability don't cares, and (b) the compatible set of observability don't cares for
each MV-node. We give a recursive image computation to transform the don't cares into
the space of local inputs of the node to be minimized. The methods are applied to some
experimental multi-valued networks, and demonstrate reduction in the size of the tables
that represent multi-valued logic functions.
References
[1] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan,
R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “SIS: A System for Sequential Circuit Synthesis,” Tech.
Rep. UCB/ERL M92/41, Electronics Research Laboratory, Univ. of California, Berkeley, CA 94720, May
1992.
[2] R. K. Brayton and F. Somenzi, “Minimization of Boolean Relations,” in Proc. of the Intl. Symposium
on Circuits and Systems, pp. 738–743, May 1989.
[3] Y. Watanabe and R. K. Brayton, “Heuristic minimization of multiple-valued relations,” IEEE
Transaction on CAD, 1993.
[4] B. Lin and F. Somenzi, “Minimization of Symbolic Relations,” in Proc. of the Intl. Conf. on ComputerAided Design, pp. 88–91, Nov. 1990.
[5] A. Ghosh, S. Devadas, and A. R. Newton, “Heuristic minimization of boolean relations using testing
techniques,” IEEE Transaction on CAD, 1992.
[6] H. Savoj, R. K. Brayton, and H. Touati, “Extracting Local Don’t Cares for Network Optimization,” in
Proc. of the Intl. Conf. on Computer-Aided Design, pp. 514–517, Nov. 1991.
[7] H. Savoj, Don’t Cares in Multi-Level Network Optimization. PhD thesis, University of California
Berkeley, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, CA
94720, May 1992.
[8] M. Damiani and G. D. Micheli, “Observability don’t care sets and boolean relations,” in Proc. of the
Intl. Conf. on Computer-Aided Design, 1990.
[9] Y. Watanabe, L. M. Guerra, and R. K. Brayton, “Permissible functions for multioutput components in
combinational logic optimization.,” IEEE Transaction on CAD, 1996.
[10] R. K. Brayton, “Algebraic methods for multi-valued logic,” Tech. Rep. UCB/ERL M99/62,
Electronics Research Laboratory, University of California, Berkeley, Dec. 1999.
[11] M. Gao and R. K. Brayton, “Semi-algebraic methods for multi-valued logic,” in Proc. of the Intl.
Workshop on Logic Synthesis, May. 2000.
[12] A. Srinivasan, T. Kam, S. Malik, and R. K. Brayton, “Algorithms for Discrete Function
Manipulation,” in Proc. of the Intl. Conf. on Computer-Aided Design, pp. 92–95, Nov. 1990.
[13] T. Kam and R. K. Brayton, “Multi-valued deisoin diagrams,” Tech. Rep. UCB/ERL M90/125,
Electronics Research Lab, Univ. of California, Berkeley, CA 94720, Dec. 1990.
[14] VISgroup, “VIS: A system for verification and synthesis,” in IEEE International Conference on
Computer-Aided Verification, 1996.
[15] R. L. Rudell and A. Sangiovanni-Vincentelli, “Multiple-valued minimization for pla optimization,”
IEEE Transaction on CAD, 1988.
ICCAD2000, Pages 526-532
Generalized Symmetries in Boolean Functions
Victor N. Kravets, Karem A. Sakallah
Department of Electrical Engineering and Computer Science, University of Michigan,
Ann Arbor, MI 48109
Abstract
In this paper we take a fresh look at the notion of symmetries in Boolean functions. Our
studies are motivated by the fact that the classical characterization of symmetries based
on invariance under variable swaps is a special case of a more general invariance based
on unrestricted variable permutations. We propose a generalization of classical symmetry
that allows for the simultaneous swap of ordered and unordered groups of variables, and
show that it captures more of a function's invariant permutations without undue
computational requirements. We apply the new symmetry definition to analyze a large set
of benchmark circuits and provide extensive data showing the existence of substantial
symmetries in those circuits. Specific case studies of several of these benchmarks reveal
additional insights about their functional structure and how it might be related to their
circuit structure.
References
[1] R. K. Brayton and C. McMullen. The decomposition and factorization of Boolean expressions. In Proc.
IEEE Int. Symp. on Circuits and Systems, pp. 49–54, May 1982.
[2] R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Trans. on Computers,
C-35(6):677–691, August 1986.
[3] D. I. Cheng and M. Marek-Sadowska.Verifying equivalence of functions with unknown input
correspondence. In Proc. European Design Automation Conf., pp. 81–85, February 1993.
[4] C. R. Edwards and S. L. Hurst. A digital synthesis procedure under function symmetries and mapping
methods. IEEE Trans. on Computers, C-27:985–997, 1978.
[5] M. Hall. The Theory of Groups. Macmillan, New York, 1957.
[6] G. D. Hachtel and F. Somenzi. Logic Synthesis and Verification Algorithms. Kluwer Academic
Publishers, 1996.
[7] M. C. Hansen, H. Yalcin, and J. P. Hayes.Unveiling theISCAS-85 benchmarks: A case study in reverse
engineering. IEEE Design and Test of Computers, 16(3):72-80, July 1999.
[8] U. Hinsberger and R. Kolla. Boolean matching for large libraries. In Proc. Design Automation Conf.,
pp. 206–211, June 1998.
[9] S.-W. Jeong and T.-S. Kim and F. Somenzi. An efficient method for optimal BDD ordering
computation. In Proc. Int. Conf. on VLSI and CAD, November 1993.
[10] B.-G. Kim and D. L. Dietmeyer. Multilevel logic synthesis of symmetric switching functions, IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, 10(4):436-446, April 1991.
[11] V. N. Kravets and K. A. Sakallah. Constructive library-aware synthesis using symmetries. In Proc.
Design, Automation and Test in Europe Conf., pp. 208-213, March 2000.
[12] F. Mailhot and G.DeMicheli.Technology mapping using Boolean matching. In Proc. European Design
Automation Conf., pp. 180-185. March 1990
[13] J. Mohnke, P. Molitor, and S. Malik. Limits of using signatures for permutation independent Boolean
comparison. InProc. Asia and South Pacific Design Automation Conf., pp. 459-464, August 1995.
[14] D. Moller, J. Mohnke, andM. Weber. Detection of symmetry of Boolean functions represented by
ROBDDs. In Proc. Int. Conf. on Computer-Aided Design, pp. 680–684, October 1993.
[15] S. Panda, F. Somenzi, and B. F. Plessier. Symmetry detection and dynamic variable ordering of
decision diagrams. In Proc. Int. Conf. on Computer-Aided Design, pp. 628–631, November 1994.
[16] I. Pomeranz and S. M. Reddy. On determining symmetries in inputs of logic circuits. IEEE Trans. on
Computer-Aided Design of Integrated Circuits, 13(11):1428-1434.
[17] H. Savoj. Don’t Cares in Multi-Level Network Optimization. Ph.D. thesis, University of California,
Berkeley, 1992.
[18] C. Scholl, D. Moller, P. Molitor, and R. Drechsler. BDD minimization using symmetries. IEEE Trans.
on Computer-Aided Design of Integrated Circuits, 18(2):81–100, February 1999.
[19] C. E. Shannon. A symbolic analysis of relay and switching circuits. AIEE Trans., 57:713–723, 1938.
[20] C. C. Tsai and M. Marek-Sadowska. Boolean matching using generalized Reed-Muller forms. In Proc.
Design Automation Conf., pp. 339-344, June 1994.
[21] C. C. Tsai and M. Marek-Sadowska. Generalized Reed-Muller forms as a tool to detect symmetries.
IEEE Trans. on Computers, C-45(1):772–781, August 1996.
[22] http://www.eecs.umich.edu/~jhayes/iscas/benchmark.html.
[23] S. Yang. Logic synthesis and optimization benchmarks user guide – version 3.0. Microelectronics
Center of North Carolina, Research Triangle Park, NC, January 1991.
ICCAD2000, Pages 533-536
Wire Reconnections Based on Implication Flow Graph
Shih-Chieh Chang, Zhong-Zhen Wu, and He-Zhe Yu
National Chung-Cheng University, Chia-Yi, Taiwan R.O.C.
Abstract
Global Flow Optimization (GFO) can perform the fanout/fanin wire re-connections by
modeling the problem of the wire re-connections by a flow graph and then solving the
problem using the maxflow-mincut algorithm on the flow graph. However, the flow
graph cannot fully characterize the wire re-connections which causes GFO to lose
optimality on several obvious cases. In addition, we find that the fanin re-connection can
have more optimization power than the fanout re-connection but requires more
sophisticated modeling. In this paper, we re-formulate the problem of the fanout/fanin reconnections by a new graph called the implication flow graph. We show that the problem
of wire re-connections on the implication flow graph is NP complete and also propose an
efficient heuristic on the new graph. Our experimental results are very exciting.
References
[1] C. L. Berman and L. H. Trevillyan. “Global Flow Optimization in Automatic Logic Design,” IEEE
Trans. CAD 10, pp. 557-564, May 1991.
[2] S. C. Chang, M. Marek-Sadowska, and K. T. Cheng, “Perturb and Simplify: Multi-Level Boolean
Network Optimizer”, IEEE Transaction on Computer Aided Design, Vol. 15, pp. 1494-1504, Nov 1996.
[3] K. T. Cheng and L. A. Entrena, “Multi-Level Logic Optimization by Redundancy Addition and
Removal,” in Proc. European Conference On Design Automation, pp.373-377, Feb. 1993.
[4] R. Damiano and L. Berman, “Dual Global Flow”, Proc. IEEE Int. Conf. on Computer Design, pp. 4953, Oct. 1991.
[5] U. Glaser and K.T. Cheng, “Logic Optimization by an Improved Sequential Redundancy Addition and
Removel”, in Proc. of ASP-DAC, pp.235-240, Sept. 1995.
[6] W. Kunz and D. K. Pradhan, “Multi-Level Logic Optimization by Implication Analysis”, Digest Int.
Conf. on Computer Aided Design, pp. 6-13, Nov. 1994.
ICCAD2000, Pages 538-543
Deterministic Test Pattern Generation Techniques for Sequential Circuits
Ilker Hamzaogluy and Janak H. Patel
Center for Reliable & High-Performance Computing, University of Illinois, Urbana, IL 61801
Abstract
This paper presents new test generation techniques for improving the average-case
performance of the iterative logic array based deterministic sequential circuit test
generation algorithms. To be able to assess the effectiveness of the proposed techniques,
we have developed a new ATPG system for sequential circuits, called ATOMS, and we
have incorporated these techniques into the test generator. ATOMS achieved very high
fault coverages in a short amount of time for the ISCAS89 sequential benchmark circuits,
demonstrating the effectiveness of these techniques on the test generation performance.
References
[1] F. Brglez, D. Bryan, and K. Kozminski, “Combinational Profiles of Sequential Benchmark Circuits,”
Proc. Int. Symp. on Circuits and Systems, pp. 1929-1934, May 1989.
[2] K. T. Cheng, “Gate-Level Test Generation for Sequential Circuits,” ACM Trans. on Design Automation,
vol. 1, no. 4, pp. 405-442, October 1996.
[3] W. T. Cheng, “The BACK Algorithm for Sequential Test Generation,” Proc. Int. Conf. on Computer
Design, pp. 66-69, October 1988.
[4] W. T. Cheng and T. J. Chakraborty, “Gentest: An Automatic Test Generation System for Sequential
Circuits,” IEEE Computer, pp. 43-49, April 1989.
[5] H. Cho, G. D. Hachtel, and F. Somenzi, “Redundancy Identification/Removal and Test Generation for
Sequential Circuits Using Implicit State Enumeration,” IEEE Trans. on Computer-Aided Design, vol. 12,
no. 7, pp. 935-945, July 1993.
[6] A. Dargelas, C. Gauthron, and Y. Bertrand, “Mosaic: A Multiple-Strategy Oriented Sequential ATPG
for Integrated Circuits,” Proc. European Design and Test Conf., pp. 29-36, March 1997.
[7] A. Ghosh, S. Devadas, and A. R. Newton, “Test Generation andVerification for Highly Sequential
Circuits,” IEEE Trans. on Computer-Aided Design, vol. 10, no. 5, pp. 652-667, May 1991.
[8] P. Goel, “An Implicit Enumeration Algorithm to Generate Tests for Combinational Logic Circuits,”
IEEE Trans. on Computers, vol. C-30, no. 3, pp. 21-222, March 1981.
[9] N. Gouders and R. Kaibel, “Test Generation Techniques for Sequential Circuits,” Proc. IEEE VLSI Test
Symp., pp. 221-226, April 1991.
[10] I. HamzaogluandJ. H. Patel, “NewTechniques for Deterministic Test Pattern Generation,” Proc. IEEE
VLSI Test Symp., pp. 446-452, April 1998.
[11] I. HamzaogluandJ. H. Patel, “NewTechniques for Deterministic Test Pattern Generation,” Journal of
Electronic Testing: Theory and Applications, vol. 15, no. 1/2, pp. 63-73, October 1999.
[12] M. S. Hsiao, E. M. Rudnick, and J. H. Patel, “Alternating Strategies for Sequential CircuitATPG,”
Proc. EuropeanDesignand Test Conf., pp. 368-374, March 1996.
[13] M. S. Hsiao, E. M. Rudnick, and J. H. Patel, “Sequential Circuit Test Generation Using Dynamic State
Traversal,” Proc. European Design and Test Conf., pp. 22-28, February 1997.
[14] M. S. Hsiao, E. M. Rudnick, and J. H. Patel, “Application of Genetically-Engineered Finite-StateMachine Sequences to Sequential Circuit ATPG,” IEEE Trans. on Computer-Aided Design, vol. 17, no. 3,
pp. 239-254, March 1998.
[15] T. P. Kelsey and K. K. Saluja, “Fast Test Generation for Sequential Circuits,” Proc. Int. Conf. on
Computer-Aided Design, pp. 354-357, November 1989.
[16] M. H. Konijnenburg, J. Th. van der Linden, and A. J. van de Goor, “Illegal State Space Identification
for Sequential Circuit Test Generation,” Proc. Design, Automation, and Test in Europe Conf., pp. 741-746,
March 1999.
[17] D. H. Lee and S. M. Reddy, “A New Test Generation Method for Sequential Circuits,” Proc. Int. Conf.
on Computer-Aided Design, pp. 446-449, November 1991.
[18] X. Lin, I. Pomeranz, and S. M. Reddy, “MIX: ATest Generation System for Synchronous Sequential
Circuits,” Proc. Int. Conf. on VLSI Design, pp. 456-463, January 1998.
[19] X. Lin, I. Pomeranz, and S. M. Reddy, “Techniques for Improving the Efficiency of Sequential Circuit
Test Generation,” Proc. Int. Conf. on Computer-Aided Design, pp. 147-151, November 1999.
[20] H. T. Ma, S. Devadas, A. R. Newton, and A. Sangiovanni-Vincentelli, “Test Generation for Sequential
Circuits,” IEEE Trans. on Computer-Aided Design, vol. 7, no. 10, pp. 1081-1093, October 1988.
[21] T. E. Marchok, Aiman El-Maleh, W. Maly and J. Rajski, “Complexity of Sequential ATPG,” Proc.
European Design and Test Conf., pp. 252-261, February 1995.
[22] R. Marlett, “An Effective Test Generation System for Sequential Circuits,” Proc. Design Automation
Conf., pp. 250-256, June 1986.
[23] P. Mazumder and E. M. Rudnick. Genetic Algorithms for VLSI Design, Layout and Test Automation.
Prentice Hall, New Jersey, 1999.
[24] T. M. Niermann and J. H. Patel, “HITEC: A Test Generation Package for Sequential Circuits,” Proc.
European Design Automation Conf., pp. 214-218, February 1991.
[25] T. M. Niermann, W. Cheng, and J. H. Patel, “PROOFS: A Fast, Memory-Efficient Sequential Circuit
Fault Simulator,” IEEE Trans. on Computer-Aided Design, vol. 11, no. 2, pp. 198-207, February 1992.
[26] P. Prinetto, M. Rebaudengo, and M. Sonza Reorda, “An Automatic Test Pattern Generator for Large
Sequential Circuits Based on Genetic Algorithms,” Proc. Int. Test Conf., pp. 240-249, October 1994.
[27] E. M. RudnickandJ. H. Patel, “CombiningDeterministic and Genetic Approaches for Sequential
Circuit Test Generation,” Proc. Design Automation Conf., pp. 183-188, June 1995.
[28] E. M. Rudnick, J. H. Patel, G. S. Greenstein, and T. M. Niermann, “A Genetic Algorithm Framework
for Test Generation,” IEEE Trans. on Computer-Aided Design, vol. 16, no. 9, pp. 1034-1044, September
1997.
[29] D. G. Saab, Y. G. Saab, and J. A. Abraham, “CRIS: A Test Cultivation Program for Sequential VLSI
Circuits,” Proc. Int. Conf. on Computer-Aided Design, pp. 216-219, November 1992.
[30] D. G. Saab, Y. G. Saab, and J. A. Abraham, “Iterative Simulation-Based Genetics + Deterministic
Techniques = Complete ATPG,” Proc. Int. Conf. on Computer-Aided Design, pp. 40-43, November 1994.
[31] M. H. Schulz and E. Auth, “Essential: An Efficient Self-Learning Test Pattern Generation Algorithm
for Sequential Circuits,” Proc. Int. Test Conf., pp. 28-37, August 1989.
ICCAD2000, Pages 544-549
Simulation Based Test Generation for Scan Designs
Irith Pomeranz and Sudhakar M. Reddy
Electrical and Computer Engineering Department, University of Iowa, Iowa City, IA 52242
Abstract
We describe a simulation-based test generation procedure for designs. A test sequence
generated by this procedure consists of a sequence of one or more primary input vectors
embedded between a scan-in operation and a scan-out operation. We consider the set of
faults that can be detected by test sequences of this form, compared to the case where
scan is applied with test vector. The proposed procedure constructs test sequences that
traverse as many pairs of fault-free/faulty states as possible, and thus avoids the use of
branch-and-bound test generation techniques. Additional techniques are incorporated into
this basic procedure to enhance its effectiveness.
References
[1] J. Snethen, "Simulation-Oriented Fault Test Generator", in Proc. 14th Design Autom. Conf., June 1977,
pp. 88-93.
[2] D. G. Saab, Y. G. Saab and J. A. Abraham, "CRIS: A Test Cultivation Program for Sequential VLSI
Circuits," Intl. Conf. Computer-Aided Design, Nov. 1992, pp. 216-219.
[3] E. M. Rudnick, J. H. Patel, G. S. Greenstein and T. M. Niermann, "Sequential Circuit Test Generation
in a Genetic Algorithm Framework", in Proc. Design Autom. Conf., June 1994, pp. 698-704.
[4] P. Prinetto, M. Rebaudengo and M. Sonza Reorda, "An Automatic Test Pattern Generator for Large
Sequential Circuits based on Genetic Algorithms", in Proc. 1994 Intl. Test Conf., Oct. 1994, pp. 240-249.
[5] F. Corno, P. Prinetto, M. Rebaudengo, M. Sonza Reorda and R. Mosca, "Advanced Techniques for GAbased Sequential ATPGs", in Proc. 1996 Europ. Design & Test Conf., March 1996, pp. 375-379.
[6] D. G. Saab, Y. G. Saab and J. A. Abraham, "Iterative [Simulation-Based Genetics + Deterministic
Techniques ] = Complete ATPG", in Proc. 1994 Intl. Conf. on Computer-Aided Design, Nov. 1993, pp. 4043.
[7] E. M. Rudnick and J. H. Patel, "Combining Deterministic and Genetic Approaches for Sequential
Circuit Test Generation", in Proc. 32rd Design Autom. Conf., June 1995, pp. 183-188.
[8] I. Pomeranz and S. M. Reddy, "On Improving Genetic Optimization based Test Generation", in Proc.
1997 European Design & Test Conf., March 1997, pp. 506-511.
[9] V. D. Agrawal, K. T. Cheng, and P. Agrawal, "A Directed Search Method for Test Generation Using
Concurrent Simulator," IEEE Trans. on Computer-Aided Design, Feb. 1989, pp. 131-138.
[10] I. Pomeranz and S. M. Reddy, "LOCSTEP: A Logic Simulation Based Test Generation Procedure", in
Proc. 25th Fault-Tolerant Computing Symp., June 1995, pp. 110-119.
[11] I. Pomeranz and S. M. Reddy, "ACTIV-LOCSTEP: A Test Generation Procedure Based on Logic
Simulation and Fault Activation", in Proc. 27th Fault-Tolerant Comp. Symp., June 1997, pp. 144-151.
[12] R. Guo, S. M. Reddy and I. Pomeranz, "PROPTEST: A Property Based Test Pattern Generator for
Sequential Circuits Using Test Compaction", in Proc. 36th Design Autom. Conf., June 1999, pp. 653-659.
[13] L. Nachman, K. K. Saluja, S. Upadhyaya and R. Reuse, "Random Pattern Testing for Sequential
Circuits Revisited", in Proc. 26th Fault-Tolerant Computing Symp., June 1996, pp. 44-52.
[14] I. Pomeranz and S. M. Reddy, "Static Test Compaction for Scan-Based Designs to Reduce Test
Application Time", in Proc. 7th Asian Test Symp., Dec. 1998, pp. 198-203.
ICCAD2000, Pages 550-556
Test Generation for Acyclic Sequential Circuits with Hold Registers
Tomoo Inoue†, Debesh Kumar Das*, Chiiho Sano‡, Takahiro Mihara**, and Hideo
Fujiwara‡
† Faculty of Information Sciences, Hiroshima City University, Hiroshima 731-3194, Japan
*Computer Science and Engineering Department, Jadavpur University, Calcutta- 700 032, India
‡ Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101,
Japan
**Mitsubishi Electronic Control Software Corporation, Kobe 652-0871, Japan
Abstract
We present a method of test generation for acyclic sequential circuits with hold registers.
A complete (100% fault efficiency) test sequence for an acyclic sequential circuit can be
obtained by applying a combinational test generator to all the maximal time-expansion
models (TEMs) of the circuit. We propose a class of acyclic sequential circuits for which
the number of maximal TEMs is one, i.e, the maximum TEM exists. For a circuit in the
class, test generation can be performed by using only the maximum TEM.
The proposed class of sequential circuits with the maximum TEM properly includes
several known classes of acyclic sequential circuits such as balanced structures and
acyclic sequential circuits without hold registers for which test generation can be also
performed by using a combinational test generator. Therefore, in general, the hardware
overhead for partial scan based on the proposed structure is smaller than that based on
balanced or acyclic sequential structure without hold registers.
References
[1] H. Fujiwara, Logic Testing and Design for Testability, The MIT Press, 1985.
[2] M. Abramovici, M.A. Breuer, and A.D. Friedman, Digital Systems Testing and Testable Design,
Computer Science Press, 1990.
[3] K.-T. Cheng and V.D. Agrawal, “A partial scan method for sequential circuits with feedback,” IEEE
Trans. Comput., Vol.39, No.4, pp.544–548, Apr. 1990.
[4] D.H. Lee and S.M. Reddy, “On determining scan flip-flops in partial-scan design approach,” in Proc.
Int. Conf. Computer-Aided Design, pp.322–325, Nov. 1990.
[5] R. Gupta, R. Gupta, and M.A. Breuer, “The BALLAST methodology for structured partial scan
design,” IEEE Trans. Comput., Vol.39, No.4, pp.538–544, Apr. 1990.
[6] A. Balakrishnan and S.T. Chakradhar, “Sequential circuits with combinational test generation
complexity,” in Proc. Int. Conf. on VLSI Design, Jan. 1996, pp.111–117.
[7] T. Takasaki, T. Inoue, and H. Fujiwara, “Partial scan design methods based on internally balanced
structure,” in Proc. Asia & South Pacific Design Automation Conf., 1998, pp.211–216.
[8] R. Gupta and M.A. Breuer, “Partial scan design of register-transfer level circuits,” Journal of Electronic
Testing: Theory and Applications, Vol.7, pp.25–46, 1995.
[9] A. Kunzmann and H.-J.Wunderlich, “An analytical approach to the partial scan problem,” Journal of
Electronic Testing: Theory and Applications, Vol.1, pp.163-174, 1990.
[10] R. Gupta and M.A. Breuer, “Testability properties of acyclic structures and applications to partial scan
design,” in Proc. IEEE VLSI Test Symp., 1992, pp.49–54.
[11] Tomoo Inoue, Toshinori Hosokawa, Takahiro Mihara, and Hideo Fujiwara, “An optimal time
expansion model based on combinational test generation for RT level circuits,” in Proc. IEEE Asian Test
Symp., 1998, pp.190-197.
ICCAD2000, Pages 557-561
A Parametric Test Method for Analog Components
in Integrated Mixed-Signal Circuits
M. Pronath, V. Gloeckel, H. Graeb
Institute of Electronic Design Automation, Technical University of Munich
Abstract
In this paper, we present a novel approach to use test stimuli generated by digital
components of a mixed-signal circuit for testing its analog components. A wavelet
transform is applied to the response signal of the device under test (DUT). We will show,
that in comparison to Fourier transform or no transform at all, particular properties of this
transformation are advantageous for mixed-signal test and especially built-in self test.
We introduce a new method for test measurement selection based on a non-deterministic
parametric fault model for analog circuits. This approach allows for noise and
measurement error in testing. We show, how test quality can be optimized in the
presented fault model. Our test methodology is demonstrated on an analog CMOS
bandpass filter.
References
[1] Linda S. Milor, “A tutorial introduction to research on analog and mixed-signal circuit testing,” IEEE
Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 45, no. 10, pp. 1389–
1407, Oct. 1998.
[2] “Mixed-signal BIST,” http://www.logicvision.com/solution/tb_msbist.htm.
[3] L. Milor and A. L. Sangiovanni-Vincentelli, “Minimizing production test time to detect faults in analog
circuits,” IEEE Transactions on Computer-Aided Design of Circuits and Systems, vol. 13, pp. 796–813,
1994.
[4] G. Hemink, B. Meijer, and H. Kerkhoff, “Testability analysis of analog systems,” IEEE Transactions
on Computer-Aided Design of Circuits and Systems, vol. 9, pp. 573–583, 1990.
[5] Zhihua Wang, Georges Gielen, and Willy Sansen, “Probabilistic fault detection and the selection of
measurements for analog integrated circuits,” IEEE Transactions on Computer-Aided Design of Circuits
and Systems, vol. 17, no. 9, pp. 862–872, 1998.
[6] Giri Devarayanadurg, Mani Soma, Prashant Goteti, and Sam D. Huynh, “Test set selection for structural
faults in analog IC’s,” IEEE Transactions on Computer-Aided Design of Circuits and Systems, vol. 18, no.
7, pp. 1026–1038, July 1999.
[7] Sam D. Huynh, Seongwon Kim, Mani Soma, and Jinyan Zhang, “Automatic analog test signal
generation using multifrequency analysis,” IEEE Transactions on Circuits and Systems II: Analog and
Digital Signal Processing, vol. 46, no. 5, pp. 565–576, May 1999.
[8] Walter M. Lindermeir, Helmut E. Graeb, and Kurt J. Antreich, “Analog testing by characteristic
observation inference,” IEEE Transactions on Computer-Aided Design of Circuits and Systems, vol. 18, no.
9, pp. 1353–1368, Sept. 1999.
[9] Chen-Yang Pan and Kwang-Ting (Tim) Cheng, “Test generation for linear time-invariant analog
circuits,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 46, no.
5, pp. 554–564, May 1999.
[10] M. Ohletz, “Hybrid built-in self-test for mixed analog/digital integrated circuits,” in European Test
Conference (ETC), 1991, pp. 307–316.
[11] Evan M. Hawrysh and Gordon W. Roberts, “An integration of memory-based analog signal generation
into current DFT architectures,” in IEEE International Test Conference (ITC), 1996, pp. 528–537.
[12] Jeongjin Roh and Jacob A. Abraham, “Subband filtering scheme for analog and mixed-signal circuit
testing,” in IEEE International Test Conference (ITC), 1999, pp. 221–229.
[13] P. N. Variyam, A. Chatterjee, and N. Nagi, “Low-cost and efficient digital-compatible BIST for
analog circuits using pulse response sampling,” in 15th IEEE VLSI Test Symposium, 1997, pp. 261–266.
[14] K. Arabi and B. Kaminska, “Oscillation built-in self test (OBIST) scheme for funtional and structural
testing of analog and mixed-signal integrated circuits,” in IEEE International Test Conference (ITC), 1997,
pp. 786–795.
[15] Ingrid Daubechies, “Orthonormal bases of compactly supported wavelets,” Comm. Pure Appl. Math.,
vol. 41, no. 7, pp. 909–996, 1988.
[16] Sven Simon, Peter Rieder, and Josef A. Nossek, “Efficient VLSI suited architectures for discrete
wavelet transforms,” in VLSI Signal Processing, vol. IX, pp. 388–397. IEEE, New York, NY, USA, 1996.
[17] B. R. Epstein, M. Czigler, and S. R. Miller, “Fault detection and classification in linear integrated
circuits: An application of discrimination analysis and hypothesis testing,” IEEE Transactions on
Computer-Aided Design of Circuits and Systems, vol. 12, pp. 102–113, 1993.
[18] Chieh-Yuan Chao, Hung-Jen Lin, and Linda Milor, “Optimal testing of VLSI analog circuits,” IEEE
Transactions on Computer-Aided Design of Circuits and Systems, vol. 16, no. 1, pp. 58–77, Jan. 1997.
[19] F. Bouwman, T. Zwemstra, S. Hartanto, and K. Baker, “Application of joint time-frequency analysis
in mixed signal testing,” in IEEE International Test Conference (ITC), 1994, pp. 747–756.
[20] Takahiro Yamaguchi, “Static testing of ADCs using wavelet transform,” in Sixth Asian Test
Symposium, 1997, pp. 188–193.
[21] G. N. Stenbakken, T. M. Souders, and G. W. Stewart, “Ambiguity groups and testability,” IEEE
Transactions on Instrumentation and Measurement, vol. 38, no. 5, pp. 941–947, Oct. 1989.
[22] J. V. Spaandonk and T. A. M. Kevenaar, “Selecting measurements to test the functional behavior of
analog circuits,” Journal of Electronic Testing, vol. 9, pp. 9–18, 1996.
[23] Stephen Sunter and Naveena Nagi, “Test metrics for analog parametric faults,” in 17th IEEE VLSI Test
Symposium, 1999, pp. 226–234.
[24] J. A. Anderson, “Separate sample logistic discrimination,” Biometrika, pp. 19–35, 1972.
[25] K. Antreich, H. Graeb, and C. Wieser, “Circuit analysis and optimization driven by worst-case
distances,” IEEE Transactions on Computer-Aided Design of Circuits and Systems, vol. 13, no. 1, pp. 57–
71, Jan. 1994.
[26] U. Feldmann, U. Wever, Q. Zheng, R. Schultz, and H. Wriedt, “Algorithms for modern circuit
simulation,” Archiv für Elektronik und Übertragungstechnik (AEÜ), vol. 46, pp. 274–285, 1992.
[27] K. Antreich, J. Eckmueller, H. Graeb, M. Pronath, F. Schenkel, R. Schwencker, and S. Zizala,
“WiCkeD: Analog circuit synthesis incorporating mismatch,” in IEEE Custom Integrated Circuits
Conference (CICC), 2000.
ICCAD2000, Pages 562-567
Partial Simulation-Driven ATPG for Detection and Diagnosis of Faults in
Analog Circuits
Sudip Chakrabarti and Abhijit Chatterjee
School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA.
Abstract
In this paper, we propose a novel fault-oriented test generation methodology for detection
and isolation of faults in analog circuits. Given the description of the circuit-under-test,
the proposed test generator computes the optimal transient test stimuli in order to detect
and isolate a given set of faults. It also computes the optimal set of test nodes to probe at,
and the time instants to make measurements. The test generation program accommodates
the effects introduced by component tolerances and measurement inaccuracy, and can be
tailored to fit the signal generation capabilities of a hardware tester. Experimental results
show that the proposed technique can be applied to generate transient tests for both linear
and non-linear analog circuits of moderate complexity in reasonably less CPU time. This
will significantly impact the test development costs for an analog circuit and will
decrease the time-to-market of a product. Finally, the short duration and the easy-to-apply
feature of the test stimuli will lead to significant reduction in production test costs.
References
[1] M. J. Marlett and J. A. Abraham, “DC-IATP: An Iterative Analog Circuit Test Generation Program for
Generating DC Single Pattern Tests,” Proc. Int’l Test Conference, 1988, pp. 839-845.
[2] L. Milor and V. Visvanathan, “Detection of Catastrophic Faults in Analog Integrated Circuits,” IEEE
Trans. on CAD, Vol. 8, 1989, pp. 114-130.
[3] N. B. Hamida and B. Kaminska, “Analog Circuit Testing based on Sensitivity Computation and New
Circuit Modeling,” Proc. Int’l Test Conference, 1993, pp. 331-343.
[4] M. Slamani, B. Kaminska and G. Quesnel, “An Integrated Approach for Analog Circuit Testing with
Minimum Number of Detected Parameters,” Proc. Int’l Test Conference, 1994, pp. 631-640.
[5] S. Tsai, “Test Vector Generation for Linear Analog Devices,” Proc. Int’l Test Conference, 1991, pp.
592-597.
[6] G. Devarayanadurg and M. Soma, “Dynamic Test Signal Design for Analog ICs,” Proc. Int’l
Conference for CAD, 1995, pp. 627-629.
[7] H. H. Zheng, A. Balivada and J. A. Abraham, “A Novel Test Generation Approach for Parametric
Faults in Linear Analog Circuits,” Proc. VLSI Test Symposium, 1996, pp. 470-475.
[8] S. Chakrabarti and A. Chatterjee, “Diagnostic Test Pattern Generation for Analog Circuits Using
Hierarchical Models,” Proceedings, International Conference on VLSI Design, January 1999, pp. 518-523.
[9] R. Voorakaranam, S. Chakrabarti, J. Hou, A. Gomes, S. Cherubal, A. Chatterjee, W. Kao, “Hierarchical
Specification-Driven Analog Fault Modeling for Efficient Fault Simulation and Diagnosis,” Proc. Int’l Test
Conference, 1997, pp. 903-912.
[10]H. Walker, S. W. Director, “VLASIC: A Catastrophic Fault Yield Simulator for Integrated Circuits,”
IEEE Trans on CAD, Vol. CAD-5, No. 4, October 1986, pp. 541-546.
[11]A.V. Gomes, R. Voorakaranam and A. Chatterjee, “Modular fault simulation of mixed signal circuits
with fault ranking by severity,” Proc. IEEE Int’l Symposium on Defect and Fault Tolerance in VLSI
Systems, 1998, pp. 341-348.
[12]J.W. Bandler and A.E. Salama, “Fault Diagnosis of Analog Circuits”, Proc. IEEE, Vol. 73, August
1985, pp. 1279-1325.
[13]J. Hou and A. Chatterjee, “CONCERT: A Concurrent Fault Simulator for Analog Circuits,” Proc. Int’l
Conference on CAD, 1998, pp. 384-391.
[14]P. M. Lin and Y. S. Elcherif, “Analog Circuits Fault Dictionary - New Approaches and
Implementation,” Circuit Theory and Applications, 1985.
ICCAD2000, Pages 569-573
System and architecture-level power reduction of microprocessor-based
communication and multi-media applications
Lode Nachtergaele
IMEC, Leuven, Belgium
Vivek Tiwari
Intel Corp., Santa Clara, CA
Nikil Dutt
U.C. Irvine, CA
Introduction
Current microprocessor architectures become more and more dominated by the data
access bottlenecks in the cache, system bus and main memory subsystems. These also
have a major influence on the system (board-level) power consumption. In practice this
means lower energy consumption for a given throughput requirement.
In the booming domain of (largely embedded) cost-sensitive communication and multimedia applications, more and more implementations make use of microprocessor based
platforms for flexibility reasons.
However, in order to provide sufficiently high data throughput at reasonable power
consumption for these demanding applications, novel solutions for the memory access
and data transfer will have to be introduced. These will have to be situated both at the
processor architecture and the algorithm/compiler level.
The question we want to address in this paper and tutorial is what would these solutions
look like. We will show that they will be based on processor architecture optimizations,
on novel approaches in the application of compiler technology, and on exploiting the
interface between the system hardware and software.
References
[1] A. Chandrakasan and R. Brodersen, “Low Power Digital CMOS Design,” Kluwer Academic Press,
1998.
[2] IEEE Transaction on Computer Architecture Newsletter, special issue on “Interaction between
Compilers and Computer Architectures'”, June 1997.
[3] R. Bahar, G. Albera, and S. Manne, “Power and Performance Tradeoffs Using Various Caching
Strategies,” Proc. Intl. Symposium on Low-Power Electronics and Design, 1998.
[4] M. Kin and W. Mangione-Smith, “The Filter Cache: An Energy Efficient Memory Structure,” Proc.
Micro30, 1997.
[5] M. Azam, P. Franzon, W. Liu, and T. Conte, “Low Power Data Processing by Elimination of
Redundant Computations,” Proc. Intl. Symposium on Low-Power Electronics and Design, 1997.
[6] D. Brooks and M. Martonosi, “Dynamically Exploting Narrow Width Operands to Improve Processor
Power and Performance,” Proc. HPCA-5, 1999.
[7] S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,”
Proc. ISCA-25, 1998.
[8] N. Bellas, “Architectural and Compiler Techniques for Energy Reduction in High Performance
Microprocessors,” Ph.D. Thesis, Univ. of Illinois at Urbana-Champaign.
[9] ACPI, http://www.teleport.com/~acpi/
[10] Y-H. Lu, E-Y. Chung, T. Simunic, L. Benini, and G. De Micheli, “Quantitative Comparison of Power
Management Algorithms,” Proc. DATE, 2000.
[11] T. Burd et. al., “A Dynamic Voltage Scaled Microprocessor System,” Proc. ISSCC 2000.
[12] L. Nachtergaele, T. Gijbels, J. Bormans, F. Catthoor, M.Engels, “Power and speed-efficient code
transformation of multi-media algorithms for RISC processors”, IEEE Workshop on Multimedia Signal
Processing, Los Angeles, California, USA, December 7-9, 1998, pp. 317-322.
[13] F. Catthoor, S. Wuytack, E. DeGreef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, “Custom
Memory Management Methodology,” Kluwer Academic Press, 1998.
[14] L.Benini, G.De Micheli, “System-level power optimization techniques and tools”, ACM Trans. on
Design Automation for Embedded Systems (TODAES), Vol.5, No.2, pp.115-192, April 2000.
[15] K.Danckaert, F.Catthoor, H.De Man, “Platform independent data transfer and storage exploration
illustrated on a parallel cavity detection algorithm'', Proc. ACM Conf. on Par. and Dist. Proc. Techniques
and Applications, PDPTA'99, Vol.III, pp.1669-1675, Las Vegas NV, June 1999.
[16] V. Tiwari, S. Malik, A. Wolfe, and T.C. Lee, “Instruction Level Power Analysis and Optimization of
Software”, Journal of VLSI Signal Processing Systems, Vol. 13, No. 2, August 1996.
[17] P. Panda, N. Dutt, and A. Nicolau, “Memory Issues in Embedded Systems-on-Chip: Optimizations
and Exploration,” Kluwer Academic Press, 1999.
[18] W. Shiue and C. Chakrabarti, “Memory Exploration for Low Power Embedded Systems,” Proc. 36th
Design Automation Conference, 1999.
[19] C. Kulkarni, F. Catthoor, H. De Man, “Advanced Data Layout Organization for Multi-media
Applications,” Proc. IPDPS Workshop on Parallel, Distributed Computing in Image Processing, Video
Processing and Multimedia, 2000.
[20] H. Tomiyama, T. Ishihara, A. Inoue, and H. Yasuura, “Instruction Scheduling for Power Reduction in
Processor-based System Design,” Proc. Conference on Design, Automation, and Test in Europe, 1998.
[21] P. Grun, N. Dutt and A. Nicolau, “Memory-aware Compilation through Accurate Timing Extraction,”
Proc. 37th Design Automation Conference, 2000.
[22] P. Grun, N. Dutt and A. Nicolau, “MIST: An Algorithm for Memory Miss Traffic Management,” Proc.
International Conference on Computer-Aided Design, 2000.
[23] S. Rixner, W. Dally, U. Kapasi, P. Mattson and J. Owens, “Memory Access Scheduling,” Proc. 27th
International Symposium on Computer Architecture, 2000.
[24] D. Brooks, V. Tiwari and M. Martonosi, “Wattch: A Framework for Architectural-Level Power
Analysis and Optimizations,” Proc. 27th International Symposium on Computer Architecture, 2000.
[25] N. Vijaykrishnan, M. Kandemir, M. Irwin, H. Kim, and W. Ye, “Energy-Driven Integrated HardwareSoftware Optimizations using SimplePower,” Proc. 27th International Symposium on Computer
Architecture, 2000.
[26] M. Kandemir, N. Vijaykrishnan, M. Irwin, and W. Ye, “Influence of Compiler Optimizations on
System Power,” Proc. 37th Design Automation Conference, 2000.
[27] D. Marculescu, “Profile-Driven Code Execution for Low Power Dissipation,” Proc. International
Symposium on Low Power Electronics and Design, 2000.
[28] The COPPER Project: Compiler-Controlled Continuous Power-Performance Management, The Center
for Embedded Computer Systems, University of California,Irvine. http://www.cecs.uci.edu/~copper
ICCAD2000, Page 575
Design-Manufacturing Interface for 0.13 micron and Below
Andrzej J. Strojwas
Department of ECE, Carnegie Mellon University, Pittsburgh, PA15213
& PDF Solutions, Inc., San Jose, CA95110
SUMMARY
Over the years, the increase in IC functionality has been achieved by a continuous drive
towards smaller feature sizes. Due to the decreasing dimensions of semiconductor
structures, the sensitivity to critical design and manufacturing parameters has risen
dramatically. Vertical integration techniques and multi-level interconnect, which are
becoming more common in modern technologies, have driven up the number of critical
processing steps to several hundreds. These trends are expected to continue for the next
several decades. The .13 micron technology is around the corner, as well as 300mm
wafers. The increase in IC functionality has come with a skyrocketing capital spending
(more than $2 billion per fabrication facility). Moreover, the product life cycles for
leading edge IC's have become very short (less than 2 years)…