Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

An Automatic Circuit Design Framework For Level Shifter Circuits

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO.

12, DECEMBER 2022 5169

An Automatic Circuit Design Framework


for Level Shifter Circuits
Jiwoo Hong , Graduate Student Member, IEEE, Sunghoon Kim , Graduate Student Member, IEEE,
and Dongsuk Jeon , Member, IEEE

Abstract—Although design automation is a key enabler of to realize its true potential. Various approaches have been
modern large-scale digital systems, automating the transistor- reported for automatic circuit topology generation. For digital
level circuit design process still remains a challenge. Some recent logic gates, the Boolean expression factoring method that gen-
works suggest that deep learning algorithms could be adopted
to find optimal transistor dimensions in relatively small circuitry erates series-parallel (SP) associations of transistors for the
such as analog amplifiers. However, those approaches are not given function was suggested [3]. Possani et al. [4] proposed
capable of exploring different circuit structures to meet the given an improved graph-based method that creates a logic gate by
design constraints. In this work, we propose an automatic circuit introducing nonseries-parallel (NSP) arrangements into the SP
design framework that can generate practical circuit structures structure, thus reducing the number of transistors. But these
from scratch as well as optimize the size of each transistor, con-
sidering performance and reliability. We employ the framework methods regard transistors as ideal switches and, hence, are
to design level shifter circuits, and the experimental results show only applicable to designing digital logic gates based on static
that the framework produces novel level shifter circuit topolo- operations.
gies and the automatically optimized designs achieve 2.8×–5.3× There have also been several circuit topology synthesis
lower power-delay product (PDP) than prior arts designed by approaches aimed at more general integrated circuits. The
human experts.
library-based methods [5], [6] select one of the predefined
Index Terms—Circuit design automation, deep learning, circuit structures (e.g., a two-stage amplifier) in the library
evolutionary algorithm, level shifter, reinforcement learning (RL). based on the desired operating characteristics. However, one
must construct a library containing all possible circuit struc-
I. I NTRODUCTION tures in advance, which is a time-consuming process that also
ITH increasing hardware design complexity and vari- necessitates a considerable amount of human effort. Building-
W ability of the fabrication process, design automation
has been widely adopted in a large portion of the IC design
block-based methods [7]–[9] take a similar approach, but rely
on a library of smaller building blocks, such as a current mirror
process. For instance, various electronic design automation and a differential input pair. They employ various algorithms
(EDA) tools are now available for designing digital blocks to search for the best topology, such as the multiobjective
and System-on-Chips (SoCs). The EDA tools can generate a evolutionary algorithm [7], framework for explorative analog
large block composed of millions of logic gates very efficiently topology synthesis method (FEATS) [8], and graph-grammar-
using a standard cell library[1]. However, when it comes to based topology generation (GGTG) [9]. Since the library- or
designing integrated circuits, design automation remains a building-block-based approaches have relatively limited search
challenge. Most digital and analog circuits are still carefully space, they are suitable for the fast generation of integrated cir-
designed by human experts due to high design complexity and cuits using a well-established topology. However, the search
reliability concerns [2]. space is constrained within the predefined set of circuit struc-
Integrated circuit design automation can be decomposed tures or building blocks and, hence, they are less adaptive to
into two design problems: 1) circuit topology selection and changes in design parameters or fabrication process. In addi-
2) transistor size optimization. It is crucial to choose a proper tion, there is little possibility that they could generate a novel
circuit topology in the first place since the topology mainly topology that has not been studied yet.
sets the limit on the performance and reliability a circuit can On the other hand, the transistor-based methods [10]–[13]
achieve. We also need to optimize the size of each transistor do not rely on predefined components for topology genera-
tion; instead, they progressively construct a circuit by adding
Manuscript received 25 June 2021; revised 10 December 2021; accepted or removing a transistor in the topology. For instance, the
10 February 2022. Date of publication 1 March 2022; date of current version
22 November 2022. This work was supported in part by the National Research circuit-constructing robot (CC-BOT) [11] starts with a single
Foundation of Korea under Grant NRF-2020R1A4A4079177 and Grant NRF- node and conditionally adds a transistor following an evolu-
2019R1C1C1004927, and in part by the IC Design Education Center. This tionary algorithm. An active bot moves to a newly created node
article was recommended by Associate Editor P. Li. (Corresponding author:
Dongsuk Jeon.) and continues adding transistors from there. The algorithm
The authors are with the Graduate School of Convergence Science in [13] represents transistors and passive devices as a 3-node
and Technology, the Research Institute for Convergence Science, and the graph (hypergraph) and an edge, respectively. In each gener-
Inter-University Semiconductor Research Center, Seoul National University,
Seoul 08826, South Korea (e-mail: djeon1@snu.ac.kr). ation, it removes and adds multiple hypergraphs and edges,
Digital Object Identifier 10.1109/TCAD.2022.3155444 also following an evolutionary algorithm. The transistor-based
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
5170 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 12, DECEMBER 2022

approaches have a significantly larger search space and, as a


result, are capable of generating an optimal circuit topology
under different design constraints. These approaches do not
require much prior knowledge on the target circuit, remov-
ing the need for the aid of human experts during the design
process. However, they essentially rely on trial and errors;
an inefficient search algorithm results in very slow search
speed, requiring an extensive amount of SPICE simulations
to evaluate candidate integrated circuits.
Transistor sizing is another crucial part of integrated circuit
design automation since it directly affects the performance
and reliability of a circuit. Liu et al. [14] proposed the
multiobjective uncertain optimization with ordinal optimization
LSS and parallel computation (MOOLP) optimizer based
on a differential evolutionary algorithm, whereas other prior
works suggest using particle swarm optimization [15], [16]
or Bayesian methods [17], [18]. Recent works demonstrate
promising results by applying deep learning algorithms to
transistor sizing optimization. For instance, learning to design
circuit (L2DC) [19] and AutoCkt [20] adopt reinforcement
learning (RL) for optimizing transistor sizes in analog ampli-
fiers. It was demonstrated that those RL-based approaches could
successfully optimize the integrated circuit to meet the given
design constraints, such as gain, bandwidth, and input-referred
noise. While the RL-based methods achieve significantly faster
convergence than conventional optimization algorithms, they
still need numerous SPICE simulations during optimization.
Also, both L2DC and AutoCkt utilize prior knowledge on
the circuit topology during optimization (e.g., the signal path
and tightly coupled transistors), limiting their applicability to
other types of integrated circuits. To address these problems,
Wang et al. [21] employed a graph convolutional network in RL Fig. 1. Overview of the proposed circuit design framework.
(GCN-RL) and utilize transfer learning. They show that using
a pretrained network can reduce the number of SPICE simula- In this work, we propose a unified circuit design automation
tions in optimizing two-stage and three-stage transimpedance framework that can generate an optimal circuit topology from
amplifiers. However, the initial training of the neural network scratch as well as optimize the size of each transistor. Our key
still requires a large number of SPICE simulations, and the contributions are as follows.
pretrained network is only effective when applied to another 1) A 2-stage circuit design framework that significantly
circuit with a similar structure. speeds up the design process.
Level shifter circuits are widely used in digital systems to 2) A new voltage-based graph representation of integrated
convert the level of signals between voltage domains. The circuits.
signals from the internal core must be boosted to communi- 3) A fast circuit optimizer adopting a multiagent RL algo-
cate with external discrete components or the data between rithm for faster convergence.
core blocks with different supply voltages should be level 4) A process variation-aware optimization algorithm that
converted for proper operation. Various level shifter circuit results in a practical, robust design.
topologies have been studied, and the optimal topology can The framework was employed to design a level shifter cir-
vary greatly depending on the operating conditions, such as cuit, and the resulting level shifter circuits are fabricated in
voltage conversion range (e.g., core to core or core to I/O) and a 180-nm CMOS process to validate the effectiveness of the
power budget. For instance, differential cascode voltage switch proposed circuit design framework.
(DCVS) exhibits better conversion speed and energy efficiency The remainder of the article elaborates on the proposed
for conversion between core voltage domains, whereas the framework as follows: Section II describes the overall archi-
Wilson current mirror level shifter (WCMLS) and its vari- tecture of the framework and its distinct features. Section III
ant are suitable for converting subthreshold voltage input due discusses the experimental results, and Section IV concludes
to relaxed contention [22]–[24]. Therefore, we may have to the article.
switch to a totally different topology and start the design
process again when the design constraints and operating con- II. P ROPOSED C IRCUIT D ESIGN F RAMEWORK
ditions change, making conventional circuit design automation The overall flow of the proposed circuit design framework
frameworks unsuitable for level shifter design. is shown in Fig. 1. Instead of relying on a single algorithm to
HONG et al.: AUTOMATIC CIRCUIT DESIGN FRAMEWORK FOR LEVEL SHIFTER CIRCUITS 5171

Algorithm 1 Topology Generator


Input: Population size N, Max Generations G, Mutation
Probability
Output: Candidate Topologies
1: P: Population, C: Offspring, P0 : Initial Population
2: for g = 1, 2, . . . , G do
3: Simulate all Ci (i ∈ N) and Calculate Fitness
4: Remove Stagnated Species and Extract Candidates
5: Calculate Fitness of Species sk ∈ Sg
6: Calculate Reproduction Size Rk of each sk
7: for all sk in Sg do
(a) (b) 8: Add Best Candidate in sk to Pg+1
9: Make Parent Pool with N  Top Candidates in sk
Fig. 2. Examples of proposed graph-based circuit representation. (a) Simple 10: for j = 1, 2, . . . , Rk do
circuit with a pair of MOSFET devices. (b) Complex circuit where gates are
connected to other nodes. 11: Crossover
12: Mutate
13: Add Cj to Pg+1
design a circuit, we propose to split the design process into two 14: end for
distinct stages. The first stage (topology generator) employs 15: end for
an evolutionary algorithm to search for candidate circuit struc- 16: Speciate Pg+1
tures quickly. The second stage (circuit optimizer) performs 17: end for
an RL-based transistor size optimization on the generated
integrated circuits to maximize performance, while guaran-
teeing reliable operation under process variations. Each stage Since we aim to generate an optimal circuit topology
is described in detail in the following sections. without prior knowledge, we suggest employing an evolu-
tionary algorithm in the topology generator. NeuroEvolution
of Augmenting Topologies (NEAT) is a widely used evo-
A. Topology Generator
lutionary algorithm for exploring artificial neural network
In the topology generator, we represent each circuit topol- structures [25]. The algorithm starts from a simple network
ogy as a graph and employ a graph generation algorithm to with a single fully connected layer, and the network evolves
obtain candidate circuit structures. The graph-based method into more complex structures through crossover and mutation
in [4] gives an example of expressing digital circuits as a over generations. We modify the NEAT algorithm to make
graph, where pull-up and pull-down networks are generated it suitable for circuit topology generation; we introduce the
separately, and each transistor corresponds to an edge in the voltage concept into the NEAT algorithm, and the mutation
graph. The two nodes connected by an edge define the source functions and the properties of genes are heavily modified
and drain, whereas the gate connection is defined as one of aimed at circuit topology generation.
the node properties. However, this approach is not applicable Algorithm 1 details the proposed circuit topology gener-
to other types of integrated circuits that do not have separate ation algorithm. An offspring represents a candidate circuit
pull-up and pull-down paths, where N-channel and P-channel topology and has node and connection genes. The node gene
MOSFET devices can be placed more arbitrarily. Therefore, represents a node in the graph and has type, voltage and inno-
we propose a generalized graph representation method suit- vation number properties. The type property determines the
able for a broader range of integrated circuits shown in Fig. 2. type of the nodes (input port, output port, supply, ground, or
In our representation method, an edge (transistor) has gate internal net) and the voltage property represents its relative
and size properties, representing the net connected to the gate voltage, whereas the innovation number is a unique identi-
and transistor size. A Node has a type property that represents fier. The connection genes represent edges in the graph with
the net type (e.g., input port, output port, supply, ground, and in, out, size, gate, and innovation number properties. The in
internal net). Additionally, we introduce a new property volt- and out properties define two endpoints of the edge (source
age in the nodes. This property represents a relative voltage and drain of the transistor), and the size property defines the
of each node and has a range of [−1, 1]. The voltage of an relative strength of the transistor. As described above, since a
edge is obtained by averaging the voltages of the nodes on transistor is a three-terminal device, the node to which the gate
both ends (source and drain). The edges with positive voltage of the transistor is connected is defined by the gate property.
translate to P-channel MOSFET devices, whereas the edges The innovation number is a unique identifier. A population is
having zero or negative voltage represent N-channel MOSFET a set of all offspring of the current generation. The popula-
devices. This method allows for generating more generalized tion is divided into several species based on similarity. Each
circuit structures while preserving a common circuit property species has a base model, and the offspring close to the base
that P-channel MOSFET devices are typically placed near the model are included in the species.
power supply voltage, whereas N-channel MOSFET devices The topology generator first creates an initial population P0
are biased at lower voltages to maximize operation range. which consists of offspring with only three node genes and
5172 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 12, DECEMBER 2022

two connection genes: VDD , VSS , and an internal node con- that produce a desired output. Once the algorithm finds prop-
nected by one P-channel MOSFET & N-channel MOSFET erly working circuit topologies, the scores related to the hard
pair [Fig. 2(a)]. The gate of each transistor is randomly con- constraints saturate and do not affect the fitness. The remain-
nected to the node except VDD and VSS . In each generation, der of the evolution process further modifies the topology to
evolution begins by calculating the fitness of the species in improve circuit performance.
the current population. The algorithm converts each offspring The topology generator employs various mutation functions
into a netlist and runs a SPICE simulation to calculate the so that it can cover a wide range of circuit topologies. Note
offspring’s fitness based on the observed functionality and that the nodes without any connection (i.e., floating nodes)
performance. Then, the fitness of the offsprings included in the can be generated as a result of mutation. Hence, we label the
species is averaged to obtain the species fitness. The number nodes with one or more connections as active nodes, and only
of offspring that can reproduce from each species to the next active nodes are selected for mutation. The types of mutations
generation is determined in proportion to the species fitness. are discussed as follows.
During circuit topology generation, simulations are performed 1) Add Connection: This mutation randomly chooses two
only at the typical (TT) corner to scan large search spaces and active nodes and connects them by adding a new edge. Since
find the best candidates quickly. an edge corresponds to a transistor in the actual circuit, it links
Before reproducing a new population, the algorithm the gate of the new edge to one of the existing active nodes
observes whether the fitness of the best offspring in each by updating the gate property.
species has been improved or not in the last few generations. 2) Add Node: This inserts a new node in one of the edges.
If the fitness of a species does not improve any further for a In other words, a single transistor is replaced with two stacked
certain number of generations, then the species is considered transistors. The gates of the stacked transistors are connected
stagnant, and offsprings of that species are removed from the to the same node to which the gate of the original transistor
population. The evolution process is independently performed was connected. This process is often used when designing
for each species. First, the offspring with the highest fitness a circuit to increase output resistance or minimize leakage
in each species is automatically included in the population of current.
the next generation. Next, a set of offspring with the highest 3) Add P-Channel MOSFET and N-Channel MOSFET:
fitness within each species is selected as a parent pool. Two Pair A P-channel MOSFET and N-channel MOSFET pair
offspring are randomly chosen from the pool and compared makes a new connection between VDD and VSS . If a sin-
with each other, where the winner evolves through mutation gle P-channel or N-channel MOSFET transistor is placed
and joins the population of the next generation. between VDD and VSS , this will be just a current leaking path.
The fitness function represents the performance and relia- Therefore, we place transistors as a pair of P-channel and
bility of a circuit as a single value. We consider two types of N-channel MOSFET devices when making a new connection
design constraints for fitness calculation: 1) hard constraints between the supply rails.
and 2) soft constraints. The hard constraints are the set of 4) Change Gate: The gate of a transistor is connected to a
design constraints that a circuit must satisfy (e.g., rail-to-rail different active node except for VDD and VSS nodes.
output swing for level shifters), whereas the soft constraints 5) Remove Connection: This mutation randomly removes
indicate the design quality (e.g., power consumption and con- one of the connections, which allows for removing transis-
version delay of level shifters). The fitness of an offspring at tors from the current topology. This prevents the circuit from
the xth generation is calculated as continuously becoming larger.
 ⎧ ⎫ 6) Change Size: The size of the connection genes repre-
      ⎨  ⎬ sents the relative size (strength) of a transistor. Since our goal
fitx = αi f qi,x + f qi,x αj f qj,x (1)
⎩ ⎭ is to quickly go through a variety of circuit topologies and
i∈H i∈H j∈S
find promising candidates, we define each transistor’s strength
where fitx is the calculated fitness of an offspring, qi,x is the in only three steps: 1) strong; 2) medium; and 3) weak.
observed performance of the circuit in SPICE simulations cor- During mutation, the transistor size randomly changes in each
responding to the ith constraint, f (qi,x ) is the score function connection gene independently.
for each constraint, αi is the weight of the ith constraint, and 7) Change Output Port: This mutation changes the location
H and S represent the sets of hard and soft constraints, respec- of the output port. One of the active nodes is selected as an
tively. This is similar to the reward function used in RL for output.
circuit optimization in [19], but our approach has two distinct In the original NEAT algorithm, each mutation function is
differences: 1) we use log(qi,x ) instead of qi,x for the scores randomly selected in each mutation. Hence, multiple types of
that have a large dynamic range and 2) the contribution of soft mutations may be performed simultaneously. However, this
constraints in the fitness is regulated by the scores related to may result in an excessive amount of change in a circuit. For
the hard constraints, instead of using a hyper-parameter manu- instance, removing a transistor from the circuit and changing
ally tuned for a specific type of circuit. In early generations, it the gate connection of another transistor would produce a cir-
is highly likely that most offspring would fail to function prop- cuit with entirely different characteristics. Hence, we limit the
erly. The scores related to the hard constraints would be very mutation process to select only one of the add, remove or gate
low, making the fitness largely dictated by the hard constraints. change mutations (mutations 1 through 5 above). In addition,
Hence, the algorithm focuses on finding feasible topologies other minor mutations (mutations 6 and 7) are independently
HONG et al.: AUTOMATIC CIRCUIT DESIGN FRAMEWORK FOR LEVEL SHIFTER CIRCUITS 5173

introduced with a certain probability. Let PaddNode , PaddCon , in continuous or high-dimensional action spaces. An agent
PaddPair , PchangeGate , and PrmCon denote the probability of collects and saves a sample into a replay memory. Then, a
mutations 1 through 5 above. Then, the mutation process minibatch is randomly selected from the replay memory to
follows the equation below: train the network. While PPO also has an actor–critic struc-
ture suitable for training in continuous or high-dimensional
PaddNode + PaddCon + PaddPair + PchangeGate + PrmCon = 1 (2) action space, it does not have a replay memory. Instead, N
During topology exploration, we do not want the algorithm agents collect samples in parallel during an episode which
to keep adding transistors indefinitely. Otherwise, the number consists of T time steps, and a minibatch is constructed using
of transistors in a circuit may explode, and the resulting cir- the collected samples and used for training the algorithm.
cuit would be far from what we desire. For instance, an ideal Then, all the samples are discarded. DDPG exhibits slower
analog amplifier or level shifter circuit typically has tens of convergence during training since it only uses one agent con-
transistors at most. Therefore, we balance the expected num- trary to PPO, but has the advantage of being able to reuse the
ber of removed and added transistors in each mutation by samples stored in the replay memory. PPO trains the model
enforcing the relationship as follows: more quickly by using multiple agents, but it only uses the
samples collected in the current episode for training, which
2PaddNode + PaddCon + 2PaddPair − PrmCon = 0 (3) reduces sample efficiency. In circuit optimization, samples
are obtained by running time-consuming SPICE simulations.
since adding a node (a net in the circuit) adds two transistors,
Therefore, it is crucial to maximize sample efficiency (i.e.,
whereas adding or removing a connection adds or removes a
reduce the number of samples required for algorithm con-
single transistor in the circuit.
vergence) to speed up the circuit optimization process. To
After a new generation is obtained by mutating all the off-
resolve this issue, we adopt distributed distributional DDPG
spring of the current generation, the newly generated offspring
(D4PG) [28] algorithm in the circuit optimizer. D4PG sup-
are grouped again into a set of species. Each offspring is com-
ports both multiagent training and sample reuse by using a
pared to the base models of existing species. If the number
replay memory. Unlike DDPG and PPO which express future
of differences in the connection genes is below the thresh-
rewards as a single scalar value, D4PG expresses rewards as
old for one or more existing species, then the offspring joins
a probability distribution. It models the inherent uncertainty
the closest species. Otherwise, the offspring constitutes a new
imposed by function approximation in a continuous environ-
species and becomes its base model. After the grouping pro-
ment, resulting in better gradients and improving the training
cess is done, the population undergoes another iteration of the
performance compared to DDPG. It also shows more stable
mutation process to obtain the next generation. This process
performance when multiple agents are used [28].
continues until it reaches the maximum number of generations
In the RL algorithms using actor–critic structure, two differ-
defined by the user.
ent neural networks are typically employed: an actor network
The topology generator selects candidate topologies both
and a critic network. The actor network takes a state vector
during and at the end of the evolution. When a stagnant species
as an input and produces an action vector, whereas the critic
is removed during evolution, the offspring with the best fitness
network takes state and action vectors as inputs and predicts
in that species is selected and added to the candidate list if
the reward value an agent is expected to receive as a result
it meets all the given design constraints. When the algorithm
of the current and future actions. The RL algorithm trains
finishes the last iteration, the same operation is performed on
those neural networks on the observed samples. As the com-
all the remaining species. Note that there may exist floating
plexity of the neural network increases with the dimension of
nodes and floating paths as a result of mutation. Before adding
input vectors, it is important to minimize the dimension of
an offspring to the candidate list, the topology generator finds
the input vector for faster optimization. Since the action vec-
and removes the floating nodes and paths.
tor represents relative size changes of all the transistors in the
circuit, its dimension is fixed. Hence, we aim to optimize the
B. Circuit Optimizer critic network by reducing the dimension of the state vector.
The topology generator is aimed at quickly finding promis- Specifically, we use the simulated circuit performance (e.g.,
ing circuit topologies. Hence, each transistor is only roughly power consumption and delay) and area as a state, instead of
sized during exploration (e.g., strong, medium, or weak). This feeding each transistor’s size or other characteristics (e.g., Vth ,
accelerates the search process by significantly limiting the Vsat , and μ0 ) as did in prior works [19]–[21]. Therefore, the
search space, but the size of each transistor must be further dimension of the state vector is independent of the number of
tuned for optimal performance. For this purpose, we employ an transistors in the topology and the optimization process can be
additional circuit optimizer as the second stage in the proposed efficiently accelerated when the target circuit topology consists
circuit design framework. of many transistors.
The circuit optimizer adopts a RL algorithm to optimize The actor network creates an action based on the state
candidate integrated circuits. Various RL algorithms have been obtained by SPICE simulations. An action represents a rel-
used for circuit optimization. L2DC [19] and GCN-RL [21] are ative change in the size (width, length, and multiplier) of each
based on deep deterministic policy gradient (DDPG) [26], and transistor. If the target topology has N transistors in total, the
AutoCkt [20] adopts proximal policy optimization (PPO) [27]. dimension of the action vector would be 3N. Each dimension
DDPG has an actor–critic structure and generally works well of the action vector has a value in [−1, 1]. Then, the amount
5174 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 12, DECEMBER 2022

of change in the size of the ith transistor is Algorithm 2 Circuit Optimizer


  Learner
sizemax − sizemin
Size = round Action · (4) Input: Number of Steps in Episode N, Batch Size M,
LmaxStep Replay Memory Size R, Learning Rates α0 and β0 ,
where Size is the amount of change in transistor size, Action Multi-Update Parameter U
is the output of the actor network, LmaxStep is the number of 1: Determine Network Size by Analyzing Netlist
steps in one episode, and sizemax and sizemin are the allowable 2: Initialize Network Weights with Kaiming Initialization
maximum and minimum transistor sizes. This translates to the 3: for i = 1, 2, . . . , N do
maximum size change that can occur in one episode equal to 4: Wait for Samples from Agents
sizemax − sizemin . The size values are real numbers, so they 5: for j = 1, 2, . . . , U do
are rounded to the closest values allowed in the given process 6: Randomly Choose M Samples from Replay
before converted to an actual circuit. Memory
In D4PG, when a sample collected by one of the agents is 7: Compute Updates of Actor and Critic Networks
stored in the replay memory, a minibatch is created by ran- Using Samples
domly choosing samples from the replay memory. However, 8: Update Network Parameters
as the training progresses, the amount of samples stored in 9: end for
the memory becomes larger; thus, only a fraction of stored 10: end for
samples is used to generate a minibatch, reducing sample
efficiency. In addition, the learner updates the network only Agent
once in each time step, and the SPICE simulation to obtain a Input: Number of Steps in Episode N, Number of Actors P,
new sample becomes the processing bottleneck. To address Episode Early Stopping Interval T
these issues, we propose to adopt a multiupdate technique
1: repeat
that has been used for unbiased learning. When a sample
2: Initialize Episode
is obtained by the agent and stored in the replay memory,
3: Copy Actor Network from Learner
unlike the conventional method of circuit optimization that cre-
4: for step = 0, . . . , K do
ates one minibatch from the stored samples, we create several
5: Get Action from Actor Network and Change Size
minibatches and update the critic and actor networks multiple
6: Simulate and Calculate State (s) and Reward (r)
times. This accelerates the circuit optimization process with-
7: Send Sample to Learner
out time overheads since multiple updates could be performed
8: end for
while SPICE simulations are running. This scheme also allows
9: Increase K every T Episodes
for unbiased learning through random sampling that removes
10: until Learner Finishes
correlation between minibatches, reducing the possibility of
overfitting.
At the beginning of training, the actor network tends to gen-
erate the same action even if the state changes gradually in of steps in each episode, and the episode finally proceeds with
each step. In other words, the size of a transistor continues the maximum number of steps defined by a hyperparameter.
to increase or decrease regardless of the current state. This is This technique allows the network to learn more stably while
because the output is close to either 1 or −1 in most cases acquiring more meaningful samples near the initial point in
when the actor network weights are randomly initialized. The early episodes.
actor network typically uses the tanh function as the activa- Algorithm 2 details the proposed circuit optimizer. The
tion function. In a randomly initialized network, the output exploration agent gets an action by entering the current state
of the network, which is the input to the final tanh activation into the actor network in each step. The algorithm uses the
function, typically has an absolute value of 2 or larger, ren- network output (action) to change the size of transistors with
dering the final output close to ±1. This effect is amplified by the corresponding action values and runs a SPICE simula-
the fact that circuit performance is converted to a state using tion to obtain the reward and the state. The reward function
a logarithmic function. Even if the state changes, the sign of in the circuit optimizer is identical to the fitness function (1)
the action which determines size change direction (increase or employed in the topology generator, except that the scores
decrease) is likely to stay the same. In addition, the weights are obtained at different process corners, as explained later in
of the actor network in each agent are updated only when an this section. The current state, the action, the next state, and
episode ends, and they remain fixed for all the steps within an the calculated reward constitute a single sample and are writ-
episode. Therefore, in the first few episodes, the sizes of many ten to the replay memory. Each time a new sample is sent
transistors just move to the minimum or maximum value. This to the replay memory, the optimizer creates multiple mini-
severely hinders circuit optimization by moving the design far batches to update the neural networks. This update process
from the initial point, which is already a near-optimal design continues until it reaches a user-defined maximum number of
found in the circuit topology generator. To solve this problem, steps. Then, the best set of the parameters found in the course
we propose an episode early stopping technique that limits the of training is selected as the final design.
number of steps in an episode in the early stage of training. L2DC [19] uses a recursive neural network (RNN) in the
As the learning progresses, it gradually increases the number actor network and multilayer perceptron (MLP) as the critic
HONG et al.: AUTOMATIC CIRCUIT DESIGN FRAMEWORK FOR LEVEL SHIFTER CIRCUITS 5175

TABLE I
network. However, RNN is typically hard to train due to the E XPERIMENTAL S ETUP FOR L EVEL S HIFTER D ESIGN
vanishing and exploding gradient problems [29]. Also, the
state is composed of the observed values (e.g., gm and Vth )
of each transistor, and the order is determined by the sig-
nal path of the circuit, necessitating manual examination of
the circuit topology. Instead, we use an MLP as the actor as
did in AutoCkt [20], where the specifications of topology are
combined into a state vector in an arbitrary order. Also, we
initialize the weights of the MLP following the method in [30].
While the circuit optimizer primarily focuses on maximizing
circuit performance, it is also very important to guarantee that
the circuit properly operates under process variations. Contrary
to prior works on circuit optimization [19]–[21], we run SPICE to digital circuits [22]. On the other hand, the internal opera-
simulations at five different process corners (TT, FF, SS, FS, tion is similar to that of analog circuits such as amplifiers. In
and SF). The optimizer constantly observes if the circuit meets experiments, we adopt the framework to design level shifter
the hard constraints at all corners during optimization. On the circuits in a 180-nm CMOS process, and the resulting circuits
contrary, the scores related to the soft constraints are only mea- are compared to prior designs reported in the literature.
sured at TT corner. This allows the circuit to exhibit maximum A level shifter circuit converts a low-voltage (VDDL ) digi-
performance at the corner of most concern while still guaran- tal signal to a high-voltage (VDDH ) signal. Level shifters must
teeing proper functionality in the worst cases. Note that the generate a rail-to-rail swing between the ground and VDDH at
Monte-Carlo analysis better captures the robustness of a circuit the output. Therefore, we use output signal swing as a hard
under process variation. However, since the size of each tran- constraint in the framework. Because level shifters are typi-
sistor continues to change during optimization, adopting the cally expected to operate with high conversion speed and low
Monte-Carlo analysis will require a large number of SPICE power consumption with minimal footprint [31], we use the
simulations in each time step, incurring a large time overhead. delay, total power (Ptotal ), static power (Pstatic ), and area as
On the contrary, the corner analysis requires only a few sim- soft constraints. The circuit area is calculated as the number
ulations for each design point and, hence, is more suitable for of transistors in the topology generator, whereas the circuit
fast optimization. optimizer uses the total active area.
Table I shows the score functions used in each step. The
scores related to the soft constraints are calculated as -log(qi,x)
III. E XPERIMENT R ESULT except for the area in the topology generator, whereas the score
In the previous section, we presented an unified circuit for the hard constraint (output swing) is calculated as the swing
design framework that automatically generates appropriate cir- observed in simulation divided by VDDH . In topology gener-
cuit topologies and further optimizes each design through ation, the area is calculated as the number of transistors in
finding an optimal size of each transistor. In this section, we the circuit. When the number of transistors exceeds a thresh-
experimentally verify the proposed circuit design framework. old (b in Table I), the score is divided by a slope which is a
By employing the framework to design level shifter circuits, hyperparamter. In the circuit optimizer, we use the worst val-
we demonstrate that the topology generator produces novel ues across all process corners when calculating the score for
level shifter topologies, and the circuit optimizer successfully the hard constraint. Soft constraint scores are obtained at the
improves the design. Finally, the resulting level shifter designs TT corner.
are fabricated and compared against prior arts designed by
human experts. All experiments are conducted on a work- B. Topology Generation
station running CentOS 7.4 with two Intel E5-2687W v4
The topology generator runs seven SPICE simulations in
processors, 128-GB DRAM, and an Nvidia GTX Titan X
parallel only at the TT corner for fast topology search. The
GPU. The topology generator only uses the processors whereas
input inverter of level-shifter is implemented using low thresh-
the circuit optimizer uses both the processors and GPU.
old voltage (Vth ) devices, whereas the other transistors are
standard Vth devices. We use a minimum-sized transistor with
A. Level Shifter Design 180-nm channel length and 220-nm channel width as a weak
We choose a level shifter circuit as a test vehicle for our device. Medium and strong devices have 2× and 4× larger
framework since it is an active research area where new circuit channel width, respectively. The initial population size is set
topologies are continuously developed. There are many differ- to 450, and the population evolves for 600 generations, which
ent topologies, and an optimal topology varies with the design takes approximately 5 h 30 min. In addition, we experiment
constraints [24]. Therefore, the effectiveness of our framework with varying the soft constraint weights in the fitness func-
that is capable of finding optimal circuit topologies could be tion to observe how the topology generator performs under
verified more clearly. In addition, level shifter circuits share different design constraints. The generated circuit topologies
common properties both with digital and analog circuits. For are displayed in Fig. 3. Specifically, three cases are tested: 1)
instance, level shifters operate on a rail-to-rail input signal all the constraints have the same weight [C1 in Fig. 3]; 2)
and produce a rail-to-rail output in a higher voltage, similar only the weight of static power is lowered (C2 and C3); and
5176 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 12, DECEMBER 2022

(a) (b) (c) (d) (e)

Fig. 3. Level shifter circuit topologies generated by topology generator (a) C1. (b) C2. (c) C3. (d) C4. (e) C5.

TABLE II
R ESULTS OF T OPOLOGY G ENERATION C. Circuit Optimization
In experiments, we use an MLP with three hidden layers
and 200 nodes in each layer as the actor network. The critic
network has the same structure but has two hidden layers.
First, we evaluate each of the proposed RL optimization tech-
niques using WCMLS circuit, which is widely adopted in level
shifter designs [22]–[24]. Experiments are performed using a
total of seven actors, where one of them is used to estimate
the performance of the optimization algorithm in real time
(evaluation actor). Thirty thousand SPICE simulations are run
across all the actors except the evaluation actor, which takes
2 h 20 min. Since the RL algorithm has some degree of ran-
domness, we test each configuration on three independent runs
to observe its reliability. Fig. 5 summarizes the experimen-
tal results. Fig. 5(a) shows that conventional D4PG fails to
converge in two of the three runs. However, when the mul-
tiupdate technique with U = 10 is applied, the algorithm
successfully finds a correct optimization direction and properly
biases transistors in the circuit after about 7000 SPICE sim-
(a) (b) ulations [Fig. 5(c)]. Fig. 5(e) shows the optimization results
when the episode early stopping method is also employed.
Fig. 4. Experimental results of topology generation. (a) Fitness. (b) Number Initially, an episode stops only after four steps, and the episode
of species. length increases by two after every five episodes in each
exploration agent until it reaches the maximum length of
20. This method reduces the number of SPICE simulations
3) the weight of delay is increased while the weights of static required to capture the bias points from 7000 to 5000, sug-
and total powers are decreased (C4 and C5). Table II sum- gesting that this technique accelerates RL training in the early
marizes the performance and fitness of the generated circuits stage. Note that the algorithm shows more fluctuation during
(C1–C5). Simulation results show that the circuits generated optimization when the early stopping method is adopted. We
with lower static power weight (C2 and C3) exhibit higher suspect that the conventional approach is exposed to more
static power than C1. In addition, the circuits optimized for “bad” samples, which are far from the initial nearly opti-
delay (C4 and C5) achieve lower delay than C1 at the expense mized design from the topology generator, in the early stages
of power consumption increases. of training. Those samples exhibit very low rewards as they
We also perform three independent runs of circuit topology do not meet the hard constraints. As a result, the actor is
generation to estimate algorithm stability. Fig. 4 shows the trained to be more conservative, and once the design enters
fitness and the number of species as the evolution proceeds. the near-optimal region where the hard constraints are sat-
The best fitness, which is the fitness of the best circuit in isfied, the algorithm tends to stay near that point only with
the population, rapidly increases in the first 7–9 generations, fine tuning to avoid a large drop in the reward value. On the
and then gradually improves through fine tuning of the circuit contrary, the episode early stopping method allows the design
topology. Note that the value of fitness is not capped at a fixed to enter the near-optimal region quickly, significantly reduc-
value. While the fitness of a circuit can have an arbitrary value, ing the number of bad samples during initial training. When
the generated circuits exhibit fitness values less than 30 in our the design approaches an optimal point during optimization,
experiments. The number of species is nearly constant during the algorithm now searches for better design points more
evolution except in the first few generations, suggesting that aggressively. In other words, the algorithm is less reluctant
stagnant species are replaced with a similar number of new to depart from the local optima, which helps find a global
species. optimum.
HONG et al.: AUTOMATIC CIRCUIT DESIGN FRAMEWORK FOR LEVEL SHIFTER CIRCUITS 5177

(a) (b) (a) (b)

Fig. 6. Reward trends of alternative approaches for comparisons. (a) Scalar


value critic. (b) Action scaling.

(c) (d)

(a) (b)

Fig. 7. Trends of output swing ratio when the circuit is optimized (a) at
TT corner only and (b) at all process corners. Considering process corners
significantly improves reliability.

apply the scalar value critic used in DDPG to our algorithm.


With both multiupdate and episode early stopping applied,
Fig. 6(a) shows that using the scalar value critic results in
(e) (f) more unstable training convergence compared to Fig. 5(e).
The episode early stopping method effectively limits the
Fig. 5. Trends of reward improvement with and without proposed algorith- agent’s exploration capability in the early stages of training,
mic optimization techniques. Proposed techniques enable a faster and more and a similar effect could be achieved by scaling the output
stable optimization process. (a) D4PG. (b) DDPG. (c) D4PG with multiup-
date. (d) DDPG with multiupdate. (e) D4PG with multiupdate and episode
of the actor network. We conducted additional experiments in
early stopping. (f) DDPG with multiupdate and episode early stopping. which we multiplied the output of the actor network with a
scaling factor before passing it to the environment. The scal-
ing factor is set to 0.2 at first and is increased by 0.1 every
For comparison, we experiment with the DDPG algorithm five epochs, which translates to the maximum amount of size
adopted in prior work [19] using the same environment. D4PG, change in each episode identical to the episode early stopping
which is employed in our framework, is similar to DDPG method. Experimental results are displayed in Fig. 6(b). It can
except that it uses multiple agents in parallel, and the output of be seen that this scaling method results in a slower conver-
the critic network is represented as the probability distribution. gence. We suspect that this is because the actor network is
The DDPG algorithm is trained for 30 000 SPICE simulations not properly trained in early episodes due to the continuously
in total, and the total running time is 14 h 30 min. This is more changing scaling factor. More specifically, the actor network
than six times longer than the time required for our approach to is trained in a way to generate the best action for the cur-
process the same number of SPICE simulations, which con- rent state. However, the output of the actor network is scaled
firms the effectiveness of the multiagent training of D4PG. before being applied to the environment and, hence, the actor
The experimental results are displayed in Fig. 5. We exper- should take this into account during training. Since we are
imented with a vanilla DDPG algorithm [Fig. 5(b)], DDPG now changing the scaling factor, the actor should be trained
with multiupdate [Fig. 5(d)], and DDPG with both techniques in different directions as the optimization process continues,
[Fig. 5(f)]. Experimental results show that DDPG exhibits hindering proper training.
larger variations between runs and unstable training conver- During optimization, our framework considers multiple pro-
gence compared to our approach. To observe how the type cess corners to make sure the circuit properly works under
of critic affects the training performance, we experimentally process variations. Fig. 7 compares our approach to the
5178 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 12, DECEMBER 2022

TABLE III
E XPERIMENTS W ITH D IFFERENT W EIGHTS IN C IRCUIT O PTIMIZATION

TABLE IV (a) (b) (c)


R ESULTS OF O PTIMIZING G ENERATED C IRCUITS

conventional method that observes the circuit performance at


the TT corner only. When the circuit is optimized only at the
TT corner, the relative output voltage swing reaches 0.95 at
the same corner, but the design may produce much smaller
swing at different corners [Fig. 7(a)]. On the other hand, if (d) (e)
we obtain the score related to the hard constraint at the worst
corner during optimization, the resulting circuit achieves >0.95 Fig. 8. Layout of generated circuits and their size. (a) C1 (4.7 μm × 8.6 μm).
output swing at all the corners. (b) C2 (3.5 μm × 11.2 μm). (c) C3 (4.5 μm × 10.3 μm). (d) C4
(4.0 μm × 8.4 μm). (e) C5 (4.0 μm × 8.2 μm).
Similar to topology generator, we also experiment with
changing the weights of the soft constraints. The sum of the
weights is fixed, and their values are allocated differently
in each case. Table III summarizes experimental results for
180-nm process. Since the framework only provides a netlist
optimization with 32 000 SPICE simulations. The last col-
as the output, the layout was manually drawn, as shown in
umn shows the actual weights of the soft constraint of interest
Fig. 8. The input inverter supplied by VDDL is included in
and the others. As expected, increasing the weight for total
the layout. It is difficult to measure the conversion delay of a
power consumption further reduces power consumption dur-
level shifter accurately, since parasitic components (e.g., I/O
ing optimization while sacrificing delay and area since the
cell, PCB trace, and bond wire) also contribute to the delay.
circuit is subject to a tradeoff between delay, power, and area.
Hence, we adopt the dual-path measurement method in [32].
Similarly, using a higher weight for delay produces a faster
Two different paths with and without a level shifter are imple-
level shifter circuit at the expense of power and area increase.
mented, and the conversion delay is indirectly measured by
Finally, we apply our circuit optimizer to the circuits gen-
subtracting their delays as depicted in Fig. 9(a). The VDDL
erated by the topology generator (C1–C5). Similar to previous
inverter (colored gray in the figure) converts a high-voltage
experiments, we use seven actors in the RL algorithm, where
input to a low-voltage signal, which is later converted back
one of them is used as an evaluation actor. For each circuit
to VDDH by the level shifter. Each level shifter has a dedi-
topology, 38 000 SPICE simulations were performed except
cated power supply rail to measure its power consumption.
the evaluation actor, and the multiupdate constant U was set
A different level shifter can be selected by a control signal
to 13. The optimizer successfully improved all the generated
to the multiplexer and demultiplexer. The conversion delay is
circuit topologies, which is verified by comparing the results in
measured as the difference in arrival times of OUT and REF
Table IV to the results in Table II. Note that the area represents
signals. Fig. 9(b) shows the top-level layout of the test chip,
the total active area, not the actual layout size.
and Fig. 9(c) is the chip micrography.
Table V displays measurement results and comparisons
D. Test Chip Fabrication against recent level shifter circuits reported in the literature
To validate level shifter circuits designed by our framework, (Fig. 10). Note that the performance of the baseline cir-
we fabricated the generated and optimized circuits C1–C5 in a cuits (B1–B5) are simulation results obtained from [31]. In
HONG et al.: AUTOMATIC CIRCUIT DESIGN FRAMEWORK FOR LEVEL SHIFTER CIRCUITS 5179

(a) (b) (c)

(a)

(d) (e)

Fig. 10. Baseline level shifter designs from prior work. (a) B1 [22].
(b) B2 [23]. (c) B3 [34]. (d) B4 [35]. (e) B5 [36].

(b) (c)

Fig. 9. (a) Delay measurement circuit, (b) test chip layout, and (c) test chip (a) (b)
micrography.
Fig. 11. AND gates generated by topology generator.
TABLE V
M EASUREMENT R ESULTS OF G ENERATED C IRCUITS

outperforming baseline circuits. To determine the lowest pos-


sible voltage that the level shifters could handle, we also
experimented with a 100-Hz input signal and checked if the
output shows full swing. In this case, the generated level
shifters achieve significantly lower VDDLmin less than 100 mV.

E. Applicability of Topology Generator


We conduct further experiments to observe if the proposed
topology generator could be used for designing other types
of circuits. For experiments, the topology generator is tested
on both digital (AND gate) and analog (differential amplifier)
circuits. In both cases, the algorithm starts with a P-channel
measurements, all of the generated circuits (C1–C5) success- MOSFET and N-channel MOSFET pair as the initial offspring
fully perform level conversions. Measurement results show and an initial population size of 600. For AND gate, the pop-
that our designs consume much smaller power consumption ulation evolves for 300 generations. We use a minimum-sized
during conversion with similar or lower conversion delay. transistor with 180-nm channel length and 220-nm channel
More specifically, our designs exhibit 2.6×–4.7× lower total width as a weak device. Medium and strong devices have 2×
power consumption than the design with the lowest power and 4× larger channel width, respectively. The topology gen-
consumption (B3) and 1.0×–1.7× larger conversion delay erator successfully produces a standard AND gate composed
than the fastest design (B5). In addition, our designs occupy of a NAND gate and an inverter as shown in Fig. 11(a). The
1.5×–2.1× smaller area than the smallest design (B1). The left part of the circuit in Fig. 11(b) is similar to a standard
power-delay product (PDP) is a metric commonly used for NAND gate, but the output is not fully pulled up since one of
comparing level shifter circuits [24], [31], [33], and the gen- the pMOS devices is connected to an internal node. However,
erated circuits achieve 2.8×–5.3× lower PDP than the baseline the additional pMOS keeper fully pulls up the output node,
circuits. providing a rail-to-rail output.
VDDLmin represents the minimum input voltage that a For the amplifier design, the population evolves for 600 gen-
level shifter can convert to a high voltage signal. VDDLmin erations. Since analog circuits often require proper biasing, a
was first measured for the input with 1-MHz frequency. bias node that supplies a dc voltage is introduced in the algo-
Generated circuits (C1–C5) achieve 320 mV or lower VDDLmin , rithm. In addition, five sizing options are used for topology
5180 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 12, DECEMBER 2022

[5] H. Y. Koh, C. H. Séquin, and P. R. Gray, “OPASYN: A computer


for CMOS operational amplifiers,” IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst., vol. 9, no. 2, pp. 113–125, Feb. 1990.
[6] W. Kruiskamp and D. Leenaerts, “DARWIN: CMOS opamp synthesis
by means of a genetic algorithm,” in Proc. 32nd Annu. ACM/IEEE Des.
Autom. Conf., 2002, pp. 139–144.
[7] T. McConaghy, P. Palmers, M. Steyaert, and G. G. Gielen, “Variation-
aware structural synthesis of analog circuits via hierarchical building
blocks and structural homotopy,” IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst., vol. 28, no. 1, pp. 1281–1294, Sep. 2009.
(a) (b) [8] M. Meissner and L. Hedrich, “FEATS: Framework for explorative analog
topology synthesis,” IEEE Trans. Comput.-Aided Design Integr. Circuits
Fig. 12. Differential amplifiers generated by topology generator. Syst., vol. 34, no. 2, pp. 213–226, Feb. 2015.
[9] Z. Zhao and L. Zhang, “Graph-grammar-based analog circuit topology
synthesis,” in Proc. IEEE Int. Symp. Circuits Syst., 2019, pp. 1–5.
[10] J. R. Koza, F. Dunlap, F. H. Bennett, M. A. Keane, J. Lohn,
generation. We use a transistor with 720-nm channel length and D. Andre, “Automated synthesis of computational circuits using
genetic programming,” in Proc. IEEE Int. Conf. Evol. Comput., 1997,
and 220-nm channel width as a baseline. Two stronger devices pp. 447–452.
have 2× and 4× larger channel width, respectively, whereas [11] J. D. Lohn and S. P. Colombano, “A circuit representation technique
two weaker devices have 2× and 4× larger channel length, for automated circuit design,” IEEE Trans. Evol. Comput., vol. 3, no. 3,
pp. 205–219, Sep. 1999.
respectively. The topology generator successfully generates [12] Y. Sapargaliyev and T. G. Kalganova, “Unconstrained evolution of ana-
circuit topologies that are similar to widely used amplifier cir- logue computational ‘QR’ circuit with oscillating length representation,”
cuits. The amplifier circuit in Fig. 12(a) is a self-biased 5T in Proc. Int. Conf. Evolvable Syst., Sep. 2008, pp. 1–10.
[13] J. Slezák and J. Petržela, “Evolutionary synthesis of cube root compu-
OTA (Operational Transconductance Amplifier) circuit [37], tational circuit using graph hybrid estimation of distribution algorithm,”
and the circuit in Fig. 12(b) is a low-voltage pseudodifferential Radioengineering, vol. 23, no. 1, p. 549, 2014.
amplifier [38], [39]. [14] B. Liu, Q. Zhang, F. V. Fernàndez, and G. G. E. Gielen, “An efficient
evolutionary algorithm for chance-constrained bi-objective stochastic
optimization,” IEEE Trans. Evol. Comput., vol. 17, no. 6, pp. 786–796,
Dec. 2013.
IV. C ONCLUSION [15] P. P. Prajapati and M. V. Shah, “Two stage CMOS operational amplifier
In this work, we proposed an automatic circuit design design using particle swarm optimization algorithm,” in Proc. IEEE UP
Section Conf. Electr. Comput. Electron., 2015, pp. 1–5.
framework for level shifter circuits. To design a circuit with- [16] R. A. Thakker, M. S. Baghini, and M. B. Patil, “Low-power low-voltage
out preconstructed building blocks and prior knowledge, the analog circuit design using hierarchical particle swarm optimization,” in
framework implements a two-step design process using the Proc. Int. Conf. VLSI Des., 2009, pp. 427–432.
[17] W. Lyu et al., “An efficient Bayesian optimization approach for auto-
topology generator and the circuit optimizer. We first proposed mated optimization of analog circuits,” IEEE Trans. Circuits Syst. I, Reg.
a new graph-based circuit representation, and the topology Paper, vol. 65, no. 6, pp. 1954–1967, Jun. 2018.
generator employs an evolutionary algorithm to search for pos- [18] B. He, S. Zhang, F. Yang, C. Yan, D. Zhou, and X. Zeng, “An efficient
Bayesian optimization approach for analog circuit synthesis via sparse
sible circuit topologies quickly, considering the given design Gaussian process modeling,” in Proc. Design Autom. Test Eur. Conf.
constraints. Then, the circuit optimizer utilizes RL to fine-tune Exhib., 2020, pp. 1463–1468.
the size of each transistor, where we adopt various algorithmic [19] H. Wang, J. Yang, H.-S. Lee, and S. Han, “Learning to design circuits,”
2018, arXiv:1812.02734.
optimizations, such as multiagent training, process variation- [20] K. Settaluri, A. Haj-Ali, Q. Huang, K. Hakhamaneshi, and B. Nikolic,
aware optimization, multiupdate, and episode early stopping “AutoCkt: Deep reinforcement learning of analog circuit designs,” in
to improve sample efficiency. In experiments, the framework Proc. Design Autom. Test Eur. Conf. Exhibition, 2020, pp. 490–495.
[21] H. Wang et al., “GCN-RL circuit designer: Transferable transistor sizing
was applied to designing level shifter circuits. The topology with graph neural networks and reinforcement learning,” in Proc. Design
generator produced novel level shifter topologies, and they Autom. Conf., 2020, pp. 1–6.
are successfully optimized by the circuit optimizer. Fabricated [22] S. Lütkemeier and U. Rückert, “A subthreshold to above-threshold
level shifter comprising a Wilson current mirror,” IEEE Trans. Circuits
in a 180-nm CMOS process, the test chip demonstrates that Syst. II, Exp. Briefs, vol. 57, no. 9, pp. 721–724, Sep. 2010.
the automatically designed circuits achieve 2.8×–5.3× lower [23] S.-C. Luo, C.-J. Huang, and Y.-H. Chu, “A wide-range level shifter using
PDP than manually designed level shifter circuits reported in a modified wilson current mirror hybrid buffer,” IEEE Trans. Circuits
Syst. I, Reg. Paper, vol. 61, no. 6, pp. 1656–1665, Jun. 2014.
the literature. [24] S. Kabirpour and M. Jalali, “A power-delay and area efficient voltage
level shifter based on a reflected-output Wilson current mirror level
shifter,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 67, no. 2,
R EFERENCES pp. 250–254, Feb. 2020.
[25] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through
[1] L. Lavagno, L. Scheffer, and G. Martin, EDA for IC Implementation, augmenting topologies,” Evol. Comput., vol. 10, no. 2, pp. 99–127, 2002.
Circuit Design, and Process Technology. Boca Raton, FL, USA: CRC [26] T. P. Lillicrap et al., “Continuous control with deep reinforcement
Press, 2016. learning,” Sep. 2015, arxiv:1509.02971.
[2] O. Aaserud and I. R. Nielsen, “Trends in current analog design—A panel [27] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
debate,” Analog Integr. Circuits Signal Process., vol. 7, no. 1, pp. 5–9, “Proximal policy optimization algorithms,” Jul. 2017, arxiv:1707.06347.
1995. [28] G. Barth-Maron et al., “Distributed distributional deterministic policy
[3] M. C. Golumbic, A. Mintz, and U. Rotics, “An improvement on the com- gradients,” Apr. 2018, arxiv:1804.08617.
plexity of factoring read-once Boolean functions,” Discr. Appl. Math., [29] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of train-
vol. 156, no. 10, pp. 1633–1636, May 2008. ing recurrent neural networks,” in Proc. Int. Conf. Mach. Learn., 2013,
[4] V. N. Possani, V. Callegaro, A. I. Reis, R. P. Ribas, F. De Souza Marques, pp. 1310–1318.
and L. S. Da Rosa, “Graph-based transistor network generation method [30] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
for supergate design,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Surpassing human-level performance on ImageNet classification,” in
vol. 24, no. 2, pp. 692–705, Feb. 2016. Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
HONG et al.: AUTOMATIC CIRCUIT DESIGN FRAMEWORK FOR LEVEL SHIFTER CIRCUITS 5181

[31] S. R. Hosseini, M. Saberi, and R. Lotfi, “A high-speed and power- Sunghoon Kim (Graduate Student Member, IEEE)
efficient voltage level shifter for dual-supply applications,” IEEE Trans. received the B.S. degree in electrical engineering
VLSI Syst., vol. 25, no. 3, pp. 1154–1158, Mar. 2017. from Seoul National University, Seoul, South Korea,
[32] R. Lotfi, M. Saberi, S. R. Hosseini, A. R. Ahmadi-Mehr, and in 2018, where he is currently the Ph.D. degree with
R. B. Staszewski, “Energy-efficient wide-range voltage level shifters the Graduate School of Convergence Science and
reaching 4.2 fJ/transition,” IEEE Solid-State Circuits Lett., vol. 1, no. 2, Technology.
pp. 34–37, Feb. 2018. His current research interests include radiation
[33] S. Kabirpour and M. Jalali, “A low-power and high-speed voltage level hardening circuit, mixed-signal machine learning
shifter based on a regulated cross-coupled pull-up network,” IEEE Trans. hardware, and computation in memory.
Circuits Syst. II, Exp. Briefs, vol. 66, no. 6, pp. 909–913, Jun. 2019.
[34] M. Lanuzza, P. Corsonello, and S. Perri, “Fast and wide range voltage
conversion in multisupply voltage designs,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 23, no. 2, pp. 388–391, Feb. 2015.
[35] Y. Osaki, T. Hirose, N. Kuroki, and M. Numa, “A low-power level shifter
with logic error correction for extremely low-voltage digital CMOS
LSIs,” IEEE J. Solid-State Circuits, vol. 47, no. 7, pp. 1776–1783,
Jul. 2012.
[36] S. R. Hosseini, M. Saberi, and R. Lotfi, “A low-power subthreshold to
above-threshold voltage level shifter,” IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 61, no. 10, pp. 753–757, Oct. 2014.
[37] B. A. Chappell et al., “Fast CMOS ECL receivers with 100-mV worst-
case sensitivity,” IEEE J. Solid-State Circuits, vol. 23, no. 1, pp. 59–67,
Feb. 1988.
[38] C. J. A. Gomez, H. Klimach, E. Fabris, and O. E. Mattia, “High PSRR
nano-watt MOS-only threshold voltage monitor circuit,” in Proc. IEEE
Symp. Integr. Circuits Syst. Design (SBCCI), 2015, pp. 1–6.
[39] A. Shankar, J. Silva-Martínez, and E. Sánchez-Sinencio, “A low voltage
operational transconductance amplifier using common mode feedforward
for high frequency switched capacitor circuits,” in Proc. IEEE Int. Symp.
Circuits Syst., vol. 1, 2001, pp. 643–646.

Dongsuk Jeon (Member, IEEE) received the B.S.


degree in electrical engineering from Seoul National
University, Seoul, South Korea, in 2009, and the
Jiwoo Hong (Graduate Student Member, IEEE) Ph.D. degree in electrical engineering from the
received the B.S. degree in electrical engineering University of Michigan at Ann Arbor, Ann Arbor,
from Seoul National University, Seoul, South Korea, MI, USA, in 2014.
in 2019, where he is currently pursuing the M.S. From 2014 to 2015, he was a Postdoctoral
degree with the Graduate School of Convergence Associate with the Massachusetts Institute of
Science and Technology. Technology, Cambridge, MA, USA. He is currently
His current research interests include electronic an Associate Professor with the Graduate School
design automation and circuit design automation of Convergence Science and Technology, Seoul
with machine learning. National University. His current research interests include hardware-oriented
machine learning algorithms, hardware accelerators, and low-power circuits.

You might also like