Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Variation-Aware Low-Power Synthesis Methodology For Fixed-Point FIR Filters

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO.

1, JANUARY 2009 87

Variation-Aware Low-Power Synthesis Methodology


for Fixed-Point FIR Filters
Jung Hwan Choi, Student Member, IEEE, Nilanjan Banerjee, and Kaushik Roy, Fellow, IEEE

Abstract—In this paper, we present a novel finite-impulse re-


sponse (FIR) filter synthesis technique that allows for aggressive
voltage scaling by exploiting the fact that all filter coefficients
are not equally important to obtain a “reasonably accurate” filter
response. Our technique implements a level-constrained common-
subexpression-elimination algorithm, where we can constrain the
number of adder levels (ALs) required to compute each of the
coefficient outputs. By specifying a tighter constraint (in terms
of the number of adders in the critical path) on the important
coefficients, we ensure that the later computational steps compute
only the less important coefficient outputs. In case of delay varia-
tions due to voltage scaling and/or process variations, only the less
important outputs are affected, resulting in graceful degradation
of filter quality. The proposed architecture, therefore, lends itself
to aggressive voltage scaling for low-power dissipation even under
process parameter variations. Under extreme process variation
and supply voltage scaling (0.8 V), filters implemented in the
Fig. 1. (a) N -tap transposed FIR filter. (b) Coefficient multiplications
Predictive Technology Model (PTM) 70 nm technology show an replaced by a multiplier block.
average power savings of 25%–30% with minor degradation in
filter response in terms of normalized passband/stopband ripple complexity. One popular approach for complexity mitigation
(0.02 at a scaled voltage of 0.8 V compared with 0.005 at a nominal is the replacement of multiplications of a FIR filter by ad-
supply).
dition and shift operations [1]. Further power optimization
Index Terms—Finite-impulse response (FIR) filter synthesis, is obtained by minimizing the number of adders required to
low-power methodology, variation-aware design. implement the multiplications. For this purpose, all coefficients
of a transposed-form FIR filter are considered as a whole and
I. I NTRODUCTION replaced by a single multiplier block, as shown in Fig. 1. The
redundancy across the coefficients in the multiplier block is
W ITH EXPLOSIVE growth in the demand of portable
computing and wireless communication systems, power
dissipation is becoming an increasing concern. Higher power
then exploited to share computations and reduce the number of
adders. To effectively enable such sharing, a variety of methods
have been developed. The most significant among them are
consumption reduces the battery lifetime of portable devices,
the graph-dependence (GD) algorithms [2]–[5] and common-
affects device reliability, and increases cooling cost. There-
subexpression-elimination (CSE) methods [6]–[10].
fore, low-power methods are necessary for the design of these
In GD algorithms, each filter coefficient is represented by a
DSP-based systems. Since finite-impulse response (FIR) filters
graph. Each vertex of a graph represents a partial sum. These
are critical to most DSP applications, an energy-aware filter
partial sums can be shared across coefficients, if possible. In
design helps significantly in reducing the total power intake
general, the more partial sums are shared, the more adders
of the system. Reduction of hardware complexity directly re-
are saved. Usually, the GD algorithms outperform the CSE
lates to lower power consumption; therefore, several methods
algorithms in terms of the required number of adders. However,
have been reported in the literature to reduce computational
studies [3], [4] indicate that the conventional GD algorithms
may produce FIR filters with long critical paths in their mul-
Manuscript received March 3, 2008; revised July 3, 2008. Current version
tiplier blocks. To reduce complexity and critical path length,
published December 17, 2008. An earlier version of this paper was accepted CSE-based filter designs are often used.
and published in the International Symposium on Low Power Electronics and A variety of CSE methods have been proposed in the lit-
Design (ISLPED) 2007. This work was supported in part by the Gigascale
Silicon Research Center (GSRC). This paper was recommended by Associate
erature [6]–[10]. Most of these implementations take filter
Editor S. Vrudhula. coefficients in canonical-sign-digit (CSD) format, where coef-
J. H. Choi and K. Roy are with the School of Electrical and Computer En- ficients are represented with a minimum number of nonzero
gineering, Purdue University, West Lafayette, IN 47907 USA (e-mail: jhchoi@
purdue.edu; kaushik@purdue.edu). bits [11]. The redundancy present in the CSD coefficients is
N. Banerjee is with the Microprocessor Research Laboratory, Intel Corpora- then exploited to share some of the adders to further reduce
tion, Santa Clara, CA 95054 USA (e-mail: nilanjan.banerjee@intel.com). the hardware complexity, thereby reducing power consumption.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. In [9], the CSE method is implemented using an integer linear
Digital Object Identifier 10.1109/TCAD.2008.2009135 programming formulation. A 2-D CSE approach, involving
0278-0070/$25.00 © 2009 IEEE
88 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 1, JANUARY 2009

both vertical and horizontal eliminations, is proposed in [6].


The CSE method in [7] considers the length of critical path
in the multiplier block, as well as the number of required
adders. A greedy CSE algorithm with a look-ahead method for
implementing low-complexity FIR filters is developed in [10].
An optimization method considering multivariable CSE is
demonstrated in [8].
All these algorithms are, however, implemented to reduce
the number of adders and, hence, lower power at nominal
operating voltages. At scaled voltages, filters developed with Fig. 2. (a) Multiplication without CSE. (b) Multiplication with CSE.
these techniques may fail. This paper, on the other hand, de-
velops a procedure which ensures graceful quality degradation This paper is organized as follows. In Section II, we ex-
with voltage scaling. It is observed in filter designs that there plain the filter design concept for concurrently addressing the
are some coefficients which play a more important role in conflicting issues of low power and process tolerance. The
shaping the filter response compared with others [12]–[14]. algorithm implementation of the aforementioned technique is
This aspect holds true for all kinds of filters and can also be elaborated in Section III. Results of the filter output under a
effectively exploited to design filter architectures that provide scaled Vdd and process variations are presented in Sections IV
an “appropriate” tradeoff between the filter response and the and V, respectively. Section VI concludes this paper.
energy consumption.
Low power is not the only concern for the present-day II. D ESIGN M ETHODOLOGY : B ASIC C ONCEPT
designers. Due to the inherent limitations in the fabrication In this section, we first provide a brief introduction to the
processes, scaled silicon devices suffer from parameter vari- CSE-based procedure for filter synthesis and some preliminary
ations, leading to delay failures in some chips [15]–[17]. To definitions. We then develop the basic concept behind our
counter these delay failures and achieve high parametric yield, design technique for low power and variation tolerance.
scaling up the Vdd or upsizing the logic gates may be required.
However, such techniques come at a cost of increased power
A. CSE
and/or die area. It is difficult, therefore, to simultaneously
satisfy both process tolerance and low-power requirements. In CSE has been utilized as a very powerful tool in FIR filter
this paper, we address this problem and propose a generic design to reduce the number of arithmetic units (adders and
filter synthesis technique that implements filter architectures shifters). We illustrate the concept of CSE with the help of
which are amenable to voltage scaling even under parameter the following example. Consider two functions F1 and F2 ,
variations. We achieve this by exploiting the fact that all filter where F1 = 13X and F2 = 29X. Both F1 and F2 can be repre-
coefficients are not equally crucial to obtain a “reasonably sented in the following manner: F1 = X + 4X + 8X = X +
accurate” filter response. By “reasonably accurate,” we mean X  2 + X  3 and F2 = X + 4X + 8X + 16X = X +
that there is a minor degradation in the filter response in X  2 + X  3 + X  4, where “” means bitwise left
terms of passband and stopband ripple values within the range shift. Both expressions F1 and F2 have some common terms
tolerable to the user. We ensure that under delay variations/Vdd D = X + X  2 + X  3. Therefore, F1 and F2 can be
scaling, only the less contributive coefficients are affected, thus rewritten as F1 = D and F2 = D + X  4. Reusing D in
minimally degrading the filter response. Our contributions are both expressions reduces the computation overhead and the
as follows. number of adders required to implement both expressions. The
corresponding hardware implementation is shown in Fig. 2.
1) Propose a procedure to identify coefficients that are criti- Two important conclusions can be drawn from the earlier
cal in maintaining a reasonably accurate filter response. example: 1) Significant power savings can be achieved by
2) Develop a CSE-based design methodology in which the reducing number of adders using CSE (only three adders in
critical paths of the computations involving the important the CSE-based implementation compared with five adders in
coefficients are constrained to take a fixed number of the unshared case), and 2) CSE might increase the total number
adders (for example, n); any computational adder stages of adders in the critical path. To elucidate this point further,
beyond n (e.g., n + 1, n + 2) compute only the less let us consider each of the expressions—F1 and F2 . Without
important computations. CSE, F1 has two adders in its critical path [Fig. 2(a)]. Even
3) Algorithmic implementation of the aforementioned after applying CSE, F1 is still available after two adder delays
design technique. [Fig. 2(b)]. Therefore, there is no delay penalty for F1 in the
4) Utilization of this algorithm to generate transposed-form CSE-based implementation. However, the critical path of F2
FIR filter architectures, where path-delay errors are pre- is increased from two to three adders due to CSE, resulting
dictable under scaled supply voltage and process param- in a delay overhead. Therefore, there is a tradeoff between the
eter variations, and to tolerate delay failures in such paths power consumption and the frequency requirements in the case
with minimal degradation in the filter quality. of a CSE-based implementation.
5) Adaptive circuitry to achieve voltage scaling/process It should also be noted that the shortest critical path (SCP) of
tolerance. F1 and F2 consists of two adders. Since FIR filter coefficients
CHOI et al.: VARIATION-AWARE LOW-POWER SYNTHESIS METHODOLOGY FOR FIXED-POINT FIR FILTERS 89

shown in Fig. 2(b), the FAL value of three determines the filter
throughput. It is interesting to note that as we relax the number
of adders in the critical path beyond the MAL value, CSE can
be utilized more effectively to reduce the number of adders.
For example, if F AL = 2, as shown in Fig. 2, the common
subexpression of (X + X  2) can be shared between F1 and
F2 . On the other hand, F AL = 3 allows for more sharing
(X + X  2 + X  3). In general, a larger FAL increases
the possibility of sharing hardware for coefficients, significantly
reducing the power/area overhead.

B. Analysis Procedure to Determine Critical/Less


Critical Coefficients
With the help of these preliminary definitions, we present
the underlying concept utilized in developing our variation-
tolerant and low-power design framework for FIR filters in this
Fig. 3. (a) Coefficient “10101001” realized in AL = M AL = 2. (b) Alter-
native implementation in AL = 3. (c) Filter with F AL = 6. section. We utilize the fact that all filter coefficients are not
equally important to the filter response [12]–[14]. The principle
are usually represented in CSD format, the minimum number for realizing the filters is that we perform the computations
of adders in their critical paths is determined by the number associated with the important coefficients with higher priority
of nonzero terms present in their CSD representation. For than the less important ones so that in the presence of delay
example, if a coefficient value is given by “10101001,” then variations, only the less important coefficients are affected.
the SCP consists of two adders. In general, if the coefficient However, important/less important can only be classified based
consists of n nonzero terms, SCP is given by log2 n. on a user-specified passband/stopband requirement. Therefore,
However, as mentioned before, due to sharing in CSE, a we start this analysis procedure by noting the worst-case filter
coefficient output might not be computed within the SCP. quality requirements of the user in terms of passband and
Therefore, the actual critical path of the whole multiplication stopband ripples. We then follow a three-step procedure to iden-
block [Fig. 1(b)] can be quite different from the SCP values tify and evaluate the “important/less important coefficients”
of the individual coefficients in case of CSE. For instance in and their effects on the frequency response. To illustrate this
Fig. 2, although the SCP for both F1 and F2 is composed of two further, we consider a 25-tap low-pass FIR filter with the
adders, the critical path under CSE, however, consists of three following specifications (Fs = 48 kHz, Fpass = 10 kHz, and
adders. Based on this observation, we introduce three terms Fstop = 12 kHz, equiripple). This filter was designed using the
(reintroduced from [7]). Matlab FDA tool, and the fixed-point coefficient set c0 −c24
1) AL: The AL of a coefficient is defined as the number of was generated. We elucidate, in detail, our three-step analysis
adders in a critical path when the coefficient is computed procedure with the help of the aforementioned example.
[AL = 3 for “10101001” in Fig. 3(b)]. 1) Step 1: The coefficients are individually set to zero, and
2) Minimum AL (MAL): The MAL of a coefficient is the filter response is recorded. For each of these responses,
defined as the number of adders in the SCP among all we measure the maximum normalized ripple both at the
possible realizations of a coefficient and is given by passband and the stopband frequencies, as shown in Fig. 4(a).
log2 n, where n is the number of nonzero terms in the Based on these calibrated degradations, the coefficients are then
coefficient [M AL = 2 for “10101001” in Fig. 3(a)]. arranged in ascending order of importance. The importance
3) Filter AL (FAL): FAL is the maximum number of adders of coefficients is determined by the values of degradation in
in the critical path of the multiplier block [Fig. 1(a)] of the pass/stopband ripples; a larger degradation indicates higher
a transposed-form FIR filter. The value of FAL is chosen importance. Interestingly, in this example, we find that the plots
based on throughput requirements and is user-specified in of the passband/stopband ripple closely follow the absolute
our case. In Fig. 3(c), the critical path of the multiplier magnitude plot of the coefficients, asserting the fact that the
block consists of six adders. Therefore, FAL is six in coefficients with bigger absolute magnitudes affect the filter re-
this case. It should be noted that the minimum bound on sponse more adversely than their lower magnitude counterparts.
FAL is given by the maximum of MAL among all the The passband/stopband frequency ripple values serve as our
coefficients (F AL ≥ max{M AL}). metric for estimating the importance of individual coefficients.
It is also important to note that since the throughput require- 2) Step 2: In this step, we perform an analysis to understand
ment of the filter determines the FAL (and not the MAL), it the cumulative effect of setting a set of filter coefficients to
is not necessary to compute all the coefficients within their zero. This is important since there can be more than one less
MAL delay. Moreover, even when all coefficients are computed important coefficients. For this paper, the arranged coefficients
with their respective MALs, the critical path is determined by of step 1 (arranged based on an ascending order of importance)
the maximum of all these individual MALs. Therefore, FAL are set to zero in increasing numbers starting from the least
plays a more important role in filter synthesis. For example, as important one, and the pass/stopband ripple is noted. This
90 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 1, JANUARY 2009

rently, in the FIR filter design. We explain this with the help of
an example. Let us consider a transposed-form FIR filter where
the filter output is computed as

Y [n] = c0 · x[n]+c1 · x[n − 1]+c2 · x[n − 2]+· · ·+cn · x[0]

where c0 , c1 , . . . , cn denote the coefficients and x[n], . . . , x[0]


denote the inputs. Based on the three-step technique mentioned
in the previous section, we divide the coefficients into two sets,
namely, “important” and “less important.” We now need to
ensure a way that these “important computations” are given
a higher priority over the “less important” ones. We exploit
the CSE-based implementation of the multiplier block for this
purpose. Depending on the throughput requirements, suppose
the FAL has been set to L (also referred to as L levels from
now on), which means that the critical path of the final com-
puted output from the multiplier block can have the maximum
of L adder delays. We constrain the “important coefficients”
to occupy the maximum length of L − 1, whereas the “less
important coefficients” are left constrained to level L. This
implies that all the outputs of the “important coefficient” set
need to be computed with a critical path of L − 1 adders (L − 1
levels) or less and that the last level of computation (if any)
should consist of calculations of the “less important” set. It
Fig. 4. (a) Normalized passband/stopband ripple with setting individ-
is worth noting that these two conditions should be satisfied
ual coefficients = 0. (b) Normalized passband/stopband ripple with set of for level constraints: L − 1 ≥ max{M AL} among important
coefficients = 0 based on importance. (c) Filter response with 6, 10, and coefficients and L ≥ max{M AL} among other coefficients.
14 coefficients = 0.
Fig. 5 shows a conventional CSE-based implementation ver-
sus our modified CSE design, referred to as level-constrained
procedure is repeated with an increasing number of coefficients CSE (LCCSE). In the normal CSE case, since there is no addi-
(with gradually increasing importance) being set to be zero tional constraint on the “important coefficients,” there is a pos-
values. Fig. 4(b) shows that with the increasing numbers of sibility that some of the “important coefficients” are computed
coefficients having zero values, both the passband and the with L adders (L levels). Under a scaled Vdd or large process
stopband ripple magnitudes gradually increase and degrade the (delay) variation, these outputs might not be computed properly
filter response. In Fig. 4(c), we show the magnitude response of due to a delay increase resulting in a large degradation in the fil-
the original and the altered filters with 6, 10, and 14 coefficients ter quality. This is prevented in LCCSE-based filters [Fig. 5(b)]
set to zero. As expected from Fig. 4(a) and (b), the filter since all the “important outputs” are available one level
quality is minimally affected with six zero coefficients but gets ahead (for the example considered) of the maximum AL (L).
gradually degraded with the increasing number of zero-valued In case of delay variations, we ensure that only “less im-
coefficients, starting from the least important one. This allows portant” parts are affected. In general, the user can specify
us to identify how much degradation happens when a set of passband/stopband ripple requirements for different values of
coefficients starting from the least important one is set to zero. scaled voltages. Based on these inputs, we can make a more
These data are tabulated in this step. In case of delay variations, gradual assignment for each set of coefficients. The coefficients
if the coefficient outputs which have minimal impact on filter can be separated into k subsets {S1 , S2 , . . . , Sk } based on
response are affected, then there is only a minor degradation in their cumulative effects on the passband/stopband ripple. We
the filter quality. Under such conditions, we set the outputs of then assign level constraints {L1 , L2 , . . . , Lk } corresponding
the affected coefficients to be zero. to each of the subsets (L1 for S1 and so on). This ensures that
3) Step 3: In the third step, the user-specified degradation any coefficient belonging to a sensitivity list S1 should have
values are taken and matched with the results from the second the maximum AL of L1 , S2 has the maximum of L2 , and so
step. For instance, the maximum allowed ripple of 0.12 might on. It should be noted that Li , the maximum AL for the subset
allow only six coefficients, starting from the least important Si , should not be smaller than the maximum MAL among the
one, to be set to zero (or assigned as less critical ones). This step coefficients belonging to Si . We then apply LCCSE to design
enables us to isolate the critical/less critical coefficients. This FIR filters under given constraints. The details of the LCCSE
three-step analysis technique has been implemented in Matlab. algorithm are presented in Section III.

C. Proposed Filter Architecture III. A LGORITHM D ESCRIPTION


We utilize this concept of important/less important coeffi- In this section, we describe the algorithm used to implement
cients for addressing process tolerance and low power, concur- the LCCSE-based low-power FIR filters. Before proceeding to
CHOI et al.: VARIATION-AWARE LOW-POWER SYNTHESIS METHODOLOGY FOR FIXED-POINT FIR FILTERS 91

are robust to process variations and allow for significant


power savings under voltage scaling while maintaining a
minimum degradation in the passband/stopband ripple.

A. Problem Formulation
Given a set of fixed-point coefficients c0 −cn (that have been
divided into k subsets {S1 , S2 , . . . , Sk } ) and the levels of
computation {L1 , L2 , . . . , Lk } corresponding to each subset
(S1 , S2 , etc.), find the minimum number of adders required to
implement the filter. In this problem, S1 is a subset containing
the “most important coefficients,” and Sk contains the “least
important ones.” As defined before, the number of computation
levels for each subset (L1 for S1 , etc.) denotes the maximum
number of adders allowed in the critical path for any coeffi-
cients belonging to the corresponding subset. The computation
levels {L1 , L2 , . . . , Lk } are user-defined constraints, and Lk ,
the level for the least important subset Sk , is equal to the
FAL. The individual level constraints should be satisfied, which
implies that for each Li , Li ≥ max{M AL} over a respective
set Si .

B. LCCSE Algorithm
As mentioned earlier, our LCCSE algorithm utilizes the
CSE method in [7], which considers not only resource sharing
among the filter coefficients but also the length of the critical
path in the multiplier block. The proposed LCCSE algorithm
also takes into consideration the level constraints of each coef-
ficient based on its sensitivity. The difference of LCCSE from
the previous CSE method [7] is the fact that different level
Fig. 5. (a) Synthesis of a FIR filter using conventional CSE. The important
computations with longer delays might not be computed under process variation constraints can be given for each coefficient so that we can
resulting in low filter quality. (b) Proposed design methodology where impor- assert tighter timing bounds for more important coefficients.
tant computations constrained by “intelligent” CSE procedure. Under process Only in one case that a single level constraint (FAL) is specified
variation and voltage scaling, high filter quality is maintained.
for all coefficients, LCCSE yields results identical to that of the
the description of the algorithm, there are several points that conventional CSE. Before going into the details of the proposed
should be considered for our design methodology. LCCSE technique, we define some notations used to design the
1) Our technique is generic and applicable to any constant algorithm (reintroduced from [7]).
coefficient FIR filter design (low-pass, high-pass, and 1) Decomposed set (DS): the set of absolute values of CSD
bandpass). numbers that have been decomposed as a sum of other
2) Our LCCSE algorithm utilizes the CSE method proposed CSD numbers. For example, 101001 (25) can be de-
in [7]. While most CSE algorithms in the literature are composed as (101  3) + 1 or 100001 − (1  3)(25 =
focused on minimizing the number of adders without con- 3 × 23 + 1 = 33 − 1 × 23 ).
sidering the delay of the critical path, the CSE algorithm 2) Undecomposed set (UDS): the set of absolute values
in [7] can constrain the number of adders in the critical of CSD numbers waiting to be decomposed with other
path. Hence, this CSE method can perform tradeoff CSD numbers. Initially, {U DS} contains all the filter
between the critical path delay (ALs) and the hardware coefficients.
overhead (the number of adders) in a direct way, which is 3) All possible combination set (APCS): the set of candidate
an essential property for our design methodology. CSD numbers which can be utilized to decompose CSD
3) We might not get the same number of adders obtained numbers in {U DS}. This set is constructed in the fol-
by the standard CSE technique [7] in our LCCSE im- lowing manner. All the possible combinations of nonzero
plementation. While in the standard CSE implementation terms of a coefficient are extracted, and the extracted CSD
only the maximum levels or the maximum number of numbers are continuously right-shifted until they become
adders in the critical path would be specified, in LCCSE, odd. The absolute values of these numbers (except the
we constrain coefficients to certain levels based on their value of one) are added to {AP CS}. For instance, from
sensitivities. As explained in Section II, this might result a coefficient “101001” (25), we can extract six CSD num-
in less sharing opportunities and increase the number bers {100001, 101000, 1001, 100000, 1000, 1}, which
of required adders compared with the standard CSE are right-shifted to become {100001, 101, 1001, 1, 1, 1},
implementation. However, filters designed with LCCSE respectively. Finally, their absolute values, except one, are
92 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 1, JANUARY 2009

added to {AP CS} so that {AP CS} has the following


three terms {101(3), 100001(33), |1001|(7)}. This pro-
cedure is repeated for each filter coefficient. In the final
step, any CSD numbers which belong to both {U DS}
and {AP CS} are erased from {AP CS}. After {AP CS}
is constructed, all the coefficients can be expressed as
combinations of some elements of {AP CS} ∪ {1}.
The basic approach of the LCCSE algorithm is as follows.
1) Throughout the procedure, {U DS} contains CSD num-
bers which must be decomposed with other CSD numbers
to complete the filter implementation. We are trying to
decompose elements of {U DS} with other CSD num-
bers in {DS} ∪ {U DS} ∪ {AP CS} ∪ {1}. The LCCSE
routine ends when {U DS} becomes empty.
2) Once a number in {U DS} is decomposed with a pair of
other CSD numbers (subexpressions), it is removed from
{U DS} and added to {DS}. If elements of {AP CS}
are chosen as subexpressions for this decomposition, they
are moved from {AP CS} to {U DS}, waiting to be
decomposed.
3) When we choose subexpressions for decomposition, we
need to avoid using elements of {AP CS} as much as
possible to minimize hardware overhead. Since every Fig. 6. Flowchart of LCCSE algorithm.
decomposition requires one additional adder, using el-
ements of {AP CS} increases the number of adders for some nonnegative integers n1 and n2 . After finding a candi-
required for the filter implementation. date for decomposition, we need to check the timing constraints
4) We perform a timing check whenever one CSD number is as follows.
decomposed. The procedure to check the level constraints 1) Both d1 and d2 should be available at least one level
is more complicated in LCCSE than in the conventional ahead of the level constraint of Ck
CSE [7] because in LCCSE, each coefficient is allowed
to have a different level constraint. For this purpose, we Ck (max AL) ≥ max{d1 (currAL), d2 (currAL)} + 1
have the data structure for CSD numbers contain two
where Ck (max AL) is the level constraint of Ck and
member variables to keep the level information, namely,
di (currAL) means the AL in which di is currently
currAL (currently scheduled AL) and max AL (maxi-
scheduled.
mum AL limited by filter constraints). The data structure
2) If the condition is satisfied, we can schedule Ck in the
also keeps a track of the list of child terms which use this
next AL of d1 and d2
CSD number as a subexpression. For example, if 621 is
decomposed to “(105  3) − 9,” then 621 is a child term Ck (currAL) = max{d1 (currAL), d2 (currAL)} + 1.
of 105 and 9. It will be described later how we use this
information to check the timing constraints. 3) If Ck is already being used as a subexpression of other
The LCCSE procedure is described by the flowchart shown numbers in {DS}, we need to check whether Ck is
in Fig. 6. The fixed-point filter coefficients c0 −cn and the scheduled ahead of the child terms of Ck . For each child
level constraint of each coefficient are given as inputs. Initially, term Cchild of Ck , it should be satisfied that
{DS} is empty, and all the filter coefficients are included
Cchild (max AL ≥ Cchild (currAL) ≥ Ck (currAL) + 1.
in {U DS}. Later, the set {AP CS} is constructed from the
CSD numbers present in {U DS}. The currAL values of all 4) If the aforementioned condition is satisfied, the de-
the CSD numbers are initialized with their MAL, and the composition is confirmed, and no further timing
max AL values of elements in {U DS} are set to their level check is required. If Cchild (max AL) ≤ Ck (currAL),
constraints given as inputs. The synthesis procedure consists of this decomposition violates timing; thus, it is dis-
two steps. carded. If Cchild (max AL) ≥ Ck (currAL) + 1 and
In the first step, we try to decompose each element in Cchild (currAL) ≤ Ck (currAL), we schedule Cchild in
{U DS} in terms of other numbers in {DS} ∪ {U DS} ∪ {1}. the next level of Ck
Decreasing k from FAL to one, we select a number Ck with the
level constraint of k from {U DS} and find subexpressions d1 Cchild (currAL) = Ck (currAL) + 1
and d2 from {DS} ∪ {U DS} ∪ {1} such that
and perform timing check recursively for the child terms
Ck = ±d1 · 2n1 ± d2 · 2n2 of Cchild .
CHOI et al.: VARIATION-AWARE LOW-POWER SYNTHESIS METHODOLOGY FOR FIXED-POINT FIR FILTERS 93

5) If the decomposition of Ck is confirmed, we adjust


the level constraints of subexpressions d1 and d2
to make sure that they are scheduled ahead of Ck .
If di (max AL) ≥ Ck (max AL), we then assign
di (max AL) = Ck (max AL) − 1.
Finally, the decomposed number Ck is moved to {DS} with
the decomposition information. We repeat this procedure until
there is no possible decomposition. If {U DS} is not empty at
the end of the first step, we proceed to the next step.
In the second step, we search {AP CS} to decompose the
remaining numbers in {U DS}. As mentioned earlier, the num-
ber of extra elements from {AP CS} should be minimized
because additional adders are required to decompose them. This
step consists of two substeps. In the first substep, we try to
decompose numbers of {U DS} in terms of one element from
Fig. 7. Example of LCCSE: (a) Initial state. (b) After the first step. (c) After
{U DS} ∪ {DS} ∪ {1} and the other element from {AP CS}. the first iteration of the second step. (d) Final state after the synthesis process
If the first substep fails to empty {U DS}, the remaining co- is done.
efficients are decomposed with two numbers from {AP CS} ∪
{1}. We try and record all possible decomposition candidates 105 and 831 remain unsynthesized in {U DS} at the end of the
for CSD numbers in {U DS} to find the smallest subset {S} of first step, we proceed to the second step.
{AP CS}, which all the elements of {U DS} are decomposed In the second step, we need to find the smallest subset of
with. The detailed procedure to determine {S} are described {AP CS} to decompose the remaining coefficients {105, 831}.
in [7]. After level constraints are checked, decomposed CSD The subset {13} is found so that 105 and 831 are decomposed
numbers of {U DS} are moved to {DS}, and the elements as follows:
of {S} are moved from {AP CS} to {U DS}, waiting to be
decomposed. This procedure continues until {U DS} is empty. 105 = (13  3) + 1
Once the synthesis procedure is completed, the number of
831 = (13  6) + 1.
elements in {DS} determines the number of adders in the
multiplier block.
Fig. 7(c) shows the result of the first iteration of the sec-
ond step. The currAL of 105 is increased to three because
C. LCCSE Example its subexpression 13 (10101) has three nonzero digits (thus,
M AL = 2). The max AL of 13 is set to two because it should
In this section, we employ a simple example from [7] to be scheduled before 105 and 831 with the max AL of three.
illustrate how the LCCSE algorithm works. We are focused The subexpression 13 is moved from {AP CS} to {U DS} to
on how to maintain level constraints during the decomposition be decomposed.
process. We have four coefficients {105 (10101001), After the second iteration, we find {3} to decompose 13
621 (1010010101), 815 (10101010001 ), 831 (10101000001 )}
and assume that one coefficient 831 has the level constraint 13 = (3  2) + 1
of three (important) and the other coefficients have the level
constraint of four (less important). and, finally, three is decomposed in the next iteration
Fig. 7(a) shows the initial state before the synthesis process
begins. The coefficients are shown in Fig. 7(a), with their 3 = (1  1) + 1.
currAL and max AL in parentheses. The currAL values are
initialized to their MAL, and the max AL values are set to their
Fig. 7(d) shows the final result. Two coefficients {105, 831}
level constraints. In the first step, decomposition is tried with
are scheduled at the third level and the other coefficients
the elements in {DS} ∪ {U DS} ∪ {1}, and two numbers are
{621, 815}, at the fourth level. A total of six adders are used
decomposed and moved to {DS}
for the implementation.
621 = (−1) × 105 + 831
IV. C ASE S TUDY : FIR F ILTER E XAMPLE
815 = (−1) × (1  4) + 831.
We applied our design method to the 25-tap symmetric FIR
Fig. 7(b) shows the state after the first synthesis step. Two filter of Section II and studied the effects of Vdd scaling and
coefficients 621 and 815 were decomposed, and their currAL process tolerance when it is synthesized using LCCSE. The
values were changed to four because they should be scheduled filter is symmetric and has the following coefficient values:
after their subexpression 831 whose currAL is three. The {−2423, −113, 1564, 762, −1816, −1517, 2276, 3140, −2434,
max AL of 105 is reduced to three because it should be sched- −6205, 2726, 20 680, 30 093, 20 680, 2726, −6205, −2434,
uled before 621 with the currAL of four. Since two coefficients 3140, 2276, −1517, −1816, 762, 1564, −113, −2423}.
94 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 1, JANUARY 2009

To illustrate the effectiveness of our design technique


(LCCSE) compared with a conventional technique for FIR filter
synthesis [7], we analyzed four cases:
A) all coefficients with level constraint LC = 4;
B) all coefficients with LC = 5;
C) “important coefficients” with LC = 3 and “less impor-
tant” ones with LC = 4;
D) more gradual level assignment: LC = 3 for important
coefficients, LC = 4 for less important ones, and LC =
5 for least important ones.
As mentioned in the previous section, Cases A and B have no
individual constraints on the coefficients, and the results of the
conventional CSE [7] and LCCSE are identical. However, in
the conventional CSE technique, it is not possible to constrain Fig. 8. Frequency responses of a (Case A) conventional design with LC =
4 versus (Case C) proposed designs with LC = 3 for important/LC = 4 for
the individual levels based on their importance. The proposed less important coefficients. Under delay variation, the conventional design fails;
LCCSE technique allows us to do so. The following case however, the proposed designs work with degraded performance.
studies provide further insight to the differences between two
approaches. TABLE I
MAXIMUM RIPPLE FOR 25-TAP FILTER. CASE C (3/4)
1) Case A (LC = 4 for All Coefficients): The number of VERSUS CASE A (CONV.)
adders for implementing the multiplication block is 25. The
coefficients {30 093, −2423, 2726, 3140, 1564, 762} are com-
puted at the fourth level, whereas the other coefficients are
computed between one and three levels. Therefore, in case of
delay variations where the fourth level outputs are not com-
puted properly, the characteristic of the filter would be severely
affected because 30 093 is the most important coefficient to
determine the filter response.
2) Case B (LC = 5 for All Coefficients): The number mance. It can be inferred from the results that our filter
of adders is 23. Coefficients {30 093, 2726, 762} are com- design is more stable (in terms of quality) with respect to
puted at level 5. The coefficients {−6205, −2423, −1517, 785, delay failures compared with a conventional four-level fil-
569, 391, 227} are computed at level 4, and the rest are com- ter with similar specifications. It should also be noted that
puted within level 3. assigning a relaxed constraint (larger number of ALs) to
3) Case C (LC = 3 for “Important” Coefficients/LC = 4 a “less important” coefficient does not necessarily imply
for “Less Important” Coefficients): We consider three separate that it would be computed at a later level. For example,
scenarios for this case based on the worst-case user-specified when ten coefficients are constrained to the fourth level
ripple requirement. [scenario 2)], only four of them are actually computed at
1) Nineteen “important” coefficients and six “less impor- level 4. Another important fact is that the classification
tant” coefficients. In this case, 25 adders are required for of coefficients into different groups based on their sensi-
the multiplication block. Only one coefficient {−1517} tivities to the filter response also plays a crucial role in
is computed at level 4, with other coefficients within determining how many and/or which coefficients may be
1–3 levels. The interesting point to note here is that the evaluated at the later stages of computation.
number of adders remains the same as in Case A. In case 4) Case D (LC = 3 for “Important”/LC = 4 for “Less
of delay variations, only one coefficient {−1517} is not Important”/LC = 5 for “Least Important”): The division of
computed. the coefficients into broad categories of “important” and “less
2) Fifteen coefficients are “important,” and ten coefficients important” ones is discussed in the previous paragraph to prove
are “less important.” In this case, 25 adders are required. the effectiveness of our design methodology. In general, we
The coefficients {762, −1816} are computed at level 4. may follow a more gradual assignment of levels based on
3) Eleven coefficients are “important,” and 14 coefficients the coefficient sensitivities. We verify this by dividing the
are “less important.” The number of adders is 25, and coefficients into three groups with the level constraints of three
the coefficients {762, −1816, −2423} are computed at levels (coefficients: 30 093, 20 680, −6205, 3140, and 2726),
the fourth level. The filter characteristics under delay four levels (coefficients: −2423, 2276, and −2434), and five
variations for all these cases are shown in Fig. 8, and levels (coefficients: −113, 762, −1517, −1816, and 1564). The
the corresponding error metrics (passband/stopband rip- number of adders required in this case is 24. We find that coef-
ple) are in Table I. The conventionally designed filter ficient {−1517} is computed at the fifth level and coefficients
(Case A) stops operating as a low-pass filter when the {−2423, 762, −1816} at the fourth level, whereas the rest are
fourth-level calculation fails (Fig. 8), while other filters computed between one and three levels. This feature of the
designed by our method work with degraded perfor- LCCSE-designed filter, where only less important coefficients
CHOI et al.: VARIATION-AWARE LOW-POWER SYNTHESIS METHODOLOGY FOR FIXED-POINT FIR FILTERS 95

TABLE II
MAXIMUM RIPPLE FOR 25-TAP FILTER. CASE D (3/4/5)
VERSUS CASE B (CONV.)

TABLE III
POWER FOR NOMINAL/SCALED Vdd . CASE B (CONV.)/D (PROP.)

Fig. 9. Frequency response of (Case B) conventional design with LC = 5


versus (Case D) proposed designs with 3/4/5 levels assigned to coefficients
based on sensitivities.
are affected under delay variations, helps to provide a gradual
degradation in filter quality with Vdd scaling, process varia- TABLE IV
POWER FOR NOMINAL/SCALED Vdd . CASE A (CONV.)/C (PROP.)
tions, etc. We can also utilize this to our advantage for more
effective voltage scaling. The maximum passband/stopband
ripple for all cases is shown in Table II.
It should also be noted that incrementing the number of
levels beyond a certain value does not necessarily result in the
reduction of number of adders. For example, in this 25-tap filter,
increasing the number of levels beyond six (21 adders) does not sumption at a nominal voltage since both implementations
reduce the number of adders in the filter design. have the same number of adders (25). However, under voltage
scaling (Vdd = 0.9 V), the filter characteristic of the normal
CSE-based filter is severely affected, whereas the proposed
A. Effects Under Vdd Scaling and Process Variation filter design still operates with 21.8% less power consumption
Based on the results of this section, we can infer the follow- (Table IV) and provides a slightly degraded filter response.
ing results. 2) Process Variation: Process variations affect the delay
1) Vdd Scaling: Assigning a gradual increase in ALs for distribution of the computational paths in a design. We consider
the coefficients based on sensitivities provides opportunities of the effect of process variations at both nominal and scaled Vdd ’s
low-power operation. We compare two designs in this case: for the design in Case D (3/4/5 levels).
a) filter designed with the CSE technique proposed in [7] 1) At the nominal Vdd [1 V for predictive technology model
(Case B, where LC = 5 for all the coefficients) and b) filter (PTM) 70 nm technology], the conventional CSE-based
designed using our technique with 3/4/5 levels assigned, as filter design suffers from wrong outputs for the last com-
in Case D. At a nominal voltage, there is a slight increase in putation level, and their output responses are seriously
power consumption (Table III) because the number of adders degraded. In fact, Fig. 9 shows that the filter is unable to
for our filter implementation is one more than in the normal function properly under delay variations. Our proposed
CSE case (24 compared with 23). However, if the supply design, however, can tolerate such errors with minor
voltage is reduced for a low-power operation, the path delays degradation in the frequency response since the final
of each computation would increase. For Case B, the important computational stages have only “less important outputs.”
coefficients {30 098, etc.} computed at the fifth stage are As mentioned before, under the scaled Vdd of 0.9 V,
affected under Vdd scaling, which leads to a complete failure the outputs up to the fourth level are computed properly
in the filter characteristic. For Case D, when Vdd is scaled to for the design under consideration, yielding a response
0.9 V, only the fifth-stage outputs are affected, and the power shown in Fig. 9.
savings is 17.6% (Table III) with a reasonably accurate filter 2) Additionally, the process variation at the scaled Vdd
response. When Vdd is further reduced to 0.8 V, the outputs of (0.8 V) results in computation errors at the fourth level;
levels 4 and 5 are affected by an increased delay, and only the however, the outputs are correct up to the third level
outputs up to the third level are valid. The power improvement (Fig. 9). Even under a scaled Vdd and process variation,
is 33.7% (Table III), and the filter characteristics are shown therefore, we obtain a filter response with reasonable
in Fig. 9, proving that under a scaled supply, we obtain a accuracy. In this context, our methodology is different
graceful degradation in the filter quality. Below 0.8 V, however, from operating a filter with a reduced number of taps for
the third-level outputs are also affected and severely degrade low power. Since reducing the number of taps does not
our filter functionality. reduce the critical path lengths for important coefficients,
We also compared Case A (four levels for all coefficients) it does not guarantee that important computations would
and Case C (3/4 levels) where we have a similar power con- not get affected by the process variation and Vdd scaling.
96 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 1, JANUARY 2009

TABLE V
POWER FOR NOMINAL/SCALED Vdd FOR 121/32-TAP FILTER

For example, let us consider that instead of the 25-coefficient


filter, we use a 19-coefficient filter, with the lowest six less criti-
cal coefficients not considered. The frequency requirements are
the same; thus, we still have four ALs with which to compute
the 19 coefficients. In this case, if we use the conventional CSE,
we find that the most important coefficient {30 093} is still
computed at level 4. Since reducing the number of taps does not
reduce the critical path lengths for important coefficients, they
still get affected by process variation and/or Vdd scaling. Our
implementation, on the other hand, proves the effectiveness of
our design technique under voltage scaling/process variations.
To nullify the effects of Vdd scaling and/or process variation,
we need to prevent incomplete/incorrect computations arising
from the less critical coefficients from propagating to the final
output. For instance, if the less critical coefficients were c0 and
c1 , the values of c0 · x and c1 · x (where x is the filter input)
would be erroneous under voltage scaling/process variations.
In case of delay errors, we gate the outputs of affected adders
by ANDing them with zero. In this case, the final-stage adders
(Fig. 1) for the affected less crucial coefficients obtain a zero
input. However, to correctly detect the delay failures under
process variation at nominal or scaled Vdd , we need to identify
the process corner of the system. For this purpose, we use a
variation detector circuit [18] for detection of process corner
and use this information to turn off the outputs of affected
coefficients.
Fig. 10. Frequency responses of (a) 121-tap high-pass FIR filter and
(b) 32-tap bandpass FIR filter.
V. R ESULTS
TABLE VI
In this section, we apply our design methodology for other MAXIMUM RIPPLE FOR 121/32-TAP FILTERS
filters to show the advantages of our approach. The LCCSE
code was developed in C++, and it generated a hardware
description for the filter design in VHDL format. The VHDL
file was then synthesized using the Synopsys Design Com-
piler [19], and the synthesized output was converted to spice
netlist. The spice netlist was then simulated using Nanosim
[20], with PTM 70 nm [21] technology. In order to show the
general applicability of our technique for all kinds of filters,
we choose the following two cases—a 121-tap high-pass FIR Vdd ’s) are shown in Fig. 10, and the error metrics in different
filter [22] and a 32-tap bandpass filter [23]. We have divided cases are tabulated in Table VI. Under delay variations, the
the coefficients of both the 121- and 32-tap filters into three conventional FIR designs cease to operate, as seen from the
groups with three sensitivity levels (3, 4, and 5) based on response. Moreover, their passband/stopband ripples increase
their relative importance. The adders required for the 121-tap drastically under such conditions. The proposed FIR design,
filter implementation are 52 for all five-level cases and 55 for however, maintains a low ripple under variations, even at scaled
the 3/4/5-level cases, whereas the 32-tap filter requires 21 and voltages.
23 adders for each case. The results of the conventional design
(denoted by conv.), as well as of the proposed designs, have
VI. C ONCLUSION
been summarized in terms of power consumption at nominal
and scaled Vdd ’s in Table V. The filter responses for all these We have presented a novel synthesis technique for gen-
cases (conv. and proposed designs under nominal and scaled erating FIR filter designs which simultaneously cater to
CHOI et al.: VARIATION-AWARE LOW-POWER SYNTHESIS METHODOLOGY FOR FIXED-POINT FIR FILTERS 97

low-energy requirements and tolerance to large process vari- [22] Y. C. Lim and S. R. Parker, “Discrete coefficient FIR digital filter design
ations while maintaining a reasonably accurate filter response. based upon an LMS criteria,” IEEE Trans. Circuits Syst., vol. CAS-30,
no. 10, pp. 723–739, Oct. 1983.
This is achieved by restricting the “important filter coefficients” [23] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Princi-
to a less number of computation steps than the maximum ples, Algorithms and Applications. Englewood Cliffs, NJ: Prentice–Hall,
allowed in a CSE-based filter implementation. The later com- 1996.
putation stages are allowed to compute only the “less contribu-
tive coefficients.” Under Vdd overscaling, conventional FIR Jung Hwan Choi (S’07) received the B.S. and M.S.
architectures fail to provide the desired filter response. On the degrees in electrical engineering from Seoul Na-
other hand, as the voltage scales down, our filter architectures tional University, Seoul, Korea, in 1998 and 2000, re-
spectively. He is currently working toward the Ph.D.
provide a very graceful degradation in the output response. This degree in electrical and computer engineering in
enables us to operate these filters at low voltages to save power the School of Electrical and Computer Engineering,
consumption under minor quality degradation. We believe that Purdue University, West Lafayette, IN.
In the summer of 2006 and 2007, he was with
the proposed concept can be applicable to other areas of signal Intel Corporation, Austin, TX, as an Intern. His
processing where a proper tradeoff between power and quality research interests include low-power DSP circuit de-
of service is required. sign, statistical design methodologies under process
variation, and thermal modeling and analysis.

R EFERENCES
[1] K. Parhi, VLSI Digital Signal Processing Systems: Design and Implemen- Nilanjan Banerjee received the B.S. degree in
tation. New York: Wiley, 1999. electrical engineering from Jadavpur University,
[2] D. R. Bull and D. H. Horrcks, “Primitive operator digital filters,” Proc. Calcutta, India, the M.S. degree from Arizona
Inst. Elect. Eng.—Circuits Devices Syst., vol. 138, no. 3, pp. 401–412, State University, Tempe, and the Ph.D. degree from
Jun. 1991. Purdue University, West Lafayette, IN, in 2008.
[3] A. G. Dempster and M. D. Macleod, “Use of minimum-adder multiplier He was a Software Design Engineer with Infosys
blocks in FIR digital filters,” IEEE Trans. Circuits Syst. II, Analog Digit. Technologies, Bangalore, India. He is currently a Re-
Signal Process., vol. 42, no. 9, pp. 569–577, Sep. 1995. search Scientist with the Microprocessor Research
[4] H.-J. Kang and I.-C. Park, “FIR filter synthesis algorithms for minimizing Laboratory, Intel, Santa Clara, CA. His research
the delay and the number of adders,” IEEE Trans. Circuits Syst. II, Analog interests include developing low-power and error-
Digit. Signal Process., vol. 48, no. 8, pp. 770–777, Aug. 2001. resilient circuits and systems.
[5] A. Dempster et al., “Designing multiplier blocks with low logic depth,” Dr. Banerjee was the recipient of academic excellence awards in 1999 and
in Proc. ISCAS, May 2002, vol. 5, pp. 773–776. 2002, and the Intel Fellowship and Henry Ford Scholarship Award in 2007.
[6] Y. Takahashi and M. Yokoyama, “New cost-effective VLSI implemen-
tation of multiplierless FIR filter using common subexpression elimina-
tion,” in Proc. ISCAS, May 2005, vol. 2, pp. 1445–1448. Kaushik Roy (S’83–M’83–SM’95–F’02) received
[7] C. Yao et al., “A novel common-subexpression-elimination method for the B.Tech. degree in electronics and electrical com-
synthesizing fixed-point FIR filters,” IEEE Trans. Circuits Syst. I, Reg. munications engineering from Indian Institute of
Papers, vol. 51, no. 11, pp. 2211–2215, Nov. 2004. Technology, Kharagpur, India, and the Ph.D. degree
[8] A. Hosangadi et al., “Algebraic methods for optimizing constant multipli- from the University of Illinois, Urbana, in 1990.
cations in linear systems,” J. VLSI Signal Process. Syst., vol. 49, no. 1, He was with the Semiconductor Process and De-
pp. 31–50, Oct. 2007. sign Center, Texas Instruments Incorporated, Dallas,
[9] O. Gustafsson and L. Wanhammar, “ILP modelling of the common sub- TX, where he worked on field-programmable gate
expression sharing problem,” in Proc. 9th IEEE ICECS, Dec. 2002, vol. 3, array (FPGA) architecture development and low-
pp. 1171–1174. power circuit design. Since 1993, he has been with
[10] S. Vijay et al., “A greedy common subexpression elimination algorithm the School of Electrical and Computer Engineering,
for implementing FIR filters,” in Proc. ISCAS, May 2007, pp. 3451–3454. Purdue University, West Lafayette, IN, where he is currently a Professor and a
[11] R. M. Hewlitt and E. S. Swartzlander, “Canonical signed digit repre- University Faculty Scholar and holds the Roscoe H. George Chair of Electrical
sentation for FIR digital filters,” in Proc. IEEE WorkShop SiPS, 2000, and Computer Engineering. He has published more than 400 papers in refereed
pp. 416–426. journals and conferences. He is the coauthor of two books, namely, Low-
[12] H. Shaffeu et al., “Improved design procedure for multiplierless FIR Power CMOS VLSI Circuit Design (New York, NY: Wiley, 2000) and Low
digital filters,” Electron. Lett., vol. 27, no. 13, pp. 1142–1144, Jun. 1991. Voltage, Low Power VLSI Subsystems (New York, NY; McGraw Hill, 2004).
[13] D. Ait-Boudaoud and R. Cemes, “Modified sensitivity criterion for the He is the holder of eight patents. His research interests include very large
design of powers-of-two FIR filters,” Electron. Lett., vol. 29, no. 16, scale integration (VLSI) design/computer-aided design for nanoscale silicon
pp. 1467–1469, Aug. 1993. and nonsilicon technologies, low-power electronics for portable computing and
[14] Z. Ye and C.-H. Chang, “Local search method for FIR filter coef- wireless communications, VLSI testing and verification, and reconfigurable
ficients synthesis,” in Proc. 2nd IEEE Int. Workshop DELTA, Jan. 2004, computing.
pp. 255–260. Dr. Roy has been in the editorial board of IEEE DESIGN AND TEST, IEEE
[15] S. Borkar, “Designing reliable systems from unreliable components: The TRANSACTIONS ON CIRCUITS AND SYSTEMS, and IEEE TRANSACTIONS ON
challenges of transistor variability and degradation,” IEEE Micro, vol. 25, VERY LARGE SCALE INTEGRATION SYSTEMS. He was a Guest Editor for a
no. 6, pp. 10–16, Nov./Dec. 2005. Special Issue on Low-Power VLSI in the IEEE DESIGN AND TEST, in 1994; the
[16] K. J. Kuhn, “Reducing variation in advanced logic technologies: Ap- IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, in
proaches to process and design for manufacturability of nanoscale June 2000, and the Institution of Electrical Engineers Proceedings—Computers
CMOS,” in IEDM Tech. Dig., Dec. 2007, pp. 471–474. and Digital Techniques in July 2002. He was the recipient of the National
[17] K. Bernstein et al., “High-performance CMOS variability in the 65-nm Science Foundation Career Development Award in 1995; IBM Faculty
regime and beyond,” IBM J. Res. Develop., vol. 50, no. 4/5, pp. 433–449, Partnership Award; AT&T/Lucent Foundation Award; 2005 SRC Technical
Jul. 2006. Excellence Award; Semiconductor Research Corporation Inventors Award;
[18] C. H. Kim et al., “On-die CMOS leakage current sensor for measuring 2005 IEEE Circuits and System Society Outstanding Young Author Award
process variation in sub-90 nm generations,” in VLSI Symp. Tech. Dig., (Chris Kim); 2006 IEEE TRANSACTIONS ON VERY LARGE SCALE
Jun. 2004, pp. 250–251. INTEGRATION SYSTEMS Best Paper Award; and the Best Paper Awards at the
[19] Synopsys Inc., Synopsys Design Compiler. 1997 International Test Conference, 2000 IEEE International Symposium on
[20] Synopsys Inc., Nanosim. Quality of IC Design, 2003 IEEE Latin American Test Workshop, 2003 IEEE
[21] Predictive Technology Model. [Online]. Available: http://www.eas.asu. Nano, 2004 IEEE International Conference on Computer Design, and 2006
edu/~ptm IEEE/ACM International Symposium on Low Power Electronics and Design.

You might also like