Low Asynchronous Adder
Low Asynchronous Adder
Abstract: Asynchronous circuits are often presented as a means to the null have the exact same value. Two-phase qDI protocols help
achieve low power operation. We investigate their suitability for low- reduce delays but do not improve energy consumption. Bundled
energy applications, where long battery life and delay tolerance is the data signaling (in both synchronous and asynchronous circuits)
principal design goal, and where performance is not a critical eliminates data switching when data values do not change.
requirement. Three adder circuits are studied—two dynamic and one However, bundled data speed independent logic may not be as
based on pass-transistor logic. All adders combine dual-rail and tolerant to wide delay variations as qDI circuits, since most bundled
bundled-data circuits. The circuits are simulated at a wide supply- data schemes require matched delays and are exposed to the risk of
voltage range, down to their minimal operating point. Leakage energy not being long enough, on one hand, while always incurring a
(at 0.18µm) is found negligible. Transistor count is found to be an worst-case delay, on the other hand.
unreliable predictor of energy dissipation. Keepers in dynamic logic Another low energy technique prefers large combinational
are eliminated when possible. A modified version of a two-bit blocks and minimizes the use of pipeline registers. Purely
dynamic adder (originally proposed by Chong) is found to dissipate combinational logic could sometimes achieve minimum energy per
the least amount of energy. computation, as long as redundant transitions are avoided.
As a basic test case we consider two-bit adders, which are
Index terms commonly required for signal processing applications. Although
Low energy, adder, asynchronous logic. complete CPU or DSP systems may dissipate more energy in other
sections, such as their instruction fetch and decode units [7], this is
typically due to performance optimization; in low-energy
applications where execution rate is not an issue, the data-path is
1 Introduction expected to become the energy bottleneck.
Asynchronous logic has been promoted as a means to achieve low We investigate a hybrid bundled data/dual rail approach [1].
power design [1][2][6]. A number of advantages of asynchronous The dual-rail part provides completion indication, while the bundled
logic that make it appropriate for low power operation have been data parts help minimize energy dissipation. As an example, we
sited: Asynchronous circuits can stop computing when there is no new apply the design methodology to a large adder, and compare it with
input, without the extra complexity of clock-gating logic and without other published low energy adders [4][9]. The various adders are
the need to wait for clock restart delays. Power dissipation in large presented in Section 2. The actual circuits used for our analysis are
clock distribution trees is eliminated, though partly replaced by local described in Section 3, and the energy dissipation and simulations
handshake power [10]. When the circuit is speed-independent, supply results are discussed in Section 4.
voltage can be reduced when lower performance can be tolerated
without having to retune clock frequencies [5]. More recently, 2 Low Energy Adder Architectures
asynchronous low energy (rather than low power) has been addressed
0[6][7], as this is more appropriate a design goal for extending battery In order to achieve high performance in wide adders, carry
life for mobile and other devices, as well as minimizing the efforts for look-ahead circuits are usually employed. However, such circuits
heat dissipation and cooling expenses. Low power and low energy dissipate extra energy. In low-energy applications when
techniques for asynchronous systems are typically based on performance is not an issue, no look-ahead circuits should be used.
minimizing the number of transitions [1]. Other approaches include Thus, we consider only ripple-carry adders. We also employ those
voltage scaling [5], early-open latch controllers, and data-dependent hazard-free asynchronous techniques that block spurious transitions
enabling of the logic [1][3]0[7]. and perform their computations only after all inputs have arrived.
We focus on simple computing circuits that must dissipate as Another energy-related advantage of asynchronous ripple
little energy as possible in applications where performance is non- carry adders is their relatively simple completion-detection; in the
limiting and the time to complete any computing task is immaterial. A circuits below, the carry-out of the last stage is considered as the
secondary goal is to be able to operate over a very wide range of indication of completion, and all sum outputs are assumed to be
supply voltage, as is typically the case with some battery-operated ready by the time the carry-out becomes valid.
devices where voltage regulation is not desirable. The principal
implication of a varying supply voltage is a wide range of delays, 2.1 The Nielsen Adder
calling for the speed-independence feature of asynchronous circuits. Nielsen [2][3] combines two types of dynamic adder circuits
The most robust speed-independent circuit methodology is based on (Figure 1). The least significant half of the n-bit adder employs
dual-rail encoding and on quasi-delay-insensitive (qDI) design [1]. carry-kill and carry-generate logic to speed up computation. The
Unfortunately, qDI circuits are not necessarily the most energy most significant half of the same adder employs ripple carry adder
efficient ones. circuits without any carry acceleration. All adders produce dual-rail
Four-phase qDI data signaling is based on alternating valid and carry-out and single-rail sum outputs.
null values. Each data bit must toggle from valid to null and back
again on every successive data value, even if the data on both sides of
a[N-1] b[N-1] a[N/2] b[N/2] a[N/2-1] b[N/2-1] a[0] b[0]
Req VDD
FA FA FA FA FA FA
cout cin
P P KG KG KG KG
Req
S[N-1] S[N/2+1] S[N/2-1] S[1] S[0]
FUNC
most significant least significant
out
s0
b1.t a1.t a1.f b1.f
c.t c.f
a1.t a1.f b1.t b1.f b1.f b1.t a1.t a1.f
a0.t a0.f a0.t a0.f a0.t a0.f a0.t a0.f a0.t a0.f
Req
c.t c.t c.f c.f
Req
Figure 6 Chong’s Dual-bit implementation. Figure 8 Input conception for fair adders’ simulation. (CD –
Completion Detection signal)
3.3 PTL Full-Adder Figure 9 shows the transistor counts for three 2-bit FA
The PTL FA of [9] has been appended with the Request-enabled circuits.
output inverter and adapted to produce dual-rail carry-out (Figure 7). Figure 10 presents the energy dissipation of the three circuits
versus VDD, averaged over 32 runs of ten additions each of the two-
b.t b.f c.f a.f c.t a.t bit adders, including the idle times. Other circuits have also been
a.t a.t b.t b.t b.f b.f
simulated, but their energy consumption far exceeded that of these
three circuits.
a.f b.f b.t We can learn from the simulations that Chong’s adder
a.f b.f b.t dissipates the least amount of energy. PTL dissipates a bit more, but
c.f c.t c.t c.f c.f c.t less than Nielsen’s FA. All three circuits demonstrate robustness to
VDD VDD VDD a wide variation of voltage levels. Chong’s FA produces the best
Req Req
Req
result thanks to its dual-bit structure, reducing the logic size,
eliminating redundant wiring and consequentially reducing the
s.t cout.t cout.f
number of transitions. These observations provide a strong
incentive to design larger blocks of logic in order to gain maximal
Figure 7 PTL FA Circuit energy reduction.
We checked the transistor count of the adders in order to
4 Simulation Results investigate their impact on energy. The conclusion was that mere
transistor count is not a sufficient predictor of energy dissipation.
For fair comparison with Chong’s two-bit FA, all designs were PTL FA requires the largest number of transistors (40% of them
simulated as two-bit circuits. All three FA circuits were designed (at were employed in the Request-enabled output buffer that was
the schematic transistor level) for TSMC 0.18µm technology and required to make it “asynchronous”). Still, the PTL FA dissipates on
simulated with Cadence Spectre. The simulated circuits included average 14% less energy than the dynamic FA. Also, despite the
completion detection. All outputs were loaded by 10fF capacitors. fact that PTL FA requires 17% more transistors than Chong’s FA, it
Since voltage scaling serves as the principal means for energy dissipates only about 10% more energy. Chong’s FA contains 8.5%
reduction, all simulations were conducted by VDD sweeping over fewer transistors but consumes 20% less energy than the (single-bit)
0.7—1.5V (where 1.8V is the nominal VDD for the technology). All 32 dynamic FA, thanks to producing only one carry-out signal. The
input combinations (of two 2-bit numbers plus carry-in) were dynamic FA calculates a carry-out signal per every bit, thus
simulated in each case, and energy dissipation was averaged over all dissipating more energy.
32 cases.
Ten cycles of valid-then-empty inputs were simulated, with a
long idle period in the middle (Figure 8). Thus, measurements results
5 Conclusion
Transistor Count for Dual-bit Adder We have investigated some novel adder circuits and have
been able to identify the low-energy ones. Delay was ignored in this
analysis so as to emphasize low energy over all other parameters.
65
Our next research goal is to investigate the Et and Et2 metrics [6][9]
60
[10]. In addition, we plan to consider 2-bit and 3-bit circuits for
further energy reduction.
55
Acknowledgment
50
We are grateful to Dana Amburg, Michael Moreinis, Yevgeny
45 Perelman and Akadiy Morgenshtein, who have helped with ideas
DYNAMIC CHONG PTL and CAD tools.
1.50
1.30
1.10
0.90
0.70
0.50
0.30
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
DYNAMIC CHONG PTL VDD [V]
References
[1] J. Sparso and S. Furber, Principles of Asynchronous Circuit Design: A [7] A.J. Martin, M. Nyström et al., “The Lutonium: A Sub-Nanojoule
Systems Perspective: Kluwer Academic Publishers, 2001. Asynchronous 8051 Microcontroller,” IEEE Int. Symp. Async.
[2] L.S. Nielsen and J. Sparsø, “A Low-power Asynchronous Data-path Systems and Circuits, May 2003.
for a FIR Filter Bank,” Int. Symp. Adv. Res. Async. Circuits and [8] M. Munteanu, I. Bogdan et al., “Single-Ended Pass Transistor Logic
Systems (ASYNC '96), pp. 18 - 21, 1996 for Low-Power Design,” IEEE Asilomar Conf. Signals Systems and
[3] L.S. Nielsen, “Low-power Asynchronous VLSI Design,” Ph.D. Computing, pp. 364-368, 1999.
Thesis, Department of Information Technology, Technical University [9] L. Bisdounis, D. Gouvetas and O. Koufopavlou, “Circuit techniques
of Denmark, 1997. for reducing power consumption in adders and multipliers,” in D.
[4] K.S. Chong, B.H. Gwee and J.S. Chang, “Low-voltage Asynchronous Soudris, C. Piguet and C. Goutis, “Designing CMOS Circuits for
Adders for Low Power and High Speed Applications,” Int. Symp. Low-Power,” Kluwer Academic Publishers, 2002.
Circuits and Systems (ISCAS), 2002. [10] A.J. Martin, “An asynchronous approach to energy-efficient
[5] L. S. Nielsen, C. Niessen, J. Sparso, “Low-power operation using self- computing and communication,” SSGRR 2000, August 2000.
timed and adaptive scaling of the supply voltage,” IEEE Trans. VLSI
Systems, 2:391-397, 1994.
[6] A.J. Martin, “Remarks on low-power advantages of asynchronous
circuits,” Europ. Solid-State Circuits Conf. (ESSCIRC), 1998.