# **Self-Timed Techniques for Low-Power Digital** Arithmetic in GaAs VLSI S. Lachowicz<sup>†</sup>, K. Eshraghian<sup>†</sup>, H-J. Pfleiderer<sup>‡</sup> † Edith Cowan University, Australia, † Ulm University, Germany Gallium Arsenide, Self-Timed Systems, Very High Speed Systems Kev words: Abstract: This article presents a self-timed approach to digital Gallium Arsenide logic applicable to high performance VLSI circuits and systems. The design techniques are based on GaAs Latch-Coupled FET Logic (LCFL) in order to achieve reasonable power-delay-area trade-off. The complexities due to clock skew are avoided and power savings achieved through the pipelined architecture. A range of arithmetic circuits is presented and their performance evaluated. #### 1. INTRODUCTION Clock skew is the speed limiting factor in digital synchronous systems. Moreover, the clock distribution system can dissipate a considerable amount of power, reaching up to 40% of the total power dissipation of the system [1]. On the other hand, self-timed digital systems do not suffer from the problem of the clock skew. However, the penalty is the increased complexity of the system and the requirement to incorporate a handshaking circuitry that permits reliable communications between asynchronous modules. Each module generates an event (in the form of a signal transition) when it is ready to accept data, and another event on completion of its computation. The use of transition signalling is common in self-timed applications due to the achievable time and power savings [2]. The handshaking modules can be implemented using Event Driven Logic described in [3]. Several standard circuit elements commonly required to process transition signals have been The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-0-387-35498-9\_57 L. M. Silveira et al. (eds.), VLSI: Systems on a Chip developed. These include: Muller-C elements (AND for events), Exclusive-OR gates (EDLXOR), Inclusive-OR gates (EDLINCOR), and inverters. The implementations of theses circuits in Gallium Arsenide technology are readily available [4][5]. Especially important is the Muller-C element which is the core building block used in the two- and four-phase handshake protocols [6]. This paper presents a design methodology for Gallium Arsenide self-timed integrated systems [7] using LCFL latched logic primitives [8]. ## 2. GAAS LATCH-COUPLED FET LOGIC (LCFL) Latch-Coupled FET Logic (LCFL) [8] together with Pseudo-Dynamic Latched Logic are MESFET [9], logic families overcoming the inability of the Direct Coupled FET Logic (DCFL) to support AND connection of the Enhancement type transistors in the pull-down section of the GaAs logic gate as the latch refreshing the output voltage compensates for the leakage current. The basic structure of the LCFL gate is presented in Figure 1. Figure 1. LCFL gate.(a) gate level schematic, (b) transistor level schematic In order to explain the basic operation of the LCFL cell, a two-cell part of a two-phase shift register, shown in Figure 2. together with the LCFL AND gate from Figure 3(a), is considered. The cells are clocked by $\Phi$ and $\overline{\Phi}$ respectively. During the first half of the clock period the signal $\Phi$ is high and, therefore, $V_{out}$ of the first stage is low and there is no interaction with the next stage. Since J5 is "on" and the output is low, J2 is "off". With J0 being "off" the internal node will be high and its voltage will be limited to approximately 0.7 V because of the J4 gate conduction. For the other case of J0 being "on", the internal node will be low. In other words, the transistor stage consisting of J0 and J1 acts as a DCFL inverter connected to the input J4 of the next inverter stage J4/J3. Depending on the input signal *in* two combinations are possible at the end of the first half of the clock period: - 1. in = H internal = L, out = L (L/L) - 2. in = L internal = H, out = L (H/L) Figure 2. An LCFL shift register. Cell circuit diagram (a), two consecutive cells (b) These are the initial conditions for the latch consisting of the two inverters, J2/J1 and J4/J3, which operates during the second half of the clock period. To discuss this behaviour the state diagram for Vinternal and Vout, shown in Fig. 3, will be used. It shows the static transfer curves of the latch J1/J2, J3/J4 where the DC load presented by J2 and J4 is included. The separatrix (thick line) is superimposed on the same graph. The separatrix determines which state, (L/H) or (H/L), will be reached at the end of the second half of the clock period. For example, with the initial condition set to (L/L) in phase 1 which corresponds to $V_{in} = H$ , the latch reaches the final state (L/H) in phase 2, as shown by the trajectory in Figure 3. The initial condition set to (H/L) is in the vicinity of the final endpoint (H/L) and, therefore, poses no problem. The minimum distance between the starting point and the separatrix may be thought of as a noise margin for this type of latched logic because if (L/L) is shifted to the right of the separatrix by noise sources, this will result in an incorrect state. By proper dimensioning the sizes of transistors J0 ... J5 as well as considering the three capacitances, C2, C<sub>m</sub> and C<sub>4</sub>, this distance can be made approximately 150 mV as shown in Fig. 3 for the case of the AND gate. The method of the calculation of the separatrix can be found in [10]. The slope of the separatrix will not change due to the temperature variations since it is determined by transistor parameter and capacitance ratios resulting in a rather robust circuit performance. In terms of power dissipation, LCFL is very efficient because it uses the currents of the pull-up transistors in the latch twice: during phase 1 for the logic evaluation and in phase 2 for the latch function. Moreover, there is an inherent latch property which decouples adjacent stages. This is a major reason that the AND function in the logic is possible as opposed to DCFL. Figure 3. State transition diagram of the LCFL latch used in an AND gate. Circuit schematic (a) and the transition diagram (b) #### 3. SELF-TIMED SYSTEMS IN MESFET GAAS Self-timed systems require that logic cells have several control inputs and that they generate at least one control signal for handshaking. For the typical 4-phase handshaking protocol the input signals are *Enable* and *Start* and the required generated signal is *Done* (*Complete*) as shown in Figure 4. The *Done* signal triggers the *Request* input in the next stage's handshaking block. Figure 4. A classical four-phase pipeline. The logic path consists of the register latching the input signals and the functional block implementing the logic function. The detailed operation of this type of a self-timed pipeline can be found in [6]. The GaAs LCFL logic family can be used efficiently to implement self-timed systems. The clock input can readily be used as the *Request* line, and the logic cell contains a latch which, if proper hanshaking protocol is applied, should allow elimination of the separate latches from the pipeline structure. The *Done* signal, indicating when the logic evaluation is completed, needs to be generated by extra hardware, as does the *Enable* line. ## 3.1 The Muller-C element As can be seen in Figure 4, the Muller-C element is the fundamental component of the handshaking path of the self-timed pipeline. In terms of logic operation, it implements the AND function for events, such that if a specific transition takes place at one input and it is coincident with, or followed by, a similar transition of the other input(s), then that transition will be presented at the output [3]. In conventional logic terms its function can be described as: $$Y(i+1) = Y(i)(A+B) + AB$$ Using an LCFL gate this equation can be implemented in the structure presented in Figure 4. Figure 5. LCFL implementation of the Muller-C gate ## 3.2 Self-timed GaAs LCFL pipeline Figure 6 shows an LCFL cell for self-timed applications. As can be seen the latch is an inherent component of the cell. This property together with the appropriate modifications to the handshake path to suit the GaAs latched logic design style can be utilised to eliminate the separate latch blocks from the pipeline. The modified pipeline is shown in Figure 7. In the self-timed LCFL cell from Figure 6, the *Complete* signal is generated by first producing the complement of the output with a NOR gate which is also controlled by the Request signal. This NOR gate is sized appropriately to achieve equal signal delay at the input of the following NOR gate producing the *Complete* signal. The logic cell operates as follows: When the Request line is high the cell is in the reset state, both lines Out and Out are low and the Complete line is high. When Request goes low and Enable is high the cell evaluates the output and one of the lines Out or Out conditionally goes high which causes the Complete line to go low and this indicates that logic evaluation has been completed. Figure 6. LCFL cell for the self-timed pipeline The *Enable* line is not always needed and in the pipeline structure from Figure 7(a) the logic cells do not contain the *Enable* line. There are cases, however, when *Enable* line cannot be omitted as in the bubble-shift register presented in Section 4.2. Figure 7. LCFL pipeline (a), details of the hanshaking block (b). The modification of the handshake path comparing to the standard fourphase handshake protocol is such that a particular LCFL logic cell is not allowed to enter into the reset state until the following cell completes its evaluation, and also it cannot perform the next evaluation until the cell following it enters the reset state. This handshake protocol operation removes the need for the separate latches between the logic stages. ## 4. ARITHMETIC BUILDING BLOCKS The following sections demonstrate the design of several arithmetic building blocks using the self-timed, pipelined approach. The operation of standard shift-register and bubble shift-register as well as the adder and the accumulator are presented. Especially interesting is the accumulator design because it requires the implementation of a special memory cell needed in the feedback path, as the value stored in the feedback path may have to wait for the asynchronous, input data for arbitrary periods of time. ## 4.1 Shift register The pipeline from Figure 7(a), with the logic cells like the one from Figure 6, with only one transistor in the logic block and no *Enable* transistor, forms a self-timed shift-register. Figure 8. Self-timed GaAs LCFL shift register operation Figure 8(a) presents the HSPICE simulation of the operation of the chain of 4 cells and Figure 8(b) the performance as the function of the spread of the threshold voltage in the range $(-1.5 \div 1.5)\sigma V_T$ for the 0.6 $\mu$ m MESFET technology. # 4.2 Bubble shift register Sometimes the area occupied by the register may become the main constraint, and, if the speed margin provided by the Gallium Arsenide technology is sufficient, the register may be implemented as a bubble-shift register reducing the area requirement by almost 50%. The bubble shift operation is shown in Figure 9(a). There is one more cell, than is needed to store all the data word bits. This cell is used as a bubble and is shifted left. Shifting the bubble from the rightmost to the leftmost position in the register chain is equivalent to shifting the data word one position right. The implementation of the bubble shift register as a GaAs self-timed pipeline for the simplified case of two data bits and three logic cells is shown in Figure 9(c). The feedback path connecting the output of the register to its input is necessary if the data word needs to be preserved. The logic cells in the bubble shift register have to utilise the *Enable* line as in this case adjacent cells store the logic value and the only interaction between the logic cells is allowed when a cell holding no logic value reads the contents of is left neighbouring cell which is equivalent to the bubble being passed from a cell currently holding it to its immediate neighbour on the left. Figure 9. Bubble shift operation Figure 10. Operation and performance of the bubble-shift register The handshake path direction is opposite to the direction of the logic path as the handshake protocol controls passing of the bubble from right to left. The special handshake block is a slight modification of an LCFL cell where two additional reset lines to accommodate *Rst* and *Enable n* control signals have been incorporated. | | syncinonous ai | ia sen-timea designs | | |-------------|-----------------|----------------------|-------------| | | Power | Power in the clock | | | Design | excluding clock | drivers | Total power | | Synchronous | 4 mW | 3.5 mW | 7.5 mW | | Self-timed | 4 mW | 10 | 4 mW | TABLE 1 - Power dissipation comparison between As in the previous case, the performance of the circuit was assessed using HSPICE. Figure 10(a) shows the waveforms at the output of the register for the simplified case of four logic cells (3-bit data word) and Figure 10(b) shows the relationship between the spread of 0.6 $\mu$ m MESFET process parameters and the delay and power dissipation of the circuit. The comparison of the power dissipation of the 12-bit synchronous LCFL register and the self-timed 12-bit bubble shift register is shown in Table 1. Figure 11. A one-bit, self-timed LCFL GaAs adder #### 4.3 The Adder The adder is the centre point of any arithmetic unit. In bit serial calculations a one-bit adder is everything that is required. A one-bit adder is a building block for larger, pipelined stages as is demonstrated in the next section. The appropriate self-timed architecture in GaAs MESFET is depicted in Figure 11. The details of the sum and carry blocks are shown in Figure 12. The adder circuit has been simulated using the models of the 0.6 µm E/D GaAs MESFET technology. Fig 13(a) shows the operation of the adder and Fig 13(b) presents the performance. The delay has been defined as the time between the falling slopes of the Request and Complete signals. The adder exhibits the delay of 190 ps and power dissipation of 200 $\mu$ W for the typical value of $V_T$ of the MESFET process. Figure 12. Details of the sum (a), and carry cell (b). #### 4.4 The Accumulator The accumulator structure for the case of 4 bits is shown in Figure 14(a). It can be observed that the handshaking hardware overhead is not significant, especially for higher numbers of bits, although because of the limited fan-out of the GaAs gates some buffering might be required. Figure 13. Operation and performance of the adder The accumulator contains one-bit adder cells, and pre-skew and de-skew sections consisting of simple delay cells. However, because of the feedback present in the accumulator, special memory cells have to be employed. The memory cells ensure that regardless of the delay in the input data (which can be asynchronous) the accumulator adds correctly the new set of data to the current contents. The memory cell, shown in Fig. 14(b), uses the basic cell from Figure 6 and one Muller-C cell from Figure 5 to read the output of the adder and is triggered by the $\overline{Complete}$ signal from the adder cell. Figure 14. A 4 bit accumulator (a) and a special memory cell (b). Figure 15(a) shows the waveforms at the output of the accumulator for a 4-bit data word 0011, and Figure 15(b) shows the relationship between the spread of 0.6µm GaAs MESFET process parameters and the power dissipation of the circuit. The circuit throughput is 0.5 Gsps and does not depend on the data word width. It is expected that this value will further increase for the 0.4µm process. Figure 15. Accumulator simulation results: (a) output waveforms for 0011 data word, (b) power dissipation as a function of $V_T$ spread ### 5. CONCLUSIONS The paper demonstrates a unified design methodology for Gallium Arsenide MESFET self-timed integrated systems useful for high performance computing. The four-phase handshake protocol has been modified specifically to suit the requirements of GaAs latched logic circuits. The resulting circuits are inherently delay insensitive and power efficient as the clock signal has been entirely eliminated. The latches present in between logic stages of the classic micropipeline have been eliminated using the inherent latching property of the LCFL GaAs logic family leading to further savings in power dissipation and die area. A range of design examples of various arithmetic circuits has been included. ## 6. ACKNOWLEDGMENT The support of the Australian Research Council, German Academic Exchange Office (DAAD) and Centre for Very High Speed Microelectronic Systems at Edith Cowan University is gratefully acknowledged. ## 7. REFERENCES - [1] W. Bowhill, A 300 MHz Quad-Issue CMOS RISC Microprocessor, Technical Digest of the 1995 ISSCC Conference, San Francisco, February 1995. - [2] N.R. Poole, Self-timed logic circuits, Electronics & Communication Engineering Journal, pp 261-270, December 1994. - [3] D.A. Pucknell, Event-driven logic (EDL) approach to digital system representation and related system design processes, IEE Proceedings E, Vol. 140(2), pp. 119-126, 1993. - [4] K. Eshraghian, S.W. Lachowicz, J.F. López, and R. Sarmiento, Efficient design of gallium arsenide Muller-C element, Electronics Letters, Vol. 33(9), pp. 757-759, April 1997 - [5] S.W. Lachowicz, K. Eshraghian, J.F. López, R. Sarmiento, and H-J. Pfleiderer, Low Power GaAs Methodologies for Asynchronous Systems, Chapter in: "Mixed Design of Integrated Circuits and Systems", Kluwer Academic Publishers, London, pp. 157-162, 1998. - [6] I. E. Sutherland, *Micropipelines*, Communications of the ACM, Vol. 32(6) pp. 720-738, June 1989. - [7] S.W. Lachowicz, K. Eshraghian, J. F. López, and H-J. Pfleiderer, *Design of Self-Timed Gallium Arsenide Integrated Systems for Multimedia Computing*, Proceedings of the MIXDES'98 Conference, June 1998, Lodz (Poland), pp 93-98. - [8] S.W. Lachowicz, K. Eshraghian, and H-J. Pfleiderer, *Efficient low power circuit design with Gallium Arsenide latched logic*, Proceedings of the ISIC-97 Conference, pp 158-161, Singapore, September 1997. - [9] J.F. López, K. Eshraghian, R. Sarmiento, A. Núñez and D. Abbott, Gallium Arsenide pseudo-dynamic latched logic for high performance processor cores, IEEE Journal of Solid State Circuits, Vol. 32(8), pp. 1297-1303, August 1997. - [10] K.O. Jeppson, Comments on the Metastable Behavior of Mismatched CMOS latches, IEEE J. Solid-State Circ., 1996, 31, pp, 275-277