MIXDES 2005: Wideband Mcml Basic Cells In 0.35 Μm Cmos R. M. R, J. J. D
35 µM CMOS
35 µM CMOS
ABSTRACT: This paper presents the design of three basic wideband MCML (MOS Current Mode Logic) Cells needed
in almost every optical communication transceiver. These basic cells are: frequency divider, 2:1 Multiplexer and 1:2
Demultiplexer in 0.35µm CMOS technology. The step-by-step optimization procedure is presented for these cells. For
demonstration/varification purposes three different test chips have been designed including input/output data buffers to
drive 50-ohm off-chip load to emulate the real world conditions. Post layout simulation results, using AMS RF-CMOS
models are quite promising. The frequency divider operates from 500MHz to 2.5 GHz, while 2:1 MUX and 1:2
DEMUX operate at rates of 400Mb/sec to 2Gb/sec, all of them driving 50 ohms off chip load. These chips are on their
way to Austria Micro System (AMS) for manufacturing.
INTRODUCTION In this paper step-by-step optimization procedure is
presented to optimize the static MCML frequency
The explosive demand for high-speed data rates has divider. The same optimization procedure is applied on
revitalized optical communication, motivating extensive 2:1 MUX and 1:2 DEMUX yielding promising results.
research in high-speed devices, circuits and systems. The other design reported in literature like [9] with high
The high-speed data standard like Gigabit Ethernet and operating frequencies does not drive any load and hence
SONET (Synchronous Optical Network) demand very less useful in practice. Even in case of on-chip
cheap integrated circuits solutions. integration at least a load of one similar cell is inevitable
Modular general-purpose blocks to implement above degrading the performance of circuit significantly.
standards are gradually replaced by end to end solution
that benefit from device/circuit/architecture co-design, STATIC CMOS, MCML AND DyMCML
and main stream CMOS technologies continue to take LOGIC BUFFERS
over the territories thus far claimed by GaAs and InP
devices [1][3]. The block diagram of Static CMOS, MCML and
In this paper we present three basic cells essential in DyMCML (Dynamic) buffer is shown in Fig.1 At low
implementation of any high bit rate communication frequencies the static CMOS has best energy-delay
system, hence the mature and cheap 0.35µm CMOS performance. Static CMOS draws current while
digital technology can be used for application like switching and power consumption increases with
Gigabit Ethernet on a single chip. switching frequency. In contract the MCML consumes
The cells are based on static MCML D-latch structure. the fixed amount of power [2]. The current is switched
Cascading and variation in connection structure of D- from one leg to another leg of the differential pair
latch transforms it in to frequency divider, 2: 1 MUX keeping the power supply current almost constant over
and 1:2 DEMUX. The MUX and DEMUX cells have the time irrespective of switching speed. This feature is
additional latches in data path to retime the data stream very attractive in mixed signal environments where the
to align it with CLK signal for proper operation. noise coupling through supply lines is very high and
Wideband structure is necessary to make the library actually a limiting factor for many system parameters.
cells generic for different application, especially for
operation at low frequency for backward compatibility. V CC VCC VCC
The major problem with the wideband design is the V CC
increase in the rise and fall time with decrease in RD RD
frequency. Dynamic MCML [7] design, which is M2
inherently more power efficient at relatively lower In Out
Inputs MCML Logic
frequencies, suffers from two disadvantages. The
M1 M2
operation at lower frequencies when the rise and fall M1 CLK
time becomes high is not reliable and robust. Secondly, Vbias
it generates more noise due to its dynamic nature of ID
operation. Gnd
a) b) c) Gnd
Fig1:a) Static CMOS b) MCML and c) DyMCML Buffers
At higher speed the Energy-delay product of the MCML
logic is better than static CMOS [2][3]. But the constant
power consumption of MCML is disadvantageous in D Q D Q
very large MCML circuits that have smaller numbers of L1-Master L2-Slave
logic blocks operating at higher frequencies. Also the Dn Qn Dn Qn
MCML circuits are not suitable for power-down mode. off +Chip
Due to reduced swing the logic depth of MCML circuits
has to be limited to guarantee robust circuit operation. CLK
DyMCML Logic circuits overcome the latter
disadvantage at the expense of dynamic power
consumption and higher number of transistors [4]. Our Fig2: Block Diagram of Frequency Divider
target cells are meant for low noise mixed signal
environment so MCML is a reasonable choice. In any practical application circuit divider has to drive
at least one similar load. This means that the second
Assume that MCML buffer in Fig.1 (b) is biased with latch has to drive double the load compared to the first
zero input differential voltage so that all voltages are at one.
their common-mode value [5] The best way to optimize this circuit is to optimize the
whole structure together with the output buffer so that
V swing # I D R D the effect of non-symmetric loading on L1 and L2 is
taken in to account.
$W % The necessary condition for latching pair is gm(Latch
Av # g m R D # R D ! C ox & ' ID and (1)
( L ) Transistor) RD>1 to retain the state for half cycle. The
There exist no standard values for output swing and the
internal swing of MCML logic circuits. We used
800mV logic swing to comply with CML standard and Fig3: ½ Frequency Divider circuit
400mV as low power option. The low swing was
A typical sensitivity curve is shown in Fig.4. The
achieved by halving the current through output buffer
Frequency f so is defined as the dividers natural
current mirrors. This cuts the power dissipation to
oscillation frequency when clock amplitude is set close
almost half.
1 gm 1 ! nV on to zero.
fT # #
2* C g 2* L2
Voltage VCLK,diff is defined as maximum DC differential The upper limit to VCLK,CM is such that to keep the
voltage applied to the CLK inputs for which frequency transistor M1 and M2 in saturation when Q or Qn is
divider oscillates. VCLK,diff is a function of the size of high for complete current switching.
transistors, it strongly depends upon the size of clock During the fourth step of optimization it was observed
input transistors M5 and M6. As size is made smaller, that for this particular circuit topology, scaling the
larger DC voltage is required at the input of differential power supply voltage from 3.6V to 3.3V does not have
pair for current to be fully switched. VCLK,diff also any significant effect on the maximum frequency and
depends upon the size of the latch transistor [7]. The swing of frequency divider. Further scaling down to the
larger the transistor the more easily frequency divider next standard supply of 2.5V deteriorates the response.
circuits oscillate. All of these observations are applied At high temperatures the gm of transistor drops and the
during the iterative optimisation procedure in the next product gmRD is no longer greater than unity. As a result
section. the hold stage inside the D-latch in frequency divider
fails to hold the data for half the clock cycle. This
½ Frequency Divider Optimization results in wrong frequency division factor or even
complete failure of frequency divider.
The optimization of frequency divider is an iterative For correct operation at higher temperature increase of
process. The external voltage swing has selectable fixed both the supply current and RD value is necessary to
value of either 400mV or 800mV. The internal voltage satisfy the condition gmRD>1. The results after
swing is close to 600mV at frequencies higher than optimization are tabulated in Table-1.
2GHz and increases when switching frequency TABLE 1. Optimized Frequency divider parameters
decreases. The parameters to be optimized are bias
Current, RD, size of transistors, common mode input Parameter Value
voltage VCLK,CM, and supply voltage. I D , RD 400 !A, 2400 "
The block diagram in Fig. 5 depicts the step-by-step VCM,inout 2V
procedure adopted to optimize the parameters. Initially MD , ML 5!m/0.35!m
the bias current and RD were selected so that the product MC , MBias 10!m/0.35!m
equates to the internal swing. In 2nd cycle the transistor
size was changed. Smaller transistor size has positive 2:1 MUX AND 1:2 DEMUX CIRCUITS
effect on performance parameters but this improvement
ceases after certain critical size depending upon other
Fig. 6 shows the transistor level block diagram of the
circuit requirements. Using the smallest possible size
2:1 MUX circuit This MUX topology is among the
transistors is limited by metal-1 trace current density,
fastest circuits in any given technology. The necessary
which connects the drain and source of finger type
condition for correct operation is that the input streams
AMS RF CMOS transistor. The small transistors are
must be offset from each other in time domain so as to
fast but demand higher common mode voltage to switch
avoid the simultaneous transition in both inputs of
the current completely [7] in frequency divider
MUX. The optimum value of this offset is half CLK
structure. Length of transistor is always minimum
period. This half CLK period is obtained by introducing
which is 0.35um for the obvious reason that is to attain
two latches in one data path D1 and three in data path
higher ft. (1).
D2 [Fig.9]. This arrangement aligns any arbitrary data
The third iteration is with VCLK,CM, (2) implies that
with input CLK edges with offset of half CLK cycles
higher the VCLK,CM higher the circuit operating
between D1 and D2.
The circuit in Fig.7 shows 1:2 DEMUX, the circuit
operation is simple and straightforward. The data rate is
Step twice the CLK. The circuit works both in the positive
and negative half cycle of the CLK.
Transistor VCC
Input Common
Power Supply D1 M1 M2 M3 M4 D2
VBias M7 ID
o o
27 C 100 C
Fig5: Optimization Procedure for Frequency Divider Fig6: 2:1 MUX core circuit
The shaded flip-flop gets one phase of clock and un- For example for 50& off-chip resistive load and voltage
shaded flip-flop gets the inverted clock. Latch L1 is swing of 800mV, ID is 16mA. The termination is outside
transparent in first half of the CLK and latches the data the chip and to avoid the reflection 100-200& resistor is
for second half of the clock. L2 latches data after rising used on chip. This makes the equivalent RD=40,
edge of the CLKn and keeps it for next half. Latches L4 further increasing ID. The design equation for optimal
and L5 work in similar manner but in opposite CLK number of stages is very similar to the staged static
halves. L3 aligns the data with L5, so that data is CMOS buffers [8].
available at the same time for parallel synchronous In this design single stage MCML buffer is used to feed
circuit to read. the CLK and input signal to frequency divider while
The 1: 2 DEMUX can be realized with three latches but three stage MCML buffer is used to drive the 50 ohms
this circuit demands that incoming data be edge aligned the off-chip load. The buffers are designed so that they
with clock. The circuit with five latches adjusts better provide the optimized common mode voltage and swing
for any possible misalignment between data and CLK to the circuit in the next stage. The MUX and DEMUX
signals. circuits have three input CLK buffers connected as
shown in Fig. 9 to reduce the maximum fanout to three.
Master Slave Master
The data input buffers are single stage buffers with one
L1 Qa
L2 Qb
similar load. The data output buffers of both circuits are
Dn Qn Dn Qn Dn Qn Qn multistage and designed to drive 50 & off-chip loads.
The 50 & on-chip input resistors in CLK and data input
path are implemented using HPOLY (High Resistive
Master Slave
Ploy Resistors) and all others resistors are implemented
Q Q Q1
L4 Qc
L5 in POLY2 in standard 0.35µm AMS CMOS process.
Dn Qn Dn Qn Q1n
D iv id e
C irc u it
1 W
ID " ! Cox I DV 2in ,min VCC VCC
(3) D1 D2
2I D
Vin ,min " BUF OUT OUTn BUF
W D Q D Q D Q D1 D2 Q D Q D
! Cox VCC
Dn Qn Dn Qn Dn Qn D1n D2n Qn Dn Qn Dn
D1n D2n
Multiplying the equation (1) and equation (3) results Current
Vin ,min Av
Av .Vin ,min " 2 $ I D RD # " CS VCC CLK VCC CLKn VCC TUNE
Vin ,min 2 (4)
For digital operation Av % 2 . The RD and the ID of the
last stage of bufer is determined by the load resistance. Fig9: Floor Plan of 2:1 MUX
and CLK recovery circuits show the same performance,
the complete 1Gbit/sec Ethernet transceiver might be
possible on single chip using inexpensive and mature
0.35um digital CMOS technology.
One major factor, which hampers the high-speed
operation of these test chips, is 50 ohm off chip load for
each cell. In an integrated transceiver application the
cells have to drive smaller on-chip load as compared to
50 ! off chip loads, improving the speed of circuit.
Most of the power is dissipated in CLK, data in and data
output buffers. In any on-chip transceiver the core cells
will be replicated without buffers, so total power
Fig10: Layout of Frequency divider and 2:1 MUX with dissipation will be reduced.
IO and CLK Buffers
Acknowledgements – This work was sponsored by
The layout is symmetrical and balanced. Metal2 and Fraunhofer Institute of Integrated Circuits, Germany.
Metal1 wide traces are used one above the other to
provide decoupling capacitance on chip. THE AUTHORS
Post layout Simulation Results Rashad.M.Ramzan and Dr.hab. Jerzy J. Dbrowski are
with Dept. of Electrical Engineering, Linköping
The post layout results of frequency divider with chip University, SE-581 83 Linköping, Sweden;
area of 1230 mX1260 m are tabulated here. The circuit Dr. Dbrowski is also with Institute of Electronics,
operates correctly from 500 Mhz to 2.5 GHz. At VCC of Silesian University of Technology, PL-44 100 Gliwice,
3.3V the total power consumption of divider cell is 4.62 Poland. E-mail: rashad@isy.liu.se
mW and total chip power consumption is 120mW with
wideband cells were presented in this paper. Post layout
results are promising. If the other analog cells like PLL