Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 54, NO. 12, DECEMBER 2007 1067 A “Flying-Adder” On-Chip Frequency Generator for Complex SoC Environment Liming Xiu, Senior Member, IEEE Abstract—The spirit of system-on-chip (SoC) approach is to integrate more and more system functions into one single chip. Consequently, the on-chip clock requirement could be very complicated due to the various functions the chip has to support. To fulfill those clock needs, it is not uncommon for more than several phase-locked loop (PLLs) to be used within one such large chip. Designing these on-chip PLLs is a very challenging task in term of cost and performance. To solve this problem for a HDTV SoC of over 50 millions transistors, a “flying-adder” architecture based PLL (FAPLL) is constructed. This generic FAPLL is instantiated multiple times in this SoC for different functions, resulting in significant chip cost reduction. Index Terms—Flying-adder (FA), frequency synthesis, phase-locked loop (PLL), voltage-controlled oscillator (VCO). I. INTRODUCTION N today’s system-on-chip (SoC) environment a chip can include many subsystems. As a result, clock requirements could be very complicated. Many frequencies are required to be generated on-chip for ensuring sophisticated operations. One example is a multi-millions-gates HDTV chip which integrates the functions of MPEG2 decoder, NTSC video decoder, OSD, Graphics Accelerator, AC3 audio processor on chip. To fully support these functions of many frequencies, several phase-locked loops (PLLs) are needed. If care is not taken, the implementation difficulty associated with these PLLs could easily reach the level of making the chip impractical. In other words, the cost and technical barricade of clock implementation alone will be enough to kill the legality of the chip. During the design process of this large SoC, investigation has been done on integer- , fractional- PLL and all-digital PLL (ADPLL) architectures [1]–[3]. To generate all the frequencies required, either several cascaded PLLs have to be used (integer- PLL) or compensation circuit needs to be incorporated inside PLL (fraction- PLL). The drawbacks with these implementations are larger size, or greater analog complexity without noticeable performance gain. In this brief, a “flying-adder” architecture based PLL (FAPLL) design approach [4]–[6] is demonstrated to provide an elegant alternative to this challenge. Compared to conventional PLL based synthesis techniques, it has many unique features. The most predominating advantage is its capability of generating frequencies. This is graphically depicted in Fig. 1. As shown, the “flying-adder” frequency synthesizer can be I Manuscript received June 19, 2007. This brief was recommended by Associate Editor S. Pavan. The author is with the Texas Instruments Incorporated, Dallas, TX 75243 USA (e-mail: limingxiu@ti.com). Digital Object Identifier 10.1109/TCSII.2007.906943 Fig. 1. “Flying-adder” architecture: more frequencies. viewed as a phase divider which can provide additional level of frequency resolution. Moreover, with the aid of a technique called “post divider fractional bit recovery (PDFR),” even more frequencies can be generated. Furthermore, if fractional number is allowed on the frequency control word, any asked frequency can be produced. In this SoC, this distinguishing feature of ample frequencies is used to solve many difficult problems. The five new contributions of this brief are: 1) PDFR for generating more frequencies; 2) PDFR for reducing the number of voltage-controlled oscillator (VCO) stages; 3) building multiple synthesizers on one VCO; 4) using the fine resolution of this architecture to achieve the VCXO function; 5) using fine resolution for frame rate synchronization. In this brief, Section II describes the working principle and structure of the FAPLL. Section III discusses the new contributions. Section IV reports the measurement result. Section V is the conclusion. II. FAPLL ARCHITECTURE A. Working Principle of FAPLL As depicted in Fig. 2, the FAPLL is based on a conventional PLL plus a “flying-adder” synthesizer. The frequency transfer function of the synthesizer can be expressed as [6] (1) Where is the desired period (frequency), is the frequency control word and is the time difference between any 1549-7747/$25.00 © 2007 IEEE Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:19 from IEEE Xplore. Restrictions apply. 1068 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 54, NO. 12, DECEMBER 2007 Fig. 4. PDFR. Fig. 2. FAPLL. a “phase divider.” Its highest achievable frequency is only limited by the speed of digital logic. The loop bandwidth is designed around 2 MHz due to the input of a 27-MHz crystal. The PLL loop is designed in such way that the VCO will lock to the input reference in less than 10 s after the power up. Also, the loop is targeted primarily at minimizing the impact of the noise from the input side. The VCO is intentionally constructed to operate at narrow range, making it less sensitive to the control voltage so that better overall jitter performance can be achieved. C. “Flying-Adder” Cycle-Prolong and Its Impact Fig. 3. PLL/VCO structure. two adjacent VCO output phases. When the VCO is locked to the input , the following relationships will be established: (2) (3) (4) Without the “flying-adder” synthesizer, the PLL’s output and input relationship is shown as (5) Comparing (4) and (5), the FAPLL’s advantage can be appreciated immediately. Since two more variables ( and ) have been introduced into the equation, the solution space is greatly enriched. Furthermore, if fractional number is allowed , this FAPLL can achieve virtually any frequency dein sired. B. VCO Structure and the Loop The VCO structure chosen for this FAPLL is shown in Fig. 3. As shown, the basic delay stage inside the VCO is two crosscoupled NAND gates. Four of those stages form the VCO, with . The supply of the NANDs comes eight outputs P1-8 from the loop control voltage so that the VCO oscillation frequency can be adjusted with . The VCO is optimized around 700 MHz to 1.4 GHz for small silicon area. Also, the number of delay stages is minimized to four so that area can be further reduced and layout matching among all the stages can be better achieved. Although the VCO is designed around 0.7 to 1.4 GHz, the synthesizer can boost its output well above 2 GHz. Unlike conventional PLL, where the highest output frequency is limited by the VCO’s oscillating capability, FAPLL can generate much higher frequency than that of VCO since it is When fractional number is used, the output clock signal will have a prolonged cycle from time to time. Assume , where is an integer and is a fraction, then the period (frequency) of the output signal is . Structurally, the output waveform is composed of two types of cycles: and . Whenever cycles of occur, there will be a cycle-to-cycle jitter of . However, this introduced “jitter” is deterministic in nature and is in the safe direction for digital operation. This is owed to the fact that it makes the setup constraint easier to meet. For hold-check, the cycle-prolong is irrelevant since hold-check uses only one clock edge. Therefore, the “jitter” associated with fraction has no impact on digital operation. This is the crucial difference between “flying-adder” cycle-prolong and conventional jitter. III. NEW CONTRIBUTIONS AND THEIR APPLICATIONS A. PDFR As addressed above, when frequency control word contains fractional part, the synthesizer’s output will bear cycle-prolong due to the periodical carry-in from the accumulator. In certain cases, the prolonged cycles can be recovered by the post divider . The working mechanism can be demonstrated through the example of Fig. 4. In Fig. 4, the top waveform shows the output signal at with . Due to the fraction 0.25, the signal has one long cycle of and three short cycles of in every four cycles. Thus, it contains cycle-to-cycle “jitter” of . However, if post divider is set at 4, the output signal at will have a fixed period of for all its cycles, free of such jitter. This technique is called PDFR. In general, if has fractional part and is one of the ’s factors, the irregular cycle caused by can be recovered by post divider . For any given , we can first find all its factors. Then, the inversions of all these factors and their one’s complements can safely be used in the fractional part of without negative impact. For example of , its factors are 2, 4, 8, and 16. Thus, fraction 0.5, 0.25, 0.75, 0.125, 0.875, 0.0625, and 0.9375 can all be used. This feature Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:19 from IEEE Xplore. Restrictions apply. XIU: A “FLYING-ADDER” ON-CHIP FREQUENCY GENERATOR FOR COMPLEX SoC ENVIRONMENT Fig. 5. Number of recoverable fractions for each M. 1069 Fig. 6. Multiple independent clocks from one VCO. TABLE I COMPARISON BETWEEN 32 OUTPUTS AND 8 OUTPUTS Fig. 7. Using FAPLL to achieve VCXO function. can be utilized to produce more frequency points. Fig. 5 shows between 2 the number of fractions that are recoverable for and 1000. For , there is only one recoverable fraction and collapse. For all the prime numof 0.5 since and its one’s bers, there are only two recoverable fractions ( complement). From (4), it is derivable that where is a constant when all its factors are fixed. Therefore, the new frequencies added by the recoverable fracand . tions lie between Furthermore, these frequencies are distributed almost linearly and , proportional to the magnitude of . This between can be shown as follows: B. Using PDFR to Reduce VCO Stages “Flying-adder” architecture is based on multiple VCO outputs, or stages. The more stages there are the more frequencies that can be generated. However, more stages will be translated into higher implementation cost. PDFR can be used to reduce the number of VCO stages without sacrificing the resolution (assuming the same stage delay, or ). This can be demonstrated by Table I. In this table, first two columns show the relationship between control word and output frequency for the case of VCO32 (a VCO with 32 outputs). When control word F32[5:0] varies from 000010b to 111111b, the output frequency (Period32) is to . The rest columns are for the changed from case of VCO8 (a VCO with 8 outputs). When F8[3:0] varies from 0010b to 1111b, the output frequency (Period8) shifts to . However, if we add a divider after from the synthesizer and borrow two bits from the fractional part (F8_f[1:0]), we can produce frequencies from to . This technique has been used in constructing this SoC’s video PLL to mimic a VCO32 FAPLL used in previous projects (to accommodate legacy issues). C. Building Multiple Synthesizers on One VCO Unlike conventional PLLs, where each PLL can only produce one independent clock, FAPLL can support multiple independent clocks from one VCO since several synthesizers can be constructed from the same VCO. This can reduce the PLL count and is extremely helpful in trimming the cost for large SoCs. Fig. 6 shows one example of such implementation. From one VCO, three independently controllable clocks are generated: clk_usb which is directly derived from VCO; clk_ddr and clk_arm are based on two synthesizers which are separately conand , respectively. trolled by D. Using Fine Resolution to Achieve VCXO Function From (4), it is understandable that fine frequency resolution . Furthermore, can be achieved at if fraction is used in varies in small region the it can be proved that when clock frequency follows the ’s change linearly. An example is shown in Fig. 7. This feature can be used to replace external VCXO chip or on-chip VCXO component in clock recovery system. It has been utilized in this SoC with significant cost reduction since this feature is “free” with FAPLL. Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:19 from IEEE Xplore. Restrictions apply. 1070 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 54, NO. 12, DECEMBER 2007 Fig. 8. Overall on-chip clock structure. Fig. 9. Jitter measurement at 148.5 MHz. E. Fine Resolution and Frame Rate Synchronization The fine resolution of FAPLL can also be used to solve problems for system which requires nonstandard frequencies, such as the frame rate synchronization problem in HDTV application. In TV broadcasting, different contents (movies, advertisements, sports, etc) can be broadcasted in different frame rates, such as 60 or 59.94 Hz. However, the pixel clock, which is used to drive the display device for displaying the contents, is usually obtained from PLL which can only produce several commonly used frequencies. For example, in HDTV 720p mode, the pixel clock required for 60-Hz frame rate is 74.25 MHz and the PLL is designed for this frequency. But for 59.94 Hz, the needed frequency is 74.17575 MHz. To solve this problem, a video frame can be added or deleted once in a while in the video stream to match the rate, but resulting in visible artifact. Or dedicated crystal or VCXO can be added just for accommodating the odd frame rates. With its fine resolution, FAPLL can easily solve this problem by generating the 74.17575 MHz precisely. F. FAPLLs in This SoC This large HDTV SoC has several subsystems; each has its own clock requirement. The key design challenge is to reduce the number of PLLs without sacrificing the performance. Fig. 8 is the overall on-chip clock structure of this SoC. In this section, the FAPLL-oriented implementations will be demonstrated. The system-clock domain presents the simplest design constraint. It only requires 216, 108, 54, 27, and 13.5 MHz. Thus, the VCO is set to run at 864 MHz and five dividers, along with phase alignment circuit and glitch-free clock switches, are used to produce those frequencies simultaneously. For this clock domain, the challenging issue is to constantly adjust the PLL to track the clock embedded inside the video stream. This is elegantly realized by the FAPLL’s VCXO capability of Section III-D. In terms of frequencies required, audio-clock domain presents the toughest challenge. Table II shows all the frequencies needed for this SoC’s audio application. The first column is the sampling frequency in kilohertz, whereas the first row is the over-sample rate. The numbers presented in the table is the audio clock frequencies in megahertz. Conventionally, to generate all these frequencies, two or three cascaded PLLs are required. By using FAPLL, this problem can be solved by just one component. As shown in Fig. 8, the input reference of this audio PLL is 86.4 MHz, which can be obtained from the system PLL’s VCO frequency of 864 MHz. TABLE II THE AUDIO CLOCK FREQUENCIES By setting the , and appropriately, all the required frequencies can be generated under the constraints of: VCO in the optimized range of 700 MHz to 1.2 GHz, in the range of 17.28 to 28.8 MHz. For the example of 45.1584 MHz, MHz , one setting could be: MHz , MHz and MHz . During the design, an algorithm has been developed to search for the , and setting based on (4). The PDFR technique introduced in Section III-A is also incorporated in the algorithm. For a requested frequency, the algorithm will likely produce more than one solution. For those cases, it can select the best-fit solution based on user provided criterions, such as being in the middle of the optimized range, PFD frequency being reasonably high, synthesizer output frequency being as low as possible, output signal ’s duty cycle being as close to 50% as possible, etc. The priority of those criterions will determine which solution wins. The ARM/DDR PLL needs to support three independent clocks: , and . From 100 to 250 MHz, several frequencies are required. However, the precise frequency is not important as long as there are enough frequency points available in this range to guaranty the system operation. The more frequencies there are in the range, the more flexibility the system can enjoy in the operation. The structure of multiple synthesizers on one VCO (Fig. 6) has been used to fulfill this need with minimum cost. Moreover, PDFR has been utilized to improve the available frequency points in this range from 21 to 60. Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:19 from IEEE Xplore. Restrictions apply. XIU: A “FLYING-ADDER” ON-CHIP FREQUENCY GENERATOR FOR COMPLEX SoC ENVIRONMENT 1071 Fig. 10. Spectrum measurement for three frequencies. The display PLL is used to drive the display engine which supports the display modes of 480i, 480p, 720p, 1080i, 1080p and etc. The major frequencies required are: 13.5, 27, 31.5, 36, 33.75, 40, 50, 49.5, 56.25, 74.25, 75, 78.75, 85.5, 94.5, 108, 148.5, 135, 156, 157.5, and 162 MHz. These frequencies can all be generated by the FAPLL without the use of fractional bits in . Additional frequencies, such as 25.175, 35.5, 50, 56.25, 65, 68.25, 75, 79.5, 101, 102.25, 117.5, 121.75, are required for some graphic modes. They can be produced with the help of the fractions. The fine resolution of FAPLL has also been used to solve the frame rate synchronization problem as discussed in Section III-E. Video PLL is used for an on-chip video decoder that converts analog NTSC composite signal to digital component video signal. In this application, the frequencies required are not predetermined but could be any value in real applications. Due to this special requirement FAPLL is the only choice; there are hardly any other alternatives. As mentioned in Section III-B, the PDFR technique is used in Video PLL to make the 8-phases FAPLL mimic the legacy 32-phases FAPLL in previous projects. is not mismatch in layout (the mismatch is deterministic and will show up as spurs); 2) the PDFR faithfully recovers the “flyingadder” cycle-prolong with no spur resulted in output clock. V. CONCLUSION “Flying-adder” frequency synthesis architecture was invented several years ago. Since then, many improvements have been made in circuit level. In this brief, a FAPLL design approach has been established in system level which initiates a new direction in PLL design. This is mainly due to the design-solution-space enrichment and the fine resolution provided by the new approach. The powerfulness of this new FAPLL method has been demonstrated through a real example of a large SoC with the achieved goal of “cheaper, better, and faster.” ACKNOWLEDGMENT The author thanks G. Cook, D. Dudek, J. Nave, G. Xu, and B. Parthasarthy for their invaluable help. REFERENCES IV. THE TEST REPORT The SoC which contains multiple FAPLLs has been used in real TV application with no clock-related problem reported. The size of the FAPLL (Fig. 2) is in a 90-nm CMOS process. Its supplies are: 1.8 V for analog circuitry, 1.1 V for digital blocks. The power consumption is around 10 mW. The jitter measurement at one HDTV frequency 148.5 MHz is pk-pk 90 ps as shown in Fig. 9. The spectrum measurement has been performed for all the audio frequencies listed in Table II. Fig. 10 is the plots of three frequencies 6.144 (left), 45.1584 (center), and 65.536 MHz (right). For the 65-MHz case, the setting of and is used. In this case, the fraction is used and it is recovered by the . All the other frequencies’ spectrums look similar. These measurements show that: 1) there N [1] J. Hakkinen and J. Kostamovaara, “Speeding up an integer- PLL by controlling the loop filter charge,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 50, no. 7, pp. 343–354, Jul. 2003. [2] S. E. Meninger and M. H. Perrott, “A 1-MHz bandwidth 3.6-GHz 0.18-um CMOS fractional- synthesizer utilizing a hybrid PFD/DAC structure for reduced broadband phase noise,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 966–980, Sep. 2004. [3] R. B. Staszewksi and P. T. Balsara, “Phase-domain all-digital phaselocked loop,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 3, pp. 159–163, Mar. 2005. [4] H. Mair and L. Xiu, “An architecture of high-performance frequency and phase synthesis,” IEEE J. Solid-State Circuits, vol. 35, no. 6, pp. 835–846, Jun. 2000. [5] L. Xiu and Z. You, “A ‘Flying-Adder’ architecture of frequency and phase synthesis with scalability,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 5, pp. 637–649, Oct. 2002. [6] L. Xiu and Z. You, “A new frequency synthesis method based on ‘Flying-Adder’ architecture,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., pp. 130–134, Mar. 2003. N Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:19 from IEEE Xplore. Restrictions apply.