Numerically Controlled Oscillator With Spur Reduction: Hans-Jörg Pfleiderer Stefan Lachowicz
Numerically Controlled Oscillator With Spur Reduction: Hans-Jörg Pfleiderer Stefan Lachowicz
Reduction
Hans-Jrg Pfleiderer*
Institute of Microelectronics
Ulm University
Ulm, Germany
hans-joerg.pfleiderer@uni-ulm.de
Stefan Lachowicz
School of Engineering
Edith Cowan University
Perth, Western Australia
s.lachowicz@ecu.edu.au
AbstractThis paper presents a novel method of reducing the
spurious signal content in a digitally synthesized sine wave at the
output of a numerically controlled oscillator (NCO). The
proposed method uses a linear approximation subsystem with a
reduced size look-up table (LUT). Two NCO architectures are
considered. Architecture 0 - which is the standard - in which the
accumulator word length is longer than the LUT address word, is
compared with Architecture 1, where the accumulator bits
beyond the LUT address space are used for the linear
approximation of the value in between the entries of the LUT.
Analysis of both architectures demonstrates that the spurious
free dynamic range (SFDR) in Architecture 1 equates to 12 dBc
per bit of the address space of the LUT as opposed to 6 dBc for
Architecture 0. The system was implemented and tested using the
Xilinx Spartan 3 platform.
Index TermsNumerically controlled oscillator; Digital
waveform synthesis; Spurious free dynamic range, FPGA
I. INTRODUCTION
Field Programmable Gate Arrays (FPGA) are becoming
more widely used in Digital Signal Processing (DSP) systems.
Modern FPGAs exhibit very high performance, very large
number of IOs and high capacity memory on the chip. The
development is rapid and risks are low since the IP can be
easily modified. For the low to medium production volume
range, it is not economical to develop an ASIC and, if needed,
there are companies who offer convenient solutions to allow
for the seamless migration from FPGA to ASIC.
The evolution of FPGAs has enabled to implement more
and more sophisticated DSP subsystems in the form of
dedicated hardware on the device. In late 1990s FPGAs usually
had just a few hardware multipliers per chip. In the early
2000s, hardwired multipliers in large numbers (up to ~500),
with clock speeds of up to 100 MHz became available. Mid
2000s saw DSP algorithms like full FIR filters implemented,
and currently, powerful arithmetic, such as fast square root and
divide, floating point operations and many more are being
implemented.
In DSP systems, nonlinear functions, such as trigonometric,
square root or Bessel functions are often encountered. There
are several solutions to this problem, for instance, using a
Figure 1. Conventional digital sinusoid generation (Architecture 0).
CORDIC core, or look-up tables (LUTs). A great deal of effort
is expended to optimize LUTs to maintain the precision
required, whilst minimizing the actual size of the table.
For some functions, an existing approach is to utilize
hierarchical LUTs. In this work, we present a numerically
controlled oscillator (NCO) - sometimes referred to as a direct
digital frequency synthesizer (DDFS) - using a LUT and a
linear approximation circuit for the generation of the sine wave
values in between the entries of the LUT.
Today`s advanced DDFS solutions are fast becoming an
alternative to traditional analog synthesizers. The main
advantages of a DDFS system are [1]:
x Micro-Hertz tuning resolution of the output frequency
and sub-degree phase tuning capability.
x Extremely Iast 'hopping speed with continuous
phase.
x No manual system tuning as in analog systems.
x Unparalleled matching and control of I and Q signals
in a quadrature synthesizer.
The conventional digital sinusoid generation in a NCO,
referred to as Architecture 0, is presented in Fig. 1. In this
system, there are several sources of spurious frequencies.
Firstly, the precision of the samples stored in the LUT creates a
noise floor corresponding to the number of bits used for storing
the samples. If the NCO is followed by the digital-to-analog
*The support of the School of Engineering at Edith Cowan University is
gratefully acknowledged.
MIXED DESIGN
MIXDES 2009, 16
th
International Conference "Mixed Design of Integrated Circuits and Systems", June 25-27, 2009, d, Poland
>jktmdbco -++4 ]t ?`k\moh`io ja Hd^mj`g`^omjid^n ! >jhkpo`m N^d`i^`' O`^cid^\g Pidq`mndot ja Gj_u 112
converter (DAC), the precision of the converter further limits
the signal-to-noise ratio (SNR). The influence of the DAC will
not be considered further in this paper. In the digital domain,
this quantization noise can be at least partially suppressed by
dithering [2]. The second, and most dominant source of
spurious frequencies, severely limiting the spurious free
dynamic range (SFDR) is the truncation of the phase word
related to the size limit of the LUT. Because both the phase and
amplitude samples are periodic sequences, their finite word
length representations contain periodic error sequences, which
cause spurs. The spur signal level in Architecture 0 is
approximately 6 dB per bit of representation below the
amplitude of the desired sinusoidal signal [3]. An example
output spectra of Architecture 0 are shown in Fig. 2 for the
LUT with 11-bit and 12-bit address (in both cases truncated
from 16 bit phase register) spaces and. As expected, the
improvement in SFDR is 6 dB.
(a) 11-bit LUT
(b) 12-bit LUT
Figure 2. Output spectrum of Architecture 0.
The remainder of the paper is organized as follows. Section
II presents the method of linear approximation as applied in the
NCO, referred to as Architecture 1. Section III deals with the
analysis of the performance of Architecture 1. Section IV
(a)
(b)
Figure 3. Linear approximation of a nonlinear function.
presents the results of the simulations and hardware
implementation, and the obtained output spectra of the NCO. It
is followed by conclusions.
II. NUMERICALLY CONTROLLED OSCILLATOR ARCHITECTURE
WITH LINEAR APPROXIMATION
A. Linear Approximation with Look-Up Tables
One method of enhancing the LUT method of generating a
nonlinear function such as a cosine, is to use a linear
approximation circuit. Every function can be approximated by
a piecewise linear function as presented in Fig. 3 (a). Within a
segment (or, between the node points) the function can be
expressed using the line equation:
) ( ) (
1
1
i
i i
i i
i
x x
x x
y y
y x f
(1)
There are two ways to select the values of the function at the
node points (segment boundaries). The first is simply to
choose the exact values (within a given precision), as
represented by the node points of the linear segment
() in
Fig. 3 (a). In this case, the error ()
() is nonnegative
for the range of x where the function is convex, and
nonpositive for the range of x where the function is concave.
The second way is to adjust the boundary points in such a way
that they are shifted from the exact values by half of the
average of the maximum errors in the segments to the left and
to the right of the segment boundary. In this case, the
maximum error magnitude
1
in the segment is
halved and its sign changes within a segment. However, in the
special case of the harmonic function, this adjustment in the
time domain will mainly affect the magnitude of the respective
frequency components, which can be instead corrected by
adjusting the gain slightly. Therefore, in our system, the former
method of selection is used.
0 5 10 15 20 25 30
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
X: 3
Y: -60.48
Harmonic
M
a
g
n
i
t
u
d
e
(
d
B
)
0 5 10 15 20 25 30
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
X: 3
Y: -66.79
Harmonic
M
a
g
n
i
t
u
d
e
(
d
B
)
113
Figure 4. NCO Architecture 1
The hardware mapping is presented in Fig. 3 (b). Here, the
segments of the LUT are evenly distributed, so the input word
x is split into two parts: the more significant part is an address
for the LUT and the less significant part is an increment of x
within the current segment of the LUT. The node values y
i
and
y
i+1
are read from the LUT, the increment y
i+1
y
i
is calculated
and is subsequently multiplied by the increment 'x. The shift
replaces the division in (1), as the difference x
i+1
x
i
is constant
and a power of two.
B. Numerically Controlled Oscillator Architecture
Fig. 4 presents the proposed Architecture 1 of a NCO with
an embedded linear approximation circuit. All accumulator
bits are fed into the phase-to-amplitude converter. The top
portion of the accumulator bits is used to address the cosine
LUT and the remaining bits are used for linear approximation.
The LUT contains the cosine samples in the first quadrant, and
some simple logic is used for address sequence reversal in
quadrants 2 and 4, and sign change in quadrants 2 and 3 to
obtain the full period of the waveform.
III. NCO PERFORMANCE
In [3], the following analysis yields an upper bound on the
size of the largest frequency component in the spectrum e
A
[n]
(here, n stands for the sample number), of the error sequence of
a mid-tread quantizer, applicable to Architecture 0. Assuming
that the quantizer is not saturated by the input signal x[n], the
maximum possible quantization error is '
A
/2 where '
A
is the
amplitude quantization step size. The total power in e
A
[n] is
then bounded by '
A
2
/4. By Parseval`s relation, the sum of the
spur powers in the spectrum of e
A
[n] equals the power in e
A
[n].
In order to maximize the power in a given spur, the total
number of spurs must be minimized. Since e
A
[n] is real, the
maximum power in a spur occurs when there are two frequency
components at +Z
spur
and -Z
spur
, with equal power (excluding
DC offsets and half sampling rate spurs which can be easily
eliminated). With two frequency components the power in a
single spur is d '
A
2
/8.
Since x[n] is real, its spectrum consists of a positive and a
negative frequency component, each having power A
2
/4. Using
the above bound on spur power, the SFDR is d '
A
2
/(2A
2
).
If A | provided b is not small, then in dBc (decibels in
respect to carrier), SFDR d 3 6b dBc, where '
A
= 2
-b
, and b is
the word length in bits. In summary, this upper bound on
power in a spur caused by amplitude quantization exhibits
-6 dBc per bit behaviour.
In [4], it is shown that for linear approximation, the worst
case approximation error is bounded by
=
2
8
max" (2)
where h = x
k +1
x
k
and x = [x
k
, x
k +1
), as in Fig. 3(a). For a
cosine:
max" = 1 (3)
Thus,
~
2
. Consequently, doubling the LUT size, and
therefore halving h, reduces the maximum error by a factor of
4, ie. 2 bits. Therefore, if the quantization step corresponds to
the maximum error, the result is -12 dBc per bit behavior. Fig.
5 presents the output spectrum of the NCO, with a 12-bit
address-space LUT and precision of 20 bits. As expected, the
SFDR is approximately 12 u 12 = 144 dB.
Figure 5. Output spectrum of a NCO with 12-bit address space and 20-bit
precision
IV. IMPLEMENTATION
The reduced-spur NCO has been designed, synthesized for
the Spartan 3 platform, and simulated and tested using Xilinx
ISE tools and ModelSim. The LUT was constrained by the
available size of the block RAM in a Spartan 3 chip. The size
of the block RAM is 512 u 32 bits. Only one quadrant needs
to be stored in RAM. Additionally, one memory location was
split into a 20-bit value and a 12-bit increment of the
successive sample. Therefore, effectively the LUT stored 1024
values for one quadrant of the cosine, yielding a 12-bit
addressing space for the full period of the waveform. The
design required the following resources in an xc3s1000 chip:
x 325 slices (1.3 %)
x One block RAM (4 %)
x 4 hardware multipliers (16%) forming one combined
multiplier for word lengths greater than 16.
The master clock frequency was 50 MHz and the latency
of the NCO was 1 clock cycle, which would allow very fast
frequency hopping and compares favourably with the DDFS
system presented in [5] which requires 11 clock cycles, while
0 500 1000 1500 2000 2500 3000 3500 4000 4500
-200
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
X: 4095
Y: -144.6
Harmonic
M
a
g
n
i
t
u
d
e
(
d
B
)
114
using a smaller LUT. The RTL schematic of the proposed
NCO is shown in Fig. 6
V. CONCLUSIONS
A novel architecture of a numerically controlled oscillator
has been developed. A distinguishing feature of the proposed
solution is the application of linear approximation, leading to
significant improvements of the spurious performance of the
NCO. It is shown that if all the bits of the phase accumulator
of the NCO are used for phase-to-amplitude conversion using
the linear approximation, as is the case of Architecture 1, the
SFDR increases at a rate of 12 dBc per per bit of the LUT
addressing space, as opposed to 6 dBc in the standard
architecture (Architecture 0). Future work will include the
application of variable step LUTs [6], and 2
nd
order
polynomial interpolation.
REFERENCES
[1] Analog Devices, 'A technical tutorial on digital signal synthesis,
http://www.analog.com/static/imported-files/tutorials/
450968421DDS_Tutorial_rev12-2-99.pdf, 1999.
[2] L. Schuchman, 'Dither signals and their eIIects on quantization noise,
IEEE Trans. Commun. Technol., Vol. COM-12, Dec. 1964, pp. 162-165.
[3] M. J. Flanagan and G. A. Zimmerman, 'Spur-reduced digital sinusoid
synthesis, IEEE Trans. Communications, Vol. 43, No. 7, July 1995, pp.
2254-2262.
[4] D-U. Lee, R. C. C. Cheung, W. Luk and J. D. Villasenor, 'Hardware
implementation trade-offs of polynomial approximations and
interpolations, IEEE Trans. Computers, Vol. 57, No. 5, May 2008, pp.
686-701.
[5] Y. Song and B. Kim, 'Quadrature direct digital Irequency synthesizers
using interpolation-based angle rotation, IEEE Trans. VLSI Syst., Vol.
12, No. 7, July 2004, pp. 701-710.
[6] S. Lachowicz and H-J. Pfleiderer, 'Fast evaluation oI nonlinear
Iunctions using FPGAs, Proc. 4
th
Intl. Symposium on Electronic
Design, Test and Applications DELTA 2008, Hong Kong, Jan 2008.
Figure 6. An RTL schematic of the Architecture 1 NCO
12+