A Ce Lte Almmse Impl
A Ce Lte Almmse Impl
A Ce Lte Almmse Impl
, Di Wu
, Johan Eilert
T
, (2)
x = P x, (3)
X = diag (x) . (4)
The vectors y, h and w in Eq. (1) can be divided according
to Eq. (2) into two parts:
1) a part corresponding to the pilot symbol positions,
2) a part corresponding to the remaining data symbol
positions.
III. CHANNEL ESTIMATION IN LTE
In this section, the different types of channel estimation
techniques considered in this paper are explained.
A. LS
The Least-Squares (LS) channel estimator for subcarriers
on which pilot symbols are located, is given by
h
LS
p
= X
H
p
y
p
. (5)
The remaining channel coefcients have to be obtained by
interpolation. In this work, we apply linear interpolation.
B. LMMSE
The Linear Minimum-Mean-Square-Error (LMMSE) chan-
nel estimator requires the knowledge of the second order
statistics of the channel and the noise. It performs better
than the LS estimator, but it requires higher computational
complexity. The LMMSE channel estimate can be obtained
by ltering the LS estimate
h
LMMSE
= R
h,h
p
_
R
h
p
,h
p
+
2
w
I
_
1
h
LS
p
, (6)
where the R
h
p
,h
p
is the autocorrelation matrix of the channel
at the pilot symbols position and R
h,h
p
is the crosscorrelation
matrix between the channel at the data symbol positions and
the channel at the pilot symbol position.
C. ALMMSE
The performance of the LMMSE estimator is in general
superior to that of the LS estimator [1] at the cost of higher
computational complexity because of the matrix inversion
in Eq. (6). In a real-time implementation, a reduction of
complexity is desired while preserving the performance of
the LMMSE estimator. In this section, we discuss a low
complexity estimator originally proposed in [2], where the
authors applied this estimator in WiMAX. The main difference
between LTE and WiMAX from the channel estimation point
of view is that LTE utilizes distributed pilot symbols for the
channel estimation instead of a preamble utilized in WiMAX
[4], [5]. Consequently, the ALMMSE estimator presented in
[2] has to be adopted for the application in LTE.
The two main ideas of the ALMMSE estimator in [2] are:
1) Calculate the LMMSE ltering matrix by using only the
correlation between L neighboring subcarriers instead of
the full correlation between all subcarriers as in the case
of LMMSE estimation.
2) Assume that the correlation is frequency independent
and estimate a full rank L L autocorrelation matrix
utilizing the LS channel estimate.
The ALMMSE algorithm adapted for LTE consists of the
following steps:
1) Choose the correlation length L that denes the dimen-
sion of
R
(L)
h
to be L L. Due to the pilot structure
in LTE, L is bounded by 3 L K
sub
(K
sub
is the number of subcarriers). A small L is generally
desirable from the complexity point of view. However,
with increasing L also the performance of the estimator
will improve. If L = K
sub
is chosen, the ALMMSE
estimator is equal to the LMMSE estimator.
2) Choose the interval I
k
of L consecutive subcarrier in-
dices according to the following rule (k is the subcarrier
index of the channel coefcient to be estimated):
I
k
=
_
_
_
[1, . . . , L] ; k
L+1
2 _
k
L1
2
, . . . , k +
L1
2
; otherwise
[K
sub
L + 1, , K
sub
] ; k K
sub
L1
2
(7)
Let h
(I
k
)
be the channel vector for the subcarriers from
the chosen interval I
k
h
(I
k
)
=
_
h
I
k
(1)
, . . . , h
I
k
(L)
T
. (8)
3) Find the K
(L)
p
=
L
3
subcarriers on which the pilot
symbols are located within the chosen interval I
k
. Let
h
(I
k
)
p
be the vector of channel coefcients on the pilot
symbol positions.
4) Create a permutation matrix P of dimension LL with
h
(I
k
)
=
_
h
(I
k
)
p
T
h
(I
k
)
d
T
_
T
= P
T
h
(I
k
)
, (9)
where h
(I
k
)
d
is the channel vector on the data positions
within the chosen interval I
k
.
5) Permute
R
(L)
h
with P
R
(L)
h
= P
T
R
(L)
h
P. (10)
6) Extract
R
(L)
h
LS
and
R
(L)
h,h
LS
from
R
(L)
h
as
R
(L)
h
LS
=
_
R
(L)
h
_
K
(L)
p
,K
(L)
p
, (11)
R
(L)
h,h
LS
=
_
R
(L)
h
_
L,K
(L)
p
. (12)
(The operator (A)
M,N
creates a submatrix of matrix A
which is given by the rst M rows and rst N columns).
7) Calculate the ltering matrix
F
(L)
F
(L)
=
R
(L)
h,h
LS
_
R
(L)
h
LS
+
2
w
I
_
1
. (13)
8) Obtain an estimate of the channel coefcients by multi-
plying (ltering) the LS estimate on the pilot positions
from the chosen interval I
k
with
F
(L)
and permuting
(multiplying by P
T
). Finally, the k-th element has to be
selected
q = P
T
F
(L)
. .
F
l
h
(I
k
)
LS
(14)
h
ALMMSE,k
=
_
_
_
[q]
k
; k
L+1
2
[q]
L+1
2
; otherwise
[q]
L+kK
sub
; k K
sub
L1
2
(15)
[q]
k
means that the k-th element of vector q is selected.
Due to the fact, that in LTE at least one pilot symbol
is transmitted every third subcarrier, just three different
permutations of the matrix
F
(L)
have to be calculated,
if the channel is quasi-static within one OFDM symbol.
IV. PROCESSING FLOW
The algorithm presented in Sec. III-C needs further op-
timizations for real-time implementation. As illustrated in
Fig. 2, the processing ow of the LTE channel estimator
includes both LS and ALMMSE channel estimation. The rst
stage is the scaling in Eq. (5), which computes the channel
response on pilot symbol subcarriers. ALMMSE is only used
when the UE is in low-speed mode (the mode in which
MIMO is more likely to be used). When the UE velocity is
higher than the threshold velocity K
v
, LS channel estimation
is used. When the UE velocity is lower than K
v
, ALMMSE
estimation is used. Meanwhile, to estimate the correlation
matrix
R
(L)
h
LS
and
R
(L)
h,h
LS
, a number of subframes (those
with subframe number < N after the UE enters Connected
Mode) will be used as training. During the training phase,
the channel estimator works in LS mode and calculates the
correlation matrices using the result of LS estimation. The
updated correlation matrix is stored in a buffer and updated
every subframe. The computation of
R
(L)
h
LS
and
R
(L)
h,h
LS
is only
needed when the training phase is nished and the estimator
enters ALMMSE mode.
In ALMMSE mode, three coefcient matrices F
1
, F
2
, F
3
need to be computed from the correlation matrices. This
involves the matrix inversion of 4 4 matrices (for 4 2 and
22 MIMO) assuming L = 12. However, such an operation is
only needed when the SNR changes signicantly. If the SNR
estimation is subframe based, it will not change within one
subframe. Thus, the major computational cost of ALMMSE
estimation is the matrix multiplication in Eq. (14).
scaling
FIR interpolation
YES
YES
YES
YES
NO
NO
NO
NO
LS ALMMSE
||speed > K
v
connected_time <= N
Calculate ltering
matrices F
1
, F
2
, F
3
Calculate
||SNR
new
-SNR
old
||<=
speed > K
v
Update the correlation
matrix
connected_time==N
Calculate
and
R
h
(L)
R
h,h
(L)
LS
R
h
(L)
LS
h
ALMMSE,k
h
LS
p
. The main operation involved is matrix-vector multiplica-
tion. The IU mainly consists of a 16-bit xed point four-way
Single Instruction Multiple Data (SIMD) Complex Multiply-
ACcumulate (CMAC) unit. Thanks to the regularity of the LTE
RS locations, the interpolation coefcients can be efciently
stored in a small look-up table to reduce the hardware cost.
The implementation of the four-way SIMD CMAC unit is
similar to the one presented in [6].
Note that in this paper, the channel estimator is designed
as a hardware accelerator to focus on the signal processing of
channel estimation and make it easier to quantize the silicon
cost. For more exibility and hardware reuse, the channel
estimator can also be mapped to a baseband DSP processor
such as [16] in pure software.
VI. LINK-LEVEL PERFORMANCE
In order to evaluate the performance of the channel esti-
mation algorithms, simulation is carried out using a standard-
compliant 3GPP LTE simulator [3]. The simulator is imple-
mented partly in Matlab and partly in C. It includes the com-
plete physical layer signal processing such as timing/frequency
synchronization [15], channel estimation, subcarrier demap-
ping, rate-matching [14] and turbo decoding. H-ARQ [13]
based on the CRC of coded blocks is also enabled to support
up to three retransmissions. Ped-B [7] is selected as channel
model. It is assumed that the channel is quasi-static within
one OFDM symbol duration. The bandwidth is set to 5 MHz
in the simulation, the velocity of user equipment is 3 km/h.
The parameter L is set to 12. Perfect synchronization and
ML detection is assumed to focus the simulation on channel
estimation performance.
In Fig. 4, the throughput result are shown in the case a 16-
bit oating point datatype is used, the performance severely
degrades at high SNR compared to IEEE double precision.
This is mainly due to the fact the matrices involved in the
ALMMSE processing are nearly singular and require a suf-
ciently high numerical precision. When 64-bit oating point
is used, no degradation is observed. In order to minimize the
number of bits and the hardware cost, SNR under-estimation
(regularization) is used, which sets a xed
2
value when the
SNR is higher than a threshold (12 dB in this paper). This also
reduces the amount of processing needed in ALMMSE. The
result shows that a 16-bit datatype with SNR under-estimation
incur a negligible degradation compared to 64-bit processing
without SNR under-estimation.
5 10 15 20
0
2
4
6
8
10
12
14
16
18
SNR [dB]
T
h
r
o
u
g
h
p
u
t
[
M
b
i
t
/
s
]
PERFECT
LMMSE
ALMMSE 64-bit
ALMMSE 16-bit
regularized
ALMMSE 16-bit
LS
Fig. 4: Coded Throughput (rate 0.602, 16-QAM)
Fig. 5 shows the MSE for different LTE channel estimators.
The LMMSE channel estimator outperforms the remaining
channel estimators. However, its hardware implementation
requires the most computational power. The ALMMSE chan-
nel estimator using IEEE double precision shows 4 dB SNR
improvement when compared to the LS channel estimator.
Unfortunately, implementation using IEEE double precision
is too costly. The ALMMSE channel estimator using 16 bit
implementation is at high SNR unstable. However, using SNR
underestimation the performance is close the the ALMMSE
with IEEE double precision. This channel estimator offer good
performance-complexity tradeoff.
VII. IMPLEMENTATION
The channel estimator is implemented using the ST CMOS
65 nm process libraries and Synopsys low-power design ow.
5 10 15 20
10
-4
10
-3
10
-2
10
-1
10
0
SNR [dB]
M
S
E
LMMSE
ALMMSE 64-bit
ALMMSE 16-bit
regularized
ALMMSE 16-bit
LS
Fig. 5: MSE of the different channel estimator
Table I depicts the synthesized gate count, and working
frequency.
Area of Controller (kgate) 17
Area of IU (kgate) 32
Total Area (kgate) 49
Working Frequency (MHz) 200
TABLE I: Implementation Cost Estimate
In LTE, assuming the channel estimation has to be per-
formed every subframe (1 ms), up to 1200 channel coefcients
have to be estimated per transmit-receive antenna pair to
support the full 20 MHz downlink bandwidth. The proposed
architecture running at 200 MHz can handle ALMMSE esti-
mation for up to 4 2 MIMO systems in real-time.
VIII. CONCLUSIONS
The result shows that algorithm-architecture cooptimization
can further simplify the ALMMSE channel estimation al-
gorithm. A short wordlength can be used to allow a low
cost ASIC implementation. With SNR under-estimation, a
16-bit oating-point datatype provides sufcient precision to
support the 44 matrix inversion involved in ALMMSE with
negligible degradation of performance. A trade-off between
performance and complexity has been reached at a feasible
silicon cost with a 1 dB throughput gain compared to the LS
channel estimation.
IX. ACKNOWLEDGEMENT
This work has been funded by the Christian Doppler Labora-
tory for Wireless Technologies for Sustainable Mobility under the
supervision of Christoph Mecklenbr auker. The authors thank their
industrial partners A1 Telekom Austria AG and KATHREIN-Werke
KG. Furthermore, the nancial support by the Federal Ministry
of Economy, Family and Youth and the National Foundation for
Research, Technology and Development is gratefully acknowledged.
The work of J. Eilert, D. Wu and D. Liu was funded by EU FP7
MultiBase Project in partnership with Ericsson AB et al.
REFERENCES
[1] Q. Wang, D. Wu, J. Eilert and D. Liu, Cost Analysis of Channel
Estimation in MIMO-OFDM for Software Dened Radio, in
Proc. IEEE Wireless Communications & Networking Conference,
April 2008.
[2] C. Mehlf uhrer, S. Caban, M. Rupp, An Accurate and Low
Complex Channel Estimator for OFDM WiMAX, in Proc. IEEE
ISCCSP, March 2008.
[3] C. Mehlf uhrer, M. Wrulich, J. C. Ikuno, D. Bosanska and M.
Rupp, Simulating the Long Term Evolution Physical Layer,
in Proc. of the 17th European Signal Processing Conference
(EUSIPCO 2009), Aug. 2009, Glasgow, Scotland
[4] 3GPP, Technical Specication Group Radio Access Network;
Evolved Universal Terrestrial Radio Access (E-UTRA); Physical
Channels and Modulation (Tech. Spec. 36.211 V8.4.0), Sept 2008
[5] IEEE, IEEE Standard for Local and Metropolitan Area Net-
works Part 16: Air Interface for Fixed Broadband Wireless Access
Systems, 2004
[6] J. Eilert, D. Wu, D. Liu, Implementation of a Programmable
Linear MMSE Detector for MIMO-OFDM, in Proc. IEEE
ICASSP, 2008
[7] ITU, Recommendation ITU-R M.1225: Guidelines for Evalua-
tion of Radio Transmission Technologies for IMT- 2000 Systems,
1998
[8] M.
Simko, C. Mehlf uhrer, M. Wrulich and M. Rupp, Doubly
Dispersive Channel Estimation with Scalable Complexity, in
Proc. IEEE WSA, 2010
[9] S. Haene, D. Perels, A. Burg, A Real-Time 4-Stream MIMO-
OFDM Transceiver: System Design, FPGA Implementation, and
Characterization, IEEE Journal on Selected Areas in Communi-
cations, vol.26, no.6, pp.877-889, August 2008
[10] J. Lofgren, S. Mehmood, N. Khan, B. Masood, M. Awan,
I. Khan, N.A. Chisty, P. Nilsson, Hardware implementation
of an SVD based MIMO OFDM channel estimator, in Proc.
NORCHIP, 2009
[11] J. Berkmann,C. Carbonelli, F.Dietrich, C.Drewes, W. Xu, On
3G LTE Terminal Implementation - Standard, Algorithms, Com-
plexities and Challenges, in Proc. IWCMC, 2008
[12] M.
Simko, C. Mehlf uhrer, T. Zemen and M. Rupp, Inter-
Carrier Interference Estimation in MIMO OFDM Systems with
Arbitrary Pilot Structure, in Proc. IEEE VTC Spring, 2011
[13] J. C. Ikuno, C. Mehlf uhrer and M. Rupp, A Novel LEP Model
for OFDM Systems with HARQ, in Proc. IEEE ICC, 2011
[14] J. C. Ikuno, S. Schwarz and M.
Simko, LTE Rate Matching
Performance with Code Block Balancing, in Proc. European
Wireless Conference (EW), 2011
[15] Q. Wang, C. Mehlf uhrer and M. Rupp, Carrier Frequency
Synchronization in the Downlink of 3GPP LTE, in Proc. IEEE
PIMRC, 2010
[16] A. Nilsson, E. Tell, and D. Liu, An 11 mm
2
70 mW Fully-
Programmable Baseband Processor for Mobile WiMAX and DVB-
T/H in 0.12m CMOS, in IEEE Journal of Solid-State Circuits,
vol. 44, no. 1, pp. 90-97, 2009