This document discusses using support vector machines (SVM) for nonlinear equalization. It introduces SVM techniques, describing how SVMs find optimal separating hyperplanes to perform classification. The document presents a system model using an SVM equalizer for a nonlinear channel. Simulation results show decision boundaries and bit error rate performance for the SVM equalizer under different configurations and noise scenarios. It is found that a bank of SVMs, each trained for a different signal-to-noise ratio, better handles unknown channel and SNR conditions compared to a single SVM.
1 of 28
More Related Content
Support Vector Machine Techniques for Nonlinear Equalization
2. Contents
I. Detection and Equalization
II. Support Vector Machine (SVM) technique
III. System Model
IV. Simulation Results – Decision Boundaries
V. BER Analysis
VI. Summary
3. Equalization – Non Linear
Equalization
Equalization
• Remove ISI and noise effects of the channel
• Located at the receiver
Severe channel effects, linear equalization
methods suffer – Noise enhancement
• Premise for Non-linear equalization
Non-linear equalization challenges
• Architectures maybe unmanageably complex
• Loss of information – nonlinear system maybe non-
invertible
• Computationally intensive
Why not think of it as a classification problem?
4. Why SVM
Train with small amounts of data
Training is straightforward
• Less ad hoc input from designer
Detection stage is efficient
Results Comparable to Volterra filters and
neural Networks
• Volterra filters – dimension grows quickly
• Neural networks – parameters of networks
determined in an ad-hoc fashion
5. Intro to SVM
Separate clouds of data using an optimal
hyperplane
Maximum margin classifiers don’t work well
with outliers
Outlier
Margin
Low Bias
High Variance
6. Intro to SVM
Soft Margins and Outliers
Separate clouds of data using an optimal
hyperplane
Maximum margin classifiers don’t work well
with outliers – Support vector classifiers do
Higher Bias
Low Variance
Allow for misclassifications
Soft Margin
7. Intro to SVM
Linear Classifier Limited
Separate clouds of data using an optimal hyperplane
In 2-Dimensions, the support vector classifier is a line
Support vector machines – Deal with data with high amounts of
overlaps
No matter where you place
the margin, you will obtain a
lot of errors
2 categories, but no obvious
linear classifier to separate
them.
8. Intro to SVM
Non Linear Classifier
Separate clouds of data using an optimal hyperplane
In 2-Dimensions, the support vector classifier is a line
Support vector machines – Deal with data with high amounts of
overlaps – non linear mapping from the pattern space to higher
dimensional feature space to create linearly separable clouds of
data
Move data to
higher dimension
Kernel Functions – Find
support vector classifiers in
higher dimensions
9. Support Vector Machines
Hyperplanes and decision criteria
Objective – Find the weights (w) and bias (b) to define a hyperplane:
𝐰𝑇𝐱 + 𝑏 = 0
Optimal Hyperplane – A
hyperplane for which the
margin of separation is
maximized
Margin of separation is
maximum when norm of
the weight is minimized.
𝑑+
𝑑−
10. Support Vector Machines
Lagrangian Optimization Problem
Primal Problem
min 𝐿𝑝 =
1
2
𝐰 2 −
𝑖=1
𝑙
𝑎𝑖𝑦𝑖 𝐱𝑖 ∙ 𝐰 + 𝑏 +
𝑖=1
𝑙
𝑎𝑖
Dual Optimization problem
max 𝐿𝑑(𝑎𝑖) = 𝑎𝑖 −
1
2
𝑎𝑖𝑎𝑗𝑦𝑖𝑦𝑗𝐾(𝑥𝑖, 𝑥𝑗)
Under Constraints
𝑖=1
𝑙
𝑎𝑖𝑦𝑖 = 0 and 0 ≤ 𝑎𝑖 ≤ 𝐶
Why the dual?
Let’s us solve the problem by computing just the inner
products
𝐾(∙,∙) : Kernel
Polynomial
𝐾 𝑥, 𝑦 = 𝑥 ∙ 𝑦 + 1 𝑝
Radial Basis Function
𝐾 𝑥, 𝑦 = exp
− 𝑥 − 𝑦 2
2𝜎2
Sigmoid Function
𝐾 𝑥, 𝑦 = tanh 𝜅𝑥 ⋅ 𝑦 − 𝛿
11. SVM Classification
Equalization
𝑦 = 𝑠𝑖𝑔𝑛 𝑓 𝐱
𝑦 : Estimate to the classification
𝑓 𝐱 = i∈S αi yiΦ xi ∙ Φ x + b = i∈S αi yi𝐾 xi, x + b
• {𝛼i} – Lagrange Multipliers
• 𝑆 – Set of indices for which 𝑥𝑖 is a support vector
• 𝐾 ∙,∙ - Kernel satisfying conditions of Mercer’s theorem
• 𝑏 – Affine offset
Training set consists of
𝐱𝐢 ∈ 𝐑𝐌
𝐲i ∈ {−1,1}, i = 1, … , L
12. System Model
NN {∙} SVMM
𝑧−𝐷
𝑢 𝑛 ∈ {±1}
𝒚𝒏
𝑓(𝐱)
𝑥(𝑛)
NN {∙} – Nonlinear system
𝑥(𝑛) – Nonlinear system output
𝑢(𝑛) – Training sequence
𝑦𝑛 – Desired output (delayed version of training sequence)
19. Decision Boundaries and SNR
Polynomial Kernel
K 𝐱, 𝐳 = 𝐱T
𝐳 + 1
d
d = polynomial order
All polynomials up to degree d
For our simulation, d = 3
𝐱T
𝐳 + 1
d
= O n computation
Feature space might be non −
unique
20. Decision Boundaries and SNR
RBF Kernel
K 𝐱, 𝐳 = exp −γ 𝐱 − 𝐳 2
2
Infinite dimensional space
Parameter = γ
As γ increases, the model
overfits
As 𝛾 decreases, the model
underfits
For our simulation, 𝛾 = 1
21. Decision Boundaries and SNR
Sigmoid Kernel
K 𝐱, 𝐳 = tanh k𝐱T
𝐳 − δ
k = slope
δ = intercept
For our simulation,
• k = 10, δ = 10
Sigmoidal kernels can be
thought of multi-layer
perceptron
24. Offline Training
Generalization over different SNRs
Training SNRs = 1: 20 dB
Testing SNRs = 1: 20 dB
Does not generalize well
over different SNR values
and multiple channels
25. SVM-Bank for Different SNR signals
SVM (SNR1)
SVM (SNR2)
SVM (SNR𝑁)
Noise
Variance
Estimator
NN {∙}
𝑢 𝑛 ∈ {±1} 𝑥(𝑛)
𝑢(𝑛)
27. Summary
We looked at SVM as a Dual Lagrangian Optimization Problem
and how it fits in non-linear equalization problem.
We developed a non-linear channel communication system and
applied SVM equalizer.
For different values of Detector Delay (D) and SVM kernels, we
found different BER performance of the SVM equalizer.
For Unknown SNR, the SVM equalizer does not generalize well
to unknown channel and unknown SNR.
To solve the issue of SNR, we proposed a bank of SVM with SVM
models trained with different SNR values. After receiving the
signal, noise variance estimator block will select the desired
SVM model for equalization.
We look at the received signals. In the most simplest form, using a two tap channel filter, the symbols will appear in two dimensional space. And using ground values that we know, which are the transmitted symbols, we come up with a decision boundary, so that we can predict future symbols based on that boundary.
Voltera filters – a multiplicative structure creates cross-products of all filter states. These cross-products are then weighted linearly and the proble is to find the optimum weighting that minimizes some cost. The dimension of the model grows quickly, and it becomes necessary to apply some sort of heuristic to limit that model.
Neural networks – iterative, gradient-descent like algorithm that is not guaranteed to find a global optimum but may tend to a local optimum. Neural networks are susceptible to overtraining and that the number of layers, the number of neurons per layer, and when to stop adapting must be determined in an ad hoc fashion.
Support vectors – data points near the optimal hyperplane
Soft margin – the margin when we allow for misclassifications
How do we choose the soft margin? Why should a margin be between those particular blue and green circles? For that we look at the relationship between all data points. We use cross validation to determine how many misclassifications and observations to allow inside of the soft margin to get the best classification. If for validation data, the performance is best when working with the green and blue data points as shown in the lower figure, then we would allow one misclassification. Using a soft margin classifier means using a support vector classifier.
The data points (observations) on the edge and within the soft margin are called support vectors.
Soft margin – the margin when we allow for misclassifications
How do we choose the soft margin? Why should a margin be between those particular blue and green circles? For that we look at the relationship between all data points. We use cross validation to determine how many misclassifications and observations to allow inside of the soft margin to get the best classification. If for validation data, the performance is best when working with the green and blue data points as shown in the lower figure, then we would allow one misclassification. Using a soft margin classifier means using a support vector classifier.
The data points (observations) on the edge and within the soft margin are called support vectors.
There are different types of kernels that can be used. This paper uses a polynomial kernel. A polynomial kernel systematically increases dimensions by setting d, the degree of the polynomial and the relationship between each pair of points is used to find a support vector classifier in that dimension. A good value of the degree is obtained by cross validation.
By differentiating the primal problem and equating to zeros, we get solutions for weight vector w and the bias b, which when we substitute to the original problem gives us the dual form below, where there is no dependence on w and b anymore.
What is a kernel and what are the different options for it
M – Equalizer dimension, D – Lag, d – Polynomial Kernel Order
The Decision boundary similar to the optimum and is logical in terms of the training data. The optimum for this example includes a disconnected region, but the SVM cannot match the polygon nature of the optimum.