Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Support Vector Machine
Techniques for Nonlinear
Equalization
Bhaskar Raj Upadhyay
Shamman Noor Shoudha
Contents
I. Detection and Equalization
II. Support Vector Machine (SVM) technique
III. System Model
IV. Simulation Results – Decision Boundaries
V. BER Analysis
VI. Summary
Equalization – Non Linear
Equalization
Equalization
• Remove ISI and noise effects of the channel
• Located at the receiver
Severe channel effects, linear equalization
methods suffer – Noise enhancement
• Premise for Non-linear equalization
Non-linear equalization challenges
• Architectures maybe unmanageably complex
• Loss of information – nonlinear system maybe non-
invertible
• Computationally intensive
Why not think of it as a classification problem?
Why SVM
Train with small amounts of data
Training is straightforward
• Less ad hoc input from designer
Detection stage is efficient
Results Comparable to Volterra filters and
neural Networks
• Volterra filters – dimension grows quickly
• Neural networks – parameters of networks
determined in an ad-hoc fashion
Intro to SVM
Separate clouds of data using an optimal
hyperplane
Maximum margin classifiers don’t work well
with outliers
Outlier
Margin
Low Bias
High Variance
Intro to SVM
Soft Margins and Outliers
Separate clouds of data using an optimal
hyperplane
Maximum margin classifiers don’t work well
with outliers – Support vector classifiers do
Higher Bias
Low Variance
Allow for misclassifications
Soft Margin
Intro to SVM
Linear Classifier Limited
 Separate clouds of data using an optimal hyperplane
 In 2-Dimensions, the support vector classifier is a line
 Support vector machines – Deal with data with high amounts of
overlaps
No matter where you place
the margin, you will obtain a
lot of errors
2 categories, but no obvious
linear classifier to separate
them.
Intro to SVM
Non Linear Classifier
 Separate clouds of data using an optimal hyperplane
 In 2-Dimensions, the support vector classifier is a line
 Support vector machines – Deal with data with high amounts of
overlaps – non linear mapping from the pattern space to higher
dimensional feature space to create linearly separable clouds of
data
Move data to
higher dimension
Kernel Functions – Find
support vector classifiers in
higher dimensions
Support Vector Machines
Hyperplanes and decision criteria
 Objective – Find the weights (w) and bias (b) to define a hyperplane:
𝐰𝑇𝐱 + 𝑏 = 0
Optimal Hyperplane – A
hyperplane for which the
margin of separation is
maximized
Margin of separation is
maximum when norm of
the weight is minimized.
𝑑+
𝑑−
Support Vector Machines
Lagrangian Optimization Problem
Primal Problem
min 𝐿𝑝 =
1
2
𝐰 2 −
𝑖=1
𝑙
𝑎𝑖𝑦𝑖 𝐱𝑖 ∙ 𝐰 + 𝑏 +
𝑖=1
𝑙
𝑎𝑖
Dual Optimization problem
max 𝐿𝑑(𝑎𝑖) = 𝑎𝑖 −
1
2
𝑎𝑖𝑎𝑗𝑦𝑖𝑦𝑗𝐾(𝑥𝑖, 𝑥𝑗)
Under Constraints
𝑖=1
𝑙
𝑎𝑖𝑦𝑖 = 0 and 0 ≤ 𝑎𝑖 ≤ 𝐶
Why the dual?
Let’s us solve the problem by computing just the inner
products
𝐾(∙,∙) : Kernel
Polynomial
𝐾 𝑥, 𝑦 = 𝑥 ∙ 𝑦 + 1 𝑝
Radial Basis Function
𝐾 𝑥, 𝑦 = exp
− 𝑥 − 𝑦 2
2𝜎2
Sigmoid Function
𝐾 𝑥, 𝑦 = tanh 𝜅𝑥 ⋅ 𝑦 − 𝛿
SVM Classification
Equalization
𝑦 = 𝑠𝑖𝑔𝑛 𝑓 𝐱
 𝑦 : Estimate to the classification
 𝑓 𝐱 = i∈S αi yiΦ xi ∙ Φ x + b = i∈S αi yi𝐾 xi, x + b
• {𝛼i} – Lagrange Multipliers
• 𝑆 – Set of indices for which 𝑥𝑖 is a support vector
• 𝐾 ∙,∙ - Kernel satisfying conditions of Mercer’s theorem
• 𝑏 – Affine offset
 Training set consists of
 𝐱𝐢 ∈ 𝐑𝐌
 𝐲i ∈ {−1,1}, i = 1, … , L
System Model
NN {∙} SVMM
𝑧−𝐷
𝑢 𝑛 ∈ {±1}
𝒚𝒏
𝑓(𝐱)
𝑥(𝑛)
 NN {∙} – Nonlinear system
 𝑥(𝑛) – Nonlinear system output
 𝑢(𝑛) – Training sequence
 𝑦𝑛 – Desired output (delayed version of training sequence)
Nonlinear Transmission System
PAM
nonlinear
channel
SVM equalizer
𝑧−1
𝑢 𝑛
𝑒(𝑛)
 𝑥(𝑛) – Nonlinear channel output
 𝑒(𝑛) – Additive Noise
 (𝑀 − 1) – Feed Forward Delay (No of past channel outputs utilized)
 𝑢(𝑛 − 𝐷) – Equalizer detection output (goal to mimic 𝑢(𝑛 − 𝐷))
+ 𝑧−1
𝑥 𝑛 𝑥 𝑛 − 1 𝑥 𝑛 − 𝑀 + 1
𝑥 𝑛
𝑢 𝑛 − 𝐷
System Structure and Parameters
𝑥 𝑛 = 𝑥 𝑛 − 0.9𝑥3 𝑛
𝑥 𝑛 = 𝑢 𝑛 + 0.5𝑢 𝑛 − 1
𝑒 𝑛 ~𝑁 0, 𝜎𝑒
2 → 𝑁(0,0.2)
SVM Parameters
• C = 5 (constraint)
• d = 3 (equalizer kernel order)
• M = 2 (equalizer dimension)
• Kernel = Polynomial
Simulation Results
Typical classification regions of an SVM
𝐷 = 0 𝐷 = 1 𝐷 = 2
 𝑥 𝑛 = 𝑥 𝑛 − 0.9𝑥3 𝑛
 𝑥 𝑛 = 𝑢 𝑛 + 0.5𝑢 𝑛 − 1
 𝑒 𝑛 ~𝑁 0, 𝜎𝑒
2 → 𝑁(0,0.2)
 C = 5 (constraint)
 d = 3 (equalizer kernel order)
 M = 2 (equalizer dimension)
 Kernel = Polynomial
Results – Decision Boundaries
Colored Noise
Correlated
Noise – SVM
 𝐶𝑜𝑟𝑟𝑀𝑎𝑡 = 𝜎𝑒
2 1 𝜌
𝜌 1
 𝜌 = 0.48
 𝑀 = 2, 𝐷 = 0,𝑑 = 3
 𝑥 𝑛 = 𝑥 𝑛 + 0.1𝑥2
+ 0.05𝑥3
 𝑥 𝑛 = 0.5𝑢 𝑛 + 𝑢(𝑛 − 1)
 𝜎𝑒
2
= 0.2
Correlated Noise –
Optimum
Ref: Chen et al
Results – BER
Colored Noise Vs AWGN
Results – BER
For different values of D=0, 1, 2
Decision Boundaries and SNR
Polynomial Kernel
 K 𝐱, 𝐳 = 𝐱T
𝐳 + 1
d
 d = polynomial order
 All polynomials up to degree d
 For our simulation, d = 3
 𝐱T
𝐳 + 1
d
= O n computation
 Feature space might be non −
unique
Decision Boundaries and SNR
RBF Kernel
 K 𝐱, 𝐳 = exp −γ 𝐱 − 𝐳 2
2
 Infinite dimensional space
 Parameter = γ
 As γ increases, the model
overfits
 As 𝛾 decreases, the model
underfits
 For our simulation, 𝛾 = 1
Decision Boundaries and SNR
Sigmoid Kernel
 K 𝐱, 𝐳 = tanh k𝐱T
𝐳 − δ
 k = slope
 δ = intercept
 For our simulation,
• k = 10, δ = 10
 Sigmoidal kernels can be
thought of multi-layer
perceptron
Results – BER
For different SVM Kernels
Offline Training
Generalization over different channels
 𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑥 𝑛 − 0.9𝑥3
𝑛
 𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑢 𝑛 + 𝟎. 𝟗𝑢 𝑛 − 1
 𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑥 𝑛 − 0.9𝑥3
𝑛
 𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑢 𝑛 + 𝟎. 𝟓𝑢 𝑛 − 1
 𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑥 𝑛 − 𝟎. 𝟓𝑥3 𝑛
 𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑢 𝑛 + 0.6𝑢 𝑛 − 1
 𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑥 𝑛 − 𝟎. 𝟑𝑥3 𝑛
 𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑢 𝑛 + 0.6𝑢 𝑛 − 1
Offline Training
Generalization over different SNRs
 Training SNRs = 1: 20 dB
 Testing SNRs = 1: 20 dB
 Does not generalize well
over different SNR values
and multiple channels
SVM-Bank for Different SNR signals
SVM (SNR1)
SVM (SNR2)
SVM (SNR𝑁)
Noise
Variance
Estimator
NN {∙}
𝑢 𝑛 ∈ {±1} 𝑥(𝑛)
𝑢(𝑛)
Results – BER
For Bank of SVM
Summary
We looked at SVM as a Dual Lagrangian Optimization Problem
and how it fits in non-linear equalization problem.
We developed a non-linear channel communication system and
applied SVM equalizer.
For different values of Detector Delay (D) and SVM kernels, we
found different BER performance of the SVM equalizer.
For Unknown SNR, the SVM equalizer does not generalize well
to unknown channel and unknown SNR.
To solve the issue of SNR, we proposed a bank of SVM with SVM
models trained with different SNR values. After receiving the
signal, noise variance estimator block will select the desired
SVM model for equalization.
Thank You

More Related Content

Support Vector Machine Techniques for Nonlinear Equalization

  • 1. Support Vector Machine Techniques for Nonlinear Equalization Bhaskar Raj Upadhyay Shamman Noor Shoudha
  • 2. Contents I. Detection and Equalization II. Support Vector Machine (SVM) technique III. System Model IV. Simulation Results – Decision Boundaries V. BER Analysis VI. Summary
  • 3. Equalization – Non Linear Equalization Equalization • Remove ISI and noise effects of the channel • Located at the receiver Severe channel effects, linear equalization methods suffer – Noise enhancement • Premise for Non-linear equalization Non-linear equalization challenges • Architectures maybe unmanageably complex • Loss of information – nonlinear system maybe non- invertible • Computationally intensive Why not think of it as a classification problem?
  • 4. Why SVM Train with small amounts of data Training is straightforward • Less ad hoc input from designer Detection stage is efficient Results Comparable to Volterra filters and neural Networks • Volterra filters – dimension grows quickly • Neural networks – parameters of networks determined in an ad-hoc fashion
  • 5. Intro to SVM Separate clouds of data using an optimal hyperplane Maximum margin classifiers don’t work well with outliers Outlier Margin Low Bias High Variance
  • 6. Intro to SVM Soft Margins and Outliers Separate clouds of data using an optimal hyperplane Maximum margin classifiers don’t work well with outliers – Support vector classifiers do Higher Bias Low Variance Allow for misclassifications Soft Margin
  • 7. Intro to SVM Linear Classifier Limited  Separate clouds of data using an optimal hyperplane  In 2-Dimensions, the support vector classifier is a line  Support vector machines – Deal with data with high amounts of overlaps No matter where you place the margin, you will obtain a lot of errors 2 categories, but no obvious linear classifier to separate them.
  • 8. Intro to SVM Non Linear Classifier  Separate clouds of data using an optimal hyperplane  In 2-Dimensions, the support vector classifier is a line  Support vector machines – Deal with data with high amounts of overlaps – non linear mapping from the pattern space to higher dimensional feature space to create linearly separable clouds of data Move data to higher dimension Kernel Functions – Find support vector classifiers in higher dimensions
  • 9. Support Vector Machines Hyperplanes and decision criteria  Objective – Find the weights (w) and bias (b) to define a hyperplane: 𝐰𝑇𝐱 + 𝑏 = 0 Optimal Hyperplane – A hyperplane for which the margin of separation is maximized Margin of separation is maximum when norm of the weight is minimized. 𝑑+ 𝑑−
  • 10. Support Vector Machines Lagrangian Optimization Problem Primal Problem min 𝐿𝑝 = 1 2 𝐰 2 − 𝑖=1 𝑙 𝑎𝑖𝑦𝑖 𝐱𝑖 ∙ 𝐰 + 𝑏 + 𝑖=1 𝑙 𝑎𝑖 Dual Optimization problem max 𝐿𝑑(𝑎𝑖) = 𝑎𝑖 − 1 2 𝑎𝑖𝑎𝑗𝑦𝑖𝑦𝑗𝐾(𝑥𝑖, 𝑥𝑗) Under Constraints 𝑖=1 𝑙 𝑎𝑖𝑦𝑖 = 0 and 0 ≤ 𝑎𝑖 ≤ 𝐶 Why the dual? Let’s us solve the problem by computing just the inner products 𝐾(∙,∙) : Kernel Polynomial 𝐾 𝑥, 𝑦 = 𝑥 ∙ 𝑦 + 1 𝑝 Radial Basis Function 𝐾 𝑥, 𝑦 = exp − 𝑥 − 𝑦 2 2𝜎2 Sigmoid Function 𝐾 𝑥, 𝑦 = tanh 𝜅𝑥 ⋅ 𝑦 − 𝛿
  • 11. SVM Classification Equalization 𝑦 = 𝑠𝑖𝑔𝑛 𝑓 𝐱  𝑦 : Estimate to the classification  𝑓 𝐱 = i∈S αi yiΦ xi ∙ Φ x + b = i∈S αi yi𝐾 xi, x + b • {𝛼i} – Lagrange Multipliers • 𝑆 – Set of indices for which 𝑥𝑖 is a support vector • 𝐾 ∙,∙ - Kernel satisfying conditions of Mercer’s theorem • 𝑏 – Affine offset  Training set consists of  𝐱𝐢 ∈ 𝐑𝐌  𝐲i ∈ {−1,1}, i = 1, … , L
  • 12. System Model NN {∙} SVMM 𝑧−𝐷 𝑢 𝑛 ∈ {±1} 𝒚𝒏 𝑓(𝐱) 𝑥(𝑛)  NN {∙} – Nonlinear system  𝑥(𝑛) – Nonlinear system output  𝑢(𝑛) – Training sequence  𝑦𝑛 – Desired output (delayed version of training sequence)
  • 13. Nonlinear Transmission System PAM nonlinear channel SVM equalizer 𝑧−1 𝑢 𝑛 𝑒(𝑛)  𝑥(𝑛) – Nonlinear channel output  𝑒(𝑛) – Additive Noise  (𝑀 − 1) – Feed Forward Delay (No of past channel outputs utilized)  𝑢(𝑛 − 𝐷) – Equalizer detection output (goal to mimic 𝑢(𝑛 − 𝐷)) + 𝑧−1 𝑥 𝑛 𝑥 𝑛 − 1 𝑥 𝑛 − 𝑀 + 1 𝑥 𝑛 𝑢 𝑛 − 𝐷
  • 14. System Structure and Parameters 𝑥 𝑛 = 𝑥 𝑛 − 0.9𝑥3 𝑛 𝑥 𝑛 = 𝑢 𝑛 + 0.5𝑢 𝑛 − 1 𝑒 𝑛 ~𝑁 0, 𝜎𝑒 2 → 𝑁(0,0.2) SVM Parameters • C = 5 (constraint) • d = 3 (equalizer kernel order) • M = 2 (equalizer dimension) • Kernel = Polynomial
  • 15. Simulation Results Typical classification regions of an SVM 𝐷 = 0 𝐷 = 1 𝐷 = 2  𝑥 𝑛 = 𝑥 𝑛 − 0.9𝑥3 𝑛  𝑥 𝑛 = 𝑢 𝑛 + 0.5𝑢 𝑛 − 1  𝑒 𝑛 ~𝑁 0, 𝜎𝑒 2 → 𝑁(0,0.2)  C = 5 (constraint)  d = 3 (equalizer kernel order)  M = 2 (equalizer dimension)  Kernel = Polynomial
  • 16. Results – Decision Boundaries Colored Noise Correlated Noise – SVM  𝐶𝑜𝑟𝑟𝑀𝑎𝑡 = 𝜎𝑒 2 1 𝜌 𝜌 1  𝜌 = 0.48  𝑀 = 2, 𝐷 = 0,𝑑 = 3  𝑥 𝑛 = 𝑥 𝑛 + 0.1𝑥2 + 0.05𝑥3  𝑥 𝑛 = 0.5𝑢 𝑛 + 𝑢(𝑛 − 1)  𝜎𝑒 2 = 0.2 Correlated Noise – Optimum Ref: Chen et al
  • 17. Results – BER Colored Noise Vs AWGN
  • 18. Results – BER For different values of D=0, 1, 2
  • 19. Decision Boundaries and SNR Polynomial Kernel  K 𝐱, 𝐳 = 𝐱T 𝐳 + 1 d  d = polynomial order  All polynomials up to degree d  For our simulation, d = 3  𝐱T 𝐳 + 1 d = O n computation  Feature space might be non − unique
  • 20. Decision Boundaries and SNR RBF Kernel  K 𝐱, 𝐳 = exp −γ 𝐱 − 𝐳 2 2  Infinite dimensional space  Parameter = γ  As γ increases, the model overfits  As 𝛾 decreases, the model underfits  For our simulation, 𝛾 = 1
  • 21. Decision Boundaries and SNR Sigmoid Kernel  K 𝐱, 𝐳 = tanh k𝐱T 𝐳 − δ  k = slope  δ = intercept  For our simulation, • k = 10, δ = 10  Sigmoidal kernels can be thought of multi-layer perceptron
  • 22. Results – BER For different SVM Kernels
  • 23. Offline Training Generalization over different channels  𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑥 𝑛 − 0.9𝑥3 𝑛  𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑢 𝑛 + 𝟎. 𝟗𝑢 𝑛 − 1  𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑥 𝑛 − 0.9𝑥3 𝑛  𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑢 𝑛 + 𝟎. 𝟓𝑢 𝑛 − 1  𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑥 𝑛 − 𝟎. 𝟓𝑥3 𝑛  𝑥𝑡𝑟𝑎𝑖𝑛 𝑛 = 𝑢 𝑛 + 0.6𝑢 𝑛 − 1  𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑥 𝑛 − 𝟎. 𝟑𝑥3 𝑛  𝑥𝑡𝑒𝑠𝑡 𝑛 = 𝑢 𝑛 + 0.6𝑢 𝑛 − 1
  • 24. Offline Training Generalization over different SNRs  Training SNRs = 1: 20 dB  Testing SNRs = 1: 20 dB  Does not generalize well over different SNR values and multiple channels
  • 25. SVM-Bank for Different SNR signals SVM (SNR1) SVM (SNR2) SVM (SNR𝑁) Noise Variance Estimator NN {∙} 𝑢 𝑛 ∈ {±1} 𝑥(𝑛) 𝑢(𝑛)
  • 26. Results – BER For Bank of SVM
  • 27. Summary We looked at SVM as a Dual Lagrangian Optimization Problem and how it fits in non-linear equalization problem. We developed a non-linear channel communication system and applied SVM equalizer. For different values of Detector Delay (D) and SVM kernels, we found different BER performance of the SVM equalizer. For Unknown SNR, the SVM equalizer does not generalize well to unknown channel and unknown SNR. To solve the issue of SNR, we proposed a bank of SVM with SVM models trained with different SNR values. After receiving the signal, noise variance estimator block will select the desired SVM model for equalization.

Editor's Notes

  1. We look at the received signals. In the most simplest form, using a two tap channel filter, the symbols will appear in two dimensional space. And using ground values that we know, which are the transmitted symbols, we come up with a decision boundary, so that we can predict future symbols based on that boundary.
  2. Voltera filters – a multiplicative structure creates cross-products of all filter states. These cross-products are then weighted linearly and the proble is to find the optimum weighting that minimizes some cost. The dimension of the model grows quickly, and it becomes necessary to apply some sort of heuristic to limit that model. Neural networks – iterative, gradient-descent like algorithm that is not guaranteed to find a global optimum but may tend to a local optimum. Neural networks are susceptible to overtraining and that the number of layers, the number of neurons per layer, and when to stop adapting must be determined in an ad hoc fashion.
  3. Support vectors – data points near the optimal hyperplane
  4. Soft margin – the margin when we allow for misclassifications How do we choose the soft margin? Why should a margin be between those particular blue and green circles? For that we look at the relationship between all data points. We use cross validation to determine how many misclassifications and observations to allow inside of the soft margin to get the best classification. If for validation data, the performance is best when working with the green and blue data points as shown in the lower figure, then we would allow one misclassification. Using a soft margin classifier means using a support vector classifier. The data points (observations) on the edge and within the soft margin are called support vectors.
  5. Soft margin – the margin when we allow for misclassifications How do we choose the soft margin? Why should a margin be between those particular blue and green circles? For that we look at the relationship between all data points. We use cross validation to determine how many misclassifications and observations to allow inside of the soft margin to get the best classification. If for validation data, the performance is best when working with the green and blue data points as shown in the lower figure, then we would allow one misclassification. Using a soft margin classifier means using a support vector classifier. The data points (observations) on the edge and within the soft margin are called support vectors.
  6. There are different types of kernels that can be used. This paper uses a polynomial kernel. A polynomial kernel systematically increases dimensions by setting d, the degree of the polynomial and the relationship between each pair of points is used to find a support vector classifier in that dimension. A good value of the degree is obtained by cross validation.
  7. By differentiating the primal problem and equating to zeros, we get solutions for weight vector w and the bias b, which when we substitute to the original problem gives us the dual form below, where there is no dependence on w and b anymore.
  8. What is a kernel and what are the different options for it
  9. M – Equalizer dimension, D – Lag, d – Polynomial Kernel Order The Decision boundary similar to the optimum and is logical in terms of the training data. The optimum for this example includes a disconnected region, but the SVM cannot match the polygon nature of the optimum.