Feature Selection in Machine Learning

This document discusses different methods for feature selection in machine learning: filter methods, wrapper methods, and embedded methods. Filter methods select features independently of any machine learning algorithm based on statistical tests of correlation with the target variable. Wrapper methods select features by using subsets to train models and evaluate performance. Embedded methods combine qualities of filter and wrapper methods by having built-in feature selection in algorithms like lasso and ridge regression. Feature selection can improve model performance, reduce overfitting, and increase training speed.

Uploaded by

STYX

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views

Feature Selection in Machine Learning

Uploaded by

STYX

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Feature Selection in Machine Learning

Machine learning works on a simple rule – if you put garbage in, you will only get garbage to
come out. By garbage here, I mean noise in data.

This becomes even more important when the number of features are very large. You need
not use every feature at your disposal for creating an algorithm. You can assist your
algorithm by feeding in only those features that are really important.

Not only in the competitions but this can be very useful in industrial applications as well.
You not only reduce the training time and the evaluation time, you also have less things to
worry about!

Top reasons to use feature selection are:

 It enables the machine learning algorithm to train faster.

 It reduces the complexity of a model and makes it easier to interpret.
 It improves the accuracy of a model if the right subset is chosen.
 It reduces overfitting.

Next, we’ll discuss various methodologies and techniques that you can use to subset your
feature space and help your models perform better and efficiently. So, let’s get started.

Filter Methods

Filter methods are generally used as a preprocessing step. The selection of features is
independent of any machine learning algorithms. Instead, features are selected on the basis
of their scores in various statistical tests for their correlation with the outcome variable. The
correlation is a subjective term here. For basic guidance, you can refer to the following table
for defining correlation co-efficients.
 Pearson’s Correlation: It is used as a measure for quantifying linear dependence
between two continuous variables X and Y. Its value varies from -1 to +1. Pearson’s
correlation is given as:

 LDA: Linear discriminant analysis is used to find a linear combination of features

that characterizes or separates two or more classes (or levels) of a categorical
variable.

 ANOVA: ANOVA stands for Analysis of variance. It is similar to LDA except for the
fact that it is operated using one or more categorical independent features and one
continuous dependent feature. It provides a statistical test of whether the means of
several groups are equal or not.

 Chi-Square: It is a is a statistical test applied to the groups of categorical features to

evaluate the likelihood of correlation or association between them using their
frequency distribution.

One thing that should be kept in mind is that filter methods do not remove multicollinearity.
So, you must deal with multicollinearity of features as well before training models for your
data.

Wrapper Methods

In wrapper methods, we try to use a subset of features and train a model using them. Based
on the inferences that we draw from the previous model, we decide to add or remove
features from your subset. The problem is essentially reduced to a search problem. These
methods are usually computationally very expensive.

Some common examples of wrapper methods are forward feature selection, backward
feature elimination, recursive feature elimination, etc.

 Forward Selection: Forward selection is an iterative method in which we start with

having no feature in the model. In each iteration, we keep adding the feature which
best improves our model till an addition of a new variable does not improve the
performance of the model.
 Backward Elimination: In backward elimination, we start with all the features and
removes the least significant feature at each iteration which improves the
performance of the model. We repeat this until no improvement is observed on
removal of features.
 Recursive Feature elimination: It is a greedy optimization algorithm which aims to
find the best performing feature subset. It repeatedly creates models and keeps
aside the best or the worst performing feature at each iteration. It constructs the next
model with the left features until all the features are exhausted. It then ranks the
features based on the order of their elimination.

One of the best ways for implementing feature selection with wrapper methods is to use
Boruta package that finds the importance of a feature by creating shadow features.

It works in the following steps:

1. Firstly, it adds randomness to the given data set by creating shuffled copies of all
features (which are called shadow features).
2. Then, it trains a random forest classifier on the extended data set and applies a
feature importance measure (the default is Mean Decrease Accuracy) to evaluate
the importance of each feature where higher means more important.
3. At every iteration, it checks whether a real feature has a higher importance than the
best of its shadow features (i.e. whether the feature has a higher Z-score than the
maximum Z-score of its shadow features) and constantly removes features which
are deemed highly unimportant.
4. Finally, the algorithm stops either when all features get confirmed or rejected or it
reaches a specified limit of random forest runs.
Embedded Methods

Embedded methods combine the qualities’ of filter and wrapper methods. It’s implemented
by algorithms that have their own built-in feature selection methods.

Some of the most popular examples of these methods are LASSO and RIDGE regression
which have inbuilt penalization functions to reduce overfitting.

 Lasso regression performs L1 regularization which adds penalty equivalent to

absolute value of the magnitude of coefficients.
 Ridge regression performs L2 regularization which adds penalty equivalent to square
of the magnitude of coefficients.

Difference between Filter and Wrapper methods

The main differences between the filter and wrapper methods for feature selection are:

 Filter methods measure the relevance of features by their correlation with dependent
variable while wrapper methods measure the usefulness of a subset of feature by
actually training a model on it.
 Filter methods are much faster compared to wrapper methods as they do not involve
training the models. On the other hand, wrapper methods are computationally very
expensive as well.
 Filter methods use statistical methods for evaluation of a subset of features while
wrapper methods use cross validation.
 Filter methods might fail to find the best subset of features in many occasions but
wrapper methods can always provide the best subset of features.
 Using the subset of features from the wrapper methods make the model more prone
to overfitting as compared to using subset of features from the filter methods.

As Applied Unit 3 Probability MS
100% (1)
As Applied Unit 3 Probability MS
7 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Scan To BIM - Presentation
No ratings yet
Scan To BIM - Presentation
61 pages
LangChain - Chat With Your Data
No ratings yet
LangChain - Chat With Your Data
32 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
24 pages
Stat & Prob 11 Exam 4th FINAL
100% (5)
Stat & Prob 11 Exam 4th FINAL
6 pages
Facing A New Chapter: Factors Affecting The Senior High School Career Preference of Grade 11 Students of Hipona National High School
91% (11)
Facing A New Chapter: Factors Affecting The Senior High School Career Preference of Grade 11 Students of Hipona National High School
35 pages
What Is A Support Vector Machine?: Primer
No ratings yet
What Is A Support Vector Machine?: Primer
3 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
No ratings yet
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
87 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Lec 06 Feature Selection and Extraction
No ratings yet
Lec 06 Feature Selection and Extraction
43 pages
G5Aiai Introduction To AI: Graham Kendall
No ratings yet
G5Aiai Introduction To AI: Graham Kendall
48 pages
Anomaly Detection in Images CIFAR-10
No ratings yet
Anomaly Detection in Images CIFAR-10
9 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
CUDA Memory Types: Parallel and High Performance Computing
No ratings yet
CUDA Memory Types: Parallel and High Performance Computing
27 pages
Text
No ratings yet
Text
131 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
100% (1)
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
65 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
Feature Selection For Loan Repayment Prediction System Using Machine Learning
No ratings yet
Feature Selection For Loan Repayment Prediction System Using Machine Learning
10 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Computer Vision I: Ai Courses by Opencv
No ratings yet
Computer Vision I: Ai Courses by Opencv
9 pages
Artificial Intelligence: Computer Science Engineering
No ratings yet
Artificial Intelligence: Computer Science Engineering
1 page
Instant Download Pandas Workout (MEAP V06) Reuven Lerner PDF All Chapters
100% (2)
Instant Download Pandas Workout (MEAP V06) Reuven Lerner PDF All Chapters
37 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
ML Unit 1 Notes
100% (1)
ML Unit 1 Notes
19 pages
Presentation GPT 4
100% (1)
Presentation GPT 4
25 pages
Spring 2022 CS7643 Deep Learning Syllabus and Schedule - v5.1
No ratings yet
Spring 2022 CS7643 Deep Learning Syllabus and Schedule - v5.1
11 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Get Feature Engineering Bookcamp 1st Edition Sinan Ozdemir free all chapters
100% (2)
Get Feature Engineering Bookcamp 1st Edition Sinan Ozdemir free all chapters
55 pages
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
No ratings yet
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
29 pages
Python Scikit-Learn Cheat Sheet For Machine Learning
No ratings yet
Python Scikit-Learn Cheat Sheet For Machine Learning
3 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Natural Language Processing
100% (1)
Natural Language Processing
12 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
8 pages
WEKA Manual For Version 3-6-5
No ratings yet
WEKA Manual For Version 3-6-5
303 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Download Full Deep Learning 1st Edition Dulani Meedeniya PDF All Chapters
100% (2)
Download Full Deep Learning 1st Edition Dulani Meedeniya PDF All Chapters
50 pages
3D U-Net Based Brain Tumor Segmentation
No ratings yet
3D U-Net Based Brain Tumor Segmentation
11 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Wrox Professional C - Hash Design Patterns Applied PDF
No ratings yet
Wrox Professional C - Hash Design Patterns Applied PDF
62 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
06 Feature Engineering
No ratings yet
06 Feature Engineering
24 pages
PDF Machine Learning and Optimization for Engineering Design 1st Edition Apoorva S. Shastri download
100% (2)
PDF Machine Learning and Optimization for Engineering Design 1st Edition Apoorva S. Shastri download
40 pages
Pytorch Paper
No ratings yet
Pytorch Paper
12 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
135 pages
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
No ratings yet
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
5 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Feature engineering Complete Self-Assessment Guide
From Everand
Feature engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Write 8086 Assembly Language Programs To Sort The Given 10 Numbers in Ascending Order and in Descending Order
No ratings yet
Write 8086 Assembly Language Programs To Sort The Given 10 Numbers in Ascending Order and in Descending Order
5 pages
Missing and Outlier
No ratings yet
Missing and Outlier
20 pages
Fast-Track Semester 2022: Digital Assignment-5
No ratings yet
Fast-Track Semester 2022: Digital Assignment-5
3 pages
Fast-Track Semester 2022: Technical Answers To Real-World Problems
No ratings yet
Fast-Track Semester 2022: Technical Answers To Real-World Problems
2 pages
Img 0001
No ratings yet
Img 0001
1 page
Quiz 5 Detailed Solution International Marketing
No ratings yet
Quiz 5 Detailed Solution International Marketing
1 page
Feature Selection
No ratings yet
Feature Selection
56 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
No ratings yet
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
34 pages
MGT1036 Principles-Of-Marketing Eth 1.0 40 MGT1036
No ratings yet
MGT1036 Principles-Of-Marketing Eth 1.0 40 MGT1036
2 pages
CAT2
No ratings yet
CAT2
146 pages
CSE4001 - Parallel and Distributed Computing, Fall 2019 Vellore Institute of Technology Instructor: Prof Deebak B D - SCOPE
No ratings yet
CSE4001 - Parallel and Distributed Computing, Fall 2019 Vellore Institute of Technology Instructor: Prof Deebak B D - SCOPE
3 pages
Research Paper 1
No ratings yet
Research Paper 1
13 pages
Studyplan SSC CGL General Awareness Tier 1
No ratings yet
Studyplan SSC CGL General Awareness Tier 1
40 pages
Ictus Trial
No ratings yet
Ictus Trial
9 pages
Lab5 - NguyenHoangAnhTu - Jupyter Notebook
No ratings yet
Lab5 - NguyenHoangAnhTu - Jupyter Notebook
33 pages
AP Stats Chapters
No ratings yet
AP Stats Chapters
46 pages
AP ECON 2500 Session 1
No ratings yet
AP ECON 2500 Session 1
20 pages
Romantic Relationship
No ratings yet
Romantic Relationship
68 pages
Chapter Iii-2018139pbi
No ratings yet
Chapter Iii-2018139pbi
12 pages
BBK Dissertation Results
100% (2)
BBK Dissertation Results
4 pages
Infoedge Test in IITR
No ratings yet
Infoedge Test in IITR
2 pages
PDF Applications in Statistical Computing From Music Data Analysis to Industrial Quality Improvement Nadja Bauer download
100% (3)
PDF Applications in Statistical Computing From Music Data Analysis to Industrial Quality Improvement Nadja Bauer download
65 pages
AIS 31 Functionality Classes For Random Number Generators e
No ratings yet
AIS 31 Functionality Classes For Random Number Generators e
239 pages
Business Statistics-B.com I
0% (1)
Business Statistics-B.com I
3 pages
Jurnal Fcopes Fix 2
No ratings yet
Jurnal Fcopes Fix 2
9 pages
FDRN Reviewer
No ratings yet
FDRN Reviewer
12 pages
171 Nikhil Rajput
No ratings yet
171 Nikhil Rajput
120 pages
Lesson 1 - Mathematics in Our World
No ratings yet
Lesson 1 - Mathematics in Our World
12 pages
Pengaruh Karakteristik Pemerintah Daerah Dan Opini
No ratings yet
Pengaruh Karakteristik Pemerintah Daerah Dan Opini
20 pages
Inferential Statictic by NSM Saadatul Nurah
No ratings yet
Inferential Statictic by NSM Saadatul Nurah
11 pages
Course Outline
No ratings yet
Course Outline
2 pages
Lessons#1 - Characteristics, Strengths, Weaknesses, and Kinds of Qiantitative Research
No ratings yet
Lessons#1 - Characteristics, Strengths, Weaknesses, and Kinds of Qiantitative Research
17 pages
Chapter 1 Introduction To Quantitative Analysis
No ratings yet
Chapter 1 Introduction To Quantitative Analysis
12 pages
Problems On Confidence Interval
100% (3)
Problems On Confidence Interval
6 pages
The Effects of Projects Funding On Their Performance in Rwanda
No ratings yet
The Effects of Projects Funding On Their Performance in Rwanda
32 pages
FDS Aim Algorithm
No ratings yet
FDS Aim Algorithm
18 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
Entrepreneurial Marketing Practice The Effect On Growth of Small and Medium Scale Business in Ekiti State
No ratings yet
Entrepreneurial Marketing Practice The Effect On Growth of Small and Medium Scale Business in Ekiti State
5 pages