Feature Selection in PR

Feature selection is a technique used to select the most important and relevant features for machine learning models. It aims to remove irrelevant and redundant features to improve model performance. There are supervised, unsupervised and embedded feature selection methods that use statistics, wrappers or regularization to evaluate features. Choosing the right feature selection method depends on the data types and problem being solved.

Uploaded by

Nikita Noriya

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Feature Selection in PR

Uploaded by

Nikita Noriya

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Feature Selection Techniques in Machine

Learning
Feature selection is a way of selecting the subset of the most
relevant features from the original features set by removing the
redundant, irrelevant, or noisy features.

While developing the machine learning model, only a few variables in the dataset are
useful for building the model, and the rest features are either redundant or irrelevant.
If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.
Hence it is very important to identify and select the most appropriate features from the
data and remove the irrelevant or less important features, which is done with the help
of feature selection in machine learning.
Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and
relevant dataset to the model in order to get a better result.
So, we can define feature Selection as, "It is a process of automatically or manually
selecting the subset of most appropriate and relevant features to be used in
model building." Feature selection is performed by either including the important
features or excluding the irrelevant features in the dataset without changing them.

Need for Feature Selection

Before implementing any technique, it is really important to understand, need for the
technique and so for the Feature Selection. As we know, in machine learning, it is
necessary to provide a pre-processed and good input dataset in order to get better
outcomes. We collect a huge amount of data to train our model and help it to learn
better. Generally, the dataset consists of noisy data, irrelevant data, and some part of
useful data. Moreover, the huge amount of data also slows down the training process
of the model, and with noise and irrelevant data, the model may not predict and
perform well. So, it is very necessary to remove such noises and less-important data
from the dataset and to do this, and Feature selection techniques are used.
Selecting the best features helps the model to perform well. For example, Suppose we
want to create a model that automatically decides which car should be crushed for a
spare part, and to do this, we have a dataset. This dataset contains a Model of the car,
Year, Owner's name, Miles. So, in this dataset, the name of the owner does not
contribute to the model performance as it does not decide if the car should be crushed
or not, so we can remove this column and select the rest of the features(column) for
the model building.
Below are some benefits of using feature selection in machine learning:
o It helps in avoiding the curse of dimensionality.
o It helps in the simplification of the model so that it can be easily
interpreted by the researchers.
o It reduces the training time.
o It reduces overfitting hence enhance the generalization.

Feature Selection Techniques

There are mainly two types of Feature Selection techniques, which are:
o Supervised Feature Selection technique
Supervised Feature selection techniques consider the target variable and can
be used for the labelled dataset.
o Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can
be used for the unlabelled dataset.

There are mainly three techniques under supervised feature Selection:

1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search
problem, in which different combinations are made, evaluated, and compared with
other combinations. It trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with this
feature set, the model has trained again.
Some techniques of wrapper methods are:
o Forward selection - Forward selection is an iterative process, which begins
with an empty set of features. After each iteration, it keeps adding on a feature
and evaluates the performance to check whether it is improving the
performance or not. The process continues until the addition of a new
variable/feature does not improve the performance of the model.
o Backward elimination - Backward elimination is also an iterative approach,
but it is the opposite of forward selection. This technique begins the process
by considering all the features and removes the least significant feature. This
elimination process continues until removing the features does not improve the
performance of the model.
o Exhaustive Feature Selection- Exhaustive feature selection is one of the best
feature selection methods, which evaluates each feature set as brute-force. It
means this method tries & make each possible combination of features and
return the best performing feature set.
o Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach,
where features are selected by recursively taking a smaller and smaller subset
of features. Now, an estimator is trained with each set of features, and the
importance of each feature is determined using coef_attribute or through
a feature_importances_attribute.

2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method
does not depend on the learning algorithm and chooses the features as a pre-
processing step.
The filter method filters out the irrelevant feature and redundant columns from the
model by using different metrics through ranking.
The advantage of using filter methods is that it needs low computational time and
does not overfit the data.
Some common techniques of Filter methods are as follows:
o Information Gain
o Chi-square Test
o Fisher's Score
o Missing Value Ratio

Information Gain: Information gain determines the reduction in entropy while

transforming the dataset. It can be used as a feature selection technique by calculating
the information gain of each variable with respect to the target variable.
Chi-square Test: Chi-square test is a technique to determine the relationship between
the categorical variables. The chi-square value is calculated between each feature and
the target variable, and the desired number of features with the best chi-square value
is selected.
Fisher's Score:
Fisher's score is one of the popular supervised technique of features selection. It
returns the rank of the variable on the fisher's criteria in descending order. Then we
can select the variables with a large fisher's score.
Missing Value Ratio:
The value of the missing value ratio can be used for evaluating the feature set against
the threshold value. The formula for obtaining the missing value ratio is the number
of missing values in each column divided by the total number of observations. The
variable is having more than the threshold value can be dropped.
3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are
fast processing methods similar to the filter method but more accurate than the filter
method.

These methods are also iterative, which evaluates each iteration, and optimally finds
the most important features that contribute the most to training in a particular
iteration. Some techniques of embedded methods are:
o Regularization- Regularization adds a penalty term to different parameters of
the machine learning model for avoiding overfitting in the model. This penalty
term is added to the coefficients; hence it shrinks some coefficients to zero.
Those features with zero coefficients can be removed from the dataset. The
types of regularization techniques are L1 Regularization (Lasso
Regularization) or Elastic Nets (L1 and L2 regularization).
o Random Forest Importance - Different tree-based methods of feature
selection help us with feature importance to provide a way of selecting
features. Here, feature importance specifies which feature has more
importance in model building or has a great impact on the target variable.
Random Forest is such a tree-based method, which is a type of bagging
algorithm that aggregates a different number of decision trees. It automatically
ranks the nodes by their performance or decrease in the impurity (Gini
impurity) over all the trees. Nodes are arranged as per the impurity values, and
thus it allows to pruning of trees below a specific node. The remaining nodes
create a subset of the most important features.

How to choose a Feature Selection Method?

For machine learning engineers, it is very important to understand that which feature
selection method will work properly for their model. The more we know the datatypes
of variables, the easier it is to choose the appropriate statistical measure for feature
selection.

To know this, we need to first identify the type of input and output variables. In
machine learning, variables are of mainly two types:
o Numerical Variables: Variable with continuous values such as integer, float
o Categorical Variables: Variables with categorical values such as Boolean,
ordinal, nominals.

what is Feature Selection?

A feature is an attribute that has an impact on a problem or is useful for the problem,
and choosing the important features for the model is known as feature selection. Each
machine learning process depends on feature engineering, which mainly contains two
processes; which are Feature Selection and Feature Extraction. Although feature
selection and extraction processes may have the same objective, both are completely
different from each other. The main difference between them is that feature selection
is about selecting the subset of the original feature set, whereas feature extraction
creates new features. Feature selection is a way of reducing the input variable for the
model by using only relevant data in order to reduce overfitting in the model.

Usp 1039
No ratings yet
Usp 1039
18 pages
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
No ratings yet
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
14 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Feature engineering
No ratings yet
Feature engineering
5 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
DOC-20241211-WA0028.
No ratings yet
DOC-20241211-WA0028.
10 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Feature selection techniques
No ratings yet
Feature selection techniques
5 pages
Unit 3
No ratings yet
Unit 3
50 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Presentation 1 (2)
No ratings yet
Presentation 1 (2)
22 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
A Review of Feature Selection Methods On Synthetic Data
No ratings yet
A Review of Feature Selection Methods On Synthetic Data
37 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
Feature Selection
No ratings yet
Feature Selection
2 pages
3038-Article Text-5729-1-10-20210418
No ratings yet
3038-Article Text-5729-1-10-20210418
6 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Literature Review On Feature Subset Selection Techniques
No ratings yet
Literature Review On Feature Subset Selection Techniques
3 pages
Module5.2 Feature selection methods
No ratings yet
Module5.2 Feature selection methods
64 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
Data Prep for ML-1
No ratings yet
Data Prep for ML-1
5 pages
Feature Selection 16891042299
No ratings yet
Feature Selection 16891042299
23 pages
Cheatsheet 232
No ratings yet
Cheatsheet 232
2 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Presentation1
No ratings yet
Presentation1
15 pages
Untitled (1)
No ratings yet
Untitled (1)
93 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
No ratings yet
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
7 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
The 5 Feature Selection Algorithms Every Data Scientist Should Know
No ratings yet
The 5 Feature Selection Algorithms Every Data Scientist Should Know
29 pages
Module2.1 Feature Selection
No ratings yet
Module2.1 Feature Selection
46 pages
A Review of Feature Selection Techniques in BioinformaticsBioinformatics
No ratings yet
A Review of Feature Selection Techniques in BioinformaticsBioinformatics
11 pages
A Review of Feature Selection Techniques in Bioinformatics
No ratings yet
A Review of Feature Selection Techniques in Bioinformatics
11 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
A Review of Feature Selection Methods With Applications
No ratings yet
A Review of Feature Selection Methods With Applications
6 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Lecture 15_23.09.2024_ Feature Selection
No ratings yet
Lecture 15_23.09.2024_ Feature Selection
47 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
DM Prathameshwadnerkar92
No ratings yet
DM Prathameshwadnerkar92
9 pages
Comparartive
No ratings yet
Comparartive
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Stock Market Prediction Using Machine Learning Classifiers and Social Media, News
No ratings yet
Stock Market Prediction Using Machine Learning Classifiers and Social Media, News
24 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Pattern: Recognition
No ratings yet
Pattern: Recognition
25 pages
Program
No ratings yet
Program
51 pages
Literature Review Report
No ratings yet
Literature Review Report
24 pages
Credit Scoring With A Feature Selection Approach Based Deep Learning PDF
No ratings yet
Credit Scoring With A Feature Selection Approach Based Deep Learning PDF
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
38 pages
Lit Review
No ratings yet
Lit Review
10 pages
Personality Prediction Using Social Media
No ratings yet
Personality Prediction Using Social Media
13 pages
Flight State Identification of A Self-Sensing Wing
No ratings yet
Flight State Identification of A Self-Sensing Wing
21 pages
1-s2.0-S0167404820304314-main_1
No ratings yet
1-s2.0-S0167404820304314-main_1
19 pages
Machine Learning Based Ensemble Classifier For Android Malware Detection
No ratings yet
Machine Learning Based Ensemble Classifier For Android Malware Detection
18 pages
Ransomware Attack Detection Using Supervised Machine Learning Classifiers
No ratings yet
Ransomware Attack Detection Using Supervised Machine Learning Classifiers
44 pages
10 - RA - Machine Learning For Large-Scale Crop Yield Forecasting
No ratings yet
10 - RA - Machine Learning For Large-Scale Crop Yield Forecasting
13 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
ICET Presentation 1
No ratings yet
ICET Presentation 1
23 pages
Eature Engineering: Presenter: Prof. Amit Kumar Das
No ratings yet
Eature Engineering: Presenter: Prof. Amit Kumar Das
17 pages
A Malware Detection Method
No ratings yet
A Malware Detection Method
74 pages
Download Complete Swarm Intelligence Trends and Applications: Trends and Applications 1st Edition Wellington Pinheiro Dos Santos (Editor) PDF for All Chapters
100% (1)
Download Complete Swarm Intelligence Trends and Applications: Trends and Applications 1st Edition Wellington Pinheiro Dos Santos (Editor) PDF for All Chapters
65 pages
An Insight Into Machine Learning Techniq
No ratings yet
An Insight Into Machine Learning Techniq
8 pages
Feature Selection - New
No ratings yet
Feature Selection - New
41 pages
Chapter 2 Part1
No ratings yet
Chapter 2 Part1
33 pages
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
No ratings yet
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
11 pages
Black Hole Algorithm A Comprehensive Survey
No ratings yet
Black Hole Algorithm A Comprehensive Survey
26 pages
Thesis PDF
No ratings yet
Thesis PDF
198 pages
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang - The ebook is available for instant download, no waiting required
No ratings yet
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang - The ebook is available for instant download, no waiting required
48 pages
1.4 Feature Selection
No ratings yet
1.4 Feature Selection
12 pages