Feature Selection in PR
Feature Selection in PR
Learning
Feature selection is a way of selecting the subset of the most
relevant features from the original features set by removing the
redundant, irrelevant, or noisy features.
While developing the machine learning model, only a few variables in the dataset are
useful for building the model, and the rest features are either redundant or irrelevant.
If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.
Hence it is very important to identify and select the most appropriate features from the
data and remove the irrelevant or less important features, which is done with the help
of feature selection in machine learning.
Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and
relevant dataset to the model in order to get a better result.
So, we can define feature Selection as, "It is a process of automatically or manually
selecting the subset of most appropriate and relevant features to be used in
model building." Feature selection is performed by either including the important
features or excluding the irrelevant features in the dataset without changing them.
1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search
problem, in which different combinations are made, evaluated, and compared with
other combinations. It trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with this
feature set, the model has trained again.
Some techniques of wrapper methods are:
o Forward selection - Forward selection is an iterative process, which begins
with an empty set of features. After each iteration, it keeps adding on a feature
and evaluates the performance to check whether it is improving the
performance or not. The process continues until the addition of a new
variable/feature does not improve the performance of the model.
o Backward elimination - Backward elimination is also an iterative approach,
but it is the opposite of forward selection. This technique begins the process
by considering all the features and removes the least significant feature. This
elimination process continues until removing the features does not improve the
performance of the model.
o Exhaustive Feature Selection- Exhaustive feature selection is one of the best
feature selection methods, which evaluates each feature set as brute-force. It
means this method tries & make each possible combination of features and
return the best performing feature set.
o Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach,
where features are selected by recursively taking a smaller and smaller subset
of features. Now, an estimator is trained with each set of features, and the
importance of each feature is determined using coef_attribute or through
a feature_importances_attribute.
2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method
does not depend on the learning algorithm and chooses the features as a pre-
processing step.
The filter method filters out the irrelevant feature and redundant columns from the
model by using different metrics through ranking.
The advantage of using filter methods is that it needs low computational time and
does not overfit the data.
Some common techniques of Filter methods are as follows:
o Information Gain
o Chi-square Test
o Fisher's Score
o Missing Value Ratio
These methods are also iterative, which evaluates each iteration, and optimally finds
the most important features that contribute the most to training in a particular
iteration. Some techniques of embedded methods are:
o Regularization- Regularization adds a penalty term to different parameters of
the machine learning model for avoiding overfitting in the model. This penalty
term is added to the coefficients; hence it shrinks some coefficients to zero.
Those features with zero coefficients can be removed from the dataset. The
types of regularization techniques are L1 Regularization (Lasso
Regularization) or Elastic Nets (L1 and L2 regularization).
o Random Forest Importance - Different tree-based methods of feature
selection help us with feature importance to provide a way of selecting
features. Here, feature importance specifies which feature has more
importance in model building or has a great impact on the target variable.
Random Forest is such a tree-based method, which is a type of bagging
algorithm that aggregates a different number of decision trees. It automatically
ranks the nodes by their performance or decrease in the impurity (Gini
impurity) over all the trees. Nodes are arranged as per the impurity values, and
thus it allows to pruning of trees below a specific node. The remaining nodes
create a subset of the most important features.
To know this, we need to first identify the type of input and output variables. In
machine learning, variables are of mainly two types:
o Numerical Variables: Variable with continuous values such as integer, float
o Categorical Variables: Variables with categorical values such as Boolean,
ordinal, nominals.