Summary Chap 1 & 2
Summary Chap 1 & 2
Summary Chap 1 & 2
Feature engineering: is the process of formulating the most appropriate features given the data, the
model, and the task.
The number of features is also important:
• If there are not enough informative features, then the model will be unable to perform the
ultimate task.
• If there are too many features, or if most of them are irrelevant, then the model will be more
expensive and tricky to train. Something might go awry in the training process that impacts
the model’s performance.
In a machine learning workflow, we pick not only the model, but also the features.
• Good features make the subsequent modeling step easy and the resulting model more capable of
completing the desired task.
• Bad features may require a much more complicated model to achieve the same level of
performance.
Case1: If the value of X is minimum, the value of Numerator will be 0; hence Normalization will also be 0.
Case2: If the value of X is maximum, then the value of the numerator is equal to the denominator; hence Normalization will be 1.
Case3: On the other hand, if the value of X is neither maximum nor minimum, then values of normalization will also be between 0
and 1.
Standardization scaling:
’=-/
However, unlike Min-Max scaling technique, feature values are not restricted to a specific range in the
standardization technique.
This technique is helpful for various machine learning algorithms that use distance measures such as
KNN, K-means clustering, and Principal component analysis.
Important: written
Normalization Standardization
Scales values ranges between [0, 1] or [-1, 1]. Scale values are not restricted to a specific range.
It got affected by outliers. It is comparatively less affected by outliers.
It is also called Scaling normalization. It is known as Z-score normalization.
It is useful when feature distribution is unknown. It is useful when feature distribution is normal.
Normalization: is a transformation technique that helps to improve the performance as well as the
accuracy of your model better. Normalization of a machine learning model is useful when you don't know
feature distribution exactly.
In other words, the feature distribution of data does not follow a Gaussian (bell curve) distribution.
Normalization must have an abounding range, so if you have outliers in data, they will be affected by
Normalization.
it is also useful for data having variable scaling techniques such as KNN, artificial neural networks.
Standardization: your data follows a Gaussian distribution. However, this does not have to be
necessarily true.
Standardization does not necessarily have a bounding range, so if you have outliers in your data, they
will not be affected by Standardization.
it is also useful when data has variable dimensions and techniques such as linear regression, logistic
regression, and linear discriminant analysis.
Feature Selection:
Filtering: Filtering techniques preprocess features to remove ones that are unlikely to be useful
for the model.
Wrapper methods: These techniques are expensive, but they allow you to try out subsets of
features.
Embedded methods: These methods perform feature selection as part of the model training
process.