Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Machine Learning Unit-2

Machine learning

Uploaded by

vanukurusivasai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning Unit-2

Machine learning

Uploaded by

vanukurusivasai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Machine Learning

UNIT - 2
Basics of Data Preprocessing and Feature Engineering
Feature Transformation, Feature scaling, Feature Construction, and Feature subset
selection, Dimensionality reduction, Explorative data analysis, Hyperparameter
tuning, Introduction to SK learn package

2.1. Feature Transformation:


Feature transformation techniques in machine learning are used to change data
from one form to another while keeping its main meaning. In simpler terms,
these techniques are like functions applied to data that isn't normally distributed.
After applying them, the data is more likely to be normally distributed. There are
3 types of Feature transformation techniques: Function Transformers, Power
Transformers, Quantile Transformers.

1) Function Transformers:
Function transformers are a type of feature transformation technique that
uses specific functions to change data, so it follows a normal distribution.
These functions are applied to the data observations.

There isn't a strict rule for choosing the right function transformer; it often
depends on the domain knowledge of the data. However, there are generally
five common types of function transformers that are effective in making data
more normally distributed. They are Log Transform, Square Transform,
Square Root Transform, Reciprocal Transform, Custom Transform.

a) Log Transform:
Log transformation is a simple technique where you apply the logarithm to
each value in the data. This transformed data is then used for machine
learning algorithms.

b) Square Transform:
Square transformation is a technique where you apply the square function
to each data value. In simple terms, this means you replace each value
with its square, and then use these squared values as the final
transformed data for your machine learning algorithms.

c) Square Root Transform:


In square root transformation, you calculate the square root of each data
value. This technique works especially well for left-skewed data,
effectively turning it into data that is more normally distributed.

d) Reciprocal Transform:
In reciprocal transformation, you take the reciprocal (or inverse) of each
data value, which means you replace each value with 1/x. This method
can be useful for some datasets, as applying this transformation can help
achieve a more normal distribution.

CH. ANIL KUMAR 13


Machine Learning

e) Custom Transform:
Log and square root transformations might not work for every dataset
because each dataset can have unique patterns and complexities. Instead,
you can apply custom transformations based on your understanding of the
data. These custom transformations can include any function or operation,
such as sine, cosine, tangent, or cubing, to help transform the data into a
normal distribution.

2) Power Transformers:
Power transformation techniques apply a mathematical power to data
observations to transform the data. There are two main types of power
transformation techniques:

a) Box-Cox Transform:
This technique is used to stabilize variance and make the data more
normally distributed. It requires the data to be positive, and it finds the
best power transformation to apply.

b) Yeo-Johnson Transform:
This technique extends the Box-Cox transform to handle both positive and
negative data. It also aims to stabilize variance and make the data more
normally distributed.

3) Quantile Transformers:
Quantile transformation techniques are used to transform numerical data
observations. This technique can be implemented using libraries like scikit-
learn.

In this transformation, the input data is processed so that the output data
follows a normal distribution. This makes the data more suitable for machine
learning algorithms.

2.2. Feature scaling:


Feature scaling is a crucial preprocessing step in machine learning that involves
adjusting the range of independent variables (features) in your dataset. It is
used to standardize or normalize features so that they are on a comparable
scale, improving the performance of many machine learning algorithms.

Why is Feature Scaling Important?


In machine learning, certain algorithms are sensitive to the scale of input data.
For instance, algorithms that rely on distance metrics (such as k-nearest
neighbors or support vector machines) can be significantly affected by the
magnitude of features. If one feature has a large range (e.g., income in
thousands), while another has a small range (e.g., age in years), the algorithm
might give more importance to the feature with the larger range, skewing the
results.

Feature scaling solves this by bringing all features to a similar scale, ensuring
that no single feature dominates the learning process.

Types of Feature Scaling


There are several methods to scale features:

CH. ANIL KUMAR 14


Machine Learning

CH. ANIL KUMAR 15


Machine Learning

2.3. Feature Construction:


Feature construction is the process of creating new features or variables from
existing data that can be used to improve the performance of machine learning
models. Features are the input variables used by machine learning algorithms to
make predictions, and feature construction involves transforming the raw data
into a more meaningful representation that captures the underlying patterns and
relationships in the data.

Here are some common techniques for feature construction:


• Feature Extraction: This involves selecting a subset of the original features
or transforming them into a different representation.

• Feature Scaling: This involves scaling the values of the features to a


specific range or standardizing them to have a mean of zero and a standard
deviation of one.

• Feature Engineering: This involves creating new features from existing


ones.

• Feature Selection: This involves selecting a subset of the original features


that are most relevant to the target variable.

CH. ANIL KUMAR 16


Machine Learning

Here are some of the main advantages:


• Improved model accuracy: By creating new features that capture the
underlying patterns and relationships in the data, feature construction can
help to improve the accuracy of machine learning models.

• Reduced overfitting: Feature construction can help to reduce overfitting,


which occurs when a model fits the training data too closely and fails to
generalize to new data. By creating more robust and representative features,
feature construction can help to reduce the complexity of the model and
prevent overfitting.

• Faster model training: By reducing the number of features or transforming


them into a more meaningful representation, feature construction can help to
speed up the training process and make it more efficient.

• Increased interpretability: Feature construction can also help to increase


the interpretability of machine learning models by creating features that are
more easily understood and explainable by humans.

• Better generalization: By capturing the underlying patterns and


relationships in the data, feature construction can help to improve the
generalization performance of machine learning models, enabling them to
perform well on new and unseen data.

2.4. Feature subset selection:


Feature subset selection is an important process in machine learning where a
subset of relevant features (or variables) is selected from the original feature set
to improve model performance and reduce computational complexity. The
primary goal is to select the most relevant features that contribute to predicting
the target variable, while discarding redundant or irrelevant ones. This improves
the model’s accuracy, reduces overfitting, and speeds up the learning process.

Importance:
• Dimensionality Reduction: Many machine learning datasets have a high
number of features, but not all features are useful. Reducing the number of
features can simplify the model.

• Improved Model Generalization: By selecting relevant features, the model


becomes less prone to overfitting, improving its performance on unseen data.

• Better Interpretability: A smaller set of features makes the model easier


to understand and interpret.

1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a
search problem, in which different combinations are made, evaluated, and
compared with other combinations. It trains the algorithm by using the
subset of features iteratively.
Based on the output of the model, features are added or subtracted, and with
this feature set, the model has trained again.

CH. ANIL KUMAR 17


Machine Learning

2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures.
This method does not depend on the learning algorithm and chooses the
features as a pre-processing step.
The filter method filters out the irrelevant feature and redundant columns
from the model by using different metrics through ranking.
The advantage of using filter methods is that it needs low computational time
and does not overfit the data.

3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper
methods by considering the interaction of features along with low
computational cost. These are fast processing methods similar to the filter
method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and
optimally finds the most important features that contribute the most to
training in a particular iteration.

CH. ANIL KUMAR 18


Machine Learning

2.5. Dimensionality reduction:


Dimensionality reduction is a technique used to reduce the number of features in
a dataset while retaining as much of the important information as possible. In
other words, it is a process of transforming high-dimensional data into a lower-
dimensional space that still preserves the essence of the original data. This is
often done to:
• Reduce computational cost: High-dimensional data can be computationally
expensive to process. By reducing dimensionality, we can significantly speed
up algorithms.

• Improve model performance: In some cases, high-dimensional data can


lead to overfitting. Reducing dimensionality can help prevent overfitting and
improve generalization.

• Visualize data: It's easier to visualize data in two or three dimensions.


Dimensionality reduction can help us understand the underlying structure of
high-dimensional data.

Common Dimensionality Reduction Techniques


• Principal Component Analysis (PCA): Finds a new set of uncorrelated
variables (principal components) that capture the most variance in the data.
The first few principal components often contain most of the information.

• Linear Discriminant Analysis (LDA): Like PCA, but specifically designed to


maximize class separation. Useful for classification tasks.

• t-SNE: A non-linear technique that maps high-dimensional data to a lower-


dimensional space while preserving local structure. Good for visualizing high-
dimensional data.

CH. ANIL KUMAR 19


Machine Learning

• Autoencoders: Neural networks trained to reconstruct input data. The latent


representation learned by the autoencoder can be used as a reduced-
dimensional representation.

• Factor Analysis: Assumes that observed variables are linear combinations


of a smaller set of latent variables. Like PCA but with a probabilistic
interpretation.

2.6. Explorative data analysis:


Exploratory Data Analysis (EDA) is a crucial initial step in data science projects.
It involves analyzing and visualizing data to understand its key characteristics,
uncover patterns, and identify relationships between variables refers to the
method of studying and exploring record sets to apprehend their predominant
traits, discover patterns, locate outliers, and identify relationships between
variables. EDA is normally carried out as a preliminary step before undertaking
extra formal statistical analyses or modeling.
The following figure illustrate the steps for performing exploratory data analysis.

Key aspects of EDA include:


• Distribution of Data: Examining the distribution of data points to
understand their range, central tendencies (mean, median), and dispersion
(variance, standard deviation).

• Graphical Representations: Utilizing charts such as histograms, box plots,


scatter plots, and bar charts to visualize relationships within the data and
distributions of variables.

• Outlier Detection: Identifying unusual values that deviate from other data
points. Outliers can influence statistical analyses and might indicate data
entry errors or unique cases.

• Correlation Analysis: Checking the relationships between variables to


understand how they might affect each other. This includes computing
correlation coefficients and creating correlation matrices.

CH. ANIL KUMAR 20


Machine Learning

• Handling Missing Values: Detecting and deciding how to address missing


data points, whether by imputation or removal, depending on their impact
and the amount of missing data.

• Summary Statistics: Calculating key statistics that provide insight into data
trends and nuances.

• Testing Assumptions: Many statistical tests and models assume the data
meet certain conditions (like normality or homoscedasticity). EDA helps verify
these assumptions.

2.7. Hyperparameter tuning:


A Machine Learning model is defined as a mathematical model with several
parameters that need to be learned from the data. By training a model with
existing data, we can fit the model parameters.
However, there is another kind of parameter, known as Hyperparameters, that
cannot be directly learned from the regular training process. They are usually
fixed before the actual training process begins. These parameters express
important properties of the model such as its complexity or how fast it should
learn. This article aims to explore various strategies to tune hyperparameters for
Machine learning models.

Steps to Perform Hyperparameter Tuning:


• Select the right type of model.
• Review the list of parameters of the model and build the HP space.
• Finding the methods for searching the hyper parameter tuning
• Applying the cross-validation scheme approach
• Assess the model score to evaluate the model.

CH. ANIL KUMAR 21


Machine Learning

Common Hyperparameters in Machine Learning:


• Learning Rate: Controls how much to change the model in response to the
estimated error at each update step.

• Batch Size: Number of training samples to work through before updating the
model’s parameters.

• Regularization Parameters: Parameters like L1, L2 regularization help in


preventing overfitting by penalizing complex models.

• Dropout Rate: Used in neural networks to randomly ignore a fraction of


neurons during training to prevent overfitting.

• Number of Hidden Layers and Units: In neural networks, the architecture


can have multiple layers and neurons per layer, affecting the capacity to learn
complex representations.

• Tree Depth (in Decision Trees/Random Forests): Limits the depth of the
trees to prevent overfitting.

• Number of Neighbors (in k-NN): Determines how many neighbors to


consider for classification or regression tasks.

2.8. Introduction to SK learn package:


Scikit-learn (Sklearn) is the most useful and robust library for machine learning
in Python. It provides a selection of efficient tools for machine learning and
statistical modeling including classification, regression, clustering, and
dimensionality reduction via a consistence interface in Python.

Features
Rather than focusing on loading, manipulating, and summarizing data, Scikit-
learn library is focused on modeling the data. Some of the most popular groups
of models provided by Sklearn are as follows –

• Supervised Learning algorithms − Almost all the popular supervised


learning algorithms, like Linear Regression, Support Vector Machine (SVM),
Decision Tree etc., are the part of scikit-learn.

• Unsupervised Learning algorithms − On the other hand, it also has all


the popular unsupervised learning algorithms from clustering, factor analysis,
PCA (Principal Component Analysis) to unsupervised neural networks.

• Clustering − This model is used for grouping unlabeled data.

• Cross Validation − It is used to check the accuracy of supervised models on


unseen data.

• Dimensionality Reduction − It is used for reducing the number of


attributes in data which can be further used for summarization, visualization,
and feature selection.

CH. ANIL KUMAR 22


Machine Learning

Installation
If you already installed NumPy and Scipy, following are the two easiest ways to
install scikit-learn −

Using pip: Following command can be used to install scikit-learn via pip
pip install -U scikit-learn

Using conda: Following command can be used to install scikit-learn via conda
conda install scikit-learn

2.9. Exercises:
1. Which of the following is an example of a feature transformation technique?
a) Principal Component Analysis (PCA) [ ]
b) Min-Max Scaling
c) One-Hot Encoding
d) Removing outliers

2. What is the main goal of feature scaling? [ ]


a) To reduce the number of features
b) To transform categorical variables into numerical ones
c) To bring all features to a comparable range
d) To select the most important features from the dataset

3. Which of the following is NOT a dimensionality reduction technique? [ ]


a) Principal Component Analysis (PCA)
b) Linear Discriminant Analysis (LDA)
c) Decision Trees
d) Singular Value Decomposition (SVD)

4. Feature subset selection is mainly used for which of the following purposes?
[ ]
a) Reducing the size of the feature set by removing irrelevant or
redundant features
b) Transforming categorical features into numerical features
c) Balancing the dataset with respect to class distribution
d) Creating new features based on existing ones

5. In scikit-learn, which class would you use to standardize features by removing


the mean and scaling to unit variance? [ ]
a) MinMaxScaler
b) Normalizer
c) LabelEncoder
d) StandardScaler

6. Which of the following is a key component of Exploratory Data Analysis


(EDA)? [ ]
a) Model training
b) Creating visualizations to understand patterns in data
c) Performing hyperparameter tuning
d) Feature engineering

CH. ANIL KUMAR 23


Machine Learning

7. When performing hyperparameter tuning, which method tries random


combinations of parameters rather than testing every possible combination?
a) Grid Search [ ]
b) Bayesian Optimization
c) Random Search
d) Cross-Validation

8. In feature construction, which of the following refers to creating new features


by combining or transforming existing ones? [ ]
a) One-hot encoding
b) Feature engineering
c) Dimensionality reduction
d) Feature scaling

9. Dimensionality reduction techniques help reduce overfitting by decreasing the


number of features used in the model.
True / False

10. Hyperparameter tuning is only necessary for models with high computational
complexity.
True / False

Answer Key:
1 C 2 C 3 C 4 A 5 D 6 B
7 C 8 B 9 TRUE 10 FALSE

2.10. Review Questions:


1. What is the purpose of feature scaling in machine learning?
2. Name two techniques used for dimensionality reduction in machine learning.
3. How does dimensionality reduction help improve the performance of machine
learning models?
4. How does exploratory data analysis (EDA) contribute to building a robust
machine learning model?
5. Analyse how hyperparameter tuning and feature subset selection can work
together to optimize a machine learning model.
6. Analyse the role of feature subset selection in preventing overfitting and
improving model generalization.

CH. ANIL KUMAR 24

You might also like