Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
23 views

Unit 1 Machine Learning

Uploaded by

Aviral Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Unit 1 Machine Learning

Uploaded by

Aviral Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

MACHINE LEARNING

UNIT 1

What is Machine Learning?


Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden
patterns within datasets, allowing them to make predictions on new, similar data without
explicit programming for each task. Traditional machine learning combines data with statistical
tools to predict outputs, yielding actionable insights. This technology finds applications in
diverse fields such as image and speech recognition, natural language processing,
recommendation systems, fraud detection, portfolio optimization, and automating tasks.
For instance, recommender systems use historical data to personalize suggestions. Netflix, for
example, employs collaborative and content-based filtering to recommend movies and TV
shows based on user viewing history, ratings, and genre preferences. Reinforcement learning
further enhances these systems by enabling agents to make decisions based on environmental
feedback, continually refining recommendations.
Machine learning’s impact extends to autonomous vehicles, drones, and robots, enhancing their
adaptability in dynamic environments. This approach marks a breakthrough where machines
learn from data examples to generate accurate outcomes, closely intertwined with data mining
and data science.

How machine learning algorithms work


A machine learning algorithm works by learning patterns and relationships from data
to make predictions or decisions without being explicitly programmed for each task.
Here’s a simplified overview of how a typical machine learning algorithm works:

1. Data Collection:
First, relevant data is collected or curated. This data could include examples, features, or
attributes that are important for the task at hand, such as images, text, numerical data, etc.

2. Data Preprocessing:
Before feeding the data into the algorithm, it often needs to be pre-processed. This step may
involve cleaning the data (handling missing values, outliers), transforming the data
(normalization, scaling), and splitting it into training and test sets.

3. Choosing a Model:
Depending on the task (e.g., classification, regression, clustering), a suitable machine
learning model is chosen. Examples include decision trees, neural networks, support vector
machines, and more advanced models like deep learning architectures.

4. Training the Model:


The selected model is trained using the training data. During training, the algorithm learns
patterns and relationships in the data. This involves adjusting model parameters iteratively to
minimize the difference between predicted outputs and actual outputs (labels or targets) in the
training data.

5. Evaluating the Model:


Once trained, the model is evaluated using the test data to assess its performance. Metrics
such as accuracy, precision, recall, or mean squared error are used to evaluate how well the
model generalizes to new, unseen data.

6. Fine-tuning:
Models may be fine-tuned by adjusting hyperparameters (parameters that are not directly
learned during training, like learning rate or number of hidden layers in a neural network) to
improve performance.

7. Prediction or Inference:
Finally, the trained model is used to make predictions or decisions on new data. This process
involves applying the learned patterns to new inputs to generate outputs, such as class labels
in classification tasks or numerical values in regression tasks.

Machine Learning lifecycle:


The lifecycle of a machine learning project involves a series of steps that include:

1. Study the Problems:


The first step is to study the problem. This step involves understanding the business problem
and defining the objectives of the model.

2. Data Collection:
When the problem is well-defined, we can collect the relevant data required for the model.
The data could come from various sources such as databases, APIs, or web scraping.

3. Data Preparation:
When our problem-related data is collected. then it is a good idea to check the data properly
and make it in the desired format so that it can be used by the model to find the hidden
patterns.

This can be done in the following steps:


● Data cleaning
● Data Transformation
● Explanatory Data Analysis and Feature Engineering
● Split the dataset for training and testing.

4. Model Selection:
The next step is to select the appropriate machine learning algorithm that is suitable for our
problem. This step requires knowledge of the strengths and weaknesses of different
algorithms. Sometimes we use multiple models and compare their results and select the best
model as per our requirements.

5. Model building and Training:


● After selecting the algorithm, we have to build the model.
● In the case of traditional machine learning building mode is easy it is just a few
hyperparameter tunings.
● In the case of deep learning, we have to define layer-wise architecture along with
input and output size, number of nodes in each layer, loss function, gradient
descent optimizer, etc.
● After that model is trained using the preprocessed dataset.

6. Model Evaluation:
Once the model is trained, it can be evaluated on the test dataset to determine its accuracy and
performance using different techniques. like classification report, F1 score, precision, recall,
ROC Curve, Mean Square error, absolute error, etc.

7. Model Tuning:
Based on the evaluation results, the model may need to be tuned or optimized to improve its
performance. This involves tweaking the hyperparameters of the model.

8. Deployment:
Once the model is trained and tuned, it can be deployed in a production environment to make
predictions on new data. This step requires integrating the model into an existing software
system or creating a new system for the model.

9. Monitoring and Maintenance:


Finally, it is essential to monitor the model’s performance in the production environment and
perform maintenance tasks as required. This involves monitoring for data drift, retraining the
model as needed, and updating the model as new data becomes available.

Major Challenges in Machine Learning


1. Poor Quality of Data
Data plays a significant role in the machine learning process. Unclean and noisy data can
make the whole process extremely exhausting. Hence the quality of data is essential to
enhance the output. Therefore, we need to ensure that the process of data preprocessing
which includes removing outliers, filtering missing values, and removing unwanted features,
is done with the utmost level of perfection.

2. Underfitting of Training Data


This process occurs when data is unable to establish an accurate relationship between input
and output variables. It signifies the data is too simple to establish a precise relationship. To
overcome this issue:
● Maximize the training time
● Enhance the complexity of the model
● Add more features to the data
● Reduce regular parameters
● Increasing the training time of model

3. Overfitting of Training Data


Overfitting refers to a machine learning model trained with a massive amount of data that
negatively affect its performance. This means that the algorithm is trained with noisy and
biased data, which will affect its overall performance. Let’s consider a model trained to
differentiate between a cat, a rabbit, a dog, and a tiger. The training data contains 1000 cats,
1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will
identify the cat as a rabbit. In this example, we had a vast amount of data, but it was biased;
hence the prediction was negatively affected.
We can tackle this issue by:
● Analyzing the data with the utmost level of perfection
● Use data augmentation technique
● Remove outliers in the training set

4. Machine Learning is a Complex Process


The machine learning industry is young and is continuously changing. Rapid hit and trial
experiments are being carried on. The process is transforming, and hence there are high chances
of error which makes the learning complex. It includes analyzing the data, removing data bias,
training data, applying complex mathematical calculations, and a lot more. Hence it is a really
complicated process which is another big challenge for Machine learning professionals.

5. Lack of Training Data


The most important task you need to do in the machine learning process is to train the data to
achieve an accurate output. Less amount training data will produce inaccurate or too biased
predictions. Consider a machine learning algorithm similar to training a child. One day you
decided to explain to a child how to distinguish between an apple and a watermelon. You will
take an apple and a watermelon and show him the difference between both based on their color,
shape, and taste. In this way, soon, he will attain perfection in differentiating between the two.
But on the other hand, a machine-learning algorithm needs a lot of data to distinguish. For
complex problems, it may even require millions of data to be trained. Therefore we need to
ensure that Machine learning algorithms are trained with sufficient amounts of data.

6. Slow Implementation
The machine learning models are highly efficient in providing accurate results, but it takes a
tremendous amount of time. Slow programs, data overload, and excessive requirements
usually take a lot of time to provide accurate results. Further, it requires constant monitoring
and maintenance to deliver the best output.

Types of Machine Learning


● Supervised Machine Learning
● Unsupervised Machine Learning
● Reinforcement Machine Learning

1. Supervised Machine Learning:


Supervised learning is a type of machine learning in which the algorithm is trained on the
labeled dataset. It learns to map input features to targets based on labeled training data. In
supervised learning, the algorithm is provided with input features and corresponding output
labels, and it learns to generalize from this data to make predictions on new, unseen data.

There are two main types of supervised learning:


● Regression: Regression is a type of supervised learning where the algorithm learns
to predict continuous values based on input features. The output labels in regression
are continuous values, such as stock prices, and housing prices. The different
regression algorithms in machine learning are: Linear Regression, Polynomial
Regression, Ridge Regression, Decision Tree Regression, Random Forest
Regression, Support Vector Regression, etc

● Classification: Classification is a type of supervised learning where the algorithm


learns to assign input data to a specific category or class based on input features.
The output labels in classification are discrete values. Classification algorithms can
be binary, where the output is one of two possible classes, or multiclass, where the
output can be one of several classes. The different Classification algorithms in
machine learning are: Logistic Regression, Naive Bayes, Decision Tree, Support
Vector Machine (SVM), K-Nearest Neighbors (KNN), etc

2. Unsupervised Machine Learning:


Unsupervised learning is a type of machine learning where the algorithm learns to recognize
patterns in data without being explicitly trained using labeled examples. The goal of
unsupervised learning is to discover the underlying structure or distribution in the data.
There are two main types of unsupervised learning:

● Clustering: Clustering algorithms group similar data points together based on their
characteristics. The goal is to identify groups, or clusters, of data points that are
similar to each other, while being distinct from other groups. Some popular
clustering algorithms include K-means, Hierarchical clustering, and DBSCAN.
● Dimensionality reduction: Dimensionality reduction algorithms reduce the
number of input variables in a dataset while preserving as much of the original
information as possible. This is useful for reducing the complexity of a dataset and
making it easier to visualize and analyze. Some popular dimensionality reduction
algorithms include Principal Component Analysis (PCA), t-SNE, and
Autoencoders.

3. Reinforcement Machine Learning


Reinforcement learning is a type of machine learning where an agent learns to interact with an
environment by performing actions and receiving rewards or penalties based on its actions. The
goal of reinforcement learning is to learn a policy, which is a mapping from states to actions,
that maximizes the expected cumulative reward over time.

There are two main types of reinforcement learning:


● Model-based reinforcement learning: In model-based reinforcement learning, the
agent learns a model of the environment, including the transition probabilities
between states and the rewards associated with each state-action pair. The agent
then uses this model to plan its actions in order to maximize its expected reward.
Some popular model-based reinforcement learning algorithms include Value
Iteration and Policy Iteration.

● Model-free reinforcement learning: In model-free reinforcement learning, the


agent learns a policy directly from experience without explicitly building a model
of the environment. The agent interacts with the environment and updates its policy
based on the rewards it receives. Some popular model-free reinforcement learning
algorithms include Q-Learning, SARSA, and Deep Reinforcement Learning.

Need for machine learning:


Machine learning is important because it allows computers to learn from data and improve their
performance on specific tasks without being explicitly programmed. This ability to learn from
data and adapt to new situations makes machine learning particularly useful for tasks that
involve large amounts of data, complex decision-making, and dynamic environments.

Here are some specific areas where machine learning is being used:

● Predictive modeling: Machine learning can be used to build predictive models that
can help businesses make better decisions. For example, machine learning can be
used to predict which customers are most likely to buy a particular product, or which
patients are most likely to develop a certain disease.

● Natural language processing: Machine learning is used to build systems that can
understand and interpret human language. This is important for applications such
as voice recognition, chatbots, and language translation.

● Computer vision: Machine learning is used to build systems that can recognize and
interpret images and videos. This is important for applications such as self-driving
cars, surveillance systems, and medical imaging.
● Fraud detection: Machine learning can be used to detect fraudulent behavior in
financial transactions, online advertising, and other areas.

● Recommendation systems: Machine learning can be used to build


recommendation systems that suggest products, services, or content to users based
on their past behavior and preferences.

Various Applications of Machine Learning


● Automation: Machine learning, which works entirely autonomously in any field
without the need for any human intervention. For example, robots perform the
essential process steps in manufacturing plants.
● Finance Industry: Machine learning is growing in popularity in the finance
industry. Banks are mainly using ML to find patterns inside the data but also to
prevent fraud.
● Government organization: The government makes use of ML to manage public
safety and utilities. Take the example of China with its massive face recognition.
The government uses Artificial intelligence to prevent jaywalking.
● Healthcare industry: Healthcare was one of the first industries to use machine
learning with image detection.
● Marketing: Broad use of AI is done in marketing thanks to abundant access to data.
Before the age of mass data, researchers develop advanced mathematical tools like
Bayesian analysis to estimate the value of a customer. With the boom of data, the
marketing department relies on AI to optimize customer relationships and
marketing campaigns.
● Retail industry: Machine learning is used in the retail industry to analyze customer
behavior, predict demand, and manage inventory. It also helps retailers to
personalize the shopping experience for each customer by recommending products
based on their past purchases and preferences.
● Transportation: Machine learning is used in the transportation industry to optimize
routes, reduce fuel consumption, and improve the overall efficiency of
transportation systems. It also plays a role in autonomous vehicles, where ML
algorithms are used to make decisions about navigation and safety.

Overall, machine learning has become an essential tool for many businesses and industries, as
it enables them to make better use of data, improve their decision-making processes, and deliver
more personalized experiences to their customers.

Linear Algebra Operations for Machine Learning


Linear algebra is the backbone of many machine learning algorithms and techniques. At its
core, linear algebra provides a framework for handling and manipulating data, which is often
represented as vectors and matrices. These mathematical constructs enable efficient
computation and provide insights into the underlying patterns and structures within the data.
In machine learning, linear algebra operations are used extensively in various stages, from data
preprocessing to model training and evaluation. For instance, operations such as matrix
multiplication, eigenvalue decomposition, and singular value decomposition are pivotal in
dimensionality reduction techniques like Principal Component Analysis (PCA). Similarly, the
concepts of vector spaces and linear transformations are integral to understanding neural
networks and optimization algorithms.

Basics of Linear Algebra


Linear algebra serves as the backbone of machine learning, providing the mathematical
foundation for understanding and implementing various algorithms.

A. Definition of Linear Algebra


Linear algebra is the branch of mathematics that deals with vector spaces and linear mappings
between these spaces. It encompasses the study of vectors, matrices, linear equations, and their
properties.
B. Fundamental Concepts
1. Vectors
● Vectors are quantities that have both magnitude and direction, often represented as
arrows in space.

2. Matrices
● Matrices are rectangular arrays of numbers, arranged in rows and columns.
● Matrices are used to represent linear transformations, systems of linear equations,
and data transformations in machine learning.

3. Scalars
● Scalars are single numerical values, without direction, magnitude only.
● Scalars are used to scale vectors or matrices through operations like
multiplication.
C. Operations in Linear Algebra
1. Addition and Subtraction
● Addition and subtraction of vectors or matrices involve adding or subtracting
corresponding elements.

2. Scalar Multiplication
● Scalar multiplication involves multiplying each element of a vector or
matrix by a scalar.

3. Dot Product (Vector Multiplication)


● The dot product of two vectors measures the similarity of their directions.
● It is computed by multiplying corresponding elements of two vectors and
summing the results.
4. Cross Product (Vector Multiplication)
● The cross product of two vectors in three-dimensional space produces a vector
orthogonal to the plane containing the original vectors.
● It is used less frequently in machine learning compared to the dot product.

Linear Transformations
Linear transformations are fundamental operations in linear algebra that involve the
transformation of vectors and matrices while preserving certain properties such as linearity and
proportionality. In the context of machine learning, linear transformations play a crucial role
in data preprocessing, feature engineering, and model training. In this section, we explore the
definition, types, and applications of linear transformations.

A. Definition and Explanation


Linear transformations are functions that map vectors from one vector space to another in a
linear manner. Formally, a transformation TTT is considered linear if it satisfies two properties:
1. Additivity: T(u+v)=T(u)+T(v) for all vectors u and v.
2. Homogeneity: T(kv)=kT(v) for all vectors v and scalars k.
Linear transformations can be represented by matrices, and their properties are closely related
to the properties of matrices.

B. Common Linear Transformations in Machine Learning


1. Translation:
● Translation involves shifting the position of vectors without changing
their orientation or magnitude.
● In machine learning, translation is commonly used for data
normalization and centering, where the mean of the data is subtracted
from each data point.
2. Scaling:
● Scaling involves stretching or compressing vectors along each
dimension.
● Scaling is frequently applied in feature scaling, where features are
scaled to have similar ranges to prevent dominance of certain features
in machine learning models.
3. Rotation:
● Rotation involves rotating vectors around an axis or point in space.
● While less common in basic machine learning algorithms, rotation can
be useful in advanced applications such as computer vision and
robotics.
Matrix Operations
Matrix operations form the cornerstone of linear algebra, providing essential tools for
manipulating and analyzing data in machine learning. In this section, we explore key matrix
operations, including multiplication, transpose, inverse, and determinant, along with their
significance and applications.

A. Matrix Multiplication

Matrix multiplication is a fundamental operation in linear algebra, involving the multiplication


of two matrices to produce a new matrix. The resulting matrix’s dimensions are determined by
the number of rows in the first matrix and the number of columns in the second matrix.
● Definition: Given two matrices A and B, the product matrix C=A⋅B is computed
by taking the dot product of each row of matrix A with each column of matrix B.
● Significance: Matrix multiplication is widely used in machine learning for various
tasks, including transformation of feature vectors, computation of model
parameters, and neural network operations such as feedforward and
backpropagation.

B. Transpose and Inverse of Matrices


1. Transpose:
● The transpose of a matrix involves flipping its rows and columns,
resulting in a new matrix where the rows become columns and vice
versa.
● It is denoted by AT, and its dimensions are the reverse of the original
matrix.
● Transpose is used in applications such as solving systems of linear
equations, computing matrix derivatives, and performing matrix
factorization.
2. Inverse:
● The inverse of a square matrix A is another matrix denoted by A−1
such that A⋅A−1=A−1⋅A= I ,where I is the identity matrix.
● Not all matrices have inverses, and square matrices with a determinant
not equal to zero are invertible.
● Inverse matrices are used in solving systems of linear equations,
computing solutions to optimization problems, and performing
transformations.

C. Determinants

The determinant of a square matrix is a scalar value that encodes various properties of the
matrix, such as its volume, orientation, and invertibility.
● Significance: The determinant is used to determine whether a matrix is invertible,
calculate the volume of parallelepiped spanned by vectors, and analyze the
stability of numerical algorithms.
● Properties: The determinant satisfies several properties, including linearity,
multiplicativity, and the property that a matrix is invertible if and only if its
determinant is non-zero.

Eigenvalues and Eigenvectors


Eigenvalues and eigenvectors are fundamental concepts in linear algebra that play a significant
role in machine learning algorithms and applications. In this section, we explore the definition,
significance, and applications of eigenvalues and eigenvectors.

A. Definition and Significance


1. Eigenvalues:
● Eigenvalues of a square matrix AAA are scalar values that represent
how a transformation represented by AAA stretches or compresses
vectors in certain directions.
● Eigenvalues quantify the scale of transformation along the
corresponding eigenvectors and are crucial for understanding the
behavior of linear transformations.

2. Eigenvectors:
● Eigenvectors are non-zero vectors that are transformed by a matrix
only by a scalar factor, known as the eigenvalue.
● They represent the directions in which a linear transformation
represented by a matrix stretches or compresses space.
● Eigenvectors corresponding to distinct eigenvalues are linearly
independent and form a basis for the vector space.

B. Applications in Machine Learning


1. Dimensionality Reduction:
● Techniques such as Principal Component Analysis (PCA) utilize
eigenvalues and eigenvectors to identify the principal components
(directions of maximum variance) in high-dimensional data and project
it onto a lower-dimensional subspace.
● Eigenvalues represent the amount of variance explained by each
principal component, allowing for effective dimensionality reduction
while preserving as much information as possible.
2. Graph-based Algorithms:
● Eigenvalues and eigenvectors play a crucial role in graph-based
algorithms such as spectral clustering and PageRank.
● In spectral clustering, eigenvalues and eigenvectors of the graph
Laplacian matrix are used to partition data into clusters based on
spectral properties.
3. Matrix Factorization:
● Techniques like Singular Value Decomposition (SVD) and Non-
negative Matrix Factorization (NMF) rely on eigenvalue
decomposition to factorize matrices into lower-dimensional
representations.
● Eigenvalue decomposition facilitates the extraction of meaningful
features or components from high-dimensional data matrices, enabling
efficient data representation and analysis.

Applications of Linear Algebra in Machine Learning


Linear algebra serves as the backbone of many machine learning algorithms, providing
powerful tools for data manipulation, model representation, and optimization. In this section,
we explore some of the key applications of linear algebra in machine learning, including
principal component analysis (PCA), singular value decomposition (SVD), linear regression,
support vector machines (SVM), and neural networks.

A. Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction technique that utilizes


linear algebra to identify the principal components in high-dimensional data. The main steps
of PCA involve:
1. Covariance Matrix Calculation:
● Compute the covariance matrix of the data to understand the
relationships between different features.
2. Eigenvalue Decomposition:
● Decompose the covariance matrix into its eigenvalues and eigenvectors
to identify the principal components.
3. Projection onto Principal Components:
● Project the original data onto the principal components to reduce the
dimensionality while preserving the maximum variance.

B. Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD) is a matrix factorization technique widely used in


machine learning for dimensionality reduction, data compression, and noise reduction. The key
steps of SVD include:
1. Decomposition:
● Decompose the original matrix into the product of three matrices:
● A=UΣVT
● A=UΣV
● T
● where U and V are orthogonal matrices, and
● σ
● σ is a diagonal matrix of singular values.
2. Dimensionality Reduction:
● Retain only the most significant singular values and their
corresponding columns of U and V to reduce the dimensionality of the
data.

C. Linear Regression

Linear regression is a supervised learning algorithm used for modeling the relationship between
a dependent variable and one or more independent variables. Linear algebra plays a crucial role
in solving the linear regression problem efficiently through techniques such as:
1. Matrix Formulation:

● Representing the linear regression problem in matrix form


● Y=Xβ+ϵ
● Y=Xβ+ϵ where Y is the dependent variable, X is the matrix of
independent variables,
● β
● β is the vector of coefficients, and ϵ\epsilonϵ is the error term.
2. Normal Equation:

D. Support Vector Machines (SVM)


Support Vector Machines (SVM) are powerful supervised learning models used for
classification and regression tasks. Linear algebra plays a crucial role in SVMs through:
1. Kernel Trick:
● Utilizing linear algebraic operations efficiently in kernelized SVMs to
map data into higher-dimensional feature spaces for nonlinear
classification.
2. Optimization:
● Formulating the SVM optimization problem as a quadratic
programming problem and solving it efficiently using linear algebraic
techniques such as convex optimization and quadratic programming
solvers.

E. Neural Networks

Neural networks, especially deep learning models, heavily rely on linear algebra for model
representation, parameter optimization, and forward/backward propagation. Key linear
algebraic operations in neural networks include:
1. Matrix Multiplication:
● Performing matrix multiplication operations between input features
and weight matrices in different layers of the neural network during the
forward pass.
2. Gradient Descent:
● Computing gradients efficiently using backpropagation and updating
network parameters using gradient descent optimization algorithms,
which involve various linear algebraic operations.
3. Weight Initialization:
● Initializing network weights using techniques such as Xavier
initialization and He initialization, which rely on linear algebraic
properties for proper scaling of weight matrices.

What is a Dataset?
A Dataset is a set of data grouped into a collection with which developers can work to meet
their goals. In a dataset, the rows represent the number of data points and the columns represent
the features of the Dataset. They are mostly used in fields like machine learning, business, and
government to gain insights, make informed decisions, or train algorithms. Datasets may vary
in size and complexity and they mostly require cleaning and preprocessing to ensure data
quality and suitability for analysis or modeling.

Datasets can be stored in multiple formats. The most common ones are CSV, Excel, JSON,
and zip files for large datasets such as image datasets.

Why are datasets used?


Datasets are used to train and test AI models, analyze trends, and gain insights from data.
They provide the raw material for computers to learn patterns and make predictions.

Types of Datasets
There are various types of datasets available out there. They are:
● Numerical Dataset: They include numerical data points that can be solved with
equations. These include temperature, humidity, marks and so on.
● Categorical Dataset: These include categories such as colour, gender, occupation,
games, sports and so on.
● Web Dataset: These include datasets created by calling APIs using HTTP requests
and populating them with values for data analysis. These are mostly stored in JSON
(JavaScript Object Notation) formats.
● Time series Dataset: These include datasets between a period, for example,
changes in geographical terrain over time.
● Image Dataset: It includes a dataset consisting of images. This is mostly used to
differentiate the types of diseases, heart conditions and so on.
● Ordered Dataset: These datasets contain data that are ordered in ranks, for
example, customer reviews, movie ratings and so on.
● Partitioned Dataset: These datasets have data points segregated into different
members or different partitions.
● File-Based Datasets: These datasets are stored in files, in Excel as .csv, or .xlsx
files.
● Bivariate Dataset: In this dataset, 2 classes or features are directly correlated to
each other. For example, height and weight in a dataset are directly related to each
other.
● Multivariate Dataset: In these types of datasets, as the name suggests 2 or more
classes are directly correlated to each other. For example, attendance, and
assignment grades are directly correlated to a student’s overall grade.

Data Preprocessing
Pre-processing refers to the transformations applied to our data before feeding it to the
algorithm. Data preprocessing is a technique that is used to convert the raw data into a clean
data set. In other words, whenever the data is gathered from different sources it is collected in
raw format which is not feasible for the analysis.
Data Preprocessing
Need of Data Preprocessing
● For achieving better results from the applied model in Machine Learning projects
the format of the data has to be in a proper manner. Some specified Machine
Learning model needs information in a specified format, for example, Random
Forest algorithm does not support null values, therefore to execute random forest
algorithm null values have to be managed from the original raw data set.
● Another aspect is that the data set should be formatted in such a way that more
than one Machine Learning and Deep Learning algorithm are executed in one data
set, and best out of them is chosen.

Bias and Variance in Machine Learning


Bias and Variance help us in parameter tuning and deciding better-fitted models among several
built.
Bias is one type of error that occurs due to wrong assumptions about data such as assuming
data is linear when in reality, data follows a complex function. On the other hand, variance gets
introduced with high sensitivity to variations in training data. This also is one type of error
since we want to make our model robust against noise. There are two types of error in machine
learning. Reducible error and Irreducible error. Bias and Variance come under reducible error.

What is Bias?
Bias is simply defined as the inability of the model because of that there is some difference or
error occurring between the model’s predicted value and the actual value. These differences
between actual or expected values and the predicted values are known as error or bias error or
error due to bias. Bias is a systematic error that occurs due to wrong assumptions in the machine
learning process.

● Low Bias: Low bias value means fewer assumptions are taken to build the target
function. In this case, the model will closely match the training dataset.
● High Bias: High bias value means more assumptions are taken to build the target
function. In this case, the model will not match the training dataset closely.
The high-bias model will not be able to capture the dataset trend. It is considered as the
underfitting model which has a high error rate. It is due to a very simplified algorithm.
For example, a linear regression model may have a high bias if the data has a non-linear
relationship.

Ways to reduce high bias in Machine Learning:


● Use a more complex model: One of the main reasons for high bias is the very
simplified model. it will not be able to capture the complexity of the data. In such
cases, we can make our mode more complex by increasing the number of hidden
layers in the case of a deep neural network. Or we can use a more complex model
like Polynomial regression for non-linear datasets, CNN for image processing, and
RNN for sequence learning.
● Increase the number of features: By adding more features to train the dataset
will increase the complexity of the model. And improve its ability to capture the
underlying patterns in the data.
● Reduce Regularization of the model: Regularization techniques such as L1 or L2
regularization can help to prevent overfitting and improve the generalization ability
of the model. if the model has a high bias, reducing the strength of regularization or
removing it altogether can help to improve its performance.
● Increase the size of the training data: Increasing the size of the training data can
help to reduce bias by providing the model with more examples to learn from the
dataset.

What is Variance?
Variance is the measure of spread in data from its mean position. In machine learning variance
is the amount by which the performance of a predictive model changes when it is trained on
different subsets of the training data. More specifically, variance is the variability of the model
that how much it is sensitive to another subset of the training dataset. i.e. how much it can
adjust on the new subset of the training dataset.
● Low variance: Low variance means that the model is less sensitive to changes in
the training data and can produce consistent estimates of the target function with
different subsets of data from the same distribution. This is the case of underfitting
when the model fails to generalize on both training and test data.
● High variance: High variance means that the model is very sensitive to changes
in the training data and can result in significant changes in the estimate of the
target function when trained on different subsets of data from the same
distribution. This is the case of overfitting when the model performs well on the
training data but poorly on new, unseen test data. It fits the training data too
closely that it fails on the new training dataset.

Ways to Reduce the reduce Variance in Machine Learning:


● Cross-validation: By splitting the data into training and testing sets multiple
times, cross-validation can help identify if a model is overfitting or underfitting
and can be used to tune hyperparameters to reduce variance.
● Feature selection: By choosing the only relevant feature will decrease the
model’s complexity. and it can reduce the variance error.
● Regularization: We can use L1 or L2 regularization to reduce variance in
machine learning models
● Ensemble methods: It will combine multiple models to improve generalization
performance. Bagging, boosting, and stacking are common ensemble methods that
can help reduce variance and improve generalization performance.
● Simplifying the model: Reducing the complexity of the model, such as
decreasing the number of parameters or layers in a neural network, can also help
reduce variance and improve generalization performance.
● Early stopping: Early stopping is a technique used to prevent overfitting by
stopping the training of the deep learning model when the performance on the
validation set stops improving.

Bias Variance Tradeoff


If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias
and low variance condition and thus is error-prone. If algorithms fit too complex (hypothesis
with high degree equation) then it may be on high variance and low bias. In the latter
condition, the new entries will not perform well. Well, there is something between both of
these conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in
complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more
complex and less complex at the same time. For the graph, the perfect tradeoff will be like
this.
Bias-Variance Tradeoff

Function Approximation
Function approximation is a critical concept in reinforcement learning (RL), enabling
algorithms to generalize from limited experience to a broader set of states and actions. This
capability is essential when dealing with complex environments where the state and action
spaces are vast or continuous.

Significance of Function Approximation


In reinforcement learning, the agent’s goal is to learn a policy that maximizes cumulative
reward over time. This involves estimating value functions, which predict future rewards, or
directly approximating the policy, which maps states to actions. In many practical problems,
the state or action spaces are too large to allow for an exact representation of value functions
or policies. Function approximation addresses this issue by enabling the use of parameterized
functions to represent these components compactly.

1. Handling Complexity: In many real-world problems, the state and action spaces
are too vast to enumerate or store explicitly. Function approximation allows RL
algorithms to represent value functions or policies compactly using parameterized
functions.
2. Generalization: Function approximation enables RL agents to generalize from
limited experience to unseen states and actions. This is crucial for robust
performance in environments where exhaustive exploration is impractical.
3. Efficiency: By approximating value functions or policies, RL algorithms can
operate efficiently even in high-dimensional spaces. This efficiency is essential
for scaling RL to complex tasks such as robotic control or game playing.

Key Concepts in Function Approximation for Reinforcement Learning


1. Features: These are characteristics extracted from the agent’s state that represent
relevant information for making decisions. Choosing informative features is
crucial for accurate value estimation.
2. Learning Algorithm: This algorithm updates the parameters of the chosen
function to minimise the difference between the estimated value and the actual
value experienced by the agent (temporal-difference learning). Common
algorithms include linear regression, gradient descent variants, or policy gradient
methods depending on the function class.
3. Function Class: This refers to the type of function used for approximation.
Common choices include linear functions, neural networks, decision trees, or a
combination of these. The complexity of the function class should be balanced
with the available data and computational resources.

Applications of Function Approximation in Reinforcement Learning


1. Robotics Control: Imagine a robot arm learning to manipulate objects. The state
space could include the positions, the object’s location, orientation and sensor
readings like gripper force.
2. Playing Atari Games: The state space is vast, when we are dealing with complex
environments like Atari games. Function approximation using deep neural
networks becomes essential to capture the intricate relationships between the
visual inputs and the optimal actions.
3. Stock Market Trading: An RL agent learns to buy and sell stocks to maximize
profit. The state space could involve various financial indicators like stock prices,
moving averages, and market sentiment.

Benefits of Function Approximation


● Generalization: Agents can make good decisions even in unseen states based on
what they have learned from similar states.
● Scalability: Function approximation allows agents to handle problems with large
or continuous state spaces.
● Sample Efficiency: By learning patterns from a smaller set of experiences, agents
can make better decisions with less data.

Challenges in Function Approximation


1. Bias-Variance Trade-off: Choosing the right complexity for the function
approximator is crucial. Too simple a model introduces high bias, while too
complex a model leads to high variance. Balancing this trade-off is essential for
stable and efficient learning.
2. Exploration vs. Exploitation: Function approximators must generalize well from
limited exploration data. Ensuring sufficient exploration to prevent overfitting to
the initial experiences is a major challenge.
3. Stability and Convergence: Particularly with non-linear approximators like
neural networks, ensuring stability and convergence during training is difficult.
Techniques like experience replay and target networks in DQNs have been
developed to mitigate these issues.
4. Sample Efficiency: Function approximation methods need to be sample efficient,
especially in environments where obtaining samples is costly or time-consuming.
Methods like transfer learning and meta-learning are being explored to enhance
sample efficiency.

Overfitting in Machine Learning


In the real world, the dataset present will never be clean and perfect. It means each dataset
contains impurities, noisy data, outliers, missing data, or imbalanced data. Due to these
impurities, different problems occur that affect the accuracy and the performance of the model.
One of such problems is Overfitting in Machine Learning. Overfitting is a problem that a model
can exhibit.
A statistical model is said to be overfitted if it can’t generalize well with unseen data.
Before understanding overfitting, we need to know some basic terms, which are:
Noise: Noise is meaningless or irrelevant data present in the dataset. It affects the performance
of the model if it is not removed.
Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the actual
values.
Variance: If the machine learning model performs well with the training dataset, but does not
perform well with the test dataset, then variance occurs.
Generalization: It shows how well a model is trained to predict unseen data.

○ Overfitting & underfitting are the two main errors/problems in the machine learning
model, which cause poor performance in Machine Learning.
○ Overfitting occurs when the model fits more data than required, and it tries to capture
each and every datapoint fed to it. Hence it starts capturing noise and inaccurate data
from the dataset, which degrades the performance of the model.
○ An overfitted model doesn't perform accurately with the test/unseen dataset and can’t
generalize well.
○ An overfitted model is said to have low bias and high variance.

How to detect Overfitting?


Overfitting in the model can only be detected once you test the data. To detect the issue, we
can perform Train/test split.
In the train-test split of the dataset, we can divide our dataset into random test and training
datasets. We train the model with a training dataset which is about 80% of the total dataset.
After training the model, we test it with the test dataset, which is 20 % of the total dataset.

Now, if the model performs well with the training dataset but not with the test dataset, then it
is likely to have an overfitting issue.
For example, if the model shows 85% accuracy with training data and 50% accuracy with the
test dataset, it means the model is not performing well.

Ways to prevent the Overfitting


Although overfitting is an error in Machine learning which reduces the performance of the
model, however, we can prevent it in several ways. With the use of the linear model, we can
avoid overfitting; however, many real-world problems are non-linear ones. It is important to
prevent overfitting from the models. Below are several ways that can be used to prevent
overfitting:
1. Early Stopping
2. Train with more data
3. Feature Selection
4. Cross-Validation
5. Data Augmentation
6. Regularization

Early Stopping

In this technique, the training is paused before the model starts learning the noise within the
model. In this process, while training the model iteratively, measure the performance of the
model after each iteration. Continue up to a certain number of iterations until a new iteration
improves the performance of the model.
After that point, the model begins to overfit the training data; hence we need to stop the process
before the learner passes that point.
Stopping the training process before the model starts capturing noise from the data is known
as early stopping.

However, this technique may lead to the underfitting problem if training is paused too early.
So, it is very important to find that "sweet spot" between underfitting and overfitting.

Train with More data

Increasing the training set by including more data can enhance the accuracy of the model, as it
provides more chances to discover the relationship between input and output variables.
It may not always work to prevent overfitting, but this way helps the algorithm to detect the
signal better to minimise the errors.
When a model is fed with more training data, it will be unable to overfit all the samples of data
and forced to generalise well.
But in some cases, the additional data may add more noise to the model; hence we need to be
sure that data is clean and free from in-consistencies before feeding it to the model.

Feature Selection

While building the ML model, we have a number of parameters or features that are used to
predict the outcome. However, sometimes some of these features are redundant or less
important for the prediction, and for this feature selection process is applied. In the feature
selection process, we identify the most important features within training data, and other
features are removed. Further, this process helps to simplify the model and reduces noise from
the data. Some algorithms have the auto-feature selection, and if not, then we can manually
perform this process.

Cross-Validation
Cross-validation is one of the powerful techniques to prevent overfitting.
In the general k-fold cross-validation technique, we divided the dataset into k-equal-sized
subsets of data; these subsets are known as folds.

Data Augmentation

Data Augmentation is a data analysis technique, which is an alternative to adding more data to
prevent overfitting. In this technique, instead of adding more training data, slightly modified
copies of already existing data are added to the dataset.
The data augmentation technique makes it possible to appear data sample slightly different
every time it is processed by the model. Hence each data set appears unique to the model and
prevents overfitting.

Regularization

If overfitting occurs when a model is complex, we can reduce the number of features. However,
overfitting may also occur with a simpler model, more specifically the Linear model, and for
such cases, regularization techniques are much helpful.
Regularization is the most popular technique to prevent overfitting. It is a group of methods
that forces the learning algorithms to make a model simpler. Applying the regularization
technique may slightly increase the bias but slightly reduces the variance. In this technique, we
modify the objective function by adding the penalizing term, which has a higher value with a
more complex model.
The two commonly used regularization techniques are L1 Regularization and L2
Regularization.

Ensemble Methods
In ensemble methods, prediction from different machine learning models is combined to
identify the most popular result.

The most commonly used ensemble methods are Bagging and Boosting.
In bagging, individual data points can be selected more than once. After the collection of
several sample datasets, these models are trained independently, and depending on the type of
task-i.e., regression or classification-the average of those predictions is used to predict a more
accurate result. Moreover, bagging reduces the chances of overfitting in complex models.

In boosting, a large number of weak learners arranged in a sequence are trained in such a way
that each learner in the sequence learns from the mistakes of the learner before it. It combines
all the weak learners to come out with one strong learner. In addition, it improves the predictive
flexibility of simple models.

You might also like