Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Machinelearning Unit-1

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 29

Unit I:

Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning Systems,
Main Challenges of Machine Learning.

Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs in
Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk
Minimization
……………………………………………………………………………………………………………………………
Need For Machine Learning

Ever since the technical revolution, we’ve been generating an immeasurable amount of data. As per
research, we generate around 2.5 quintillion bytes of data every single day! It is estimated that by 2020,
1.7MB of data will be created every second for every person on earth.

With the availability of so much data, it is finally possible to build predictive models that can study and
analyze complex data to find useful insights and deliver more accurate results.

Top Tier companies such as Netflix and Amazon build such Machine Learning models by using tons of data
in order to identify profitable opportunities and avoid unwanted risks.

Here’s a list of reasons why Machine Learning is so important:

 Increase in Data Generation: Due to excessive production of data, we need a method that can be
used to structure, analyze and draw useful insights from data. This is where Machine Learning
comes in. It uses data to solve problems and find solutions to the most complex tasks faced by
organizations.
 Improve Decision Making: By making use of various algorithms, Machine Learning can be used to make
better business decisions. For example, Machine Learning is used to forecast sales, predict downfalls
in the stock market, identify risks and anomalies, etc.

 Uncover patterns & trends in data: Finding hidden patterns and extracting key insights from data is the
most essential part of Machine Learning. By building predictive models and using statistical techniques,
Machine Learning allows you to dig beneath the surface and explore the data at a minute scale.
Understanding data and extracting patterns manually will take days, whereas Machine Learning
algorithms can perform such computations in less than a second.

 Solve complex problems: From detecting the genes linked to the deadly ALS disease to building self-
driving cars, Machine Learning can be used to solve the most complex problems.

To give you a better understanding of how important Machine Learning is, let’s list down a couple of
Machine
Learning Applications:
 Netflix’s Recommendation Engine: The core of Netflix is its infamous recommendation engine.
Over 75% of what you watch is recommended by Netflix and these recommendations are made by
implementing Machine Learning.

 Facebook’s Auto-tagging feature: The logic behind Facebook’s DeepMind face verification system is
Machine Learning and Neural Networks. DeepMind studies the facial features in an image to tag
your friends and family.

 Amazon’s Alexa: The infamous Alexa, which is based on Natural Language Processing and Machine
Learning is an advanced level Virtual Assistant that does more than just play songs on your playlist.
It can book you an Uber, connect with the other IoT devices at home, track your health, etc.

 Google’s Spam Filter: Gmail makes use of Machine Learning to filter out spam messages. It uses Machine
Learning algorithms and Natural Language Processing to analyze emails in real-time and classify
them as either spam or non-spam.

Introduction To Machine Learning

The term Machine Learning was first coined by Arthur Samuel in the year 1959. Looking back, that year
was probably the most significant in terms of technological advancements.

If you browse through the net about ‘what is Machine Learning’, you’ll get at least 100 different definitions.
However, the very first formal definition was given by Tom M. Mitchell:

“A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

In simple terms, Machine learning is a subset of Artificial Intelligence (AI) which provides machines the
ability to learn automatically & improve from experience without being explicitly programmed to do so. In
the sense, it is the practice of getting Machines to solve problems by gaining the ability to think.

But wait, can a machine think or make decisions? Well, if you feed a machine a good amount of data, it will
learn how to interpret, process and analyze this data by using Machine Learning Algorithms, in order to
solve real-world problems.
Machine Learning Definitions

Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used to learn patterns from
data and draw significant information from it. It is the logic behind a Machine Learning model. An example
of a Machine Learning algorithm is the Linear Regression algorithm.

Model: A model is the main component of Machine Learning. A model is trained by using a Machine
Learning Algorithm. An algorithm maps all the decisions that a model is supposed to take based on the
given input, in order to get the correct output.

Predictor Variable: It is a feature(s) of the data that can be used to predict the output.

Response Variable: It is the feature or the output variable that needs to be predicted by using the predictor
variable(s).

Training Data: The Machine Learning model is built using the training data. The training data helps the
model to identify key trends and patterns essential to predict the output.

Testing Data: After the model is trained, it must be tested to evaluate how accurately it can predict an outcome.
This is done by the testing data set.

To sum it up, take a look at the above figure. A Machine Learning process begins by feeding the machine lots
of data, by using this data the machine is trained to detect hidden insights and trends. These insights are then
used to build a Machine Learning Model by using an algorithm in order to solve a problem.

The next topic in this Introduction to Machine Learning blog is the Machine Learning

Process. Machine Learning Process

The Machine Learning process involves building a Predictive model that can be used to find a solution for a
Problem Statement. To understand the Machine Learning process let’s assume that you have been given a
problem that needs to be solved by using Machine Learning.

The problem is to predict the occurrence of rain in your local area by using Machine Learning.
The below steps are followed in a Machine Learning process:

Step 1: Define the objective of the Problem Statement

At this step, we must understand what exactly needs to be predicted. In our case, the objective is to
predict the possibility of rain by studying weather conditions. At this stage, it is also essential to take
mental notes on what kind of data can be used to solve this problem or the type of approach you must
follow to get to the solution.

Step 2: Data Gathering

At this stage, you must be asking questions such as,


 What kind of data is needed to solve this problem?
 Is the data available?

 How can I get the data?

Once you know the types of data that is required, you must understand how you can derive this data. Data
collection can be done manually or by web scraping. However, if you’re a beginner and you’re just looking
to learn Machine Learning you don’t have to worry about getting the data. There are 1000s of data
resources on the web, you can just download the data set and get going.

Coming back to the problem at hand, the data needed for weather forecasting includes measures such as
humidity level, temperature, pressure, locality, whether or not you live in a hill station, etc. Such data must
be collected and stored for analysis.
Step 3: Data Preparation

The data you collected is almost never in the right format. You will encounter a lot of inconsistencies in the
data set such as missing values, redundant variables, duplicate values, etc. Removing such inconsistencies is
very essential because they might lead to wrongful computations and predictions. Therefore, at this stage, you
scan the data set for any inconsistencies and you fix them then and there.

Step 4: Exploratory Data Analysis

Grab your detective glasses because this stage is all about diving deep into data and finding all the hidden data
mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of Machine Learning. Data
Exploration involves understanding the patterns and trends in the data. At this stage, all the useful insights
are drawn and correlations between the variables are understood.

For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if the
temperature has fallen low. Such correlations must be understood and mapped at this stage.

Step 5: Building a Machine Learning Model

All the insights and patterns derived during Data Exploration are used to build the Machine Learning Model.
This stage always begins by splitting the data set into two parts, training data, and testing data. The
training data will be used to build and analyze the model. The logic of the model is based on the Machine
Learning Algorithm that is being implemented.

In the case of predicting rainfall, since the output will be in the form of True (if it will rain tomorrow) or
False (no rain tomorrow), we can use a Classification Algorithm such as Logistic Regression.

Choosing the right algorithm depends on the type of problem you’re trying to solve, the data set and the
level of complexity of the problem. In the upcoming sections, we will discuss the different types of
problems that

can be solved by using Machine Learning.


Step 6: Model Evaluation & Optimization

After building a model by using the training data set, it is finally time to put the model to a test. The testing
data set is used to check the efficiency of the model and how accurately it can predict the outcome. Once
the accuracy is calculated, any further improvements in the model can be implemented at this stage.
Methods like parameter tuning and cross-validation can be used to improve the performance of the model.

Step 7: Predictions

Once the model is evaluated and improved, it is finally used to make predictions. The final output can be a
Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg. the predicted value of a
stock).

In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.

So that was the entire Machine Learning process. Now it’s time to learn about the different ways in which
Machines can learn.
2. Machine Learning Types

A machine can learn to solve a problem by following any one of the following three approaches. These are
the ways in which a machine can learn:

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

Supervised Learning

Supervised learning is a technique in which we teach or train the machine using data which is well labeled.

To understand Supervised Learning let’s consider an analogy. As kids we all needed guidance to solve math
problems. Our teachers helped us understand what addition is and how it is done. Similarly, you can think
of supervised learning as a type of Machine Learning that involves a guide. The labeled data set is the
teacher that will train you to understand patterns in the data. The labeled data set is nothing but the
training data set.

Consider the above figure. Here we’re feeding the machine images of Tom and Jerry and the goal is for the
machine to identify and classify the images into two groups (Tom images and Jerry images). The training
data set that is fed to the model is labeled, as in, we’re telling the machine, ‘this is how Tom looks and this
is Jerry’. By doing so you’re training the machine by using labeled data. In Supervised Learning, there is a
well-defined training phase done with the help of labeled data.

Supervised learning can be grouped further in two categories of algorithms:


oClassification

o Regression

Applications:

Advertisement Popularity: Selecting advertisements that will perform well is often a supervised
learning task. Many of the ads you see as you browse the internet are placed there because a
learning algorithm said that they were of reasonable popularity (and clickability). Furthermore, its
placement associated on a certain site or with a certain query (if you find yourself using a search
engine) is largely due to a learned algorithm saying that the matching between ad and placement
will be effective.

Spam Classification: If you use a modern email system, chances are you’ve encountered a spam
filter. That spam filter is a supervised learning system. Fed email examples and labels (spam/not
spam), these systems learn how to pre-emptively filter out malicious emails so that their user is not
harassed by them. Many of these also behave in such a way that a user can provide new labels to
the system and it can learn user preference.

Face Recognition: Most likely your face has been used in a supervised learning algorithm that is
trained to recognize your face. Having a system that takes a photo, finds faces, and guesses who
that is in the photo (suggesting a tag) is a supervised process. It has multiple layers to it, finding
faces and then identifying them, but is still supervised nonetheless.

Unsupervised Learning:

Unsupervised learning involves training by using unlabeled data and allowing the model to act on that
information without guidance.

Think of unsupervised learning as a smart kid that learns without any guidance. In this type of Machine
Learning, the model is not fed with labeled data, as in the model has no clue that ‘this image is Tom and
this is Jerry’, it figures out patterns and the differences between Tom and Jerry on its own by taking in
tons of data.

For example, it identifies prominent features of Tom such as pointy ears, bigger size, etc, to understand
that this image is of type 1. Similarly, it finds such features in Jerry and knows that this image is of type 2.
Therefore, it classifies the images into two different classes without knowing who Tom is or Jerry is.

It can be further classifieds into two categories of algorithms:

o Clustering

Association
Applications:

Recommender Systems: If you’ve ever used YouTube or Netflix, you’ve most likely
encountered a video recommendation system. These systems are often times placed in the
unsupervised domain. We know things about videos, maybe their length, their genre, etc.
We also know the watch history of many users. Taking into account users that have
watched similar videos as you and then enjoyed other videos that you have yet to see, a
recommender system can see this relationship in the data and prompt you with such a
suggestion.

Buying Habits: It is likely that your buying habits are contained in a database somewhere
and that data is being bought and sold actively at this time. These buying habits can be
used in unsupervised learning algorithms to group customers into similar purchasing
segments. This helps companies market to these grouped segments and can even resemble
recommender systems.
Grouping User Logs: Less user facing, but still very relevant, we can use unsupervised
learning to group user logs and issues. This can help companies identify central themes to
issues their customers face and rectify these issues, through improving a product or designing
an FAQ to handle common issues. Either way, it is something that is actively done and if
you’ve ever submitted an issue with a product or submitted a bug report, it is likely that it was
fed to an unsupervised learning algorithm to cluster it with other similar issues.

Reinforcement Learning

Reinforcement Learning is a part of Machine learning where an agent is put in an environment and
he learns to behave in this environment by performing certain actions and observing the rewards
which it gets from those actions.
This type of Machine Learning is comparatively different. Imagine that you were dropped off at an
isolated island! What would you do?
Panic? Yes, of course, initially we all would. But as time passes by, you will learn how to live on
the island. You will explore the environment, understand the climate condition, the type of food
that grows there, the dangers of the island, etc. This is exactly how Reinforcement Learning
works, it involves an Agent (you, stuck on the island) that is put in an unknown environment
(island), where he must learn by observing and performing actions that result in rewards.

Reinforcement Learning is mainly used in advanced Machine Learning areas such as self-driving cars,
AplhaGo, etc.
Applications:

Video Games: One of the most common places to look at reinforcement learning is in
learning to play games. Look at Google’s reinforcement learning application, AlphaZero and
AlphaGo which learned to play the game Go. Our Mario example is also a common
example. Currently, I don’t know any production-grade game that has a reinforcement
learning agent deployed as its game AI, but I can imagine that this will soon be an
interesting option for game devs to employ.
Industrial Simulation: For many robotic applications (think assembly lines), it is useful to
have our machines learn to complete their tasks without having to hardcode their processes.
This can be a cheaper and safer option; it can even be less prone to failure. We can also
incentivize our machines to use less electricity, so as to save us money. More than that, we
can start this all within a simulation so as to not waste money if we potentially break our
machine.

Resource Management: Reinforcement learning is good for navigating complex


environments. It can handle the need to balance certain requirements. Take, for
example, Google’s data centers. They used reinforcement learning to balance the need
to satisfy our power requirements, but do it as efficiently as possible, cutting major
costs. How does this affect us and the average person? Cheaper data storage costs for
us as well and less of an impact on the environment we all share.

1. Artificial Intelligence, Machine Learning, Deep learning


Deep Learning, Machine Learning, and Artificial Intelligence are the most used terms on the
internet for IT folks. However, all these three technologies are connected with each other. Artificial
Intelligence (AI) can be understood as an umbrella that consists of both Machine learning and
deep learning. Or We can say deep learning and machine learning both are subsets of artificial
intelligence.

As these technologies look similar, most of the persons have misconceptions about 'Deep
Learning, Machine learning, and Artificial Intelligence' that all three are similar to each other. But
in reality, although all these technologies are used to build intelligent machines or applications
that behave like a human, still, they differ by their functionalities and scope.

It means these three terms are often used interchangeably, but they do not quite refer to the same
things. Let's understand the fundamental difference between deep learning, machine learning, and
Artificial Intelligence with the below image.

With the above image, you can understand Artificial Intelligence is a branch of computer science
that helps us to create smart, intelligent machines. Further, ML is a subfield of AI that helps to
teach machines and build AI- driven applications. On the other hand, Deep learning is the sub-
branch of ML that helps to train ML models with a huge amount of input and complex algorithms
and mainly works with neural networks.
What is Artificial Intelligence (AI)?

Artificial Intelligence is defined as a field of science and engineering that deals with making intelligent
machines or computers to perform human-like activities.
Mr. John McCarthy is known as the godfather of this amazing invention. There are some popular
definitions of AI, which are as follows:

"AI is defined as the capability of machines to imitate intelligent human behavior."

"A computer system able to perform tasks that normally require human intelligence, such as visual
perception, speech recognition, decision-making, and translation between languages."

Goals Of Artificial Intelligence:


Following are the main goals of Artificial Intelligence:
1. Replicate human intelligence
2. Solve Knowledge-intensive tasks
3. An intelligent connection of perception and action
4. Building a machine which can perform tasks that requires human intelligence such
as:
o Proving a theorem
o Playing chess
o Plan some surgical operation
o Driving a car in traffic
5. Creating some system which can exhibit intelligent behavior, learn new things
by itself, demonstrate, explain, and can advise to its user.

Advantages of Artificial Intelligence:


Following are some main advantages of Artificial Intelligence:

o High Accuracy with less errors: AI machines or systems are prone to less
errors and high accuracy as it takes decisions as per pre-experience or
information.
o High-Speed: AI systems can be of very high-speed and fast-decision
making, because of that AI systems can beat a chess champion in the Chess
game.

o High reliability: AI machines are highly reliable and can perform the same
action multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as
defusing a bomb, exploring the ocean floor, where to employ a human can be
risky.
o Digital Assistant: AI can be very useful to provide digital assistant to the
users such as AI technology is currently used by various E-commerce
websites to show the products as per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as a
self- driving car which can make our journey safer and hassle-free, facial
recognition for security purpose, Natural language processing to communicate
with the human in human-language, etc.

Disadvantages Of Artificial Intelligence:

Every technology has some disadvantages, and thesame goes for Artificial intelligence.
Being so advantageous technology still, it has some disadvantages which we need to keep
in our mind while creating an AI system. Following are the disadvantages of AI:

o High Cost: The hardware and software requirement of AI is very costly as it


requires lots of maintenance to meet current world requirements.
o Can't think out of the box: Even we are making smarter machines with AI,
but still they cannot work out of the box, as the robot will only do that work
for which they are trained, or programmed.
o No feelings and emotions: AI machines can be an outstanding performer,
but still it does not have the feeling so it cannot make any kind of emotional
attachment with human, and may sometime be harmful for users if the
proper care is not taken.
o Increase dependency on machines: With the increment of technology,
people are getting more dependent on devices and hence they are losing
their mental capabilities.
o No Original Creativity: As humans are so creative and can imagine some
new ideas but still AI machines cannot beat this power of human intelligence
and cannot be creative and imaginative.
What is Deep Learning?
"Deep learning is defined as the subset of machine learning and artificial intelligence that is based on
artificial neural networks". In deep learning, the deep word refers to the number of layers in a
neural network.

Deep Learning is a set of algorithms inspired by the structure and function of the human brain. It
uses a huge amount of structured as well as unstructured data to teach computers and predicts
accurate results. The main difference between machine learning and deep learning technologies is
of presentation of data. Machine learning uses structured/unstructured data for learning, while
deep learning uses neural networks for learning models.

Types of Deep Learning:

1. Feed Forward Neural Network: A feed-forward neural network is none other


than an Artificial Neural Network, which ensures that the nodes do not form a cycle.
In this kind of
neural network, all the perceptrons are organized within layers, such that the input layer takes
the input, and the output layer generates the output. Since the hidden layers do not link with
the outside world, it is named as hidden layers. Each of the perceptrons contained in one
single layer is associated with each node in the subsequent layer. It can be concluded that all
of the nodes are fully connected. It does not contain any visible or invisible connection
between the nodes in the same layer. There are no back-loops in the feed-forward network.
To minimize the prediction error, the backpropagation algorithm can be used to update the
weight values.

1. Recurrent Neural Network: Recurrent neural networks are yet another variation of
feed- forward networks. Here each of the neurons present in the hidden layers
receives an input with a specific delay in time. The Recurrent neural network mainly
accesses the preceding info of existing iterations. For example, to guess the
succeeding word in any sentence, one must have knowledge about the words that
were previously used. It not only processes the inputs but also shares the length as
well as weights crossways time. It does not let the size of the model to increase with
the increase in the input size. However, the only problem with this recurrent neural
network is that it has slow computational speed as well as it does not contemplate
any future input for the current state. It has a problem with reminiscing prior
information.

2. Convolutional Neural Network: Convolutional Neural Networks are a special kind of


neural network mainly used for image classification, clustering of images and object
recognition. DNNs enable unsupervised construction of hierarchical image
representations. To achieve the best accuracy, deep convolutional neural networks
are preferred more than any other neural network.

3. Restricted Boltzmann Machine: RBMs are yet another variant of Boltzmann


Machines. Here the neurons present in the input layer and the hidden layer
encompasses symmetric connections amid them. However, there is no internal
association within the respective layer. But in contrast to RBM, Boltzmann machines
do encompass internal connections inside the hidden layer. These restrictions in BMs
helps the model to train efficiently.

4. Autoencoders: An autoencoder neural network is another kind of unsupervised


machine learning algorithm. Here the number of hidden cells is merely small than
that of the input cells. But the number of input cells is equivalent to the number of
output cells. An autoencoder network is trained to display the output similar to the
fed input to force AEs to find common patterns and generalize the data. The
autoencoders are mainly used for the smaller representation of the input. It helps in
the reconstruction of the original data from compressed data. This algorithm is
comparatively simple as it only necessitates the output identical to the input.

oEncoder: Convert input data in lower dimensions.


oDecoder: Reconstruct the compressed data.
Applications:

oSelf-Driving Cars
In self-driven cars, it is able to capture the images around it by processing a huge amount of
data, and then it will decide which actions should be incorporated to take a left or right or
should it stop. So, accordingly, it will decide what actions it should take, which will further
reduce the accidents that happen every year.
oVoice Controlled Assistance
When we talk about voice control assistance, then Siri is the one thing that comes into our
mind. So, you can tell Siri whatever you want it to do it for you, and it will search it for you
and display it for you.
oAutomatic Image Caption Generation
Whatever image that you upload, the algorithm will work in such a way that it will
generate caption accordingly. If you say blue colored eye, it will display a blue- colored
eye with a caption at the bottom of the image.
oAutomatic Machine Translation
With the help of automatic machine translation, we are able to convert one language into
another with the help of deep learning.

Advantages:

oIt lessens the need for feature engineering.


oIt eradicates all those costs that are needless.
oIt easily identifies difficult defects.
oIt results in the best-in-class performance on problems.

Disadvantages:

oIt requires an ample amount of data.


oIt is quite expensive to train.
oIt does not have strong theoretical groundwork.
3. Main Challenges of Machine Learning
During the development phase our focus is to select a learning algorithm and train it on some data, the
two things that might be a problem are a bad algorithm or bad data, or perhaps both of them.

The following are some of the challenges of ML

1. Not enough training data.

Machine Learning is not quite there yet; it takes a lot of data for most Machine Learning algorithms to work
properly. Even for very simple problems you typically need thousands of examples, and for complex
problems such as image or speech recognition you may need millions of examples.

2. Poor Quality of data:

Obviously, if your training data has lots of errors, outliers, and noise, it will make it impossible for your
machine learning model to detect a proper underlying pattern. Hence, it will not perform well.

So put in every ounce of effort in cleaning up your training data. No matter how good you are in selecting
and hyper tuning the model, this part plays a major role in helping us make an accurate machine learning
model.

“Most Data Scientists spend a significant part of their time in cleaning

data”. There are a couple of examples when you’d want to clean up the

data :

 If you see some of the instances are clear outliers just discard them or fix them manually.

 If some of the instances are missing a feature like (E.g., 2% of user did not specify their age), you
can either ignore these instances, or fill the missing values by median age, or train one model with
the feature and train one without it to come up with a conclusion.

3. Irrelevant Features:

“Garbage in, garbage out (GIGO).”

In the above image, we can see that even if our model is “AWESOME” and we feed it with garbage data,
the result will also be garbage(output). Our training data must always contain more relevant and less to
none irrelevant features.
The credit for a successful machine learning project goes to coming up with a good set of features on
which it has been trained (often referred to as feature engineering ), which includes feature selection,
extraction, and creating new features which are other interesting topics to be covered in upcoming blogs.

4. Nonrepresentative training data:

To make sure that our model generalizes well, we have to make sure that our training data should be
representative of the new cases that we want to generalize to.

If train our model by using a nonrepresentative training set, it won’t be accurate in predictions it will be
biased against one class or a group.
For E.G., Let us say you are trying to build a model that recognizes the genre of music. One way
to build your training set is to search it on youtube and use the resulting data. Here we assume that
youtube’s search engine is providing representative data but in reality, the search will be biased
towards popular artists and maybe even the artists that are popular in your location(if you live in
India you will be getting the music of Arijit Singh, Sonu Nigam or etc).

So use representative data during training, so your model won’t be biased among one or two
classes when it
works on testing data.

5. Overfitting the Training Data

Overfitting happens when the model is too complex relative to the amount and noisiness of the
training data. The possible solutions are:

To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a
high-degree polynomial model), by reducing the number of attributes in the training data or by
constraining the model

• To gather more training data

• To reduce the noise in the training data (e.g., fix data errors and remove outliers)

6. Underfitting the Training Data

Underfitting is the opposite of overfitting: it occurs when your model is too simple to learn the
underlying structure of the data. For example, a linear model of life satisfaction is prone to
underfit; reality is just more complex than the model, so its predictions are bound to be
inaccurate, even on the training examples.

The main options to fix this problem are:

• Selecting a more powerful model, with more parameters

• Feeding better features to the learning algorithm (feature engineering)

• Reducing the constraints on the model (e.g., reducing the regularization hyperparameter)

Statistical Learning:

Statistical learning refers to a vast set of tools for understanding data. These tools can be
classified as supervised or unsupervised. Broadly speaking, supervised statistical learning
involves building a statistical model for predicting, or estimating, an output based on one or
more inputs. Problems of this nature occur in fields as diverse as business, medicine,
astrophysics, and public policy. With unsupervised statistical learning, there are inputs but
no supervising output; nevertheless we can learn relationships and structure from such
data.

Supervised Learning and Unsupervised Learning:


Supervised Machine Learning:

Supervised learning is a machine learning method in which models are trained using labeled
data. In supervised learning, models need to find the mapping function to map the input
variable
(X) with the output variable (Y).
Supervised learning needs supervision to train the model, which is similar to as a student
learns things in the presence of a teacher. Supervised learning can be used for two types of
problems: Classification and Regression.

Example: Suppose we have an image of different types of fruits. The task of our
supervised learning model is to identify the fruits and classify them accordingly. So to
identify the image in supervised learning, we will give the input data as well as output for
that, which means we will train the model by the shape, size, color, and taste of each fruit.
Once the training is completed, we will test the model by giving the new set of fruit. The
model will identify the fruit and predict the output using a suitable algorithm.

Unsupervised Machine Learning:

Unsupervised learning is another machine learning method in which patterns inferred from
the unlabeled input data. The goal of unsupervised learning is to find the structure and
patterns from the input data. Unsupervised learning does not need any supervision. Instead,
it finds patterns from the data by its own.

Unsupervised learning can be used for two types of problems: Clustering and
Association.

Example: To understand the unsupervised learning, we will use the example given above.
So unlike supervised learning, here we will not provide any supervision to the model. We will
just provide the input dataset to the model and allow the model to find the patterns from
the data. With the help of a suitable algorithm, the model will train itself and divide the
fruits into different groups according to the most similar features between them.
The main differences between Supervised and Unsupervised learning are given below:

Supervised Learning Unsupervised Learning


Supervised learning algorithms Unsupervised learning algorithms are
are trained using labeled trained using unlabeled data.
data.
Supervised learning model takes direct Unsupervised learning model does
feedback to check if it is predicting not take any feedback.
correct output or not.

Supervised learning model predicts Unsupervised learning model finds


the output. the hidden patterns in data.
In supervised learning, input data is In unsupervised learning, only input
provided to the model along with data is provided to the model.
the
output.
The goal of supervised learning is to The goal of unsupervised learning is to
train the model so that it can predict find the hidden patterns and useful
the output when it is given new data. insights from the unknown dataset.

Supervised learning needs supervision Unsupervised learning does not need


to train the model. any supervision to train the model.

Supervised learning can be Unsupervised Learning can be


categorized in Classification and classified in Clustering and
Regression proble ms. Associations proble ms.

Supervised learning can be used for Unsupervised learning can be used


those cases where we know the input for those cases where we have only
as well as input
corresponding outputs. data and no corresponding output data.
Supervised learning model produces Unsupervised learning model may give
an accurate result. less accurate result as compared to
supervised learning.

Supervised learning is not close to true Unsupervised learning is more close


Artificial intelligence as in this, we to the true Artificial Intelligence as it
first learns
train the model for each data, and similarly as a child learns daily
then only it can predict the correct routine things by his experiences.
output.
Training and Test Loss:

Machine learning is a branch of Artificial Intelligence, which allows machines to perform data
analysis and make predictions. However, if the machine learning model is not accurate, it
can make predictions errors, and these prediction errors are usually known as Bias and
Variance. In machine learning, these errors will always be present as there is always a slight
difference between the model predictions and actual predictions. The main aim of ML/data
science analysts is to reduce these errors in order to get more accurate results.

Errors in Machine Learning?

In machine learning, an error is a measure of how accurately an algorithm can make


predictions for the previously unknown dataset. On the basis of these errors, the machine
learning model is selected that can perform best on the particular dataset. There are mainly
two types of errors in machine learning, which are:

Reducible errors: These errors can be reduced to improve the model accuracy. Such
errors can further be classified into bias and Variance.
Irreducible errors: These errors will always be present in the model regardless of which
algorithm has been used. The cause of these errors is unknown variables whose value can't
be reduced.

What is Bias?

In general, a machine learning model analyses the data, find patterns in it and make
predictions. While training, the model learns these patterns in the dataset and
applies them to test data for prediction. While making predictions, a difference
occurs between prediction values made by the model and actual
values/expected values, and this difference is known as bias errors or
Errors due to bias. It can be defined as an inability of machine learning algorithms
such as Linear Regression to capture the true relationship between the data
points. Each algorithm
begins with some amount of bias because bias occurs from assumptions in the model, which
makes the target function simple to learn. A model has either:

o Low Bias: A low bias model will make fewer assumptions about the form of
the target function.
o High Bias: A model with a high bias makes more assumptions, and the
model becomes unable to capture the important features of our dataset. A
high bias model also cannot perform well on new data.

Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler the
algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear algorithm
often has low bias.

Some examples of machine learning algorithms with low bias are Decision Trees,
k-Nearest Neighbours and Support Vector Machines. At the same time, an
algorithm with high bias is Linear Regression, Linear Discriminant Analysis
and Logistic Regression.

Ways to reduce High Bias:

High bias mainly occurs due to a much simple model. Below are some ways to reduce the
high bias:

oIncrease the input features as the model is underfitted.


oDecrease the regularization term.
oUse more complex models, such as including some polynomial features.

What is a Variance Error?

The variance would specify the amount of variation in the prediction if the different
training data was used. In simple words, variance tells that how much a
random variable is different from its expected value. Ideally, a model should
not vary too much from one training dataset to another, which means the algorithm
should be good in understanding the hidden mapping between inputs and output
variables. Variance errors are either of low variance or high variance.

Low variance means there is a small variation in the prediction of the target function with
changes in the training data set. At the same time, High variance shows a large variation
in the prediction of the target function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the training dataset,
and does not generalize well with the unseen dataset. As a result, such a model gives good
results with the training dataset but shows high error rates on the test dataset.

Since, with high variance, the model learns too much from the dataset, it leads to overfitting
of the model. A model with high variance has the below problems:
oA high variance model leads to overfitting.
oIncrease model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.

Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis. At the
same time, algorithms with high variance are decision tree, Support Vector
Machine, and K-nearest neighbours.

Ways to Reduce High Variance:


oReduce the input features or number of parameters as a model is overfitted.
oDo not use a much complex model.
oIncrease the training data.
oIncrease the Regularization term.

Different Combinations of Bias-Variance


There are four possible combinations of bias and variances, which are represented by the
below diagram:

1. Low-Bias, Low-Variance: The combination of low bias and low variance


shows an ideal machine learning model. However, it is not possible
practically.
2. Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case occurs when
the model learns with a large number of parameters and hence leads to an
overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are
consistent but inaccurate on average. This case occurs when a model does
not learn well with the training dataset or uses few numbers of the
parameter. It leads to underfitting problems in the model.
4. High-Bias, High-Variance: With high bias and high variance, predictions
are inconsistent and also inaccurate on average.

Tradeoffs in Statistical Learning:

Bias vs variance: A trade-off

While building the machine learning model, it is really important to take care of bias and
variance in order to avoid overfitting and underfitting in the model. If the model is very
simple with fewer parameters, it may have low variance and high bias. Whereas, if the
model has a large number of parameters, it will have high variance and low bias. So, it is
required to make a balance between bias and variance errors, and this balance between the
bias error and variance error is known as the Bias-Variance trade-off.

For an accurate prediction of the model, algorithms need a low variance and low bias. But
this is not possible because bias and variance are related to each other:

oIf we decrease the variance, it will increase the bias.


oIf we decrease the bias, it will increase the variance.
Bias-Variance trade-off is a central issue in supervised learning. Ideally, we need a model
that accurately captures the regularities in training data and simultaneously generalizes well
with the unseen dataset. Unfortunately, doing this is not possible simultaneously. Because a
high variance algorithm may perform well with training data, but it may lead to overfitting to
noisy data. Whereas, high bias algorithm generates a much simple model that may not even
capture important regularities in the data. So, we need to find a sweet spot between bias
and variance to make an optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make
a balance between bias and variance errors.

Estimating risk statistics:

Nowadays, Machine Learning is playing a big role in helping organizations in different


aspects such as analyzing structured and unstructured data, detecting risks, automating
manuals tasks, making data-driven decisions for business growth, etc. It is capable of
replacing the huge amount of human labour by applying automation and providing insights
to make better decisions for assessing, monitoring, and reducing the risks for an
organization.

Although machine learning can be used as a risk management tool, it also contains many
risks itself. While 49% of companies are exploring or planning to use machine learning, only
a small minority recognize the risks it poses. In which, only 41% of organizations in a global
McKinsey survey say they can comprehensively identify and prioritize machine learning risks.
Hence, it is necessary to be aware of some of the risks of machine learning-and how they
can be adequately evaluated and managed.

Below are a few risks associated with Machine Learning:

1. Poor Data

As we know, a machine learning model only works on the data that we provide to it, or we
can say it completely depends on human-given training data to work. What we will be input
that we will get as an output, so if we will enter the poor data, the ML model will generate
abrupt output. Poor data or dirty data includes errors in training data, outliers, and
unstructured data, which cannot be adequately interpreted by the model.

2. Overfitting

Overfitting is commonly found in non-parametric and non-linear models that are more
flexible to learn target function.
An overfitted model fits the training data so perfectly that it becomes unable to learn the
variability for the algorithm. It means it won't be able to generalize well when it comes to
testing real data.

3. Biased data

Biased data means that human biases can creep into your datasets and spoil outcomes. For
instance, the popular selfie editor FaceApp was initially inadvertently trained to make faces
"hotter" by lightening the skin tone-a result of having been fed a much larger quantity of
photos of people with lighter skin tones.

4. Lack of strategy and experience:

Machine learning is a very new technology in the IT sector; hence, less availability of trained
and skilled resources is a very big issue for the industries. Further, lack of strategy and
experience due to fewer resources leads to wastage of time and money as well as negatively
affect the organization's production and revenue. According to a survey of over 2000
people, 860 reported to lack of clear strategy and 840 were reported to lack of talent with
appropriate skill sets. This survey shows how lack of strategy and relevant experience
creates a barrier in the development of machine learning for organizations.

5. Security Risks

Security of data is one of the major issues for the IT world. Security also affects the
production and revenue of organizations. When it comes to machine learning, there are
various types of security risks exist that can compromise machine learning algorithms and
systems. Data scientists and machine learning experts have reported 3 types of attacks,
primarily for machine learning models. These are as follows:

Evasion attacks: These attacks are commonly arisen due to adversarial input introduced in
the models; hence they are also known as adversarial
attacks. An evasion attack happens when the network uses adversarial examples as input
which can influence the classifiers, i.e., disrupting ML models. When a security violation
involves supplying malicious data that gets classified as genuine. A targeted attack attempts
to allow a specific intrusion or disruption, or alternatively to create general mayhem.

Data Poisoning attacks: In data poisoning attacks, the source of raw data is known,
which is used to train the ML models. Further, it strives to bias or "poison" the data to
compromise the resulting machine learning model's accuracy. The effects of these attacks
can be overcome by prevention and detection. Through proper monitoring, we can prevent
ML models from data poisoning. Model skewing is one the most common type of data
poisoning attacks in which spammers categorise the classifiers with bad input as good.

Model Stealing: Model stealing is one of the most important security risks in machine
learning. Model stealing techniques are used to create a clone model based on information
or data used in the training of a base model. Why we are saying model stealing is a major
concern for ML
experts because ML models are the valuable intellectual property of organizations that
consist of sensitive data of users such as account details, transactions, financial information,
etc. The attackers use public API and sample data of the original model and reconstruct
another model having a similar look and feel.

6. Data privacy and confidentiality

Data is one of the main key players in developing Machine learning models. We know
machine learning requires a huge amount of structured and unstructured data for training
models so they can predict accurately in future. Hence, to achieve good results, we need to
secure data by defining some privacy terms and conditions as well as making it confidential.
Hackers can launch data extraction attacks that can fly under the radar, which can put your
entire machine learning system at risk.

7. Third-party risks

These types of security risks are not so famous in industries as there are very minimal
chances of these risks in industries. Third-party risks generally exist when someone
outsources their business to third-party service providers who may fail to properly govern a
machine learning solution. This leads to various types of data breaches in the ML industry.

8. Regulatory challenges

Regulatory challenges occur whenever a knowledge gap is found in an organization, such as


teammates do not aware of how ML algorithms work and create decisions. Hence, a lack of
knowledge to justify decisions to regulators can also be a major security risk for industries.

Sampling Distribution of the Estimator:

In statistics, it is the probability distribution of the given statistic estimated on the basis of a
random sample. It provides a generalized way to statistical inference. The estimator is the
generalized mathematical parameter to calculate sample statistics. An estimate is the result
of the estimation.

The sampling distribution of estimator depends on the sample size. The effect of change of
the sample size has to be determined. An estimate has a single numerical value and hence
they are called point estimates. There are various estimators like sample mean, sample
standard deviation, proportion, variance, range etc.

Sampling distribution of the mean: It is the population mean from which the samples are
drawn. For all the sample sizes, it is likely to be normal if the population distribution is
normal. The population mean is equal to the mean of the sampling distribution of the mean.
Sampling distribution of mean has the standard deviation, which is as follows:

Where , is the standard deviation of the sampling mean, is the population standard
deviation and n is the sample size.
As the size of the sample increases, the spread of the sampling distribution of the mean
decreases. But the mean of the distribution remains the same and it is not affected by the
sample size.

The sampling distribution of the standard deviation is the standard error of the standard
deviation. It is defined as:

Here, is the sampling distribution of the standard deviation. It is positively skewed for
small n but it approximately becomes normal for sample sizes greater than 30.

Empirical Risk Minimization(ERM):

The Empirical Risk Minimization (ERM) principle is a learning paradigm which consists in
selecting the model with minimal average error over the training set. This so-called training
error can be seen as an estimate of the risk (due to the law of large numbers), hence the
alternative name of empirical risk.

By minimizing the empirical risk, we hope to obtain a model with a low value of the risk. The
larger the training set size is, the closer to the true risk the empirical risk is.

If we were to apply the ERM principle without more care, we would end up learning
by heart, which we know is bad. This issue is more generally related
to the overfitting phenomenon, which can be avoided by restricting the space of possible
models when searching for the one with minimal error. The most severe and yet common
restriction is encountered in the contexts of linear classification or linear regression. Another
approach consists in controlling the complexity of the model by regularization.

While building our machine learning model, we choose a function that reduces the
differences between the actual and the predicted output i.e. empirical risk. We aim to
reduce/minimize the empirical risk as an attempt to minimize the true risk by hoping that
the empirical risk is almost the same as the true risk.

Empirical risk minimization depends on four factors:

 The size of the dataset - the more data we get, the more the empirical risk
approaches the true risk.

 The complexity of the true distribution - if the underlying distribution is too


complex, we might need more data to get a good approximation of it.
 The class of functions we consider - the approximation error will be very high if the
size of the function is too large.

 The loss function - It can cause trouble if the loss function gives very high loss in
certain conditions.
The L2 Regularization is an example of empirical risk minimization.

L2 Regularization
In order to handle the problem of overfitting, we use the regularization techniques. A
regression problem using L2 regularization is also known as ridge regression.
In ridge regression, the predictors that are insignificant are penalized. This method constricts the
coefficients to deal with independent variables that are highly correlated. Ridge regression adds the
“squared magnitude” of coefficient, which is the sum of squares of the weights of all features as the
penalty term to the loss function.

Here, λ is the regularization parameter.

You might also like