Machine Learning Introduction - A Comprehensive Guide
Machine Learning Introduction - A Comprehensive Guide
Machine Learning Introduction - A Comprehensive Guide
This is the first of a series of articles in which I will describe machine learning concepts,
types, algorithms and python implementations.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 1/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
The information exposed in this series comes from several sources, being the main
ones:
Python Machine Learning book (by Sebastian Raschka & Vahid Mirjalili)
Python Data Science and Machine Learning course by Jose Portilla (Udemy)
Machine Learning y Data Science con Python course by Manuel Garrido (Udemy)
Machine Learning is one of the subfields of Artificial Inteligence and can be described
as:
“Machine Learning is the science of getting computers to learn and act like humans do, and
improve their learning over time in autonomous fashion, by feeding them data and
information in the form of observations and real-world interactions.” — Dan Fagella
Machine learning offers an efficient way for capturing knowledge in data to gradually
improve the performance of predictive models, and make data-driven decisions. It has
become an ubiquitous technology and we enjoy its benefits in: e-mail spam filters, self-
driving cars, image and voice recognition and world-class go players.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 2/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
The next video shows a real-time event detection for video surveillance machine
learning application.
Usually there is one column (or feature), that we will call the target, label or
response, and its the value or class that we’re trying to predict.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 3/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
Regarding machine learning algorithms, they usually have some inner parameters. ie:
In Decision Trees, there are parameters like depth, number of nodes, number of
leaves… This inner parameters are called hyperparameters.
Supervised learning
Unsupervised learning
Deep learning.
In this series we will explore and study all of the metioned types of machine learning
and we will also dig deeper in a kind of deep learning techniques called “reinforcement
learning”.
Supervised Learning
Supervised learning refers to a kind of machine learning models that are trained with a
set of samples where the desired output signals (or labels) are already known. The
models learn from these already known results and make adjustments in their inner
parameters to adapt themselves to the input data. Once the model is properly trained,
it can make accurate predictions about unseen or future data.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 4/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
There are two main applications of supervised learning: classification and regression.
1. Classsification:
An example of binary classification: There are 2 classes, circles and crosses, and 2
features, X1 and X2. The model is able to find the relationship between the features of
each data point and its class, and to set a boundary line between them, so when
provided with new data, it can estimate the class where it belongs, given its features.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 5/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
In this case, the new data point falls into the circle subspace and, therefore, the model
will predict its class to be a circle.
2. Regression:
Regression is also used to assign categories to unlabeled data. In this type of learning
we are given a number of predictor (explanatory) variables and a continuous response
variable (outcome), and we try to find a relationship between those variables that
allows us to predict a continuous outcome.
An example of linear regression: given X and Y, we fit a straight line that minimize the
distance (with some criteria like average squared distance (SSE)) between the sample
points and the fitted line. Then, we’ll use the intercept and slope learned, of the fitted
line, to predict the outcome of new data.
Unsupervised Learning
In unsupervised learning we deal with unlabeled data of unknown structure and the
goal is to explore the structure of the data to extract meaningful information, without
the reference of a known outcome variable.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 6/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
1. Clustering:
2. Dimensionality Reduction:
It is common to work with data in which each observation comes with a high number
of features, in other words, that have high dimensionality. This can be a challenge for
the computational performance of Machine Learning algorithms, so dimensionality
reduction is one of the techniques used for dealing with this issue.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 7/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
Deep Learning
Deep learning is a subfield of machine learning, that uses a hierarchical structure of
artificial neural networks, which are built in a similar fashion of a human brain, with
the neuron nodes connected as a web. That architechture allows to tackle the data
analysis in a non-linear way.
The first layer of the neural network takes raw data as an input, processes it, extracts
some information and passes it to the next layer as an output. Each layer then
processes the information given by the previous one and repeats, until data reaches the
final layer, which makes a prediction.
This prediction is compared with the known result and then, by a method called
backpropagation, the model is able to learn the weights that yield accurate outputs.
Reinforcement Learning
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 8/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
Reinforcement learning is one of the most important branches of Deep Learning. The
goal is to build a model in which there is an agent that takes actions and where the aim
is to improve its performance. This improvement is done by giving an specific reward
each time that the agent performs an action that belongs to the set of actions that the
developer wants the agent to perform.
The reward is a measurement of how well the action was in order to achieve a
predefined goal. The agent then uses this feedback to adjust its future behaviour, with
the objective of obtaining the most reward.
One common example is a chess engine, where the agent decides from a series of
possible actions, depending on the board’s disposition (which is the environment’s
state) and the reward is given when winning or loosing the game.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 9/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
Preprocessing:
It is one of the most crucial steps in any Machine Learning application. Usually data
comes in a format that is not optimal (or even inadequate) for the model to process it.
In that cases preprocessing is a mandatory task to do.
Many algorithms require the features to be on the same scale (for example: to be in the
[0,1] range) for optimizing its performance, and this is often done by applying
normalization or standardization techniques on the data.
We can also find in some cases that the selected features are correlated and therefore,
redundant for extracting meaningful information from them. Then we must use
dimensionality reduction techniques to compress the features to smaller dimensional
subspaces.
Finally, we’ll split randomly our original dataset into training and testing subsets.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 10/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
Finally, we will use a technique called cross-validation to make sure that our model will
perform well on real-world data befiore using the testing subset for the final evaluation
of the model.
This technique divides the training dataset into smaller training and validating subsets,
then estimates the generalization ability of the model, in other words, estimating how
well it can predict outcomes when provided with new data. It then repeats the process,
K times and computes the average performance of the model by dividing the sum of the
metrics obtained between the K number of iterations.
In general, the default parameters of the machine learning algorithms provided by the
libraries are not the best ones to use with our data, so we will use hyperparameter
optimization techniques to help us to do the fine tunning of the model’s performance.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 11/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
If we are satisfied with the value of the metric obtained, we can use then the model to
make predictions on future data.
Wrap Up
In this article we learned what is Machine Learning painting a big picture of its nature,
motivation and applications.
We also learned some basic notations and terminology and the different kinds of
machine learning algorithms:
Preprocessing.
Selecting a model.
Evaluating.
As stated on the begining of the article, this is the first of a series of articles, and was a
general introduction. This will be an exciting journey as we will learn how to apply
many powerful techniques.
This will be a technichal series and therefore, we will explore some calculus, linear
algebra, statistics and python concepts, as they will be necessary to understand the
main concepts and how the algorithms work. But do not worry, we will make a gently
approach to all this concepts so, even if you do not have a technichal background you
will not feel overwhelmed.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 12/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide
Stay tuned!
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 13/13