0% found this document useful (0 votes)

96 views

Machine Learning Introduction - A Comprehensive Guide

This document provides an introduction to machine learning concepts. It defines machine learning as getting computers to learn like humans by feeding them data. There are three main types of machine learning: supervised learning which uses labeled training data, unsupervised learning which finds patterns in unlabeled data, and deep learning which uses neural networks. Supervised learning includes classification to predict categories and regression to predict continuous values. Unsupervised learning includes clustering to group similar data and dimensionality reduction to reduce redundant features.

Uploaded by

Nischay Gowda

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

Machine Learning Introduction - A Comprehensive Guide

Uploaded by

Nischay Gowda

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

2/2/2020 Machine Learning Introduction: A Comprehensive Guide

Machine Learning Introduction: A Comprehensive

Guide
Victor Roman
Dec 3, 2018 · 10 min read

This is the first of a series of articles in which I will describe machine learning concepts,
types, algorithms and python implementations.

The main goals of this series are:

1. Creating a comprehesive guide towards machine learning theory and intuition.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 1/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

2. Sharing and explaining machine learning projects, developed in python, to show in

a practical way the concepts and algorithms explained, as well as how they can be
applied in real-world problems.

3. Leaving a digital footprint of my knowledge in the subject and inspire others to

learn and apply machine learning in their own fields.

The information exposed in this series comes from several sources, being the main
ones:

Machine Learning Engineer NanoDegree (Udacity)

Python Machine Learning book (by Sebastian Raschka & Vahid Mirjalili)

Deep Learning with Python book(by Francois Chollet)

Machine Learning Mastery with Python book(by Jason Brownlee)

Python Data Science and Machine Learning course by Jose Portilla (Udemy)

Machine Learning y Data Science con Python course by Manuel Garrido (Udemy)

What Is Machine Learning?

Due to the large decrease in technology and sensors prices, we can now create, store
and send more data than ever in history. Up to ninety percent of the data in the world
today has been created in the last two years alone. There are 2.5 quintillion bytes of
data created each day at our current pace and this pace is only expected to grow. This
data feed the machine learning models and it is the main driver of the boom that this
science has experienced in recent years.

Machine Learning is one of the subfields of Artificial Inteligence and can be described
as:

“Machine Learning is the science of getting computers to learn and act like humans do, and
improve their learning over time in autonomous fashion, by feeding them data and
information in the form of observations and real-world interactions.” — Dan Fagella

Machine learning offers an efficient way for capturing knowledge in data to gradually
improve the performance of predictive models, and make data-driven decisions. It has
become an ubiquitous technology and we enjoy its benefits in: e-mail spam filters, self-
driving cars, image and voice recognition and world-class go players.
https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 2/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

The next video shows a real-time event detection for video surveillance machine
learning application.

Real-time event detection for video surveillance applications

Basic Terminology and Notations

Generally in machine learning it is used matrix and vector notations to refer to the
data. This data is used normally in matrix form where:

Each separate row of the matrix is a sample, observation or data point.

Each column is feature (or attribute) of that observation.

Usually there is one column (or feature), that we will call the target, label or
response, and its the value or class that we’re trying to predict.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 3/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

To train a machine learning model is to provide a machine learning algorithm with

training data to learn from it.

Regarding machine learning algorithms, they usually have some inner parameters. ie:
In Decision Trees, there are parameters like depth, number of nodes, number of
leaves… This inner parameters are called hyperparameters.

Generalization is the ability of the model to make predictions on new data.

Types of machine learning

The types of machine learning that will be studied through this series are:

Supervised learning

Unsupervised learning

Deep learning.

In this series we will explore and study all of the metioned types of machine learning
and we will also dig deeper in a kind of deep learning techniques called “reinforcement
learning”.

Supervised Learning
Supervised learning refers to a kind of machine learning models that are trained with a
set of samples where the desired output signals (or labels) are already known. The
models learn from these already known results and make adjustments in their inner
parameters to adapt themselves to the input data. Once the model is properly trained,
it can make accurate predictions about unseen or future data.

An overview of the general process:

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 4/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

There are two main applications of supervised learning: classification and regression.

1. Classsification:

Classification is a subcategory of supervised learning where the goal is to predict the

categorical class labels (discrete, unordered values, group membership) of new
instances based on past observations. The typical example is e-mail spam detection,
which is a binary classification (either an e-mail is -1- or isn’t -0- spam). There is also
multi-class classification such as handwritten character recognition (where classes go
from 0 to 9).

An example of binary classification: There are 2 classes, circles and crosses, and 2
features, X1 and X2. The model is able to find the relationship between the features of
each data point and its class, and to set a boundary line between them, so when
provided with new data, it can estimate the class where it belongs, given its features.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 5/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

In this case, the new data point falls into the circle subspace and, therefore, the model
will predict its class to be a circle.

2. Regression:

Regression is also used to assign categories to unlabeled data. In this type of learning
we are given a number of predictor (explanatory) variables and a continuous response
variable (outcome), and we try to find a relationship between those variables that
allows us to predict a continuous outcome.

An example of linear regression: given X and Y, we fit a straight line that minimize the
distance (with some criteria like average squared distance (SSE)) between the sample
points and the fitted line. Then, we’ll use the intercept and slope learned, of the fitted
line, to predict the outcome of new data.

Unsupervised Learning
In unsupervised learning we deal with unlabeled data of unknown structure and the
goal is to explore the structure of the data to extract meaningful information, without
the reference of a known outcome variable.

There are two main categories: clustering and dimensionality reduction.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 6/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

1. Clustering:

Clustering is an exploratory data analysis technique used for organizing information

into meaningful clusters or subgroups without any prior knowledge of its structure.
Each cluster is a group of similar objects that is different to objects of the other clusters.

2. Dimensionality Reduction:

It is common to work with data in which each observation comes with a high number
of features, in other words, that have high dimensionality. This can be a challenge for
the computational performance of Machine Learning algorithms, so dimensionality
reduction is one of the techniques used for dealing with this issue.

Dimensionality reduction methods work by finding correlations between the features,

which would mean that there is redundant information, as some feature could be
partially explained with the others. It removes noise from data (which can also
decrease the model’s performance) and compress data to a smaller subspace while
retaining most of the relevant information.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 7/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

Deep Learning
Deep learning is a subfield of machine learning, that uses a hierarchical structure of
artificial neural networks, which are built in a similar fashion of a human brain, with
the neuron nodes connected as a web. That architechture allows to tackle the data
analysis in a non-linear way.

The first layer of the neural network takes raw data as an input, processes it, extracts
some information and passes it to the next layer as an output. Each layer then
processes the information given by the previous one and repeats, until data reaches the
final layer, which makes a prediction.

This prediction is compared with the known result and then, by a method called
backpropagation, the model is able to learn the weights that yield accurate outputs.

Reinforcement Learning

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 8/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

Reinforcement learning is one of the most important branches of Deep Learning. The
goal is to build a model in which there is an agent that takes actions and where the aim
is to improve its performance. This improvement is done by giving an specific reward
each time that the agent performs an action that belongs to the set of actions that the
developer wants the agent to perform.

The reward is a measurement of how well the action was in order to achieve a
predefined goal. The agent then uses this feedback to adjust its future behaviour, with
the objective of obtaining the most reward.

One common example is a chess engine, where the agent decides from a series of
possible actions, depending on the board’s disposition (which is the environment’s
state) and the reward is given when winning or loosing the game.

General Methodology for Building Machine Learning

Models

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 9/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

Preprocessing:
It is one of the most crucial steps in any Machine Learning application. Usually data
comes in a format that is not optimal (or even inadequate) for the model to process it.
In that cases preprocessing is a mandatory task to do.

Many algorithms require the features to be on the same scale (for example: to be in the
[0,1] range) for optimizing its performance, and this is often done by applying
normalization or standardization techniques on the data.

We can also find in some cases that the selected features are correlated and therefore,
redundant for extracting meaningful information from them. Then we must use
dimensionality reduction techniques to compress the features to smaller dimensional
subspaces.

Finally, we’ll split randomly our original dataset into training and testing subsets.

Training and Selecting a Model

It is essential to compare a bunch of different algorithms in order to train and select the
best performing one. To do so, it is necessary to select a metric for measuring the
model’s performance. One commonly used in classification problems is classification
accuracy, which is the proportion of correctly classified instances. In regression
problems one of the most popular is Mean Squared Error (MSE), that measures the
average squared difference between the estimated values and the real values.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 10/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

Finally, we will use a technique called cross-validation to make sure that our model will
perform well on real-world data befiore using the testing subset for the final evaluation
of the model.

This technique divides the training dataset into smaller training and validating subsets,
then estimates the generalization ability of the model, in other words, estimating how
well it can predict outcomes when provided with new data. It then repeats the process,
K times and computes the average performance of the model by dividing the sum of the
metrics obtained between the K number of iterations.

In general, the default parameters of the machine learning algorithms provided by the
libraries are not the best ones to use with our data, so we will use hyperparameter
optimization techniques to help us to do the fine tunning of the model’s performance.

Evaluating Models and Predicting with New Data

Once we have selected and fitted a model to our training dataset, we can use the testing
dataset to estimate the performance on this unseen data, so we can make an estimation
of the generalization error of the model. Or evaluate it using some other metric.

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 11/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

If we are satisfied with the value of the metric obtained, we can use then the model to
make predictions on future data.

Wrap Up
In this article we learned what is Machine Learning painting a big picture of its nature,
motivation and applications.

We also learned some basic notations and terminology and the different kinds of
machine learning algorithms:

Supervised learning, with classification and regression problems.

Unsupervised learning, with clustering and dimensionality reduction.

Reinforcement learning, where the agent learn from its environment.

Deep learning and their artificial neuron networks.

Finally, we made an introduction to the typical methodology for building Machine

Learning models and explained its main tasks:

Preprocessing.

Training and testing.

Selecting a model.

Evaluating.

As stated on the begining of the article, this is the first of a series of articles, and was a
general introduction. This will be an exciting journey as we will learn how to apply
many powerful techniques.

This will be a technichal series and therefore, we will explore some calculus, linear
algebra, statistics and python concepts, as they will be necessary to understand the
main concepts and how the algorithms work. But do not worry, we will make a gently
approach to all this concepts so, even if you do not have a technichal background you
will not feel overwhelmed.

In the next posts it will be explained how to set up a python programming

environment, with the appropiate libraries. And then, we will be ready to start off by

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 12/13
2/2/2020 Machine Learning Introduction: A Comprehensive Guide

making a deep study of supervised learning.

Stay tuned!

Machine Learning Deep Learning AI Supervised Learning Unsupervised Learning

About Help Legal

https://towardsdatascience.com/machine-learning-introduction-a-comprehensive-guide-af6712cf68a3 13/13

Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Machine Learning PPT For Students
70% (10)
Machine Learning PPT For Students
18 pages
Puppeteer Tutorial
100% (1)
Puppeteer Tutorial
128 pages
Open Ai
100% (2)
Open Ai
23 pages
TOP 21 DATA SCIENCE PROJECTS - Part 1
No ratings yet
TOP 21 DATA SCIENCE PROJECTS - Part 1
6 pages
Are View On The Applications of PLC S
No ratings yet
Are View On The Applications of PLC S
22 pages
1 My First Perceptron With Python Eric Joel Barragan Gonzalez (WWW - Ebook DL - Com)
No ratings yet
1 My First Perceptron With Python Eric Joel Barragan Gonzalez (WWW - Ebook DL - Com)
96 pages
PDF Machine Learning and Optimization for Engineering Design 1st Edition Apoorva S. Shastri download
100% (2)
PDF Machine Learning and Optimization for Engineering Design 1st Edition Apoorva S. Shastri download
40 pages
Linux Fundamentals
100% (3)
Linux Fundamentals
86 pages
GPU Architecture
No ratings yet
GPU Architecture
17 pages
SSSV Project Report Final
No ratings yet
SSSV Project Report Final
62 pages
An Artificial Intelligence Based Neuro Fuzzy Sym
No ratings yet
An Artificial Intelligence Based Neuro Fuzzy Sym
6 pages
Openhpc (V1.3.9) Cluster Building Recipes: Sles12Sp4 Base Os Warewulf/Slurm Edition For Linux (X86 64)
No ratings yet
Openhpc (V1.3.9) Cluster Building Recipes: Sles12Sp4 Base Os Warewulf/Slurm Edition For Linux (X86 64)
63 pages
Install - Guide Rocky8 Warewulf SLURM 2.5 Aarch64
No ratings yet
Install - Guide Rocky8 Warewulf SLURM 2.5 Aarch64
43 pages
SYMBIAN OS Report
No ratings yet
SYMBIAN OS Report
25 pages
Scikit Learn Laboratory Readthedocs Io en Latest
No ratings yet
Scikit Learn Laboratory Readthedocs Io en Latest
117 pages
Movidius Neural Computer Stick
No ratings yet
Movidius Neural Computer Stick
33 pages
Machine Learning Lecture - 2 and Lecture - 3
No ratings yet
Machine Learning Lecture - 2 and Lecture - 3
59 pages
Medical Image Fusion Using Deep Learning Mechanism
No ratings yet
Medical Image Fusion Using Deep Learning Mechanism
11 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
Knime: Presented By-Jaimini Solanki Suchita Mishra Stuti Smart
No ratings yet
Knime: Presented By-Jaimini Solanki Suchita Mishra Stuti Smart
7 pages
Slurm Talk
No ratings yet
Slurm Talk
40 pages
EEE1007 Neural Network and Fuzzy Control
No ratings yet
EEE1007 Neural Network and Fuzzy Control
2 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
Gerdelan Anton - Professional Programming Tools for C and C++ (2020)
No ratings yet
Gerdelan Anton - Professional Programming Tools for C and C++ (2020)
152 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
Group 39
No ratings yet
Group 39
6 pages
Machine Learning Interview Guide
No ratings yet
Machine Learning Interview Guide
41 pages
How To Do Deep Learning With SAS: Title
No ratings yet
How To Do Deep Learning With SAS: Title
16 pages
Final Project ML Report
No ratings yet
Final Project ML Report
6 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
712 A Guide To Turi Create PDF
100% (1)
712 A Guide To Turi Create PDF
119 pages
Analytics Platform Installation Guide-Knime
No ratings yet
Analytics Platform Installation Guide-Knime
21 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
100% (1)
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
22 pages
AHDAdv Cust Guide
No ratings yet
AHDAdv Cust Guide
361 pages
Oops Through Java by Vikram Dunga
No ratings yet
Oops Through Java by Vikram Dunga
165 pages
Emotion Based Music Recommendation System
No ratings yet
Emotion Based Music Recommendation System
7 pages
Deep Learning
No ratings yet
Deep Learning
34 pages
Combining Multiple Sources of Knowledge in Deep Cnns For Action Recognition
No ratings yet
Combining Multiple Sources of Knowledge in Deep Cnns For Action Recognition
8 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
Egomotion Estimation Using Visual Odometry
No ratings yet
Egomotion Estimation Using Visual Odometry
40 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
Linux Programming and Data Mining Lab Manual
No ratings yet
Linux Programming and Data Mining Lab Manual
97 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
Hotelbooking- Documentation (1)
No ratings yet
Hotelbooking- Documentation (1)
65 pages
What Is Artificial Intelligence
100% (1)
What Is Artificial Intelligence
27 pages
Efficient Extraction of Deep Image Features Using Convolutional Neural
No ratings yet
Efficient Extraction of Deep Image Features Using Convolutional Neural
12 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Tracemetrics Tutorial
No ratings yet
Tracemetrics Tutorial
41 pages
Social Media Analytics for User Behavior Modeling-A Task Heterogeneity Perspective 1st Edition Arun Reddy Nelakurthi (Author) All Chapters Instant Download
100% (1)
Social Media Analytics for User Behavior Modeling-A Task Heterogeneity Perspective 1st Edition Arun Reddy Nelakurthi (Author) All Chapters Instant Download
43 pages
Thin Features: HK902S E.00
No ratings yet
Thin Features: HK902S E.00
21 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
No ratings yet
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
12 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Artificial Intelligence and Machine Learning in Business
No ratings yet
Artificial Intelligence and Machine Learning in Business
5 pages
Machine Learning For Absolute Beginne... (Z-Library)
No ratings yet
Machine Learning For Absolute Beginne... (Z-Library)
150 pages
Concurrent Activity Recognition With Multimodal CNN-LSTM Structure
No ratings yet
Concurrent Activity Recognition With Multimodal CNN-LSTM Structure
14 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Machine Learning Manual
100% (1)
Machine Learning Manual
81 pages
Week 7 Assignment AVD
No ratings yet
Week 7 Assignment AVD
2 pages
Artificial Intelligence AI Report Mun
No ratings yet
Artificial Intelligence AI Report Mun
11 pages
Rudolf
No ratings yet
Rudolf
95 pages
Module 1 - Session - 1 - Intro To AI
No ratings yet
Module 1 - Session - 1 - Intro To AI
18 pages
Nano
100% (1)
Nano
47 pages
AI Unit I
No ratings yet
AI Unit I
10 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
Group Project 1
No ratings yet
Group Project 1
3 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
Machine Learning Engineer Nanodegree: Capstone Proposal
No ratings yet
Machine Learning Engineer Nanodegree: Capstone Proposal
11 pages
AI Brochure - PDF 3
No ratings yet
AI Brochure - PDF 3
8 pages
Advances in Robot Kinematics
No ratings yet
Advances in Robot Kinematics
1 page
Project Presentation Viva Question and Answers
No ratings yet
Project Presentation Viva Question and Answers
4 pages
Artificial Neural Network Unit-3
No ratings yet
Artificial Neural Network Unit-3
2 pages
CJLee35 (APL)
No ratings yet
CJLee35 (APL)
3 pages
11 - Vietnamese Text Classification and Sentiment Based
No ratings yet
11 - Vietnamese Text Classification and Sentiment Based
3 pages
EfficientNet Tutorial
No ratings yet
EfficientNet Tutorial
20 pages
No NIK My Orange Status Nama Lengkap Karyawan Tanggal Kejadian
No ratings yet
No NIK My Orange Status Nama Lengkap Karyawan Tanggal Kejadian
27 pages
Introduction To Intelligent Systems: Duration: 1 HR Outline
No ratings yet
Introduction To Intelligent Systems: Duration: 1 HR Outline
10 pages
My Report CW 1
No ratings yet
My Report CW 1
13 pages
AEIOU Canvas
0% (1)
AEIOU Canvas
1 page
Embedded List
No ratings yet
Embedded List
4 pages
Y-Xiaomi LCD List 21.11.18
No ratings yet
Y-Xiaomi LCD List 21.11.18
2 pages
Nanoelectronics - Vtu
50% (2)
Nanoelectronics - Vtu
2 pages
Artificial Intelligence & Machine Learning Lab With Applications
No ratings yet
Artificial Intelligence & Machine Learning Lab With Applications
6 pages
Sts - Activity 8
No ratings yet
Sts - Activity 8
2 pages
Pattern Recognition: Talal A. Alsubaie Sfda
No ratings yet
Pattern Recognition: Talal A. Alsubaie Sfda
40 pages
DL MCQ
No ratings yet
DL MCQ
13 pages