ML Links

1.
Introduction
- https://www.javatpoint.com/machine-learning
- https://www.javatpoint.com/applications-of-machine-learning
- https://www.javatpoint.com/machine-learning-life-cycle
- https://www.javatpoint.com/difference-between-artificial-intelligence-and-machine-
learning
- https://www.geeksforgeeks.org/kdd-process-in-data-mining/
2. Supervised Learning
- https://www.javatpoint.com/supervised-machine-learning
- https://blog.dataiku.com/top-machine-learning-algorithms-how-they-work-in-plain-
english-1
- https://www.scaler.com/topics/linear-models-in-machine-learning/
- https://www.techtarget.com/searchenterpriseai/definition/supervised-learning
3. Unsupervised Learning
- https://www.guru99.com/unsupervised-machine-learning.html
- https://www.javatpoint.com/unsupervised-machine-learning
- https://www.javatpoint.com/clustering-in-machine
learning#:~:text=Clustering%20or%20cluster%20analysis%20is,consisting%20of%20similar
%20d
ata%20points.
- https://www.geeksforgeeks.org/what-is-reinforcement-learning/
4. Artificial Neural Network (ANN) (Use the Slide and the below URLS)
- https://www.analyticsvidhya.com/blog/2021/05/beginners-guide-to-artificial-neural-
network/
- https://medium.com/analytics-vidhya/how-do-neural-networks-really-work-in-the-deep
learning-72f0e8c4c419
- https://www.nickmccullum.com/python-deep-learning/how-do-neural-networks-really-
work/
- https://towardsdatascience.com/a-beginner-friendly-explanation-of-how-neural-networks
work-55064db60df4
- https://www.datacamp.com/tutorial/convolutional-neural-networks-python
- https://www.analyticsvidhya.com/blog/2021/08/beginners-guide-to-convolutional-neural
network-with-implementation-in
python/#:~:text=Convolutional%20Neural%20Network%20is%20a,based%20on%20the
%20learn
ed%20features.
- https://medium.com/machine-learning-researcher/convlutional-neural-network-cnn-
2fc4faa7bb63
5. Model Evaluation (Use the Slide only)
Machine Learning Tutorial
Machine Learning tutorial provides basic and advanced concepts of

machine learning. Our machine learning tutorial is designed for
students and working professionals.
Machine learning is a growing technology which enables computers

to learn automatically from past data. Machine learning uses various
algorithms for building mathematical models and making
predictions using historical data or information. Currently, it is
being used for various tasks such as image recognition, speech
recognition, email filtering, Facebook auto-
tagging, recommender system, and many more.
This machine learning tutorial gives you an introduction to machine

learning along with the wide range of machine learning techniques
such as Supervised, Unsupervised,
and Reinforcement learning. You will learn about regression and
classification models, clustering methods, hidden Markov models,
and various sequential models.
What is Machine Learning

In the real world, we are surrounded by humans who can learn
everything from their experiences with their learning capability, and
we have computers or machines which work on our instructions. But
can a machine also learn from experiences or past data like a
human does? So here comes the role of Machine Learning.
Play Video
Machine Learning is said as a subset of artificial intelligence that

is mainly concerned with the development of algorithms which allow
a computer to learn from the data and past experiences on their
own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:
Machine learning enables a machine to automatically learn

from data, improve performance from experiences, and
predict things without being explicitly programmed.
With the help of sample historical data, which is known as training

data, machine learning algorithms build a mathematical
model that helps in making predictions or decisions without being
explicitly programmed. Machine learning brings computer science
and statistics together for creating predictive models. Machine
learning constructs or uses the algorithms that learn from historical
data. The more we will provide the information, the higher will be
the performance.
A machine has the ability to learn if it can improve its

performance by gaining more data.
How does Machine Learning work

A Machine Learning system learns from historical data, builds
the prediction models, and whenever it receives new data,
predicts the output for it. The accuracy of predicted output
depends upon the amount of data, as the huge amount of data
helps to build a better model which predicts the output more
accurately.
Suppose we have a complex problem, where we need to perform

some predictions, so instead of writing a code for it, we just need to
feed the data to generic algorithms, and with the help of these
algorithms, machine builds the logic as per the data and predict the
output. Machine learning has changed our way of thinking about the
problem. The below block diagram explains the working of Machine
Learning algorithm:
Features of Machine Learning:

o Machine learning uses data to detect various patterns in a
given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also
deals with the huge amount of the data.
Need for Machine Learning

The need for machine learning is increasing day by day. The reason
behind the need for machine learning is that it is capable of doing
tasks that are too complex for a person to implement directly. As a
human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer
systems and here comes the machine learning to make things easy
for us.
We can train machine learning algorithms by providing them the

huge amount of data and let them explore the data, construct the
models, and predict the required output automatically. The
performance of the machine learning algorithm depends on the
amount of data, and it can be determined by the cost function. With
the help of machine learning, we can save both time and money.
The importance of machine learning can be easily understood by its

uses cases, Currently, machine learning is used in self-driving
cars, cyber fraud detection, face recognition, and friend
suggestion by Facebook, etc. Various top companies such as
Netflix and Amazon have build machine learning models that are
using a vast amount of data to analyze the user interest and
recommend product accordingly.
Following are some key points which show the importance of

Machine Learning:
o Rapid increment in the production of data

o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information
from data.
Classification of Machine Learning

At a broad level, machine learning can be classified into three types:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which
we provide sample labeled data to the machine learning system in
order to train it, and on that basis, it predicts the output.
The system creates a model using labeled data to understand the

datasets and learn about each data, once the training and
processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output
data. The supervised learning is based on supervision, and it is the
same as when a student learns things in the supervision of the
teacher. The example of supervised learning is spam filtering.
Supervised learning can be grouped further in two categories of

algorithms:
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine
learns without any supervision.
The training is provided to the machine with the set of data that has
not been labeled, classified, or categorized, and the algorithm needs
to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The

machine tries to find useful insights from the huge amount of data.
It can be further classifieds into two categories of algorithms:
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in
which a learning agent gets a reward for each right action and gets
a penalty for each wrong action. The agent learns automatically with
these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it.
The goal of an agent is to get the most reward points, and hence, it
improves its performance.
The robotic dog, which automatically learns the movement of his

arms, is an example of Reinforcement learning.
Note: We will learn about the above types of machine learning in

detail in later chapters.
History of Machine Learning

Before some years (about 40-50 years), machine learning was
science fiction, but today it is the part of our daily life. Machine
learning is making our day to day life easy from self-driving
cars to Amazon virtual assistant "Alexa". However, the idea
behind machine learning is so old and has a long history. Below
some milestones are given which have occurred in the history of
machine learning:
The early history of Machine Learning (Pre-1940):
o 1834: In 1834, Charles Babbage, the father of the computer,

conceived a device that could be programmed with punch
cards. However, the machine was never built, but all modern
computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine
can determine and execute a set of instructions.
The era of stored program computers:
o 1940: In 1940, the first manually operated computer, "ENIAC"

was invented, which was the first electronic general-purpose
computer. After that stored program computer such as EDSAC
in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an
electrical circuit. In 1950, the scientists started applying their
idea to work and analyzed how human neurons might work.
Computer machinery and intelligence:

o 1950: In 1950, Alan Turing published a seminal paper,
"Computer Machinery and Intelligence," on the topic of
artificial intelligence. In his paper, he asked, "Can
machines think?"
Machine intelligence in Games:
o 1952: Arthur Samuel, who was the pioneer of machine

learning, created a program that helped an IBM computer to
play a checkers game. It performed better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined
by Arthur Samuel.
The first "AI" winter:
o The duration of 1974 to 1980 was the tough time for AI and
ML researchers, and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and
people had reduced their interest from AI, which led to
reduced funding by the government to the researches.
Machine Learning from theory to reality
o 1959: In 1959, the first neural network was applied to a real-

world problem to remove echoes over phone lines using an
adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg
invented a neural network NETtalk, which was able to teach
itself how to correctly pronounce 20,000 words in one week.
o 1997: The IBM's Deep blue intelligent computer won the
chess game against the chess expert Garry Kasparov, and it
became the first computer which had beaten a human chess
expert.
Machine Learning at 21st century
o 2006: In the year 2006, computer scientist Geoffrey Hinton

has given a new name to neural net research as "deep
learning," and nowadays, it has become one of the most
trending technologies.
o 2012: In 2012, Google created a deep neural network which
learned to recognize the image of humans and cats in
YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the
Turing Test. It was the first Chabot who convinced the 33% of
human judges that it was not a machine.
o 2014: DeepFace was a deep neural network created by
Facebook, and they claimed that it could recognize a person
with the same precision as a human can do.
o 2016: AlphaGo beat the world's number second player Lee
sedol at Go game. In 2017 it beat the number one player of
this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent
system that was able to learn the online trolling. It used to
read millions of comments of different websites to learn to
stop online trolling.
Machine Learning at present:

Now machine learning has got a great advancement in its research,
and it is present everywhere around us, such as self-driving
cars, Amazon Alexa, Catboats, recommender system, and
many more. It includes Supervised, unsupervised,
and reinforcement learning with
clustering, classification, decision tree, SVM algorithms, etc.
Modern machine learning models can be used for making various

predictions, including weather prediction, disease
prediction, stock market analysis, etc.
Prerequisites
Before learning machine learning, you must have the basic
knowledge of followings so that you can easily understand the
concepts of machine learning:
o Fundamental knowledge of probability and linear algebra.

o The ability to code in any computer language, especially in
Python language.
o Knowledge of Calculus, especially derivatives of single
variable and multivariate functions.
Audience
Our Machine learning tutorial is designed to help beginner and
professionals.
Problems
We assure you that you will not find any difficulty while learning our
Machine learning tutorial. But if there is any mistake in this tutorial,
kindly post the problem or error in the contact form so that we can
improve it.
Applications of Machine learning
Machine learning is a buzzword for today's technology, and it is
growing very rapidly day by day. We are using machine learning in
our daily life even without knowing it such as Google Maps, Google
assistant, Alexa, etc. Below are some most trending real-world
applications of Machine Learning:
1. Image Recognition:
Image recognition is one of the most common applications of
machine learning. It is used to identify objects, persons, places,
digital images, etc. The popular use case of image recognition and
face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion.

Whenever we upload a photo with our Facebook friends, then we
automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face
detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is

responsible for face recognition and person identification in the
picture.
Play Video
2. Speech Recognition
While using Google, we get an option of "Search by voice," it
comes under speech recognition, and it's a popular application of
machine learning.
Speech recognition is a process of converting voice instructions into

text, and it is also known as "Speech to text", or "Computer
speech recognition." At present, machine learning algorithms are
widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which
shows us the correct path with the shortest route and predicts the
traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared,

slow-moving, or heavily congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and
sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it
better. It takes information from the user and sends back to its
database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and
entertainment companies such as Amazon, Netflix, etc., for
product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for
the same product while internet surfing on the same browser and
this is because of machine learning.
Google understands the user interest using various machine

learning algorithms and suggests the product as per customer
interest.
As similar, when we use Netflix, we find some recommendations for

entertainment series, movies, etc., and this is also done with the
help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-
driving cars. Machine learning plays a significant role in self-driving
cars. Tesla, the most popular car manufacturing company is working
on self-driving car. It is using unsupervised learning method to train
the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as
important, normal, and spam. We always receive an important mail
in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning.
Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer

Perceptron, Decision tree, and Naïve Bayes classifier are used
for email spam filtering and malware detection.
7. Virtual Personal Assistant:
We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help
us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions
such as Play music, call someone, Open an email, Scheduling an
appointment, etc.
These virtual assistants use machine learning algorithms as an

important part.
These assistant record our voice instructions, send it over the server
on a cloud, and decode it using ML algorithms and act accordingly.
8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure
by detecting fraud transaction. Whenever we perform some online
transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction. So to detect
this, Feed Forward Neural network helps us by checking whether
it is a genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some

hash values, and these values become the input for the next round.
For each genuine transaction, there is a specific pattern which gets
change for the fraud transaction hence, it detects it and makes our
online transactions more secure.
9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock
market, there is always a risk of up and downs in shares, so for this
machine learning's long short term memory neural network is
used for the prediction of stock market trends.
10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses.
With this, medical technology is growing very fast and able to build
3D models that can predict the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases

easily.
11. Automatic Language Translation:

Nowadays, if we visit a new place and we are not aware of the
language then it is not a problem at all, as for this also machine
learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text
into our familiar language, and it called as automatic translation.
The technology behind the automatic translation is a sequence to

sequence learning algorithm, which is used with image recognition
and translates the text from one language to another language.
Machine learning Life cycle

Machine learning has given the computer systems the abilities to
automatically learn without being explicitly programmed. But how
does a machine learning system work? So, it can be described using
the life cycle of machine learning. Machine learning life cycle is a
cyclic process to build an efficient machine learning project. The
main purpose of the life cycle is to find a solution to the problem or
project.
Machine learning life cycle involves seven major steps, which are
given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand

the problem and to know the purpose of the problem. Therefore,
before starting the life cycle, we need to understand the problem
because the good result depends on the better understanding of the
problem.
In the complete life cycle process, to solve a problem, we create a

machine learning system called "model", and this model is created
by providing "training". But to train a model, we need data, hence,
life cycle starts by collecting data.
23.9K
Machine Learning - Preprocessing Structured Data - Detecting Outliers
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle.
The goal of this step is to identify and obtain all data-related
problems.
In this step, we need to identify the different data sources, as data

can be collected from various sources such
as files, database, internet, or mobile devices. It is one of the
most important steps of the life cycle. The quantity and quality of
the collected data will determine the efficiency of the output. The
more will be the data, the more accurate will be the prediction.
This step includes the below tasks:
o Identify various data sources

o Collect data
o Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also

called as a dataset. It will be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps.
Data preparation is a step where we put our data into a suitable
place and prepare it to use in our machine learning training.
In this step, first, we put all data together, and then randomize the
ordering of data.
This step can be further divided into two processes:
o Data exploration:
It is used to understand the nature of data that we have to
work with. We need to understand the characteristics, format,
and quality of data.
A better understanding of data leads to an effective outcome.
In this, we find Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data
into a useable format. It is the process of cleaning the data,
selecting the variable to use, and transforming the data in a proper
format to make it more suitable for analysis in the next step. It is
one of the most important steps of the complete process. Cleaning
of data is required to address the quality issues.
It is not necessary that data we have collected is always of our use

as some of the data may not be useful. In real-world applications,
collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
So, we use various filtering techniques to clean the data.
It is mandatory to detect and remove the above issues because it

can negatively affect the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis
step. This step involves:
o Selection of analytical techniques

o Building models
o Review the result
The aim of this step is to build a machine learning model to analyze

the data using various analytical techniques and review the
outcome. It starts with the determination of the type of the
problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association,
etc. then build the model using prepared data, and evaluate the
model.
Hence, in this step, we take the data and use machine learning
algorithms to build the model.
5. Train Model
Now the next step is to train the model, in this step we train our
model to improve its performance for better outcome of the
problem.
We use datasets to train the model using various machine learning

algorithms. Training a model is required so that it can understand
the various patterns, rules, and, features.
6. Test Model
Once our machine learning model has been trained on a given
dataset, then we test the model. In this step, we check for the
accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model

as per the requirement of project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
If the above-prepared model is producing an accurate result as per

our requirement with acceptable speed, then we deploy the model
in the real system. But before deploying the project, we will check
whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a
project.
Difference between Artificial intelligence
and Machine learning
Artificial intelligence and machine learning are the part of computer
science that are correlated with each other. These two technologies
are the most trending technologies which are used for creating
intelligent systems.
Although these are two related technologies and sometimes people

use them as a synonym for each other, but still both are the two
different terms in various cases.
On a broad level, we can differentiate both AI and ML as:
AI is a bigger concept to create intelligent machines that can

simulate human thinking capability and behavior, whereas,
machine learning is an application or subset of AI that allows
machines to learn from data without being programmed
explicitly.
Below are some main differences between AI and machine learning
along with the overview of Artificial intelligence and machine
learning.
Play Video
Artificial Intelligence
Artificial intelligence is a field of computer science which makes a
computer system that can mimic human intelligence. It is comprised
of two words "Artificial" and "intelligence", which means "a
human-made thinking power." Hence we can define it as,
Artificial intelligence is a technology using which we can create

intelligent systems that can simulate human intelligence.
The Artificial intelligence system does not require to be pre-
programmed, instead of that, they use such algorithms which can
work with their own intelligence. It involves machine learning
algorithms such as Reinforcement learning algorithm and deep
learning neural networks. AI is being used in multiple places such as
Siri, Google?s AlphaGo, AI in Chess playing, etc.
Based on capabilities, AI can be classified into three types:
o Weak AI
o General AI
o Strong AI
Currently, we are working with weak AI and general AI. The future of
AI is Strong AI for which it is said that it will be intelligent than
humans.
Machine learning
Machine learning is about extracting knowledge from the data. It
can be defined as,
Machine learning is a subfield of artificial intelligence, which

enables machines to learn from past data or experiences
without being explicitly programmed.
Machine learning enables a computer system to make predictions or

take some decisions using historical data without being explicitly
programmed. Machine learning uses a massive amount of
structured and semi-structured data so that a machine learning
model can generate accurate result or give predictions based on
that data.
Machine learning works on algorithm which learn by it?s own using

historical data. It works only for specific domains such as if we are
creating a machine learning model to detect pictures of dogs, it will
only give result for dog images, but if we provide a new data like cat
image then it will become unresponsive. Machine learning is being
used in various places such as for online recommender system, for
Google search algorithms, Email spam filter, Facebook Auto friend
tagging suggestion, etc.
It can be divided into three types:
o Supervised learning
o Reinforcement learning
o Unsupervised learning
Key differences between Artificial Intelligence

(AI) and Machine learning (ML):
Artificial Intelligence Machine learning
Artificial intelligence is a Machine learning is a subset of AI which allows a machine to a

technology which enables a past data without programming explicitly.
machine to simulate human
behavior.
The goal of AI is to make a The goal of ML is to allow machines to learn from data so that
smart computer system like output.
humans to solve complex
problems.
In AI, we make intelligent In ML, we teach machines with data to perform a particular tas
systems to perform any task result.
like a human.
Machine learning and deep Deep learning is a main subset of machine learning.
learning are the two main
subsets of AI.
AI has a very wide range of Machine learning has a limited scope.

scope.
AI is working to create an Machine learning is working to create machines that can perform
intelligent system which can for which they are trained.
perform various complex
tasks.
AI system is concerned Machine learning is mainly concerned about accuracy and pattern
about maximizing the
chances of success.
The main applications of AI The main applications of machine learning are Online recomm
are Siri, customer support search algorithms, Facebook auto friend tagging suggestio
using catboats, Expert
System, Online game
playing, intelligent
humanoid robot, etc.
On the basis of capabilities, Machine learning can also be divided into mainly three type
AI can be divided into three learning, Unsupervised learning, and Reinforcement learnin
types, which are, Weak
AI, General AI, and Strong
AI.
It includes learning, It includes learning and self-correction when introduced with new
reasoning, and self-
correction.
AI completely deals with Machine learning deals with Structured and semi-structured data.
Structured, semi-structured,
and unstructured data.
KDD Process in Data Mining

 Difficulty Level : Easy
 Last Updated : 15 Feb, 2023
· Read
· Discuss
Data Mining – Knowledge Discovery in Databases(KDD).
KDD (Knowledge Discovery in Databases) is a process that
involves the extraction of useful, previously unknown, and
potentially valuable information from large datasets. The KDD
process in data mining typically involves the following steps:
1. Selection: Select a relevant subset of the data for analysis.
2. Pre-processing: Clean and transform the data to make it
ready for analysis. This may include tasks such as data
normalization, missing value handling, and data integration.
3. Transformation: Transform the data into a format suitable
for data mining, such as a matrix or a graph.
4. Data Mining: Apply data mining techniques and algorithms to
the data to extract useful information and insights. This may
include tasks such as clustering, classification, association rule
mining, and anomaly detection.
5. Interpretation: Interpret the results and extract knowledge
from the data. This may include tasks such as visualizing the
results, evaluating the quality of the discovered patterns and
identifying relationships and associations among the data.
6. Evaluation: Evaluate the results to ensure that the extracted
knowledge is useful, accurate, and meaningful.
7. Deployment: Use the discovered knowledge to solve the
business problem and make decisions.
The KDD process is an iterative process and it requires multiple
iterations of the above steps to extract accurate knowledge from
the data.
Why do we need Data Mining?
Volume of information is increasing everyday than we can handle
from business transactions, scientific data, sensor data, Pictures,
videos, etc. So, we need a system that will be capable of
extracting essence of information available and that can
automatically generate report,
views or summary of data for better decision-making.
Why Data Mining is used in Business?
Data mining is used in business to make better managerial
decisions by:
 Automatic summarization of data

 Extracting essence of information stored.
 Discovering patterns in raw data.
Data Mining also known as Knowledge Discovery in Databases,
refers to the nontrivial extraction of implicit, previously unknown
and potentially useful information from data stored in databases.
Steps Involved in KDD Process:

KDD process
1. Data Cleaning: Data cleaning is defined as removal of noisy

and irrelevant data from collection.
 Cleaning in case of Missing values.
 Cleaning noisy data, where noise is a random or variance
error.
 Cleaning with Data discrepancy detection and Data
transformation tools.
2. Data Integration: Data integration is defined as
heterogeneous data from multiple sources combined in a
common source(DataWarehouse).
 Data integration using Data Migration tools.
 Data integration using Data Synchronization tools.
 Data integration using ETL(Extract-Load-Transformation)
process.
3. Data Selection: Data selection is defined as the process
where data relevant to the analysis is decided and retrieved
from the data collection.
 Data selection using Neural network.
 Data selection using Decision Trees.
 Data selection using Naive bayes.
 Data selection using Clustering, Regression, etc.
4. Data Transformation: Data Transformation is defined as the
process of transforming data into appropriate form required by
mining procedure.
Data Transformation is a two step process:
5.
 Data Mapping: Assigning elements from source base to
destination to capture transformations.
 Code generation: Creation of the actual transformation
program.
6. Data Mining: Data mining is defined as clever techniques
that are applied to extract patterns potentially useful.
 Transforms task relevant data into patterns.
 Decides purpose of model
using classification or characterization.
7. Pattern Evaluation: Pattern Evaluation is defined as
identifying strictly increasing patterns representing knowledge
based on given measures.
 Find interestingness score of each pattern.
 Uses summarization and Visualization to make data
understandable by user.
8. Knowledge representation: Knowledge representation is
defined as technique which utilizes visualization tools to
represent data mining results.
 Generate reports.
 Generate tables.
 Generate discriminant rules, classification
rules, characterization rules, etc.
Note:
 KDD is an iterative process where evaluation measures can

be enhanced, mining can be refined, new data can be
integrated and transformed in order to get different and more
appropriate results.
 Preprocessing of databases consists of Data
cleaning and Data Integration.
ADVANTAGES OR DISADVANTAGES:
Advantages of KDD:
1. Improves decision-making: KDD provides valuable insights and

knowledge that can help organizations make better decisions.
2. Increased efficiency: KDD automates repetitive and time-
consuming tasks and makes the data ready for analysis, which
saves time and money.
3. Better customer service: KDD helps organizations gain a better
understanding of their customers’ needs and preferences,
which can help them provide better customer service.
4. Fraud detection: KDD can be used to detect fraudulent
activities by identifying patterns and anomalies in the data
that may indicate fraud.
5. Predictive modeling: KDD can be used to build predictive
models that can forecast future trends and patterns.
Disadvantages of KDD:
1. Privacy concerns: KDD can raise privacy concerns as it

involves collecting and analyzing large amounts of data, which
can include sensitive information about individuals.
2. Complexity: KDD can be a complex process that requires
specialized skills and knowledge to implement and interpret
the results.
3. Unintended consequences: KDD can lead to unintended
consequences, such as bias or discrimination, if the data or
models are not properly understood or used.
4. Data Quality: KDD process heavily depends on the quality of
data, if data is not accurate or consistent, the results can be
misleading
5. High cost: KDD can be an expensive process, requiring
significant investments in hardware, software, and personnel.
6. Overfitting: KDD process can lead to overfitting, which is a
common problem in machine learning where a model learns
the detail and noise in the training data to the extent that it
negatively impacts the performance of the model on new
unseen data.
Difference Between KDD and Data Mining
Parameter KDD Data Mining

Definition KDD refers to a process Data Mining refers to a
of identifying valid, process of extracting
novel, potentially useful, useful and valuable
and ultimately information or patterns
understandable patterns from large data sets.
and relationships in
data.
Objective To find useful knowledge To extract useful

from data. information from data.
Technique Data cleaning, data Association rules,

s Used integration, data classification,
selection, data clustering, regression,
transformation, data decision trees, neural
mining, pattern networks, and
evaluation, and dimensionality
knowledge reduction.
representation and
visualization.
Output Structured information, Patterns, associations,

such as rules and or insights that can be
models, that can be used used to improve
to make decisions or decision-making or
predictions. understanding.
Focus Focus is on the discovery Focus is on the

of useful knowledge, discovery of patterns or
rather than simply relationships in data.
finding patterns in data.
Role of Domain expertise is Domain expertise is less

domain important in KDD, as it critical in data mining,
expertise helps in defining the as the algorithms are
goals of the process, designed to identify
choosing appropriate patterns without relying
data, and interpreting on prior knowledge.
the results.
CHAPTER 2
Supervised Learning
Supervised Machine Learning
Supervised learning is the types of machine learning in which
machines are trained using well "labelled" training data, and on
basis of that data, machines predict the output. The labelled data
means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines

work as the supervisor that teaches the machines to predict the
output correctly. It applies the same concept as a student learns in
the supervision of the teacher.
Supervised learning is a process of providing input data as well as

correct output data to the machine learning model. The aim of a
supervised learning algorithm is to find a mapping function to
map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk
Assessment, Image classification, Fraud Detection, spam
filtering, etc.
Play Video
How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset,
where the model learns about each type of data. Once the training
process is completed, the model is tested on the basis of test data
(a subset of the training set), and then it predicts the output.
The working of Supervised learning can be easily understood by the

below example and diagram:
Suppose we have a dataset of different types of shapes which

includes square, rectangle, triangle, and Polygon. Now the first step
is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal,
then it will be labelled as a Square.
o If the given shape has three sides, then it will be labelled as
a triangle.
o If the given shape has six equal sides then it will be labelled
as hexagon.
Now, after training, we test our model using the test set, and the
task of the model is to identify the shape.
The machine is already trained on all types of shapes, and when it

finds a new shape, it classifies the shape on the bases of a number
of sides, and predicts the output.
Steps Involved in Supervised Learning:

o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset,
and validation dataset.
o Determine the input features of the training dataset, which
should have enough knowledge so that the model can
accurately predict the output.
o Determine the suitable algorithm for the model, such as
support vector machine, decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we
need validation sets as the control parameters, which are the
subset of training datasets.
o Evaluate the accuracy of the model by providing the test set.
If the model predicts the correct output, which means our
model is accurate.
Types of supervised Machine learning

Algorithms:
Supervised learning can be further divided into two types of
problems:
1. Regression
Regression algorithms are used if there is a relationship between

the input variable and the output variable. It is used for the
prediction of continuous variables, such as Weather forecasting,
Market Trends, etc. Below are some popular Regression algorithms
which come under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is

categorical, which means there are two classes such as Yes-No,
Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
Note: We will discuss these algorithms in detail in later chapters.
Advantages of Supervised learning:

o With the help of supervised learning, the model can predict
the output on the basis of prior experiences.
o In supervised learning, we can have an exact idea about the
classes of objects.
o Supervised learning model helps us to solve various real-world
problems such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:

o Supervised learning models are not suitable for handling the
complex tasks.
o Supervised learning cannot predict the correct output if the
test data is different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the
classes of object.
Machine
Learning and
Linear Models:
How They Work
(In Plain
English)
February 20, 2020
Data BasicsKatie Gross
This series will go over at a high level what machine learning is plus take a deeper
dive into some of the top algorithms and how they work — in plain English!
High Level: What

Is Machine
Learning?
Before we get into machine learning (ML), let’s take a step back and discuss artificial
intelligence (AI) more broadly. AI is actually just an umbrella term for any computer
program that does something smart that we previously thought only humans could do.
This can even include something as simple as a computer program that uses a set of
predefined rules to play checkers, although when we talk about AI today, we are
usually referring to more advanced applications.
→ Download Machine Learning Basics: An Illustrated Guide for Non-Technical

Readers
Specifically, we're usually talking about machine learning, which means teaching a
machine to learn from experience without explicitly programming it to do so. Deep
learning, another hot topic, is a subset of machine learning and has been largely
responsible for the AI boom of the last 10 years. In a nutshell, deep learning is an
advanced type of ML that can handle complex tasks like image and sound
recognition. We’ll discuss it in more detail in a later post.
One other thing worth mentioning about AI is that we sort of have this "Instagram vs.
reality" scenario, if you will. That is, the way AI is portrayed in pop culture is not
necessarily representative of where we’re at today. The examples of AI that we see in
the media are usually “Artificial General Intelligence” or “Strong AI,” which refer to
AI with the full intellectual capacity of a human, including thoughts and self-
awareness. Think “Westworld,” “The Terminator,” “Ex Machina,” etc.
The good news is you can sleep soundly tonight knowing that this does not currently
exist, and we’re probably still pretty far away from it — if it is even possible at all,
which is up for debate. The closest thing we have to Strong AI today is voice
assistants like Amazon's Alexa and Apple's Siri, but they’re pretty far away from
having thoughts and feelings, and there are obviously serious concerns around ever
creating AI with this level of human intelligence.
The AI/ML that we actually interact with in our day-to-day lives is usually “Weak
AI,” which means that it is programmed to do one specific task. This includes things
like credit card fraud detection, spam email classification, and movie
recommendations on Netflix.
We can break machine learning into two key subcategories:
 Supervised ML, which uses a set of input variables to predict the value of an output variable.
 Unsupervised ML, which infers patterns from an unlabeled dataset. Here, you aren’t trying to predict
anything, you’re just trying to understand patterns and groupings in the data.
We will focus on supervised ML in this post. The idea is that we will look at historical
data to train a model to learn the relationships between features, or variables, and
a target, the thing we’re trying to predict. This way, when new data comes in, we can
use the feature values to make a good prediction of the target, whose value we do not
yet know.
Supervised learning can be further split into regression (predicting numerical values)
and classification (predicting categorical values). Some algorithms can only be used
for regression, others only for classification, and many for both.
Algorithms 101
We hear — and talk — a lot about algorithms, but I find that the definition is
sometimes a bit of a blur. An algorithm is actually just a set of rules used to solve a
problem. If you’ve ever taken a simple BuzzFeed quiz to answer important questions
in your life, like what “Sound of Music” character matches your personality — you
may notice that it’s really just asking a series of questions and using some set logic to
generate an answer. Let’s explore the key categories of supervised learning
algorithms.
Many of the most popular supervised learning algorithms fall into three key
categories:
. Linear models, which use a simple formula to find a best-fit line through a set of data points.
. Tree-based models, which use a series of “if-then” rules to generate predictions from one or more
decision trees, similar to the BuzzFeed quiz example.
. Artificial neural networks, which are modeled after the way that neurons interact in the human brain
to interpret information and solve problems. This is also often referred to as deep learning.
We will look into each of these algorithm categories throughout the series, but this
post will focus on linear models.
Machine Learning
Algorithms in
Action: Practical
Examples
Let's say we're the owners of a candy store, Willy Wonka’s Candy, and we want to do
a better job of predicting how much our customers will spend this week, in order to
stock our shelves more appropriately. To get even more specific, let’s explore one
specific customer named George. George is a 65-year-old mechanic who has children
and spent $10 at our store last week. We’re going to try to predict the following:
 How much George will spend this week (hint: this is regression, because it is a dollar amount).
 Whether George will be a “high spender,” which we’ve defined as someone who will spend at least
$25 at Willy Wonka's Candy this week (hint: this is a classification, because we’re predicting a distinct
category, high spender or not).
Linear Models
So now let’s dive in and see how we can use a linear model. Remember, linear models
generate a formula to create a best-fit line to predict unknown values. Linear models
are considered “old school” and often not as predictive as newer algorithm classes,
but they can be trained relatively quickly and are generally more straightforward to
interpret, which can be a big plus!
We’ll explore two types of linear models:
. Linear regression, which is used for regression (numerical predictions).

. Logistic regression, which is used for classification (categorical predictions). Don’t get thrown off by
the word “regression” in the name. I know, it’s confusing. I didn’t make the rules.
Linear Regression
Okay, let’s imagine we have a simple model in which we’re trying to just use age to
predict how much George will spend at Willy Wonka’s Candy this week.
The data points we used to train our model are in blue. The red line is the line of best
fit, which the model generated, and captures the direction of those points as best as
possible.
Here, it looks like the older somebody is, the more money they will spend. We know
George is 65, so we’ll find 65 on the x-axis and follow the green dotted line up until
we meet the red “line of best fit.” Now we can follow the second dotted line across to
the y-axis, and land on our prediction — we would predict that George will spend $33
this week.
Where does this red “line of best fit” come from? Well, you may be familiar with the
formula y = mx + b, the formula for a straight line. This is the foundation of linear
regression. All we need to do is reformat a few variables, add an error term (e) to
account for randomness, and fill in our target ($ spent) and features (age).
We’ll train a model to learn the relationship between age and dollars spent this week
from past data points. Our model will determine the values of m1 and b that best
predict the dollars spent this week, given the age. We can easily add in more features,
such as has_kids, and the model will then learn the value of m2 as well.
In the real world, of course, building a straight line like this is usually not realistic, as
we often have more complex, non-linear relationships. We can manipulate our
features manually to deal with this, but that can be cumbersome, and we’ll often miss
out on some more complex relationships. However, the benefit is that it’s quite
straightforward to interpret — with a certain increase in age, we can expect a specific
corresponding increase in dollars spent.
Logistic Regression
Now, rather than trying to predict George’s exact spending, let’s just try to predict
whether or not George will be a high spender. We can use logistic regression, an
adaptation of linear regression for classification problems, to solve this.
The black dots at the top and bottom are the data points we used to train our model,
and the S-shaped line is the line of best fit.
You may have noticed that all data points in the above chart are either a 0 or a 1. This
is because each point is marked as either a low spender (0) or a high spender (1).
Now, we will use a logistic function to generate an S-shaped line of best fit, also
called a Sigmoid curve, to predict the likelihood of a data point belonging to one
category, in this case high spender. We also could have predicted the likelihood of
being a low spender, it doesn’t matter. We’ll then use a predefined threshold to make
a final prediction.
Let’s predict for George again — we’ll find 65 on the x-axis and then map it up to the
S-shaped line and then across. Now, we think there is a 60%chance that George is a
high spender. We’ll now use our threshold, which is indicated by the black dotted line
in the chart above, to decide whether we will predict that he is a high spender or not.
Our threshold is 50%, so since our point is above that line, we’ll predict that George is
a high spender. For this use case, a 50%threshold makes sense, but that’s not always
the case. For example, in the case of credit card fraud, a bank might only want to
predict that a transaction is fraudulent if they’re, say, 95%sure, so they don’t annoy
their customers by frequently declining valid transactions.
Recap
Machine learning is really all about using past data to either make predictions or
understand general groupings in your dataset. Linear models tend to be the simplest
class of algorithms, and work by generating a line of best fit. They’re not always as
accurate as newer algorithm classes, but are still used quite a bit, mostly because
they’re fast to train and fairly straightforward to interpret.
More and more often, analysts and business teams are breaking down the historically
high barrier of entry to AI. Whether you have coding experience or not, you can
expand your machine learning knowledge and learn to build the right model for a
given project.
We hope that you find this high-level overview of machine learning and linear models
helpful. Be on the lookout for future posts from this series discussing other families of
algorithms, including but not limited to tree-based models, neural networks, and
clustering.
What are Linear Models in Machine

Learning?
Challenge Inside! : Find out where you stand! Try quiz, solve problems &
win rewards!
Go to Challenge
Overview
The Linear Model is one of the most straightforward models in machine
learning. It is the building block for many complex machine learning
algorithms, including deep neural networks. Linear models predict the target
variable using a linear function of the input features. In this article, we will
cover two crucial linear models in machine learning: linear
regression and logistic regression. Linear regression is used for regression
tasks, whereas logistic regression is a classification algorithm. We will also
discuss some examples of the linear model, which has essential applications
in the industry.
Scope
 This article will cover linear models in machine learning.
 A brief discussion on linear regression and logistic regression will also
be presented here.
 Details on these topics will be covered in the subsequent article.
Introduction to Linear Models

The linear model is one of the most simple models in machine learning. It
assumes that the data is linearly separable and tries to learn the weight of
each feature. Mathematically, it can be written as �=��Y=WTX, where X
is the feature matrix, Y is the target variable, and W is the learned weight
vector. We apply a transformation function or a threshold for the classification
problem to convert the continuous-valued variable Y into a discrete category.
Here we will briefly learn linear and logistic regression, which are
the regression and classification task models, respectively.
SEE TABLE
Linear models in machine learning are easy to implement and interpret and
are helpful in solving many real-life use cases.
Types of Linear Models

Among many linear models, this article will cover linear regression and logistic
regression.
Linear Regression
Linear Regression is a statistical approach that predicts the result of a
response variable by combining numerous influencing factors. It attempts to
represent the linear connection between features (independent variables) and
the target (dependent variables). The cost function enables us to find the best
possible values for the model parameters. A detailed discussion on linear
regression is presented in a different article.
SEE TABLE
Example: An analyst would be interested in seeing how market movement

influences the price of ExxonMobil (XOM). The value of the S&P 500 index
will be the independent variable, or predictor, in this example, while the price
of XOM will be the dependent variable. In reality, various elements influence
an event's result. Hence, we usually have many independent features.
Logistic Regression
Logistic regression is an extension of linear regression. The sigmoid function

first transforms the linear regression output between 0 and 1. After that, a
predefined threshold helps to determine the probability of the output values.
The values higher than the threshold value tend towards having a probability
of 1, whereas values lower than the threshold value tend towards having a
probability of 0. A separate article dives deeper into the mathematics behind
the Logistic Regression Model.
SEE TABLE
Example: A bank wants to predict if a customer will default on their loan

based on their credit score and income. The independent variables would be
credit score and income, while the dependent variable would be whether the
customer defaults (1) or not (0).
Applications of Linear Models

Several real-life scenarios follow linear relations between dependent and
independent variables. Some of the examples are:
 The relationship between the boiling point of water and change in

altitude.
 The relationship between spending on advertising and the revenue of
an organization.
 The relationship between the amount of fertilizer used and crop yields.
 Performance of athletes and their training regimen.
SEE TABLE
Conclusion
 In this article, we have covered linear models in machine learning.
 Linear and logistic regression were also discussed in brief here.
 Some real-life applications of linear models are also presented here.
 A detailed discussion on linear and logistic regression will be presented
in a subsequent article.
supervised learning
What is supervised learning?
Supervised learning is an approach to creating artificial intelligence (AI),
where a computer algorithm is trained on input data that has been labeled for
a particular output. The model is trained until it can detect the underlying
patterns and relationships between the input data and the output labels,
enabling it to yield accurate labeling results when presented with never-
before-seen data.
Supervised learning is good at classification and regression problems, such

as determining what category a news article belongs to or predicting the
volume of sales for a given future date. In supervised learning, the aim is to
make sense of data within the context of a specific question.
In contrast to supervised learning is unsupervised learning. In this approach,

the algorithm is presented with unlabeled data and is designed to detect
patterns or similarities on its own, a process described in more detail below.
How does supervised learning work?

Like all machine learning algorithms, supervised learning is based on
training. During its training phase, the system is fed with labeled data sets,
which instruct the system what output is related to each specific input value.
The trained model is then presented with test data: This is data that has been
labeled, but the labels have not been revealed to the algorithm. The aim of
the testing data is to measure how accurately the algorithm will perform on
unlabeled data.
THIS ARTICLE IS PART OF
In-depth guide to machine learning in the enterprise

 Which also includes:
 Learn the business value of AI's various techniques
 10 common uses for machine learning applications in business
 6 ways to reduce different types of bias in machine learning
DOWNLOAD 1
Download this entire guide for FREE now!
In neural network algorithms, the supervised learning process is improved

by constantly measuring the resulting outputs of the model and fine-tuning
the system to get closer to its target accuracy. The level of accuracy
obtainable depends on two things: the available labeled data and the
algorithm that is used. In addition:
 Training data must be balanced and cleaned. Garbage or duplicate data

will skew the AI's understanding -- hence, data scientists must be
careful with the data the model is trained on.
 The diversity of the data determines how well the AI will perform when
presented with new cases; if there are not enough samples in the
training data set, the model will falter and fail to yield reliable answers.
 High accuracy, paradoxically, is not necessarily a good indication; it
could also mean the model is suffering from overfitting -- i.e., it is
overtuned to its particular training data set. Such a data set might
perform well in test scenarios but fail miserably when presented with
real-world challenges. To avoid overfitting, it is important that the test
data is different from the training data to ensure the model is not
drawing answers from its previous experience, but instead that the
model's inference is generalized.
 The algorithm, on the other hand, determines how that data can be put
in use. For instance, deep learning algorithms can be trained to extract
billions of parameters from their data and reach unprecedented levels
of accuracy, as demonstrated by OpenAI's GPT-3.
Apart from neural networks, there are many other supervised learning
algorithms (see below). Supervised learning algorithms primarily generate
two kinds of results: classification and regression.
Classification algorithms
A classification algorithm aims to sort inputs into a given number of
categories or classes, based on the labeled data it was trained on.
Classification algorithms can be used for binary classifications such as
filtering email into spam or non-spam and categorizing customer feedback
as positive or negative. Feature recognition, such as recognizing handwritten
letters and numbers or classifying drugs into many different categories, is
another classification problem solved by supervised learning.
Regression models
Regression tasks are different, as they expect the model to produce a
numerical relationship between the input and output data. Examples of
regression models include predicting real estate prices based on zip code, or
predicting click rates in online ads in relation to time of day, or determining
how much customers would be willing to pay for a certain product based on
their age.
Algorithms commonly used in supervised learning programs include the

following:
 linear regression
 logistic regression
 neural networks
 linear discriminant analysis
 decision trees
 similarity learning
 Bayseian logic
 support vector machines (SVMs)
 random forests
When choosing a supervised learning algorithm, there are a few things that
should be considered. The first is the bias and variance that exist within the
algorithm, as there is a fine line between being flexible enough and too
flexible. Another is the complexity of the model or function that the system
is trying to learn. As noted, the heterogeneity, accuracy, redundancy and
linearity of the data should also be analyzed before choosing an algorithm.
Learn more about supervised learning algorithms and how they are best
applied in this supervised learning primer from Arcitura Education.
Supervised vs. unsupervised learning

The chief difference between unsupervised and supervised learning is in
how the algorithm learns. In unsupervised learning, the algorithm is given
unlabeled data as a training set. Unlike in supervised learning, there are no
correct output values; the algorithm determines the patterns and similarities
within the data, as opposed to relating it to some external measurement. In
other words, algorithms are able to function freely in order to learn more
about the data and find interesting or unexpected findings that human beings
weren't looking for. Unsupervised learning is popular in applications of
clustering (the act of uncovering groups within data) and association (the act
of predicting rules that describe the data).
Algorithms used in supervised, unsupervised and semi-supervised
learning.
Benefits and limitations
Supervised learning models have some advantages over the unsupervised
approach, but they also have limitations. Supervised learning systems are
more likely to make judgments that humans can relate to, for example,
because humans have provided the basis for decisions.
However, in the case of a retrieval-based method, supervised learning

systems have trouble dealing with new information. If a system with
categories for cars and trucks is presented with a bicycle, for example, it
would have to be incorrectly lumped in one category or the other. If the AI
system was generative (that is, unsupervised), however, it may not know
what the bicycle is, but it would be able to recognize it as belonging to a
separate category.
Supervised learning also typically requires large amounts of correctly

labeled data to reach acceptable performance levels, and such data may not
always be available. Unsupervised learning does not suffer from this
problem and can work with unlabeled data as well.
Semi-supervised learning
In cases where supervised learning is needed but there is a lack of quality
data, semi-supervised learning may be the appropriate learning method. This
learning model resides between supervised learning and unsupervised; it
accepts data that is partially labeled -- i.e., the majority of the data lacks
labels.
Semi-supervised learning determines the correlations between the data

points -- just like unsupervised learning -- and then uses the labeled data to
mark those data points. Finally, the entire model is trained based on the
newly applied labels.
Semi-supervised learning has proven to yield accurate results and is

applicable to many real-world problems where the small amount of labeled
data would prevent supervised learning algorithms from functioning
properly. As a rule of thumb, a data set with at least 25% labeled data is
suitable for semi-supervised learning.
Facial recognition, for instance, is ideal for semi-supervised learning; the

vast number of images of different people is clustered by similarity and then
made sense of with a labeled picture giving identity to the clustered photos.
Example of a supervised learning project

Consider the news categorization problem from earlier. One approach is to
determine which category each piece of news belongs to, such as business,
finance, technology or sports. To solve this problem, a supervised model
would be the best fit.
Humans would present the model with various news articles and their
categories and have the model learn what kind of news belongs to each
category. This way, the model becomes capable of recognizing the news
category of any article it looks at based on its previous training experience.
However, humans might also come to the conclusion that classifying news
based on the predetermined categories is not sufficiently informative or
flexible, as some news may talk about climate change technologies or the
workforce problems in an industry. There are billions of news articles out
there, and separating them into 40 or 50 categories may be an
oversimplification. Instead, a better approach would be to find the
similarities between the news articles and group the news accordingly. That
would be looking at news clusters instead, where similar articles would be
grouped together. There are no specific categories anymore.
This is what unsupervised learning achieves: It determines the patterns and
similarities within the data, as opposed to relating it to some external
measurement.
Learn about how semi-supervised learning and the new "one-shot learning"
approach aim to reduce the need for large data sets and human
intervention.
CHAPTER 3
Unsupervised Learning
Unsupervised Machine
Learning: Algorithms, Types
with Example
ByDaniel JohnsonUpdatedJanuary 21, 2023
What is Unsupervised Learning?

Unsupervised Learning is a machine learning technique in
which the users do not need to supervise the model. Instead,
it allows the model to work on its own to discover patterns
and information that was previously undetected. It mainly
deals with the unlabelled data.
Unsupervised Learning Algorithms

Unsupervised Learning Algorithms allow users to
perform more complex processing tasks compared to
supervised learning. Although, unsupervised learning can be
more unpredictable compared with other natural learning
methods. Unsupervised learning algorithms include
clustering, anomaly detection, neural networks, etc.
In this tutorial, you will learn:
 Example of Unsupervised Machine Learning

 Why Unsupervised Learning?
 Clustering Types of Unsupervised Learning Algorithms
 Clustering
 Clustering Types
 Association
 Supervised vs. Unsupervised Machine Learning
 Applications of Unsupervised Machine Learning
 Disadvantages of Unsupervised Learning
Example of Unsupervised Machine

Learning
Let’s, take an example of Unsupervised Learning for a baby
and her family dog.
She knows and identifies this dog. Few weeks later a family
friend brings along a dog and tries to play with the baby.
EXPLORE MORE Learn
Java Programming with Beginners Tutorial08:32

Linux Tutorial for Beginners: Introduction to Linux
Operating...01:35W hat is Integration Testing
Software Testing Tutorial03:04 What is JVM (Java
Virtual Machine) with Architecture JAVA...02:24 How

to write a TEST CASE Software Testing Tutorial01:08
Seven Testing Principles Software Testing05:01
Linux File Permissions Commands with
Examples13:29 How to use Text tool in Photoshop
CC Tutorial08:32 What is NoSQL Database
Tutorial02:00 Important Linux Commands for

Beginners Linux Tutorial15:03
Baby has not seen this dog earlier. But it recognizes many
features (2 ears, eyes, walking on 4 legs) are like her pet
dog. She identifies the new animal as a dog. This is
unsupervised learning, where you are not taught but you
learn from the data (in this case data about a dog.) Had this
been supervised learning, the family friend would have told
the baby that it’s a dog as shown in the above Unsupervised
Learning example.
Why Unsupervised Learning?

Here, are prime reasons for using Unsupervised Learning
in Machine Learning:
 Unsupervised machine learning finds all kind of

unknown patterns in data.
 Unsupervised methods help you to find features which
can be useful for categorization.
 It is taken place in real time, so all the input data to be
analyzed and labeled in the presence of learners.
 It is easier to get unlabeled data from a computer than
labeled data, which needs manual intervention.
Clustering Types of Unsupervised

Learning Algorithms
Below are the clustering types of Unsupervised Machine
Learning algorithms:
Unsupervised learning problems further grouped into

clustering and association problems.
Clustering
Clustering
Clustering is an important concept when it comes to
unsupervised learning. It mainly deals with finding a
structure or pattern in a collection of uncategorized data.
Unsupervised Learning Clustering algorithms will process
your data and find natural clusters(groups) if they exist in
the data. You can also modify how many clusters your
algorithms should identify. It allows you to adjust the
granularity of these groups.
There are different types of clustering you can utilize:
Exclusive (partitioning)
In this clustering method, Data are grouped in such a way
that one data can belong to one cluster only.
Example: K-means
Agglomerative
In this clustering technique, every data is a cluster. The
iterative unions between the two nearest clusters reduce the
number of clusters.
Example: Hierarchical clustering
Overlapping
In this technique, fuzzy sets is used to cluster data. Each
point may belong to two or more clusters with separate
degrees of membership.
Here, data will be associated with an appropriate

membership value. Example: Fuzzy C-Means
Probabilistic
This technique uses probability distribution to create the
clusters
Example: Following keywords
 “man’s shoe.”
 “women’s shoe.”
 “women’s glove.”
 “man’s glove.”
can be clustered into two categories “shoe” and “glove” or

“man” and “women.”
Clustering Types
Following are the clustering types of Machine Learning:
 Hierarchical clustering
 K-means clustering
 K-NN (k nearest neighbors)
 Principal Component Analysis
 Singular Value Decomposition
 Independent Component Analysis
Hierarchical Clustering
Hierarchical clustering is an algorithm which builds a
hierarchy of clusters. It begins with all the data which is
assigned to a cluster of their own. Here, two close cluster
are going to be in the same cluster. This algorithm ends
when there is only one cluster left.
K-means Clustering
K means it is an iterative clustering algorithm which helps
you to find the highest value for every iteration. Initially, the
desired number of clusters are selected. In this clustering
method, you need to cluster the data points into k groups. A
larger k means smaller groups with more granularity in the
same way. A lower k means larger groups with less
granularity.
The output of the algorithm is a group of “labels.” It assigns

data point to one of the k groups. In k-means clustering,
each group is defined by creating a centroid for each group.
The centroids are like the heart of the cluster, which
captures the points closest to them and adds them to the
cluster.
K-mean clustering further defines two subgroups:
 Agglomerative clustering
 Dendrogram
Agglomerative clustering
This type of K-means clustering starts with a fixed number of
clusters. It allocates all data into the exact number of
clusters. This clustering method does not require the
number of clusters K as an input. Agglomeration process
starts by forming each data as a single cluster.
This method uses some distance measure, reduces the

number of clusters (one in each iteration) by merging
process. Lastly, we have one big cluster that contains all the
objects.
Dendrogram
In the Dendrogram clustering method, each level will
represent a possible cluster. The height of dendrogram
shows the level of similarity between two join clusters. The
closer to the bottom of the process they are more similar
cluster which is finding of the group from dendrogram which
is not natural and mostly subjective.
K- Nearest neighbors
K- nearest neighbour is the simplest of all machine learning
classifiers. It differs from other machine learning techniques,
in that it doesn’t produce a model. It is a simple algorithm
which stores all available cases and classifies new instances
based on a similarity measure.
It works very well when there is a distance between

examples. The learning speed is slow when the training set
is large, and the distance calculation is nontrivial.
Principal Components Analysis

In case you want a higher-dimensional space. You need to
select a basis for that space and only the 200 most
important scores of that basis. This base is known as a
principal component. The subset you select constitute is a
new space which is small in size compared to original space.
It maintains as much of the complexity of data as possible.
Association
Association rules allow you to establish associations
amongst data objects inside large databases. This
unsupervised technique is about discovering interesting
relationships between variables in large databases. For
example, people that buy a new home most likely to buy
new furniture.
Other Examples:
 A subgroup of cancer patients grouped by their gene

expression measurements
 Groups of shopper based on their browsing and
purchasing histories
 Movie group by the rating given by movies viewers
Supervised vs. Unsupervised

Machine Learning
Here is the main difference between Supervised vs.
Unsupervised Learning:
Supervised machine learning Unsupervised machine
Parameters
technique technique
Algorithms are trained using Algorithms are used again
Input Data
labeled data. not labelled
Computational Supervised learning is a simpler Unsupervised learning is
Complexity method. complex
Highly accurate and trustworthy
Accuracy Less accurate and trustwo
method.
Applications of Unsupervised
Machine Learning
Some application of Unsupervised Learning Techniques are:
 Clustering automatically split the dataset into groups

base on their similarities
 Anomaly detection can discover unusual data points in
your dataset. It is useful for finding fraudulent
transactions
 Association mining identifies sets of items which often
occur together in your dataset
 Latent variable models are widely used for data
preprocessing. Like reducing the number of features in
a dataset or decomposing the dataset into multiple
components
Disadvantages of Unsupervised
Learning
 You cannot get precise information regarding data
sorting, and the output as data used in unsupervised
learning is labeled and not known
 Less accuracy of the results is because the input data
is not known and not labeled by people in advance.
This means that the machine requires to do this itself.
 The spectral classes do not always correspond to
informational classes.
 The user needs to spend time interpreting and label
the classes which follow that classification.
 Spectral properties of classes can also change over
time so you can’t have the same class information
while moving from one image to another.
Summary
 Unsupervised learning is a machine learning technique,
where you do not need to supervise the model.
 Unsupervised machine learning helps you to finds all
kind of unknown patterns in data.
 Clustering and Association are two types of
Unsupervised learning.
 Four types of clustering methods are 1) Exclusive 2)
Agglomerative 3) Overlapping 4) Probabilistic.
 Important clustering types are: 1)Hierarchical
clustering 2) K-means clustering 3) K-NN 4) Principal
Component Analysis 5) Singular Value Decomposition
6) Independent Component Analysis.
 Association rules allow you to establish associations
amongst data objects inside large databases.
 In Supervised learning, Algorithms are trained using
labelled data while in Unsupervised learning Algorithms
are used against data which is not labelled.
 Anomaly detection can discover important data points
in your dataset which is useful for finding fraudulent
transactions.
 The biggest drawback of Unsupervised learning is that
you cannot get precise information regarding data
sorting
Unsupervised Machine Learning

In the previous topic, we learned supervised machine learning in
which models are trained using labeled data under the supervision
of training data. But there may be many cases in which we do not
have labeled data and need to find the hidden patterns from the
given dataset. So, to solve such types of cases in machine learning,
we need unsupervised learning techniques.
What is Unsupervised Learning?

As the name suggests, unsupervised learning is a machine learning
technique in which models are not supervised using training
dataset. Instead, models itself find the hidden patterns and insights
from the given data. It can be compared to learning which takes
place in the human brain while learning new things. It can be
defined as:
Unsupervised learning is a type of machine learning in which

models are trained using unlabeled dataset and are allowed to
act on that data without any supervision.
Unsupervised learning cannot be directly applied to a regression or

classification problem because unlike supervised learning, we have
the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of
dataset, group that data according to similarities, and
represent that dataset in a compressed format.
Example: Suppose the unsupervised learning algorithm is given an

input dataset containing images of different types of cats and dogs.
The algorithm is never trained upon the given dataset, which means
it does not have any idea about the features of the dataset. The task
of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform
this task by clustering the image dataset into the groups according
to similarities between images.
Pause
Unmute
Current TimeÂ 0:09
DurationÂ 18:10
Loaded: 4.77%
Â
Fullscreen
Why use Unsupervised Learning?

Below are some main reasons which describe the importance of
Unsupervised Learning:
o Unsupervised learning is helpful for finding useful insights

from the data.
o Unsupervised learning is much similar as a human learns to
think by their own experiences, which makes it closer to the
real AI.
o Unsupervised learning works on unlabeled and uncategorized
data which make unsupervised learning more important.
o In real-world, we do not always have input data with the
corresponding output so to solve such cases, we need
unsupervised learning.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below
diagram:
Here, we have taken an unlabeled input data, which means it is not

categorized and corresponding outputs are also not given. Now, this
unlabeled input data is fed to the machine learning model in order
to train it. Firstly, it will interpret the raw data to find the hidden
patterns from the data and then will apply suitable algorithms such
as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the

data objects into groups according to the similarities and difference
between the objects.
Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into
two types of problems:
o Clustering: Clustering is a method of grouping the objects
into clusters such that objects with most similarities remains
into a group and has less or no similarities with the objects of
another group. Cluster analysis finds the commonalities
between the data objects and categorizes them as per the
presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning
method which is used for finding the relationships between
variables in the large database. It determines the set of items
that occurs together in the dataset. Association rule makes
marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis.
Note: We will learn these algorithms in later chapters.
Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
Advantages of Unsupervised Learning

o Unsupervised learning is used for more complex tasks as
compared to supervised learning because, in unsupervised
learning, we don't have labeled input data.
o Unsupervised learning is preferable as it is easy to get
unlabeled data in comparison to labeled data.
Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than
supervised learning as it does not have corresponding output.
o The result of the unsupervised learning algorithm might be
less accurate as input data is not labeled, and algorithms do
not know the exact output in advance.
W3scholl seem
Reinforcement learning
 Difficulty Level : Easy
 Last Updated : 23 Jan, 2023
· Read
· Discuss
· Courses
· Practice
· Video
Reinforcement learning is an area of Machine Learning. It is about
taking suitable action to maximize reward in a particular
situation. It is employed by various software and machines to find
the best possible behavior or path it should take in a specific
situation. Reinforcement learning differs from supervised learning
in a way that in supervised learning the training data has the
answer key with it so the model is trained with the correct answer
itself whereas in reinforcement learning, there is no answer but
the reinforcement agent decides what to do to perform the given
task. In the absence of a training dataset, it is bound to learn
from its experience.
Example: The problem is as follows: We have an agent and a
reward, with many hurdles in between. The agent is supposed to
find the best possible path to reach the reward. The following
problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of
the robot is to get the reward that is the diamond and avoid the
hurdles that are fired. The robot learns by trying all the possible
paths and then choosing the path which gives him the reward
with the least hurdles. Each right step will give the robot a reward
and each wrong step will subtract the reward of the robot. The
total reward will be calculated when it reaches the final reward
that is the diamond.
Main points in Reinforcement learning –
 Input: The input should be an initial state from which the
model will start
 Output: There are many possible outputs as there are a
variety of solutions to a particular problem
 Training: The training is based upon the input, The model will
return a state and the user will decide to reward or punish the
model based on its output.
 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and

Supervised learning:
Reinforcement learning Supervised

learning
Reinforcement learning is all about In Supervised

making decisions sequentially. In learning, the decision
simple words, we can say that the is made on the initial
output depends on the state of the input or the input
current input and the next input given at the start
depends on the output of the previous
input
In Reinforcement learning decision is In supervised learning

dependent, So we give labels to the decisions are
sequences of dependent decisions independent of each
other so labels are
given to each decision.
Reinforcement learning Supervised
learning
Example: Chess game Example: Object

recognition
Types of Reinforcement: There are two types of

Reinforcement:
1. Positive –
Positive Reinforcement is defined as when an event, occurs
due to a particular behavior, increases the strength and the
frequency of the behavior. In other words, it has a positive
effect on behavior.
Advantages of reinforcement learning are:
 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states
which can diminish the results
2. Negative –
Negative Reinforcement is defined as strengthening of
behavior because a negative condition is stopped or avoided.
Advantages of reinforcement learning:
 Increases Behavior
 Provide defiance to a minimum standard of performance
 It Only provides enough to meet up the minimum behavior
Various Practical applications of Reinforcement Learning

–
 RL can be used in robotics for industrial automation.
 RL can be used in machine learning and data processing
 RL can be used to create training systems that provide custom
instruction and materials according to the requirement of
students.
RL can be used in large environments in the following situations:
1. A model of the environment is known, but an analytic solution

is not available;
2. Only a simulation model of the environment is given (the
subject of simulation-based optimization)
3. The only way to collect information about the environment is
to interact with it.
EXAMPLE AND IMPLEMENTATION:
PY
 Python3
import gym
import numpy as np
# Define the Q-table and learning rate

q_table = np.zeros((state_size, action_size))
alpha = 0.8
gamma = 0.95
# Train the Q-Learning algorithm

for episode in range(num_episodes):
state = env.reset()
done = False
while not done:
# Choose an action
action = np.argmax(q_table[state, :] +
np.random.randn(1, action_size) * (1. / (episode + 1)))
# Take the action and observe the new state and reward
next_state, reward, done, _ = env.step(action)
# Update the Q-table

q_table[state, action] = (1 - alpha) * q_table[state,
action] + alpha * (reward + gamma *
np.max(q_table[next_state, :]))
state = next_state
# Test the trained Q-Learning algorithm
state = env.reset()
done = False
while not done:
# Choose an action
action = np.argmax(q_table[state, :])
# Take the action

state, reward, done, _ = env.step(action)
env.render()
CHAPTER 4
Artificial Neural Network (ANN)
(Use the Slide and the below URLS)
Beginners Guide to Artificial
Neural Network
Deepanshi — Published On May 25, 2021 and Last Modified On May 31st, 2021
Advanced Deep Learning Maths
This article was published as a part of the Data Science

Blogathon
Introduction
Deep Learning which is a subset of Machine Learning is the

human brain embedded in a machine. It is inspired by the
working of a human brain and therefore is a set of neural
network algorithms which tries to mimics the working of a
human brain and learn from the experiences.
In this article, we are going to learn about how a basic Neural
Network works and how it improves itself to make the best
predictions.
Table of Content
1. Neural networks and their components
2. Perceptron and Multilayer Perceptron
3. Step by Step Working of Neural Network
4. Back Propagation and how it works
5. Brief about Activation Functions
Artificial Neural Networks and Its

components
Neural Networks is a computational learning system that uses
a network of functions to understand and translate a data input of
one form into a desired output, usually in another form. The
concept of the artificial neural network was inspired by human
biology and the way neurons of the human brain function
together to understand inputs from human senses.
In simple words, Neural Networks are a set of algorithms that

tries to recognize the patterns, relationships, and information
from the data through the process which is inspired by and
works like the human brain/biology.
Components / Architecture of Neural

Network
A simple neural network consists of three components :

Become a Full-Stack Data Scientist
Power Ahead in your AI ML Career | No Pre-requisites Required
Download Brochure
 Input layer
 Hidden layer
 Output layer
Source:
Wikipedia
Input Layer: Also known as Input nodes are the

inputs/information from the outside world is provided to the
model to learn and derive conclusions from. Input nodes pass
the information to the next layer i.e Hidden layer.
Hidden Layer: Hidden layer is the set of neurons where all the
computations are performed on the input data. There can be
any number of hidden layers in a neural network. The simplest
network consists of a single hidden layer.
Output layer: The output layer is the output/conclusions of the

model derived from all the computations performed. There can
be single or multiple nodes in the output layer. If we have a
binary classification problem the output node is 1 but in the
case of multi-class classification, the output nodes can be more
than 1.
Perceptron and Multi-Layer Perceptron
Perceptron is a simple form of Neural Network and consists of

a single layer where all the mathematical computations are
performed.
Source:
kindsonthegenius.com
Whereas, Multilayer Perceptron also known as Artificial

Neural Networks consists of more than one perception which
is grouped together to form a multiple layer neural network.
Source: Medium
In the above image, The Artificial Neural Network

consists of four layers interconnected with each other:
 An input layer, with 6 input nodes

 Hidden Layer 1, with 4 hidden nodes/4 perceptrons
 Hidden layer 2, with 4 hidden nodes
 Output layer with 1 output node
Step by Step Working of the Artificial

Neural Network
Source:
Xenonstack.com
1.
In the first step, Input units are passed i.e data is

passed with some weights attached to it to the
hidden layer. We can have any number of hidden layers.
In the above image inputs x1,x2,x3,….xn is passed.
2.
3.
Each hidden layer consists of neurons. All the inputs are
connected to each neuron.
4.
5.
After passing on the inputs, all the computation is

performed in the hidden layer (Blue oval in the picture)
6.
Computation performed in hidden layers are done in two steps

which are as follows :
First of all, all the inputs are multiplied by their weights.

Weight is the gradient or coefficient of each variable. It shows
the strength of the particular input. After assigning the weights,
a bias variable is added. Bias is a constant that helps the model
to fit in the best way possible.
Z1 = W1*In1 + W2*In2 + W3*In3 + W4*In4 + W5*In5 + b
W1, W2, W3, W4, W5 are the weights assigned to the inputs In 1,
In2, In3, In4, In5, and b is the bias.
 Then in the second step, the activation function is applied

to the linear equation Z1. The activation function is a
nonlinear transformation that is applied to the input before
sending it to the next layer of neurons. The importance of the
activation function is to inculcate nonlinearity in the model.
There are several activation functions that will be listed in the

next section.
1.
The whole process described in point 3 is performed in

each hidden layer. After passing through every hidden
layer, we move to the last layer i.e our output layer
which gives us the final output.
2.
The process explained above is known as forwarding

Propagation.
1.
After getting the predictions from the output layer,

the error is calculated i.e the difference between the
actual and the predicted output.
2.
If the error is large, then the steps are taken to minimize the
error and for the same purpose, Back Propagation is
performed.
What is Back Propagation and How it

works?
Back Propagation is the process of updating and finding
the optimal values of weights or coefficients which helps
the model to minimize the error i.e difference between
the actual and predicted values.
But here are the question is: How the weights are updated and
new weights are calculated?
The weights are updated with the help of optimizers.

Optimizers are the methods/ mathematical formulations to
change the attributes of neural networks i.e weights to
minimizer the error.
Back Propagation with Gradient Descent
Gradient Descent is one of the optimizers which helps in

calculating the new weights. Let’s understand step by step how
Gradient Descent optimizes the cost function.
In the image below, the curve is our cost function curve and our
aim is the minimize the error such that J min i.e global minima is
achieved.
Source: Quora
Steps to achieve the global minima:
1.
First, the weights are initialized randomly i.e random

value of the weight, and intercepts are assigned to the
model while forward propagation and the errors are
calculated after all the computation. (As discussed
above)
2.
3.
Then the gradient is calculated i.e derivative of error

w.r.t current weights
4.
5.
Then new weights are calculated using the below formula,
where a is the learning rate which is the parameter also
known as step size to control the speed or steps of the
backpropagation. It gives additional control on how fast we
want to move on the curve to reach global minima.
6.
Source:
hmkcode.com
4.This process of calculating the new weights, then errors from the
new weights, and then updation of weights continues till we reach
global minima and loss is minimized.
A point to note here is that the learning rate i.e a in our

weight updation equation should be chosen wisely. Learning
rate is the amount of change or step size taken towards
reaching global minima. It should not be very small as it will
take time to converge as well as it should not be very
large that it doesn’t reach global minima at all. Therefore, the
learning rate is the hyperparameter that we have to choose
based on the model.
Source:
Educative.io
To know the detailed maths and the chain rule of

Backpropagation, refer to the attached tutorial.
Brief about Activation Functions
Activation functions are attached to each neuron and are

mathematical equations that determine whether a neuron
should be activated or not based on whether the neuron’s input
is relevant for the model’s prediction or not. The purpose of the
activation function is to introduce the nonlinearity in the data.
Various Types of Activation Functions are :
 Sigmoid Activation Function

 TanH / Hyperbolic Tangent Activation Function
 Rectified Linear Unit Function (ReLU)
 Leaky ReLU
 Softmax
How Do Neural Networks Really
Work in the Deep Learning?
Introduction to Deep Learning :
As a subset of artificial intelligence, deep learning lies

at the heart of various innovations: self-driving cars,
natural language processing, image recognition and
so on.
AI vs ML vs DL
Deep learning is one of the subsets of machine

learning that uses deep learning algorithms to
implicitly come up with important conclusions based
on input data.
Usually, deep learning is unsupervised or semi-

supervised. Deep learning is based on representation
learning. Instead of using task-specific algorithms, it
learns from representative examples. For example, if
you want to build a model that recognizes cats by
species, you need to prepare a database that includes
a lot of different cat images.
The main architectures of deep learning are:
 Convolutional neural networks

 Recurrent neural networks
 Generative adversarial networks
 Recursive neural networks
We are going to talk about them more in detail later in
this text.
Difference between machine learning and deep

learning
Machine learning attempts to extract new knowledge

from a large set of pre-processed data loaded into the
system. Programmers need to formulate the rules for
the machine, and it learns based on them. Sometimes,
a human might intervene to correct its errors.
However, deep learning is a bit different:

ML vs DL
What are artificial neural networks?
“Artificial neural networks” and “deep learning” are

often used interchangeably, which isn’t really correct.
Not all neural networks are “deep”, meaning “with
many hidden layers”, and not all deep learning
architectures are neural networks. There are
also deep belief networks, for example.
However, since neural networks are the most hyped
algorithms right now and are, in fact, very useful for
solving complex tasks, we are going to talk about
them in this post.
Definition of an ANN
An artificial neural network represents the structure

of a human brain modeled on the computer. It consists
of neurons and synapses organized into layers.
ANN can have millions of neurons connected into one
system, which makes it extremely successful at
analyzing and even memorizing various information.
Here is a video for those who want to dive deeper into

the technical details of how artificial neural networks
work.
Components of Neural Networks
There are different types of neural

networks but they always consist of the
same components: neurons, synapses,
weights, biases, and functions.
Neurons
A neuron or a node is a basic unit of neural networks

that receives information, performs simple
calculations, and passes it further.
All neurons in a net are divided into three groups:
 Input neurons that receive information from the

outside world;
 Hidden neurons that process that information;
 Output neurons that produce a conclusion.
In a large neural network with many neurons and
connections between them, neurons are organized in
layers. There is an input layer that receives
information, a number of hidden layers, and the
output layer that provides valuable results. Every
neuron performs transformation on the input
information.
Neurons only operate numbers in the range [0,1] or [-

1,1]. In order to turn data into something that a
neuron can work with, we need normalization.
Wait, but how do neurons communicate? Through

synapses.
Synapses and weights
A synapse is what connects the neurons like an

electricity cable. Every synapse has a weight. The
weights also add to the changes in the input
information. The results of the neuron with the
greater weight will be dominant in the next neuron,
while information from less ‘weighty’ neurons will not
be passed over. One can say that the matrix of
weights governs the whole neural system.
How do you know which neuron has the biggest

weight? During the initialization (first launch of the
NN), the weights are randomly assigned but then you
will have to optimize them.
Bias
A bias neuron allows for more variations of weights to

be stored. Biases add richer representation of the
input space to the model’s weights.
In the case of neural networks, a bias neuron is added

to every layer. It plays a vital role by making it
possible to move the activation function to the left or
right on the graph.
It is true that ANNs can work without bias neurons.
However, they are almost always added and counted
as an indispensable part of the overall model.
How ANNs work
Every neuron processes input data to extract a

feature. Let’s imagine that we have three features and
three neurons, each of which is connected with all
these features.
Each of the neurons has its own weights that are used
to weight the features. During the training of the
network, you need to select such weights for each of
the neurons that the output provided by the whole
network would be true-to-life.
To perform transformations and get an output, every

neuron has an activation function. This combination of
functions performs a transformation that is described
by a common function F — this describes the formula
behind the NN’s magic.
There are a lot of activation functions. The most
common ones are linear, sigmoid, and hyperbolic
tangent. Their main difference is the range of values
they work with.
How do you train an algorithm?
Neural networks are trained like any other algorithm.

You want to get some results and provide information
to the network to learn from. For example, we want
our neural network to distinguish between photos of
cats and dogs and provide plenty of examples.
Delta is the difference between the data and the

output of the neural network. We use calculus magic
and repeatedly optimize the weights of the network
until the delta is zero. Once the delta is zero or close
to it, our model is correctly able to predict our
example data.
Iteration
This is a kind of counter that increases every time the

neural network goes through one training set. In other
words, this is the total number of training sets
completed by the neural network.
Epoch
The epoch increases each time we go through the

entire set of training sets. The more epochs there are,
the better is the training of the model.
Batch
Batch size is equal to the number of training examples

in one forward/backward pass. The higher the batch
size, the more memory space you’ll need.
What is the difference between an iteration and an

epoch?
 one epoch is one forward pass and one backward
pass of all the training examples;
 number of iterations is a number of passes, each
pass using [batch size] number of examples. To be
clear, one pass equals one forward pass + one
backward pass (we do not count the forward pass
and backward pass as two different passes).
And what about errors?
Error is a deviation that reflects the discrepancy

between expected and received output. The error
should become smaller after every epoch. If this does
not happen, then you are doing something wrong.
The error can be calculated in different ways, but we
will consider only two main ways: Arctan and Mean
Squared Error.
There is no restriction on which one to use and you

are free to choose whichever method gives you the
best results. But each method counts errors in
different ways:
 With Arctan, the error will almost always be larger.
arctan2(i1−a1)+…
+arctan2(in−an)narctan2(i1−a1)+…
+arctan2(in−an)n
 MSE is more balanced and is used more often.
(i1−a1)2+(i2−a2)2+…+
(in−an)2n(i1−a1)2+(i2−a2)2+…+
(in−an)2n
What kinds of neural networks exist?
There are so many different neural networks out there

that it is simply impossible to mention them all. If you
want to learn more about this variety, visit the neural
network zoo where you can see them all represented
graphically.
Feed-forward neural networks
This is the simplest neural network algorithm. A feed-

forward network doesn’t have any memory. That is,
there is no going back in a feed-forward network. In
many tasks, this approach is not very applicable. For
example, when we work with text, the words form a
certain sequence, and we want the machine to
understand it.
Feedforward neural networks can be applied in
supervised learning when the data that you work with
is not sequential or time-dependent. You can also use
it if you don’t know how the output should be
structured but want to build a relatively fast and easy
NN.
Recurrent neural networks
A recurrent neural network can process texts, videos,

or sets of images and become more precise every time
because it remembers the results of the previous
iteration and can use that information to make better
decisions.
Recurrent neural networks are widely used in natural

language processing and speech recognition.
Convolutional neural networks
Convolutional neural networks are the standard of

today’s deep machine learning and are used to solve
the majority of problems. Convolutional neural
networks can be either feed-forward or recurrent.
Let’s see how they work. Imagine we have an image of

Albert Einstein. We can assign a neuron to all pixels in
the input image.
But there is a big problem here: if you connect each
neuron to all pixels, then, firstly, you will get a lot of
weights. Hence, it will be a very computationally
intensive operation and take a very long time. Then,
there will be so many weights that this method will be
very unstable to overfitting. It will predict everything
well on the training example but work badly on other
images.
Therefore, programmers came up with a different

architecture where each of the neurons is connected
only to a small square in the image. All these neurons
will have the same weights, and this design is called
image convolution. We can say that we have
transformed the picture, walked through it with a
filter simplifying the process. Fewer weights, faster to
count, less prone to overfitting.
For an awesome explanation of how convolutional

neural networks work, watch this video by Luis
Serrano.
Generative adversarial neural networks
A generative adversarial network is an unsupervised

machine learning algorithm that is a combination of
two neural networks, one of which (network G)
generates patterns and the other (network A) tries to
distinguish genuine samples from the fake ones. Since
networks have opposite goals — to create samples and
reject samples — they start an antagonistic game that
turns out to be quite effective.
GANs are used, for example, to generate photographs

that are perceived by the human eye as natural
images or deepfakes (videos where real people say
and do things they have never done in real life).
What kind of problems do NNs solve?
Neural networks are used to solve complex problems

that require analytical calculations similar to those of
the human brain. The most common uses for neural
networks are:
 Classification. NNs label the data into classes by
implicitly analyzing its parameters. For example, a
neural network can analyse the parameters of a
bank client such as age, solvency, credit history
and decide whether to loan them money.
 Prediction. The algorithm has the ability to make
predictions. For example, it can foresee the rise or
fall of a stock based on the situation in the stock
market.
 Recognition. This is currently the widest
application of neural networks. For example, a
security system can use face recognition to only let
authorized people into the building.
Summary
Deep learning and neural networks are useful

technologies that expand human intelligence and
skills. Neural networks are just one type of deep
learning architecture. However, they have become
widely known because NNs can effectively solve a
huge variety of tasks and cope with them better than
other algorithms.
How Do Neural Networks Really
Work?
Hey - Nick here! This page is a free excerpt from my $199
course Python for Finance, which is 50% off for the next 50
students.
If you want the full course, click here to sign up.
In separate articles, we have discussed two of the building

blocks for building neural networks:
 Neurons
 Activation functions
However, you're probably still a bit confused as to how neural
networks really work.
This tutorial will put together the pieces we've already
discussed so that you can understand how neural networks
work in practice.
Table of Contents
You can skip to a specific section of this deep learning tutorial
using the table of contents below:
 The Example We'll Be Using In This Tutorial
 The Parameters In Our Data Set
 The Most Basic Form of a Neural Network
 The Purpose of Neurons in the Hidden Layer of a Neural
Network
 How Neurons Determine Their Input Values
 Visualizing A Neural Net's Prediction Process
 Final Thoughts
The Example We'll Be Using In This Tutorial

This tutorial will work through a real-world example step-by-
step so that you can understand how neural networks make
predictions.
More specifically, we will be dealing with property valuations.
You probably already know that there are a ton of factors that
influence house prices, including the economy, interest rates,
its number of bedrooms/bathrooms, and its location.
The high dimensionality of this data set makes it an interesting
candidate for building and training a neural network on.
One caveat about this tutorial is the neural network we will be
using to make predictions has already been trained. We'll
explore the process for training a new neural network in the
next section of this course.
The Parameters In Our Data Set

Let's start by discussing the parameters in our data set. More
specifically, let's imagine that the data set contains the
following parameters:
 Square footage
 Bedrooms
 Distance to city center
 House age
These four parameters will form the input layer of the artificial
neural network. Note that in reality, there are likely many more
parameters that you could use to train a neural network to
predict housing prices. We have constrained this number to
four to keep the example reasonably simple.
The Most Basic Form of a Neural Network

In its most basic form, a neural network only has two layers -
the input layer and the output layer. The output layer is the
component of the neural net that actually makes predictions.
For example, if you wanted to make predictions using a simple
weighted sum (also called linear regression) model, your
neural network would take the following form:
While this diagram is a bit abstract, the point is that most
neural networks can be visuailzed in this manner:
 An input layer
 Possibly some hidden layers
 An output layer
It is the hidden layer of neurons that causes neural networks
to be so powerful for calculating predictions.
For each neuron in a hidden layer, it performs calculations
using some (or all) of the neurons in the last layer of the
neural network. These values are then used in the next layer
of the neural network.
The Purpose of Neurons in the Hidden Layer of

a Neural Network
You are probably wondering - what exactly does each neuron
in the hidden layer mean? Said differently, how should
machine learning practitioners interpret these values?
Generally speaking, neurons in the midden layers of a neural
net are activated (meaning their activation function returns 1)
for an input value that satisfies certain sub-properties.
For our housing price prediction model, one example might be
5-bedroom houses with small distances to the city center.
In most other cases, describing the characteristics that would
cause a neuron in a hidden layer to activate is not so easy.
How Neurons Determine Their Input Values

Earlier in this tutorial, I wrote "For each neuron in a hidden
layer, it performs calculations using some (or all) of the
neurons in the last layer of the neural network."
This illustrates an important point - that each neuron in a
neural net does not need to use every neuron in the preceding
layer.
The process through which neurons determine which input
values to use from the preceding layer of the neural net is
called training the model. We will learn more about training
neural nets in the next section of this course.
Visualizing A Neural Net's Prediction Process
When visualizing a neutral network, we generally draw lines
from the previous layer to the current layer whenever the
preceding neuron has a weight above 0 in the weighted sum
formula for the current neuron.
The following image will help visualize this:
As you can see, not every neuron-neuron pair has
synapse. x4 only feeds three out of the five neurons in the
hidden layer, as an example. This illustrates an important
point when building neural networks - that not every neuron in
a preceding layer must be used in the next layer of a neural
network.
A Beginner-Friendly Explanation of
How Neural Networks Work
Understanding Neural Network Fundamentals for
Five-Year-Olds
Image by Ahmed Gad from Pixabay
Table of Content
1. Preface
2. Artificial Intelligence, Machine Learning, and
Neural Networks
3. The Mechanics of a Basic Neural Network
4. Types of Neural Networks
5. Neural Network Applications
Preface
A few weeks ago, when I started to learn about neural

networks, I found that the quality of introductory
information for such a complex topic didn’t exist. I
frequently read that neural networks are algorithms
that mimic the brain or have a brain-like structure,
which didn’t really help me at all. Therefore, this
article aims to teach the fundamentals of a neural
network in a manner that is digestible for anyone,
especially those that are new to machine learning.
Artificial Intelligence, Machine Learning, and Neural

Networks
Before understanding what neural networks are, we

need to take a few steps back and understand what
artificial intelligence and machine learning are.
Artificial Intelligence and Machine Learning
Again, it’s frustrating because when you Google what

artificial intelligence means, you get definitions like
“it is the simulation of human intelligence by
machines”, which although may be true, it can be
quite misleading for new learners.
In the simplest sense, artificial intelligence
(AI) refers to the idea of giving machines or software
the ability to makes its own decisions based on
predefined rules or pattern recognition models. The
idea of pattern recognition models leads to machine
learning models, which are algorithms that build
models based on sample data to make predictions on
new data. Notice that machine learning is a subset of
artificial intelligence.
There are a number of machine learning models, like

linear regression, support vector machines, random
forests, and of course, neural networks. This now
leads us back to our original question, what are neural
networks?
Neural Networks
At its roots, a Neural Network is essentially a
network of mathematical equations. It takes one
or more input variables, and by going through a
network of equations, results in one or more output
variables. You can also say that a neural network
takes in a vector of inputs and returns a vector of
outputs, but I won’t get into matrices in this article.
The Mechanics of a Basic Neural Network
Again, I don’t want to get too deep into the

mechanics, but it’s worthwhile to show you what the
structure of a basic neural network looks like.
In a neural network, there’s an input layer, one or

more hidden layers, and an output layer. The input
layer consists of one or more feature variables (or
input variables or independent variables) denoted as
x1, x2, …, xn. The hidden layer consists of one or
more hidden nodes or hidden units. A node is simply
one of the circles in the diagram above. Similarly, the
output variable consists of one or more output units.
A given layer can have many nodes like the image

above.
As well, a given neural network can have many layers.

Generally, more nodes and more layers allows the
neural network to make much more complex
calculations.
Above is an example of a potential neural network. It
has three input variables, Lot Size, # of Bedrooms,
and Avg. Family Income. By feeding this neural
network these three pieces of information, it will
return an output, House Price. So how exactly does it
do this?
Like I said at the beginning of the article, a neural

network is nothing more than a network of equations.
Each node in a neural network is composed of two
functions, a linear function and an activation function.
This is where things can get a little confusing, but for
now, think of the linear function as some line of best
fit. Also, think of the activation function like a light
switch, which results in a number between 1 or 0.
What happens is that the input features (x) are fed

into the linear function of each node, resulting in a
value, z. Then, the value z is fed into the activation
function, which determines if the light switch turns on
or not (between 0 and 1).
Thus, each node ultimately determines which nodes in

the following layer get activated, until it reaches an
output. Conceptually, that is the essence of a neural
network.
If you want to learn about the different types of

activation functions, how a neural network determines
the parameters of the linear functions, and how it
behaves like a ‘machine learning’ model that self-
learns, there are full courses specifically on neural
networks that you can find online!
Types of Neural Networks
Neural networks have advanced so much that there

are now several types of neural networks, but below
are the three main types of neural networks that you’ll
probably hear about often.
Artificial Neural Networks (ANN)
Artificial neural networks, or ANNs, are like the

neural networks in the images above, which is
composed of a collection of connected nodes that
takes an input or a set of inputs and returns an
output. This is the most fundamental type of neural
network that you’ll probably first learn about if you
ever take a course. ANNs are composed of everything
we talked about as well as propagation functions,
learning rates, cost function, and backpropagation.
Convolutional Neural Networks (CNN)
A convolutional neural network (CNN) is a type of

neural network that uses a mathematical operation
called convolution. Wikipedia defines convolution as
a mathematical operation on two functions that
produces a third function expressing how the shape of
one is modified by the other. Thus, CNNs use
convolution instead of general matrix multiplication in
at least one of their layers.
Recurrent Neural Networks (RNN)

Recurrent neural networks (RNNs) are a type of
ANNs where connections between the nodes form
a digraph along a temporal sequence, allowing them
to use their internal memory to process variable-
length sequences of inputs. Because of this
characteristic, RNNs are exceptional at handling
sequence data, like text recognition or audio
recognition.
Neural Network Applications
Neural networks are powerful algorithms that have

led to some revolutionary applications that were not
previously possible, including but not limited to the
following:
 Image and video recognition: Because of image

recognition capabilities, we now have things like
facial recognition for security and Bixby vision.
 Recommender systems: Ever wonder how Netflix
is always able to recommend shows and movies
that you ACTUALLY like? They’re most likely
leveraging neural networks to provide that
experience.
 Audio recognition: In case you haven’t noticed,
‘OK Google’ and Seri have gotten tremendously
better at understanding our questions and what we
say. This success can be attributed to neural
networks.
 Autonomous driving: Lastly, our progression
towards perfecting autonomous driving is largely
due to the advancements in artificial intelligence
and neural networks.
Summary
To summarize, here are the main points:
 Neural networks are a type of machine learning

model or a subset of machine learning, and
machine learning is a subset of artificial
intelligence.
 A neural network is a network of equations that
takes in an input (or a set of inputs) and returns an
output (or a set of outputs)
 Neural networks are composed of various
components like an input layer, hidden layers, an
output layer, and nodes.
 Each node is composed of a linear function and an
activation function, which ultimately determines
which nodes in the following layer get activated.
 There are various types of neural networks, like
ANNs, CNNs, and RNNs
Convolutional Neural Network(CNN)
with Practical Implementation
In this Third Chapter of Deep Learning book, we will

discuss the Convolutional Neural Network. It is a
Supervised Deep Learning technique and we will
discuss both theoretical and Practical Implementation
from Scratch.
This chapter spans 4 parts:
1. What is Convolutional Neural Network?

2. Structure of Convolutional Neural Network.
3. How Convolutional Neural Network works?
4. Practical Implementation of Convolutional Neural
Network.
1. What are Convolutional Neural Networks?
1.1: Introduction
Convolutional neural networks. Sounds like a weird

combination of biology and math with a little CS
sprinkled in, but these networks have been some of
the most influential innovations in the field of
computer vision. 2012 was the first year that neural
nets grew to prominence as Alex Krizhevsky used
them to win that year’s ImageNet competition
(basically, the annual Olympics of computer vision),
dropping the classification error record from 26% to
15%, an astounding improvement at the time. Ever
since then, a host of companies have been using deep
learning at the core of their services. Facebook uses
neural nets for their automatic tagging algorithms,
Google for their photo search, Amazon for their
product recommendations, Pinterest for their home
feed personalization, and Instagram for their search
infrastructure.
1.2: The Problem Space
Image classification is the task of taking an input

image and outputting a class (a cat, dog, etc) or a
probability of classes that best describes the image.
For humans, this task of recognition is one of the first
skills we learn from the moment we are born and is
one that comes naturally and effortlessly as adults.
Without even thinking twice, we’re able to quickly and
seamlessly identify the environment we are in as well
as the objects that surround us. When we see an
image or just when we look at the world around us,
most of the time we can immediately characterize the
scene and give each object a label, all without even
consciously noticing. These skills of being able to
quickly recognize patterns, generalize from prior
knowledge, and adapt to different image
environments are ones that we do not share with our
fellow machines.
1.3: Input and Output
When a computer sees an image (takes an image as

input), it will see an array of pixel values. Depending
on the resolution and size of the image, it will see a 32
x 32 x 3 array of numbers (The 3 refers to RGB
values). Just to drive home the point, let’s say we have
a color image in JPG form and its size is 480 x 480.
The representative array will be 480 x 480 x 3. Each
of these numbers is given a value from 0 to 255 which
describes the pixel intensity at that point. These
numbers, while meaningless to us when we perform
image classification, are the only inputs available to
the computer. The idea is that you give the computer
this array of numbers and it will output numbers that
describe the probability of the image being a certain
class (.80 for a cat, .15 for a dog, .05 for a bird, etc).
1.4: What we want to computer Do.
Now that we know the problem as well as the inputs

and outputs, let’s think about how to approach this.
What we want the computer to do is to be able to
differentiate between all the images it’s given and
figure out the unique features that make a dog a dog
or that make a cat a cat. This is the process that goes
on in our minds subconsciously as well. When we look
at a picture of a dog, we can classify it as such if the
picture has identifiable features such as paws or 4
legs. Similarly, the computer can perform image
classification by looking for low-level features such as
edges and curves and then building up to more
abstract concepts through a series of convolutional
layers. This is a general overview of what CNN does.
Let’s get into the specifics.
1.5: Biological Connection.
But first, a little background. When you first heard of

the term convolutional neural networks, you may have
thought of something related to neuroscience or
biology, and you would be right. Sort of. CNNs do
take a biological inspiration from the visual cortex.
The visual cortex has small regions of cells that are
sensitive to specific regions of the visual field. This
idea was expanded upon by a fascinating experiment
by Hubel and Wiesel in 1962 where they showed that
some individual neuronal cells in the brain responded
(or fired) only in the presence of edges of a certain
orientation. For example, some neurons fired when
exposed to vertical edges and some when shown
horizontal or diagonal edges. Hubel and Wiesel found
out that all of these neurons were organized in a
columnar architecture and that together, they were
able to produce visual perception. This idea of
specialized components inside of a system having
specific tasks (the neuronal cells in the visual cortex
looking for specific characteristics) is one that
machines use as well and is the basis behind CNNs.
2. Structure of Convolutional Neural Network.
A more detailed overview of what CNNs do would be

that you take the image, pass it through a series of
convolutional, nonlinear, pooling (downsampling), and
fully connected layers, and get an output. As we said
earlier, the output can be a single class or a
probability of classes that best describes the image.
Now, the hard part understands what each of these
layers does. So let’s get into the most important one.
2.1: Convolutional
ConvNets derive their name from the “Convolution

Operation”. The Convolution in case of ConvNet is to
extract features from the input images. Convolution
preserves the spatial relationships between pixels by
learning image features using Small Square of input
data. As we discussed above, every image can be
considered as a matrix of the pixel value. Consider a
5*5 image whose pixel values are only 0 & 1 (note
that for a grayscale image, pixel values range from 0
to 255, where pixel values are only 0 & 1.
Similarly, Feature Detector Detect the Every single
part of the input image and then result from the show
in Feature map which base on the match of feature
Detector of the input image.
CNN terminology, the 3×3 matrix is called a ‘filter’ or

‘kernel’ or ‘feature detector’ and the matrix formed by
sliding the filter over the image and computing the
dot product is called the ‘Convolved Feature’ or
‘Activation Map’ or the ‘Feature Map’. It is important
to note that filters act as feature detectors from the
original input image.
Depth:
Depth corresponds to the number of filters we use for

the convolution operation. In the network shown in
Figure below, we are performing the convolution of
the original boat image using three distinct filters,
thus producing three different feature maps as shown.
You can think of these three feature maps as stacked
2d matrices, so, the ‘depth’ of the feature map would
be three. The more numbers of filters the more
accurate result.
Stride:
Stride is the number of pixels by which we slide our

filter matrix over the input matrix. When the stride is
1 then we move the filters one pixel at a time. When
the stride is 2, then the filters jump 2 pixels at a time
as we slide them around. Having a larger stride will
produce smaller feature maps.
Zero-Padding:
Sometimes, it is convenient to pad the input matrix

with zeros around the border, so that we can apply
the filter to bordering elements of our input image
matrix. A nice feature of zero paddings is that it
allows us to control the size of the feature maps.
Adding zero-padding is also called wide
convolution, and not using zero-padding would be
a narrow convolution.
Input = output
That’s the benefits of padding
Padding >>> valid >>> no padding

Padding >>> same >>> padding to make output size
same as input size.
2.1a: ReLu Layer
An additional operation called ReLU has been used

after every Convolution operation. ReLU stands for
the Rectified Linear Unit and is a non-linear
operation. Its output is given by:
ReLU is an element-wise operation (applied per pixel)

and replaces all negative pixel values in the feature
map by zero. The purpose of ReLU is to introduce
non-linearity in our ConvNet since most of the real-
world data we would want our ConvNet to learn
would be non-linear (Convolution is a linear operation
— element-wise matrix multiplication and addition, so
we account for non-linearity by introducing a non-
linear function like ReLU).
As shown below image we apply the Relu operation

and he replaces all the negative numbers by 0.
2.2: Max Pooling
Spatial Pooling (also called subsampling or

downsampling) reduces the dimensionality of each
feature map but retains the most important
information. Spatial Pooling can be of different types:
Max, Average, Sum, etc.
In the case of Max Pooling, we define a spatial

neighborhood (for example, a 2×2 window) and take
the largest element from the rectified feature map
within that window. Instead of taking the largest
element we could also take the average (Average
Pooling) or sum of all elements in that window. In
practice, Max Pooling has been shown to work better.
Below shows an example of Max Pooling operation on

a Rectified Feature map (obtained after convolution +
ReLU operation) by using a 2×2 window.
2.3: Flattening
Flattening is the process of converting al the resultant

2-dimensional arrays into a single long continuous
linear vector.
The process of building a CNN involves four major
steps
1. Convolution
2. Pooling
3. Flattening
4. Full Connection
So Flattening is become the input of Artificial Neural
Network which is used for the backpropagation
Method.
2.4: Full Connection
The Fully Connected layer is a traditional Multi-Layer

Perceptron that uses a softmax activation function in
the output layer (other classifiers like SVM can also
be used, but will stick to softmax in this post). The
term “Fully Connected” implies that every neuron in
the previous layer is connected to every neuron on the
next layer.
The output from the convolutional and pooling layers

represent high-level features of the input image. The
purpose of the Fully Connected layer is to use these
features for classifying the input image into various
classes based on the training dataset. For example,
the image classification task we set out to perform has
four possible outputs as shown in Figure below (note
that Figure 14 does not show connections between
the nodes in the fully connected layer)
Apart from classification, adding a fully-connected

layer is also a (usually) cheap way of learning non-
linear combinations of these features. Most of the
features from convolutional and pooling layers may be
good for the classification task, but combinations of
those features might be even better. The sum of
output probabilities from the Fully Connected Layer is
1. This is ensured by using the Softmax as the
activation function in the output layer of the Fully
Connected Layer. The Softmax function takes a vector
of arbitrary real-valued scores and squashes it to a
vector of values between zero and one that sums to
one.
2.5: Putting it all together Training using Backpropagation
As discussed above the convolution + Pooling layers

act as Features extractor from the input image while a
fully connected layer acts as the classifier.
Note that in Figure below, since the input image is a
boat, the target probability is 1 for Boat class and 0
for the other three classes, i.e.
Input Image = Boat
Target Vector = [0, 0, 1, 0]
The overall training process of the Convolution

Network may be summarized as below:
Step1: We initialize all filters and parameters/weights
with random values.
Step2: The network takes a training image as input,

goes through the forward propagation step
(convolution, ReLU, and pooling operations along with
forwarding propagation in the Fully Connected layer)
and finds the output probabilities for each class.
Let’s say the output probabilities for the boat image

above are [0.2, 0.4, 0.1, 0.3]
Since weights are randomly assigned for the first

training example, output probabilities are also
random.
Step3: Calculate the total error at the output layer

(summation over all 4 classes)
Total Error = ∑ ½ (target probability — output

probability) ²
Step4: Use Backpropagation to calculate

the gradients of the error concerning all weights in
the network and use gradient descent to update all
filter values/weights and parameter values to
minimize the output error.
The weights are adjusted in proportion to their
contribution to the total error.
When the same image is input again, output

probabilities might now be [0.1, 0.1, 0.7, 0.1], which is
closer to the target vector [0, 0, 1, 0].
This means that the network has learned to classify

this particular image correctly by adjusting its
weights/filters such that the output error is reduced.
Parameters like the number of filters, filter sizes, the

architecture of the network, etc. have all been fixed
before Step 1 and do not change during the training
process — only the values of the filter matrix and
connection weights get updated.
Step5: Repeat steps 2–4 with all images in the

training set.
3: How Convolutional Neural Network Works?
Approach
 Build a small convolutional neural network as

defined in the architecture below.
 Select images to train the convolutional neural
network.
 Extraction of feature filters/feature maps.
 Implementation of the convolutional layer.
 Apply the ReLu Activation function on the
convolutional layer to convert all negative values to
zero.
 Then apply max pooling on convolutional layers.
 Make a fully connected layer
 Then input an image into CNN to predict the image
content
 Backpropagation to calculate the error rate
Selection of Image
Let’s take an image of the alphabet “X” and “O”.
Train the Convolutional Neural Network For

Image X
Feature Filters extraction from image X
In convolutional networks, you look at an image

through a smaller window and move that window to
the right and down. That way you can find features in
that window, for example, a horizontal line or a
vertical line or a curve, etc… What exactly a
convolutional neural network considers an important
feature is defined while learning.
Wherever you find those features, you report that in

the feature maps. A certain combination of features in
a certain area can signal a larger, more complex
feature exists there.
For example, your first feature map could look for

curves. The next feature map could look at a
combination of curves that build circles.
3.1 Convolutional Layers
3.1.1 Convolutional Layer 1 (Image X with filter

1)
In CNN convolutional layer, the 3×3 matrix called the

‘feature filter’ or ‘kernel’ or ‘feature detector’ sliding
over the image and the matrix formed will be
the convolutional layer. It is important to note that
filters act as feature detectors from the original input
image. Image X matching with filter # 1 with
a stride of 1.
The pixel values of the highlighted matrix will be

multiplied with their corresponding pixel values of the
filter and then takes the average.
Here you will see how the filter shifts on pixels with a
stride of 1.
similarly so on
= 0.77
2)
After repeating the same steps (As we did for filter 1)

of the convolutional layer on image “X” with filter 2,
we get

3)
After repeating the same steps of the convolutional
layer on image “X” with filter 3, we get
3.2 RELU Layer:
Apply ReLu Activation Function on Convolutional

layers:
Convert all negative values to zero
3.2.1 Relu layer For Convolutional Layer 1:

Apply ReLu activation Function on Convolutional
Layer 1 to convert all the negative values to zero
3.2.2 Relu layer Convolutional Layer 2:

3.2.2 Relu layer Convolutional Layer 3:

3.3 Max Pooling:
After applying the Convolutional & Relu layer

respectively Now we apply the Max pooling for
convolutional layers 1, 2 & 3 and extract maximum
feature from the image.
3.3.1 Max pooling For Convolutional Layer 1

similarly so on
Repeat the same steps of max pooling for

convolutional layer 2, we get

Repeat the same steps of max pooling for
convolutional layer 3, we get
3.4 Further Max Pooling:
Further Max Pooling for use in fully connected layer
3.4.1 Further Max pooling for convolutional layer 1

3.4.2 Further Max pooling for convolutional layer
2
3.4.3 Further Max pooling for convolutional layer
3
3.5 Flattening:
In this step, we converting all the resultant 2-
dimensional arrays into a single long continuous
linear vector.
Now, these single long continuous linear vectors are

input nodes of our full connection layer.
3.6 Fully Connected Layer ( for X ):

White cells are votes depends on how strongly a value
can predict “X”
These are the input node so after that we doing

backpropagation. For backpropagation, you can
see chapter 2 which we explain backpropagation in
details (we not going backpropagation here because
we already have done in chapter 2)
Train the Convolutional Neural Network For

Image ‘O’
Feature Filters extraction from image ‘O’

After repeating the procedure of training the
convolutional neural network the same as we had
done for training image X, we get
White cells are votes depends on how strongly a value

can predict “O”
Prediction Example
We feed an image into the convolutional neural

network and we don’t know whether it is X or O.
Now we get new input and we don’t know what it is

and want to decide the way this works is the input
goes through all of our convolutional our rectified
linear unit and pooling layer and at the end we get,
Now calculate predictions for X

So our CNN predicts the input image as X with a
prediction rate of 91 percent
Now calculate a prediction for O Image

So our CNN predicts the input image as O with a
prediction rate of 51 percent
As
Prediction rate for X > Prediction rate of O
91 > 51
So,
The image we input for prediction into our
convolutional neural network is X
Note: If you want this article check out

my academia.edu profile.
5: Practical Implementation of Convolutional Neural

Network.
Image Recognition (Cat & Dog dataset)
In this part, we will create a Convolutional Neural

Network that can detect various objects in images. We
will implement this Deep Learning model to recognize
a cat or a dog in a set of pictures. However, this
model can be reused to detect anything else and we
will show you how to do it — by simply changing the
pictures in the input folder.
For example, you will be able to train the same model

on a set of brain images, to detect if they contain a
tumor or not. But if you want to keep it fitted to cats
and dogs, then you will be able to take a picture of
your cat or your dog, and your model will predict
which pet you have.
Dataset sample:
The dataset contains 10000 images of cats and dogs.

Part 1: Building the CNN model
1.1 Import the Libraries:
In this step, we import Keras library and packages for

building the Convolutional Neural Network model.
First, we import the Sequential module which is
used for initializing our model. Second
is Convolutiona2D is the package that we’ll use for
the first step of making the CNN that is the
convolution step in which we add convolutional layers
So here we are working with images which are only 2
dimensions there we use convolution2D( incase of
video we use 3D). Then the next package
is MaxPooling2D that’s a package that we’ll use to
proceed to step to the pooling step that will add our
pooling layers. The next package is Flatten this is the
package that we will use in step 3 flattening in which
we convert all the pooled feature maps that we
created through convolution and max-pooling into the
large feature vector that is then becoming the input of
our fully connected layers. And then finally the last
package which Dense package this is the package we
use to add the fully connected layer.
1.2 Initialize our model:
In this step, we initialize our Convolutional Neural

Network model to do that we use sequential modules.
1.3 Convolution Step:
In this step, we add a convolution step and this is the

first step of CNN. Here in Convolution2D, we pass
several arguments in this module. First arguments
which we pass here feature detector (32, 3, 3) that’s
mean we create 32 feature detectors of three by three
dimensions and therefore our convolutional layers
composed of 32 feature maps. Next arguments
are input shape that’s the shape of input image on
which we are going to apply feature detectors through
the convolution operation in this arguments we
convert all images into same formats here 64 and 64
are dimension and 3 is number of channels because
we are working with colored images that’s why we
use 3 number channels correspond to RGB ( incase of
black & white image we use 1 number of channel).
And the last argument which we pass here
is relu activation function the relu function is used to
replace the negate pixel value by 0(As we understand
in mathematical part of CNN intuition)
1.4 Pooling Step:
In this step, we add the pooling layer of our CNN

model pooling layer is reducing the size of the feature
map( the size of the feature map is divided by 2 when
we apply the pooling layer). Here we passed only one
argument which is pool size and define 2 by 2
dimension because we don’t want to lose any
information about the image there we take the
minimum size of the pool.
1.5 Add a second convolution layer and pooling
layer:
In this step, we add a second convolution layer and

pooling layer to make a mode more efficient and
produce some good accuracy.
1.6 Flatten Step:
In this step, we convert Max pooling 2D into a single

long continuous linear vector to make an input node
of fully connected layers.
1.7 Full Connection layer:
In this step, we use the Dense model to add a different

layer. The parameter which we pass here first is
ouptu_dim=6 which defines hidden layer=128 and
last arguments Third is activation= relu here in
hidden layer we use relu activation.
After adding a Hidden layer we add an output layer in
full connection layer out_dim= 1 which means one
output node here we use the sigmoid function because
our target attribute has a binary class which is cat or
dog that’s why we use sigmoid activation function.
1.8 Compiling the CNN
In this step we compile the CNN to do that we use the

compile method and add several parameters the first
parameter is optimizer = Adam here use the optimal
number of weights. So for choosing the optimal
number of weights, there are various algorithms of
Stochastic Gradient Descent but very efficient one
which is Adam so that’s why we use Adam optimizer
here. The second parameter is loss this corresponds
to loss function here we
use binary_crossentropy because if we see target
attribute our dataset which contains the binary value
that’s why we choose the binary cross-entropy. The
final parameter is metrics its list of metrics to be
evaluated by the model and here we choose the
accuracy metrics.
Part 2: Image Preprocessing

2.1 Import the library
First import a class that will allow us to use this image

data Generator function. This class called Image Data
Generator and we import this class from Keras image
preprocessing.
2.2: Image Augmentation
2.2.1: Train_datagen
This step is like a data preprocessing which we did

chapter 2 ANN but Now we are working with an
image that’s why we doing Image Augmentation. Here
we pass several arguments first is rescale is like a
feature scaling which doing in data preprocessing
here our image 0 to 255-pixel value and we rescale
into 0 & 1. Then next is Shear_range( Shear angle in
the counter-clockwise direction in degrees) that to
apply random transactions and we will keep this
0.2. zoom_range that will apply some random zoom
and we keep 0.2 value. And then last
is horizontal_flip Randomly flip inputs horizontally
2.2.2: Test_datagen
Same as a train but here only rescale our images.
2.3 Training set & Test set
In this step, we import the images that we want to

train. Remember our dataset contains 1000 images
8000 for train and 2000 for test and here first we
import our images from the working directory that we
want to train. Then target size here 64 by 64 that’s
expected CNN which we defined earlier in Building
the CNN part. Then batch size that the size of batch in
which some random sample of our image will be
included and that contains the number of images that
will go through the CNN after which the weight will
be updated. And finally, the class mode that the
parameter indicating if your class is dependent
variable is binary or has more than two categories and
therefore since two-class here cat & dog that’s why
class mode is binary here that we use here.
2.4 Fit the CNN model
First arguments which we pass here is train

set which contains 8000 images belongs to 2 class
second is stepper epoch that is the number of the
image we in our training set because remember all
the observation of the training set pass through CNN
during each epoch since we have 8000 images that’s
why 8000 here. Then the number of epoch you want
to choose to train our CNN here we take only 1
because our purpose is just to understand we suggest
to take more than 50 for good accuracy. And
then validation data that correspond to the test set
on which we want to evaluate the performance of our
CNN. And last are validation steps that correspond to
the number of the image in our test set and that is
2000.
Here we go we obtain 80% for the training set and
81% for the test set.
Part 3: Making a new prediction
3.1 Import the library
In this step, we import libraries the first library which

we import here is numpy used for multidimensional
array and the second is image package from Keras
used for import the image from the working directory.
3.2 Import the Image
In this step, we import our single image we don’t

know either is cat image or dog image (like in
mathematical part when we train images 0 and X and
then make a prediction X). Here we pass two
arguments first is image load and second is target
size. Here we set target size 64 by 64 as expected our
CNN model.
3.3 Set the Dimension of image
In this step, we set the dimension of our image 2D

(64, 64) into the 3D array(64, 64, 3) exactly as input
shape that’s the architecture of input shape.
3.4 Set the Dimension of image
We are working with the single image we directly

make a prediction then will raise an error which
expected convolution 2d input_1 to have 4 dimensions
to do that we modify the test image into 4 dimensions.
3.5 Make a prediction
In this step, we predict a single image
So here we go cat corresponds to 0 and dog

corresponds to 1 so our image is a dog. Perfect so
that’s mean the prediction of our CNN model for the
first image of cat & dog is correct because the image
contains the dog
So here we go cat corresponds to 0 and dog

corresponds to 1 so our image is a dog. Perfect so
that’s mean the prediction of our CNN model for the
first image of cat & dog is correct because the image
contains the dog
If you want dataset and code you also check

my Github Profile.

ML Links

Uploaded by

Copyright:

Available Formats

ML Links

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Links

Uploaded by

Copyright:

Available Formats

1.

Machine Learning tutorial provides basic and advanced concepts of

Machine learning is a growing technology which enables computers

This machine learning tutorial gives you an introduction to machine

What is Machine Learning

Machine Learning is said as a subset of artificial intelligence that

Machine learning enables a machine to automatically learn

With the help of sample historical data, which is known as training

A machine has the ability to learn if it can improve its

How does Machine Learning work

Suppose we have a complex problem, where we need to perform

Features of Machine Learning:

Need for Machine Learning

We can train machine learning algorithms by providing them the

The importance of machine learning can be easily understood by its

Following are some key points which show the importance of

o Rapid increment in the production of data

Classification of Machine Learning

The system creates a model using labeled data to understand the

Supervised learning can be grouped further in two categories of

In unsupervised learning, we don't have a predetermined result. The

The robotic dog, which automatically learns the movement of his

Note: We will learn about the above types of machine learning in

History of Machine Learning

o 1834: In 1834, Charles Babbage, the father of the computer,

The era of stored program computers:

o 1940: In 1940, the first manually operated computer, "ENIAC"

Computer machinery and intelligence:

Machine intelligence in Games:

o 1952: Arthur Samuel, who was the pioneer of machine

The first "AI" winter:

Machine Learning from theory to reality

o 1959: In 1959, the first neural network was applied to a real-

Machine Learning at 21st century

o 2006: In the year 2006, computer scientist Geoffrey Hinton

Machine Learning at present:

Modern machine learning models can be used for making various

o Fundamental knowledge of probability and linear algebra.

Facebook provides us a feature of auto friend tagging suggestion.

It is based on the Facebook project named "Deep Face," which is

Speech recognition is a process of converting voice instructions into

It predicts the traffic conditions such as whether traffic is cleared,

Google understands the user interest using various machine

As similar, when we use Netflix, we find some recommendations for

6. Email Spam and Malware Filtering:

Some machine learning algorithms such as Multi-Layer

These virtual assistants use machine learning algorithms as an

8. Online Fraud Detection:

For each genuine transaction, the output is converted into some

9. Stock Market trading:

10. Medical Diagnosis:

It helps in finding brain tumors and other brain-related diseases

11. Automatic Language Translation:

The technology behind the automatic translation is a sequence to

Machine learning Life cycle

The most important thing in the complete process is to understand

In the complete life cycle process, to solve a problem, we create a

In this step, we need to identify the different data sources, as data

This step includes the below tasks:

o Identify various data sources