Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Unit-3 Introduction To Machine Learning Algorithms

Uploaded by

homeserv123
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit-3 Introduction To Machine Learning Algorithms

Uploaded by

homeserv123
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-3 | Introduction to Machine Learning Algorithms

Decision Tree Classification


Algorithm
o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
UNIT-3 | Introduction to Machine Learning Algorithms

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

Why use Decision Trees?


There are various algorithms in Machine learning, so choosing the best algorithm for
the given dataset and problem is the main point to remember while creating a
machine learning model. Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-
like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.

 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.

Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according
to the given conditions.

Branch/Sub Tree: A tree formed by splitting the tree.


UNIT-3 | Introduction to Machine Learning Algorithms

Pruning: Pruning is the process of removing the unwanted branches from the tree.

Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts
from the root node of the tree. This algorithm compares the values of root attribute
with the record (real dataset) attribute and, based on the comparison, follows the
branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other
sub-nodes and move further. It continues the process until it reaches the leaf node of
the tree. The complete process can be better understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by ASM). The root node splits further
into the next decision node (distance from the office) and one leaf node based on
the corresponding labels. The next decision node further gets split into one decision
node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf
nodes (Accepted offers and Declined offer). Consider the below diagram:
UNIT-3 | Introduction to Machine Learning Algorithms

Attribute Selection Measures


While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or ASM. By this
measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:

o Information Gain
o Gini Index

1. Information Gain:
o Information gain is the measurement of changes in entropy after the segmentation of
a dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision
tree.
o A decision tree algorithm always tries to maximize the value of information gain, and
a node/attribute having the highest information gain is split first. It can be calculated
using the below formula:
UNIT-3 | Introduction to Machine Learning Algorithms

1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples


o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Advantages of the Decision Tree


o It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
UNIT-3 | Introduction to Machine Learning Algorithms

o For more class labels, the computational complexity of the decision tree may
increase.

Support Vector Machine Algorithm


Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is called
a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine. Consider the below diagram in which there are two
different categories that are classified using a decision boundary or hyperplane:

The objective of the support vector machine algorithm is to find a hyperplane in an


N-dimensional space (N — the number of features) that distinctly classifies the data
points.
UNIT-3 | Introduction to Machine Learning Algorithms

Possible hyperplanes

To separate the two classes of data points, there are many possible hyperplanes that
could be chosen. Our objective is to find a plane that has the maximum margin, i.e
the maximum distance between data points of both classes. Maximizing the margin
distance provides some reinforcement so that future data points can be classified
with more confidence.

Hyperplanes in 2D and 3D feature space

Hyperplanes are decision boundaries that help classify the data points. Data points
falling on either side of the hyperplane can be attributed to different classes. Also,
the dimension of the hyperplane depends upon the number of features. If the
number of input features is 2, then the hyperplane is just a line. If the number of
input features is 3, then the hyperplane becomes a two-dimensional plane. It
becomes difficult to imagine when the number of features exceeds 3.
UNIT-3 | Introduction to Machine Learning Algorithms

K-Nearest Neighbor (KNN) Algorithm


o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use the
UNIT-3 | Introduction to Machine Learning Algorithms

KNN algorithm, as it works on a similarity measure. Our KNN model will find the
similar features of the new data set to the cats and dogs images and based on the
most similar features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?


Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve this
type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily
identify the category or class of a particular dataset. Consider the below diagram:

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:
UNIT-3 | Introduction to Machine Learning Algorithms

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category.
Consider the below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already studied
in geometry. It can be calculated as:
UNIT-3 | Introduction to Machine Learning Algorithms

o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
UNIT-3 | Introduction to Machine Learning Algorithms

o As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.

How to select the value of K in the K-NN


Algorithm?
Below are some points to remember while selecting the value of K in the K-NN
algorithm:

o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:


o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data
points for all the training samples.
UNIT-3 | Introduction to Machine Learning Algorithms

Time Series Forecasting


Time Series is a certain sequence of data observations that a system collects within specific
periods of time — e.g., daily, monthly, or yearly. The specialized models are used to analyze
the collected time-series data — describe and interpret them, as well as make certain
assumptions based on shifts and odds in the collection. These shifts and odds may include
the switch of trends, seasonal spikes in demand, certain repetitive changes or non-systematic
shifts in usual patterns, etc.

All the previously, recently, and currently collected data is used as input for time series
forecasting where future trends, seasonal changes, irregularities, and such are elaborated
based on complex math-driven algorithms. And with machine learning, time series
forecasting becomes faster, more precise, and more efficient in the long run. ML has proven
to help better process both structured and unstructured data flows, swiftly capturing
accurate patterns within massifs of data.

Applications of Machine Learning Time Series


Forecasting

 Stock prices forecasting — the data on the history of stock prices combined with the
data on both regular and irregular stock market spikes and drops can be used to gain
insightful predictions of the most probable upcoming stock price shifts.
UNIT-3 | Introduction to Machine Learning Algorithms

 Demand and sales forecasting — customer behaviour patterns data along with inputs
from the history of purchases, timeline of demand, seasonal impact, etc., enable ML
models to point out the most potentially demanded products and hit the spot in the
dynamic market.

 Web traffic forecasting — common data on usual traffic rates among competitor
websites is bunched up with input data on traffic-related patterns in order to predict
web traffic rates during certain periods.

 Climate and weather prediction — time-based data is regularly gathered from


numerous interconnected weather stations worldwide, while ML techniques allow to
thoroughly analyze and interpret it for future forecasts based on statistical dynamics.

 Demographic and economic forecasting — there are tons of statistical inputs in


demographics and economics, which is most efficiently used for ML-based time-
series predictions. As a result, the most fitting target audience can be picked and the
most efficient ways to interact with that particular TA can be elaborated.

 Scientific studies forecasting — ML and deep learning principles accelerate the rates
of polishing up and introducing scientific innovations dramatically. For instance,
science data that requires an indefinite number of analytical iterations can be
processed much faster with the help of patterns automated by machine learning.
UNIT-3 | Introduction to Machine Learning Algorithms

Principal Component Analysis (PCA)


Principal component analysis, or PCA, is a dimensionality-reduction method that is often
used to reduce the dimensionality of large data sets, by transforming a large set of variables
into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy,
but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because
smaller data sets are easier to explore and visualize and make analyzing data much easier
and faster for machine learning algorithms without extraneous variables to process.

So, to sum up, the idea of PCA is simple — reduces the number of variables of a data set,
while preserving as much information as possible.

Clustering in Machine Learning


Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the possible
similarities remain in a group that has less or no similarities with another
group."

It does it by finding some similar patterns in the unlabelled dataset such as shape,
size, colour, behaviour, etc., and divides them as per the presence and absence of
those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the


algorithm, and it deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and complex
datasets.
UNIT-3 | Introduction to Machine Learning Algorithms

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but the difference is
the type of dataset that we are using. In classification, we work with the labelled data set,
whereas in clustering, we work with the unlabelled dataset.

Example: Let's understand the clustering technique with the real-world example of
Mall: When we visit any shopping mall, we can observe that the things with similar
usage are grouped together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way. Other examples of
clustering are grouping documents according to the topic.

The clustering technique can be widely used in various tasks. Some most common
uses of this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.

Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of
products. Netflix also uses this technique to recommend the movies and web-series
to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the
different fruits are divided into several groups with similar properties.
UNIT-3 | Introduction to Machine Learning Algorithms

Types of Clustering Methods


The clustering methods are broadly divided into Hard clustering (datapoint belongs
to only one group) and Soft Clustering (data points can belong to another group
also). But there are also other various approaches of Clustering exist. Below are the
main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine
Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets
into different groups.
UNIT-3 | Introduction to Machine Learning Algorithms

o In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping
similar data objects in one group that is far from the other dissimilar objects. The
accurate result of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers
based on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands
use in the GIS database. This can be very useful to find that for what purpose the
particular land should be used, that means for which purpose it is more suitable.

You might also like