OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
INTRODUCTION
ENVIRONMENTAL CONFIGURATION
Installation
To create applications, you need another application unless you want to
get a low level and write applications in machine code - a tough
experience that even real programmers avoid as much as possible.
Writing an application using the Python programming language
requires some important applications. These applications allow you to
work with Python by creating Python code, providing the necessary
help information, and allowing you to execute the code you are writing.
This chapter helps you get a copy of the Python application, install it on
your hard drive, look for installed applications so you can use them, and
test your installation to see how it works.
Download the version you need
Each platform (a combination of hardware and operating system
software) is governed by special rules when running applications. The
Python application hides these details. You enter the code that runs on
any platform supported by Python, and Python applications translate
that code into something that the platform can understand. However,
for the translation to take place, you must have a version of Python that
works on your specific platform. Python supports these platforms:
Amiga Research OS (AROS)
IBM Advanced Unix (AIX)
400 Application System (AS / 400)
Hewlett-Packard Unix (HP-UX)
BeOS
Linux
Microsoft Disk operating system (MS-DOS)
Mac OS X (pre-installed with the operating system)
MorphOS
OS 390 (OS / 390) and z / OS
Operating system 2 (OS / 2)
PalmOS
Psion
playground
QNX
series 60
Windows CE / Pocket PC
RISC OS (originally Acorn)
Solaris
32-bit Windows (XP and later)
Virtual memory system (VMS)
64-bit Windows
Wow, that's a lot of different platforms! This book has been tested with
Windows, Mac OS X, and Linux platforms. However, the examples can
also work with these other platforms because they do not depend on
any code specific to that platform .
To get the correct version for your platform, you need to access HTTP:
// www. python.org/download/releases/3.3.4/. Since the download
section is initially hidden, you must scroll down the page.
If you want to use another platform, click the second link on the left
side of the page. You discover a list of Python installations for other
platforms. Many of these installations are run by volunteers and not by
people who create Python versions for Windows, Mac OS X, and Linux.
Be sure to contact these people when you have questions about the
installation, as they know how to help you get a good setup on your
platform.
Install Python
After downloading your copy of Python, it's time to install it on your
system.
The downloaded file contains everything you need to get started:
• Python
interpreter
• Help files
(documentation)
• Command Line
Access
• IDLE (Integrated
Development
Environment)
application
• Uninstaller (only
on platforms
that need it)
Work with Windows
The process of installation on a Windows system follows the same
procedure that is used for other types of applications. The main
difference is finding the file you downloaded so that you can start the
installation process. The following procedure should work correctly on
any Windows system, whether you use the 32-bit or 64-bit version of
Python.
Find the downloaded copy of Python on your system.
The name of this file varies, but it usually appears under the following
names: python-3.3.4.amd64.msi for 64-bit systems and python-
3.3.4.msi for 32-bit systems. The version number is inserted in the file
name. In this case, the file name refers to version 3.3.4, which is the
version used for this book.
Double-click on the installation file.
(You can see an Open File - Security Warning dialog box asking if you
want to run this file.) Click Run if this dialog box appears.) A Python
Setup dialog box similar to that shown in Figure 2 is displayed. 3 The
precise dialog box you see depends on the version of the Python
installer you download.
Choose a user installation option, and then click Next.
The installation prompts you for the name of an installation directory
for Python. Using the default destination will save you effort and time
later. However, you can install Python anywhere.
Using the Windows \ Program Files (x86) folder or Program Files is
problematic for two reasons. First, the folder name has space, which
makes access difficult from the application. Second, the folder usually
requires administrator access. You will have to constantly fight with the
Windows User Account Control (UAC) feature if you install Python in
any folder .
Type a destination folder name, if needed, and then click Next. Python
asks you to customize your installation.
Enabling the Add python.exe option to the path will save you time. This
feature allows you to access Python from the command prompt
window. Do not worry too much about how you use this feature at the
moment, but it's a good feature to install. The book assumes that you
have enabled this feature. Do not worry about the other features you
see in Figure 2-5. They are all enabled by default, giving you maximum
access to Python features.
(Optional) Select on the down arrow next to the Add python.exe to path
option and choose the options and will be installed on the local drive.
Click Next.
You see the installation process begin. A User Account Control box may
appear asking you if you want to perform the installation. If you notice
this dialog box, click Yes. The installer continues, and a Setup Complete
dialog box appears.
Click Finish.
Python is ready to use.
Work with Mac
Python is probably on your Mac system. However, this installation
usually takes a few years, regardless of the age of your system. For this
book, the installation will probably work properly. You will not test the
limits of Python programming technology - you will learn how to use
Python.
The latest version of OS X at the time of this publication (Mavericks, or
10.9) comes with Python 2.7, which is very useful for working with
book examples .
Depending on how you use Python, you may want to update your
installation at some point. Part of this process involves installing the
GCC (GNU Compiler Collection) tools so that Python has access to the
low-level resources you need. The following steps begin installing a
new version of Python on the Mac OS X system.
Click on the link for your version of OS X:
• Python 3.3.5 Mac OS X 32-bit i386 / PPC installation program for 32-
bit versions on the Power PC processor
• Python 3.3.5 Mac OS X 32-bit / 64-bit x86-64 / i386 installation
program for 32-bit or 64-bit versions on Intel
The Python disk image starts to download. Be patient: downloading the
disk image takes several minutes. You can easily see how long it will
take to the download because most browsers provide a method to
monitor the download process. Once the download is complete, Mac
will automatically open the disk image for you.
The disk image looks like a folder. In this folder, you see several files like
python.mpkg. The python.mpkg file contains the Python application.
Text files contain information about the latest compilation, licenses, and
annotations.
Double-click on python.mpkg.
You see a welcome dialog that informs you about this particular Python
build.
Click Continue three times.
The installer displays the latest notes on Python, the license
information (click Accept when asked about license information), and
finally a target dialog box.
Select the volume (hard disk or other media) that you want to use to
install Python, and then click Continue .
The Installation Type dialog box appears. This dialog box performs two
tasks:
Click Customize to change the feature set installed on your system.
Click Change Installation Location to change the location where the
installer places Python.
The book assumes you are performing a default installation and have
not changed the installation location. However, you can make use of
these options in case you want to use them.
Click Install.
The installer can request your administrator password. Enter the
administrator username and password, if necessary, in the dialog box
and click OK. You see a Python Installation dialog box. The contents of
this dialog will change as the installation process progresses. This will
tell you which part of Python works with the installer.
When you have successfully installed the software, you will see a
Successful Setup dialog box.
Click Close.
Python software is ready to use. (You may decide to close the disk
image at this junction and delete it from your system.)
Work with Linux
Python software some with some versions of Linux. For example, if you
have an RPM (Red Hat Package Manager) -based distribution (such as
CentOS, SUSE, Yellow Dog, Red Hat, and Fedora Core), you probably
already have Python on your system, and you have nothing else to do.
Depending on the version of Linux that you use, the version of Python
varies, and some systems do not include the Interactive Development
Environment (IDE) application. If you have an earlier version of Python
(version 2.5.1 or earlier), you may want to install a newer version to
access IDLE. Most book exercises require the use of IDLE.
Using the default Linux installation
The default installation of Linux runs on any system. However, you
must work in the terminal and enter the commands to complete it.
Some of the actual commands may vary depending on the version of
Linux. The information on http://docs.python.org/3/install/ provides
useful tips that you can use in addition to the following procedure.
Click on the link that matches your Linux version:
Compressed source archive Python 3.3.4 (any version of Linux)
Python 3.3.3 xzip python fonts (better compression and faster
download)
You will be prompt to either open or save the file, choose Save.
Python source files are being downloaded. Be patient: downloading
source files take a minute or two.
Double-click on the downloaded file.
The Archive Manager window opens. Once the files are extracted, you
see the Python 3.3.4 folder in the file manager window.
Double-click the Python 3.3.4 folder.
The file manager extracts the files from the Python 3.3.4 subfolder from
your folder.
Open a copy of the terminal.
The terminal window is displayed. If you have never created software
on your system before, you must install the basics of the compilation,
SQLite and bzip2. Otherwise, the installation of Python will fail.
Otherwise, you can go to step 10 to start using Python without any
delay .
Press Enter after typing the following "sudo apt-get install build-
essential."
Linux installs the necessary Build Essential support for creating
packages (see https://packages.debian.org/squeeze/build-essential for
more details).
Press Enter after typing the following "sudo apt-get install libsqlite3-
dev."
The SQLite support needed by Python software for database
manipulation is installed by Linux (see
https://packages.debian.org/squeeze/libsqlite3-dev).
Press Enter after typing the following "sudo apt-get install libbz2-dev."
The bzip2 support required by Python software for file manipulation is
installed by Linux (see https://packages.debian.org/sid/libbz2-dev for
more details).
Type CD Python 3.3.5 in the Terminal window and press Enter. The
terminal changes directories in the Python 3.3.5 folder of your system.
Type ./configure and press Enter.
The script starts by checking the type of system build, then performs a
series of tasks depending on the system you are using. This process can
take one or two minutes because there is a long list of things to check.
Type make and press Enter.
Linux runs the creation script to create the Python application
software. The manufacturing process may take a minute - this depends
on the processing rate of your system.
Type sudo make altinstall and press Enter .
The system may prompt you for your administrator password. Enter
your password and press Enter. At this point, several tasks occur when
the system installs Python on your system.
CONCEPTS OF LEARNING
Learning involves the process of transforming experience into
knowledge or experience.
As indicated below, learning can be broadly categorized into three
categories, depending on the nature of the learning data and the
interaction between the student and the environment.
• supervised
learning
• Unsupervised
learning
• Semi-
supervised
learning
Likewise, there are four categories of machine learning algorithms, as
shown below -
• supervised
learning
• unsupervised
learning
• semi-
supervised
learning
• reinforcement
learning
However, the most used are supervised and unsupervised learning.
Supervised learning
Supervised learning is commonly used in real-world applications such
as facial and voice recognition, product or film recommendations, and
sales forecasting. Supervised learning is classified into two types -
Regression and Classification.
The regression drives and predicts a continuous value response, for
example, the prediction of real estate prices .
The classification attempts to find the appropriate class label, for
example, by analyzing positive/negative sentiment, men and women,
benign and malignant tumors, secure and unsecured loans, and so on.
In supervised learning, the learning data is accompanied by a
description, labels, goals, or desired results. The goal is to find a general
rule that maps inputs to outputs. This type of learning data is called
labeled data. The learned rule is then used to tag new data with
unknown outputs.
Supervised learning entails building a machine learning model based
on labeled samples. For instance, if we build a system to estimate the
price of land or a house based on various characteristics such as size,
location, etc., we must first create a database and then 'label. We need
to teach the algorithm which features match what prices. According to
this data, the algorithm will master how to calculate the price of the
property using the values of the input resources.
Supervised learning involves learning a function from available training
data. Here, a learning algorithm analyzes the learning data and
produces a derived function that can be used to map new examples.
There are many supervised learning algorithms, which are: Naive Bayes
classifiers, logistic regression, neural networks, and support vector
machines.
Common examples of supervised learning include sorting e-mails into
categories of spam and unwanted junk mail, labeling web pages based
on their content, and voice recognition.
Unsupervised learning
Unsupervised learning is used to detect anomalies, exceptions, such as
fraud or defective equipment, or to group customers with similar
behaviors into a sales campaign. It's the opposite of supervised
learning. There is no data labeled here .
When the training data contains only a few indications without a
description or tag, it is up to the coder or algorithm to look for the
underlying data structure, to determine how to describe the data, or to
discover hidden patterns. This type of learning data is called unbranded
data.
Suppose we have multiple data points and want to classify them into
several groups. We can not know exactly what the classification criteria
would be. Thus, an unsupervised learning algorithm attempts to ideally
classify the given data set into a given number of groups.
Unsupervised learning algorithms are remarkably powerful tools for
analyzing data and identifying patterns and trends. They are most often
used to group similar entries into logical groups. Unsupervised learning
algorithms are Kmeans, Random Forests, Hierarchical Clustering, and
so on.
Semi-supervised learning
If some learning samples are tagged, but others are not, this is semi-
supervised learning. It uses a large amount of uncollected learning data
and a small amount of tagged data for testing. Semi-supervised learning
is applied in cases where acquiring a fully labeled dataset is expensive,
although it is more practical to label a small subset. For example, it is
usually necessary for qualified experts to classify some remote sensing
images and many field experiments to locate oil at a specific location
while acquiring unlabelled data is relatively easy.
Reinforcement learning
Here, the training data provides the system with feedback that allows it
to adapt to the dynamic conditions for achieving a given goal. The
system evaluates its performance according to the responses received
and reacts accordingly. The best-known examples include autonomous
cars and the AlphaGo main algorithm .
Purpose of machine learning
Machine learning can be considered a branch of artificial intelligence or
artificial intelligence because the ability to transform experience into
an experience or to detect patterns in complex data is a hallmark of
human intelligence or animal.
As a scientific domain, machine learning shares common concepts with
other disciplines, such as statistics, information theory, game theory,
and optimization.
As a sub-domain of information technology, your goal is to program
machines to enable them to learn.
However, it should be noted that the purpose of machine learning is not
to create automated duplication of intelligent behavior, but to use the
power of computers to supplement and supplement human
intelligence. For example, machine learning programs can analyze and
process huge databases, detecting patterns that go beyond human
perception.
CHAPTER FOUR
In the real world, many raw data are not adequate for quick processing
by machine learning algorithms. We need to pre-process the raw data
before introducing it into various machine learning algorithms. This
chapter discusses various data preprocessing techniques in Python
machine learning.
Pre-Processing of data
How we process data in Python will be discussed in this section.
First, open a file with a .py extension, such as prefoo.py, in a text editor
such as Notepad.
Then add the following code snippet to this file –
Note that at the output, the average is almost equal to 0, and the
standard deviation is 1.
Scaling
The values of each resource in a data point can vary between random
values. Therefore, it is important to scale them to match the specified
rules.
For scaling, make use of the below code snippet –
Now run the code above, and you can observe the following output −
Note that all values have been scaled between the given interval.
Normalization
Normalization consists of adjusting the values of the resource vector to
measure them on a common scale. In normalization, the values of a
feature vector are adjusted to add 1. We add the below lines to the
prefoo.py file -
You can utilize the following code for normalization –
Now run the code above, and you can observe the following output −
Standardization is used to ensure that data points are not increased due
to the nature of their resources.
Binarization
Binarization is used to convert a digital resource vector into a Boolean
vector. For binarization, you can use the following code –
Now run the code above, and you can observe the following output −
Now run the code above, and you can observe the below output –
As displayed in the above output, the words have been altered into 0-
indexed numbers. Now, a set of labels is what we deal with; they can be
transformed as follows –
Now run the code above, and you can observe the following output –
Now run the code above, and you can observe the following output −
Upon leaving, you will notice that the cartography is perfectly
preserved.
Data analysis
This section details the analysis of data in machine learning in Python
Loading the dataset
We can download data directly from the UCI Machine Learning
repository. Note that we use pandas here to load the data. We will also
use pandas to explore the data, both with descriptive statistics and data
visualization. Note the following code and note that we specify the
names of each column when loading the data.
After running the code, you may notice that the dataset is loaded and
ready for analysis. Here we downloaded the pima_indians.csv file,
moved it to our working directory and loaded it using the local file
name.
Summarize the dataset
In summary, the data can be done in several ways, as follows:
• Check the
dimensions
of the
dataset
• Full data
list
• View the
statistical
summary
of all
attributes
• Splitting of
data by
class
variabl e
Dimensions of the dataset
You can utilize the below command to check the number of instances
(rows) and attributes (columns) that the data have with the shape
property.
Then, concerning the code we have talked about, we can see 759
instances and six attributes –
This command provides you with the below output that shows each
attribute’s statistical summary. −
Then you are shown the number of the output of instances as shown −
Data visualization
You can display the data using two types of graphics, as shown -
Univariate graphs to understand each attribute
Multivariate graphs to understand the relationships that exist between
attributes
Univariate Plots
These are plots of each variable. Consider the case where input
variables are numeric and where we need to create box and mustache
graphics. The following code can be used for this purpose.
The output can be seen with a clearer idea of the distribution of input
attributes, as shown -
Box and Whisker Plots
A histogram of each input variable can be created to get the distribution
idea by making use of the commands below –
In the output, it can be seen that two of the input variables have a
Gaussian distribution. Thus, these tables help to give an idea of the
algorithms that we can use in our program.
Multivariate Plots
Multivariate graphs help us understand the interactions between
variables.
Scatter Plot matrix
First, let's examine the scatter plots of all attribute pairs. This can be
useful for identifying structured relationships between input variables.
Output
Example
Here we are making use of the banknote authentication dataset to
determine the accuracy.
When the code given above is executed, you can observe the output as
follows −
Now find a line that divides the data between the two groups of data
sorted differently. This line can be seen such that the distances from the
nearest point in each of the two groups will be farther apart.
In the above example, the line dividing the data into two differently
sorted groups is the black line, because the two closest points are
furthest from the line. This line is our classifier. So, depending on the
arrival of the test data on both sides of the line, we can sort the new
data.
You can take note of the following result and plot after running the code
shown above −
Naïve Bayes Algorithm
It is a classification technique based on the Bayes theorem with an
assumption of independence of the predictor variables. Succinctly put,
a Naive Bayes classifier assumes that the presence of a particular
characteristic in one class is dissimilar to the existence of another
characteristic.
For example, fruit can be considered an orange if it is orange, round and
about 3 inches in diameter. Even if these characteristics depend on each
other or the existence of other characteristics, a Bayes naive classifier
would consider that all these characteristics contribute independently
to the probability that this fruit is an orange .
The naive Bayes model is easy to perform and particularly useful for
very large datasets. In addition to being simple, Naive Bayes is known
for its ability to overcome even the most advanced classification
methods.
Bayes' theorem provides a means of calculating the posterior
probability P (c | x) of P (c), P (x) and P (x | c). Observe the equation
given here: P (c / x) = P (x / c) P (c) / P (x)
Or,
P (c | x) stands for the posterior probability of the given predictor
(attribute) of class (target).
P (c) is the previous class probability.
P (x | c) is the probability which is the probability of a given predictor
class.
P (x) is the prior probability of the predictor.
Consider the example below for a better understanding
Suppose a set of Time training data and the corresponding Play goal
variable. We must now determine whether players will play or not
depending on the weather conditions. For this, you will have to follow
the steps below -
Step 1 - Convert the dataset into the frequency table.
Step 2 - Create the probability table, looking for probabilities, such as
overcast probability = 0.29 and gambling probability of 0.64.
Step 3 - Use the naive Bayesian equation to calculate the posterior
probability of each class. The result of the forecast will be the class with
the highest probability.
Problem - Players will play if the weather is nice, is this statement
correct ?
Solution - We can solve it using the method described above, then P
(Yes | Sun) = P (Sun | Yes) * P (Yes) / P (Sun)
Here we have, P (Sun | Yes) = 3/9 = 0.333, P (Sun) = 5/14 = 0.357, P
(Yes) = 9/14 = 0.64
Now, P (Yes | Sun) = 0.333 * 0.64 / 0.357 = 0.60, which has a higher
probability.
Naive Bayes makes use of a similar technique to predict the probability
of different classes based on various attributes. This algorithm is
mainly used in the classification of text and during problems with
several classes.
The code below snippet shows an example of Naive Bayes
implementation –
After running the code given above, you can observe the following
output –
KNN (K-nearest neighbors)
K-Nearest Neighbors, in short KNN, is a supervised learning algorithm
that specialized in classification. KNN is a simple algorithm which keeps
all available cases and classifies new cases by the vote of most of its
neighbors k. The case allocated to the class is the most common case
among its nearest neighbors, measured by a distance function. These
distance functions can be Euclidean distance, Manhattan, Minkowski,
and Hamming. The first three functions are used for the continuous
function and the fourth one (Hamming) for the categorical variables. If
K = 1, the case is assigned to the class of its nearest neighbor.
Sometimes the choice of K is difficult during KNN modeling.
The algorithm examines different centroids and compares the distance
using a kind of (usually Euclidean) function, analyzes these results and
assigns each point to the group so that it is optimized to be placed with
all the nearest points.
You can use KNN for classification and regression issues. However, it is
more popularly used in classification issues in the industry. KNN can be
easily mapped in our real lives.
You will have to write the following points before selecting KNN
• KNN is
expensive in
the
calculation.
• Variables
must be
normalized,
otherwise
higher
interval
variables may
influence it.
• Works more
at the
preprocessing
stage before
going to KNN
as an outlier
Observe the following code to understand KNN better –
The code given above will give the output below −
K-Means
This is a kind of unsupervised algorithm that handles clustering issues.
Its procedure follows an easy and simple way to classify a given dataset
using several clusters (suppose k clusters). The data points of a cluster
are homogeneous and heterogeneous for matched groups.
How K forms a cluster
K-means forms cluster in the steps below -
• K-means
selects
the
number
of points
for each
cluster
called
centroids.
• A cluster
is formed
by each
data
point
with the
nearest
centroids,
k k.
• Finds the
centroid
of all
clusters
based on
existing
cluster
members.
Here we
have new
centroids.
Since we have new centroids, repeat steps 2 and 3. Find the nearest
distance from each data point from the new centroids and join the new
k-clusters. Keep repeating this process until convergence occurs, that is,
until the centroids do not change.
Determination of the value of K
In K-means, there are clusters, and every cluster has its centroid. The
addition of the square of the difference between the center of gravity
and the data points of a cluster is the sum of the square value of that
cluster. Also, when the sum of the square values of all clusters is added,
it becomes total within the sum of the square value of the cluster
solution.
We know that the value continues to decrease as the number of clusters
increases, but if you plot the result, you will see that the sum of the
square distance decreases abruptly to a value of ke, then much more
slowly after that. Here we can find the optimal cluster number.
Note the following code –
You can see the following output after running the code given above -
Random forest
Random Forest is a popular ensemble supervised learning algorithm.
"Together" means that it takes a lot of "weak students" and that they
work together to form a powerful predictor. In this case, weak students
are all randomly implemented decision trees that are pooled to form
the strongest predictor - a random forest.
Note the following code –
Below is the output for the code −
Over the last six years, there has been an exponential increase in data
capture at all possible levels and points. Government agencies/research
organizations/companies not only offer new sources of data; they also
capture very detailed data at different times and different stages.
For example, e-commerce companies capture more details about
customers, such as demographics, browsing history, likes or dislikes,
shopping history, reviews, and other details for personalized attention.
Currently, available data can have thousands of features. It is difficult to
reduce them while keeping as much information as possible. In such
situations, reducing dimensionality helps a lot.
Boosting Algorithms
The term "Boosting" denotes a family of algorithms that convert weak
students into strong learners. Let's understand this definition by
solving a spam problem, as shown below -
What procedure should be followed to sort an email as spam or not? In
the initial approach, we identify spam and non-spam using the
following criteria if:
• The e-mail has only
one image file
(advertising image);
it is a spam.
• The email has only
one link, is a spam
• The body of the e-
mail consists of a
sentence of the type
"You have won a
cash prize of xxxxxx
$." It's a spam
• Emails from our
official domain
"Tutorialspoint.com"
is not spam
• Email from a known
source, not spam
Above we have defined several rules for classifying an email as spam or
non-spam. However, these rules are not perfect enough to classify spam
mail successfully or not. Therefore, these rules are called weak
apprentices .
To convert a weak student into a strong student, we combine each low
student's prediction using methods such as:
• Use of
the
weighted
average
/
weighted
average
• Given
the
highest
voting
forecasts
For example, suppose we have defined seven weak students. Of these
seven people, five are elected "SPAM," and two are elected "Not a
SPAM." In this case, by default, we will consider an email as spam
because we have more than (5) votes for "SPAM."
How it works
Boosting combines the weak learner or the basic learner to form a
strong rule. This section explains how reinforcement identifies weak
rules.
To find a weak rule, we apply basic learning algorithms (ML) with a
different distribution. Each time-based learning algorithm is applied; it
generates a new weak forecasting rule. This uses the iteration process
many times. After many iterations, the optimization algorithm
combines these weak rules into one strong prediction rule.
To choose the right distribution for each round, follow the steps -
Step 1 - The basic student takes all distributions and assigns equal
weight to each one.
Step 2 - If there is a prediction error caused by the learning algorithm
of the first base, we will have a larger weight for the observations with
a prediction error. Then we apply the next-base-learning algorithm.
We go through step 2 until we reach the threshold of the basic learning
algorithm or higher accuracy.
Finally, it combines the results of the weak student and allows a strong
learner to improve the predictive power of the model. The incentive
puts more emphasis on examples that are misclassified or have more
serious errors due to loose rules .
Types of Boosting algorithms
There are several types of mechanisms used to drive algorithms -
decision segment, margin maximization classification algorithm, and so
on. Different boost algorithms are listed here -
• AdaBoost
(adaptive
improvement)
• Gradient tree
boosting
• XGBoost
This section focuses on AdaBoost and gradient enhancement, followed
by their respective enhancement algorithms.
AdaBoost
Observe the following figure that explains the Ada-boost algorithm.
This is explained below –
Box 1 - You can see that we assign equal weights to each data point and
apply a decision stub to classify them as + (plus) or - (minus). The
decision stub (D1) created a vertical line on the left side to sort the data
points. This vertical line incorrectly predicts three + (plus) as - (minus).
Then, we will assign a higher weight to these three + (plus) and apply
another decision segment.
Box 2 - Here, you can see that the size of three data points (predicted by
mistake) + (plus) is larger than the rest of the data points. In this case,
the second decision point (D2) will attempt to predict correctly. Now, a
vertical line (D2) on the right side of this box has ranked three
misclassified + (plus) correctly. But again, he made incorrect sorting
mistakes. This time with three - (less) data points. Again, higher
weights will be assigned to the three fewer data points and apply
another decision point.
Box 3 - Here, three (least) data points receive larger weights. A decision
stub (D3) is applied to predict these misclassified observations
properly. This time, a horizontal line is produced to rank the + (plus)
and - (minus) data points based on the larger weight of the
misclassified observations.
Box 4 - We reach here D1, D2, and D3 to form a strong prediction with
complex rules compared to weak individual students. It can be seen
that this algorithm ranked these observations highly relative to any of
the weak individual students.