Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
170 views

Quantitative Methods Module 1

The document provides an overview of a quantitative methods course offered at SLSU. The 3-credit course will introduce students to data science concepts and tools for data collection, analysis, modeling, product creation and communication. Students will learn to analyze classification and prediction techniques for decision making. The course aims to achieve the university's vision of developing science, technology and innovation leaders and professionals.

Uploaded by

Edmond Dantes
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views

Quantitative Methods Module 1

The document provides an overview of a quantitative methods course offered at SLSU. The 3-credit course will introduce students to data science concepts and tools for data collection, analysis, modeling, product creation and communication. Students will learn to analyze classification and prediction techniques for decision making. The course aims to achieve the university's vision of developing science, technology and innovation leaders and professionals.

Uploaded by

Edmond Dantes
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

COURSE OVERVIEW

Course Code IT 206


Descriptive Title Quantitative Methods
Credit Units 3 Units
School Year/Term 2nd Semester, AY: 2021-2022
Mode of Delivery Synchronous & Asynchronous
Name of Instructor/ Professor Julius Amfil Dublado, Arman M. Masangkay
Course Description The course will introduce students to this rapidly growing field
and equip them with some of its basic principles and tools as well
as its general mind-set. Students will learn concepts, techniques
and tools they need to deal with various facets of data science
practice, including data collection and integration, exploratory
data analysis, predictive modelling, descriptive modelling, data
product creation, evaluation, and effective communication.
Course Outcomes Knowledge (Think)
1. Analyze effectively innovative or common classification and
prediction techniques and/or mathematical tools in decision-
making.
SLSU Vision A high quality corporate university of Science, Technology and
Innovation
SLSU Mission SLSU will a. develop Science, Technology and Innovation leaders
and professionals; b. produce high-impact technologies from
research and innovations; c. contribute to sustainable
development through responsive community engagement
programs; d. generate revenues to be self-sufficient and
financially-viable
N

1
MODULE GUIDE

This activity module is made and developed primarily for you (especially those students
who have no access in the internet) as a learning guide as you engage in the subject this Second
Semester of A.Y. 2021-2022. In the absence of face-to-face mode of learning, this learning
material could be helpful in supplementing your learning needs for the subject, the task
throughout the module, which you are required to do, are purposely designed so that course
outcomes and intended learning outcomes could still attained even in distance learning.
Before going through with the task, you are advised to answer first the pre-test to
measure your prior knowledge. As an activity module, series of task are provided in the effort to
satisfy both course and learning outcomes. After answering the series of task/activities in the
module you are going to answer the post-test found on the last part to measure how much you
learn from the subject in the whole perspective.
At the end of the module, requirements on answering are clearly stipulated. Also, the
methods of sending your answers both online and offline are directed.
Furthermore, please be guided that learning this module will be an advantage on your
part specifically, in understanding the basic context of quantitative methods, its classification
and common prediction techniques.
For more information and concerns you may contact:

Name of Instructor Contact Number Facebook Account

Julius Amfil E. Dublado 09066056202 Fhel Dublado


Arman M. Masangkay 09385051210 Arman Macasuhot Masangkay

2
PRE-TEST

Direction: Identify the given statement if it is PROS or CONS.


1. It is easy and fast to predict a class of test data set. PROS
2. Naive Bayes is also known as a bad estimator. CONS
3. Naïve Bayes classifiers performs better compare to other models assuming independence. OS
4. Independent predictor assumption. CONS
5. It perform well in case of categorical input variables compared to numerical variables. PROS

LESSON ONE NAÏVE BAYES

LEARNING PLAN

Intended Learning Outcome:

Describe innovative or common classification and prediction techniques or mathematical tools


in decision-making.

Enabled Learning Outcomes

In this lesson, you will learn about


1. Common classification and prediction techniques
2. Data Types
3. Classifiers

3
Introduction

KEYS TO REMEMBER:

 Bayes Theorem  K-Nearest Neighbors


 Data types  Support Vector Machine
 Classifiers  Decision Tree
 Naive Bayes  Probability

Discussions DISCUSSIONS

Quantitative approaches focus on numerical data and generalizing it across


objective measurements and statistical, groups of people or explaining a
mathematical, or numerical analysis of phenomenon.
data acquired through polls,
questionnaires, and surveys, as well as A classifier is a machine learning model that
modifying pre-existing statistical data distinguishes between objects based on
using computing techniques. Quantitative particular characteristics.
research is concerned with collecting

Examples of Classifiers are:

1. Naive Bayes classifiers 

-
- are a group of straightforward training such classifiers, there is no one
"probabilistic classifiers" based on Bayes' algorithm, but rather a variety of algorithms
theorem and strong (naive) independence based on the same principle: all naive Bayes
assumptions between features. classifiers assume that the value of one
feature is independent of the value of any
- is a simple technique for building other feature, given the class variable. For
classifiers: models that give class labels to example, if a fruit is red, round, and around
problem occurrences, represented as 10 cm in diameter, it is termed an apple.
vectors of feature values, with the class Regardless of any possible relationships
labels selected from a finite set of labels. For between these traits, a naive Bayes

4
classifier considers each of them to contribute independently to the likelihood
that this fruit is an apple
.

Principle of Naïve Bayes Classifier

A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification
task. The crux of the classifier is based on the Bayes theorem.
Prior Probability – which describes the
Bayes Theorem: degree to which we believe the model
accurately describes reality based on all
Likelihood – which describes how of our prior information.
well the model predicts the data

Posterior Probability – which represents the


degree to which we believe a given model Normalizing Constant – the constant that
accurately describes the situation given the makes the posterior density integrate to
available data and all of our prior one.
information.

Using Bayes theorem, we can find the is that the predictors/features are
probability of A happening, given that B has independent. That is presence of one
occurred. Here, B is the evidence and A is particular feature does not affect the other.
the hypothesis. The assumption made here Hence it is called naive.

Example:
Let us take an example to get some better intuition. Consider the problem of playing golf. The
dataset is represented as below.

Example 1, Table 1:

5
We classify whether the day is suitable for above we consider that these predictors are
playing golf, given the features of the day. independent. That is, if the temperature is
The columns represent these features and hot, it does not necessarily mean that the
the rows represent individual entries. If we humidity is high. Another assumption made
take the first row of the dataset, we can here is that all the predictors have an equal
observe that is not suitable for playing golf if effect on the outcome. That is, the day being
the outlook is rainy, temperature is hot, windy does not have more importance in
humidity is high and it is not windy. We deciding to play golf or not.
make two assumptions here, one as stated

According to this example, Bayes theorem can be rewritten as:

The variable y is the class variable (play golf), which represents if it is suitable to play golf or not
given the conditions. Variable X represent the parameters/features.

X is given as,

Here x_1,x_2….x_n represent the features, i.e they can be mapped to outlook, temperature,
humidity and windy. By substituting for X and expanding using the chain rule we get,

Now, you can obtain the values for each by looking at the dataset and substitute them into the
equation. For all entries in the dataset, the denominator does not change, it remain static.
Therefore, the denominator can be removed and a proportionality can be introduced.

In our case, the class variable(y) has only two outcomes, yes or no. There could be cases where
the classification could be multivariate. Therefore, we need to find the class y with maximum
probability.

Using the above function, we can obtain the class, given the predictors.

Types of Naïve Bayes Classifier

6
Multinomial Naive Bayes: This is similar to the multinomial naive
bayes but the predictors are boolean
This is mostly used for document variables. The parameters that we use to
classification problem, i.e whether a predict the class variable take up only values
document belongs to the category of sports, yes or no, for example if a word occurs in the
politics, technology etc. The text or not.
features/predictors used by the classifier
are the frequency of the words present in
Gaussian Naive Bayes:
the document.
When the predictors take up a continuous
Bernoulli Naive Bayes: value and are not discrete, we assume that
these values are sampled from a gaussian
distribution.

Since the way the values are present in the dataset changes, the formula for conditional
probability changes to,

Naïve Bayes Apps


- Credit Scoring
- Medical
- Realtime Predicion
- Multi-class Prediction
- Text Classification
- Spam Filtering
- Sentiment Analysis
- Recommendation System
Example: Given the data set with a variables of Outlook, Temperature, Humidity, Windy and
Play, we have to determine if on the day 15 we have to calculate the probability if we can play or
cannot play.

DAY OUTLOOK TEMPERATURE HUMIDITY WINDY PLAY

1 Sunny Hot high False No

2 Sunny Hot High True No

7
3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Sunny Mild Normal True Yes

12 Overcast Mild high True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No

15 Sunny Cool High True ?

Solution:
1. Determine/Count the probability of how many YES and NO.
P(YES) = 9/14
P(NO) = 5/14
2. Count the YES & NO of the every the data set under the variables of Outlook, Temp., Humidity,
Windy & Play.
i.e.:

OUTLOOK

Yes No P(Yes) P(No)

Sunny 2 3 2/9 3/5

Overcast 4 0 4/9 0/5

Rainy 3 2 3/9 2/5

Total 9 5 100% 100%

TEMPERATURE

Yes No P(Yes) P(No)

Hot 2 2 2/9 2/5

mild 4 2 4/9 2/5

Cool 3 1 3/9 1/5

Total 9 5 100% 100%

8
HUMIDITY

Yes No P(Yes) P(No)

High 3 4 3/9 4/5

Normal 6 1 6/9 1/5

Total 9 5 100% 100%

WINDY

Yes No P(Yes) P(No)

False 6 2 6/9 2/5

True 3 3 3/9 3/5

Total 9 5 100% 100%

Play P(Yes)/P(No)

Yes 9 9/14

No 5 5/14

Total 14 100%

3. Probability that we can play the game:


- P(Outlook = Sunny | Play = Yes) = 2/9
- P(Temperature = Cool | Play =Yes) = 3/9
- P(Humidity = High | Play = Yes) = 3/9
- P(Windy = True | Play =Yes) = 3/9
- P(Play = Yes) = 9/14

Probability that we cannot play the game:


- P(Outlook = Sunny | Play = No = 3/5
- P(Temperature = Cool | Play =No) = 1/5
- P(Humidity = High | Play = No) = 4/5
- P(Windy = True | Play =No) = 3/5
- P(Play = No) = 5/14

4. Calculate all probability of YES and probability of No.

P(X | Play = Yes)P(Play=Yes)

9
P( X | Play = Yes)P(Play=Yes) = (2/9) * (3/9) * (3/9) * (3/9) * (9/14) = 0.0053

P(X | Play = No) P(Play = No) = (3/5) * (1/5) * (4/5) * (3/5) * (5/14) = 0.0206

5. Multiply the probability of the variables:

- P (X) = P(Outlook = Sunny) * P (Temperature = Cool) * P (Humidity =High) * P


(Windy = True)
- P (X) = (5/14) * (4/14) * (7/14) * (6/14)

- P (X) = 0.02186

6. Dividing the results by this value:

- P (Play = Yes|X) = 0.0053 / 0.02186 = 0.2424


- P (Play = No|X) = 0.0206 / 0.02186 = 0.9421

Therefore, 0.9421 is greater than 0.2424, so the answer is NO

ASSESSMENT TASK

1. Given the data set above Naïve Bayes (i.e. Example), with a variables of Outlook,
Temperature, Humidity, Windy. Since on the day 15 th the answer of variable play is NO,
determine on the day 16th if you can PLAY or cannot PLAY given the data:

Outlook = Rainy
Temperature = Cool
Humidity = Normal
Windy = True
Play =?

LESSON TWO K-NEAREST NEIGHBORS

K-Nearest Neighbors (KNN)


- - is one of the most basic Machine Learning methods for regression and classification
problems.
- - make use of data to classify new data points using similarity measures (e.g.
distance function). A majority vote of its neighbors is used to classify it.
- It operates by calculating the distances between a query and all the instances in the
data, selecting the K closest examples to the query, and then voting for the most

10
frequent label (in the case of classification) or averaging the labels (in the case of
regression).

'k' in KNN is a parameter that refers to the number of nearest neighbours to include in the
majority of the voting process. ... Let's say k = 5 and the new data point is classified by the
majority of votes from its five neighbours and the new point would be classified as red since
four out of five neighbours are red.

Advantages search (as we will see in the next


section).
Disadvantages
1. The algorithm is simple and easy to
implement. 1. The algorithm gets significantly slower
2. There’s no need to build a model, tune as the number of examples and/or
several parameters, or make additional predictors/independent variables
assumptions. increase.
3. The algorithm is versatile. It can be
used for classification, regression, and

Example:

1. Given the dataset we have to classify if which sports the person belong to. With that,
we have to use the distance formula and determine the closest distance.

NAME AGE GENDER DISTANCE SPORTS


Ajay 32 M 27.02 Football
Mark 40 M ? Neither
Sara 16 F ? Cricket
Zaira 34 F ? Cricket
Sachin 55 M ? Neither
Rahul 40 F ? Cricket
Poja 20 F ? Neither
Smith 15 M ? Cricket
Laxmi 55 F ? Football
Michael 15 M ? Football
Angelina 5 F ?

Solution:
1. We have to scale the Gender in order for us to calculate by using the Euclidean
Distance formula.

i.e. : Male = 0
Female = 1

NAME AGE GENDER DISTANCE SPORTS


Ajay 32 0 27.02 Football
Mark 40 0 ? Neither
Sara 16 1 ? Cricket
Zaira 34 1 ? Cricket
Sachin 55 0 ? Neither

11
Rahul 40 1 ? Cricket
Poja 20 1 ? Neither
Smith 15 0 ? Cricket
Laxmi 55 1 ? Football
Michael 15 0 ? Football
Angelina 5 1 ?
2. We have to determine the distance of Angelina and Ajay and soon given the distance
formula.

√(x1 – x2)2 + (y1 – y2)2 - Euclidean Distance formula.


So,

√(5 – 32)2 + (1 – 0)2

= √729 + 1
= 27.02 – distance from Angelina to Ajay
3. To determined of which sports of angelina would be, we have to give the value of K. Suppose
the value of K = 3.

NAME AGE GENDER DISTANCE SPORTS


Ajay 32 0 27.02 Football
Mark 40 0 35.01 Neither
Sara 16 1 11.00 Cricket
Zaira 34 1 29.00 Cricket
Sachin 55 0 50.01 Neither
Rahul 40 1 35.01 Cricket
Poja 20 1 15.00 Neither
Smith 15 0 10.00 Cricket
Laxmi 55 1 50.10 Football
Michael 15 0 10.05 Football
Angelina 5 1 Cricket

4. Since our K =3, we have to get the three (3) close distance.
Zaira =9 – Cricket
Smith = 10 – Cricket
Michael = 10.05 - Football

Therefore, the sports of Angelina is CRICKET, because it appears twice.

ASSESSMENT TASK

1. Given the data set above of K-NN (i.e. Example), with a variables of Name, Age,
Gender, Distance & Sports. Since the answer is CRICKET of the given example.
Determine or classify of which sports the person belong to.

Name = Janjan
Age = 30
Gender = F
Sports =?

12
LESSON THREE SUPPORT VECTOR MACHINE (SVM)

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used


for both classification and regression challenges. However, it is mostly used in classification
problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space
(where n is number of features you have) with the value of each feature being the value of a
particular coordinate. Then, we perform classification by finding the hyper-plane that
differentiates the two classes very well (look at the below snapshot).

Support Vectors are simply the co-ordinates of individual observation. The SVM classifier is a
frontier which best segregates the two classes (hyper-plane/ line).
Let’s understand:

 Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B


and C). Now, identify the right hyper-plane to classify star and circle.
You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-
plane which segregates the two classes better”. In this scenario, hyper-plane “B”
has excellently performed this job.
 Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B
and C) and all are segregating the classes well. Now, How can we identify the right
hyper-plane?

Here, maximizing the distances between nearest data point (either class) and hyper-
plane will help us to decide the right hyper-plane. This distance is called as Margin. Let’s
look at the below snapshot:
Above, you can see that the margin for hyper-plane C is high as compared to both A and
B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low
margin then there is high chance of miss-classification.

 Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in


previous section to identify the right hyper-plane

13
Some of you may have selected the hyper-plane B as it has higher margin compared to A. But,
here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior
to maximizing margin. Here, hyper-plane B has a classification error and A has classified all
correctly. Therefore, the right hyper-plane is A.

 Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two


classes using a straight line, as one of the stars lies in the territory of other(circle) class
as an outlier. 

 As I have already mentioned, one star at other end is like an outlier for star class. The
SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the
maximum margin. Hence, we can say, SVM classification is robust to outliers.

 Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we


can’t have linear hyper-plane between the two classes, so how does SVM classify these

14
two classes? Till now, we have only looked at the linear hyper-plane.

 SVM can solve this problem. Easily! It solves this problem by introducing additional
feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on
axis x and z:

In above plot, points to consider are:


o All values for z would be positive always because z is the squared sum of both x
and y
o In the original plot, red circles appear close to the origin of x and y axes, leading
to lower value of z and star relatively away from the origin result to higher value
of z.

In the SVM classifier, it is easy to have a linear hyper-plane between these two classes.
But, another burning question which arises is, should we need to add this feature
manually to have a hyper-plane. No, the SVM algorithm has a technique called
the kernel trick. The SVM kernel is a function that takes low dimensional input space
and transforms it to a higher dimensional space i.e. it converts not separable problem to
separable problem. It is mostly useful in non-linear separation problem. Simply put, it
does some extremely complex data transformations, then finds out the process to
separate the data based on the labels or outputs you’ve defined.

When we look at the hyper-plane in original input space it looks like a circle:

ASSESSMENT TASK
15

As you read and understand the lesson three (3) which is support vector machine
(SVM). Now, kindly jot down what is the main goal of SVM. (10pts)
Answer: The goal of a support vector machine is not only to draw hyper planes and
divide data points, but to draw the hyper plane the separates data points with the
largest margin, or with the most space between the dividing line and any given data
point.

LESSON FOUR DECISION TREE


A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine
Learning where the data is continuously split according to a certain parameter.
Decision Tree consists of :
1. Nodes : Test for the value of a certain attribute.
2. Edges/ Branch : Correspond to the outcome of a test and connect to the next node or
leaf.
3. Leaf nodes : Terminal nodes that predict the outcome (represent class labels or class
distribution).

To understand the concept of Decision Tree consider the above example. Let’s say you want to predict
whether a person is fit or unfit, given their information like age, eating habits, physical activity, etc.
The decision nodes are the questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of
pizzas’? And the leaves represent outcomes like either ‘fit’, or ‘unfit’.
There are two main types of Decision Trees:
1. Classification Trees.
2. Regression Trees.
1. Classification trees (Yes/No types) :
What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’
or ‘unfit’. Here the decision variable is Categorical/ discrete.
Such a tree is built through a process known as binary recursive partitioning. This is an iterative
process of splitting the data into partitions, and then splitting it up further on each of the branches.

16
Example of a Classification Tree

2. Regression trees (Continuous data types) :


Decision trees where the target variable can take continuous values (typically real numbers) are
called regression trees. (e.g. the price of a house, or a patient’s length of stay in a hospital)

Example of Regression Tree

Advantages of Classification with Decision Trees:


1. Inexpensive to construct.
2. Extremely fast at classifying unknown records.
3. Easy to interpret for small-sized trees
4. Accuracy comparable to other classification techniques for many simple data sets.
5. Excludes unimportant features.
Disadvantages of Classification with Decision Trees:
1. Easy to overfit.
2. Decision Boundary restricted to being parallel to attribute axes.
3. Decision tree models are often biased toward splits on features having a large number
of levels.
4. Small changes in the training data can result in large changes to decision logic.

17
5. Large trees can be difficult to interpret and the decisions they make may seem
counter intuitive.
Applications of Decision trees in real life :
1. Biomedical Engineering (decision trees for identifying features to be used in
implantable devices).
2. Financial analysis (Customer Satisfaction with a product or service).
3. Astronomy (classify galaxies).
4. System Control.
5. Manufacturing and Production (Quality control, Semiconductor manufacturing, etc).
6. Medicines (diagnosis, cardiology, psychiatry).
7. Physics (Particle detection).

Example:

Day Outlook Temp Humidity Wind Play Tennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

Solution:
1. Determine the attribute of Outlook, Temp, Humidity and Wind.
Attribute: Outlook
Values (Outlook) = Sunny, Overcast, Rain
Calculate the Entropy:
S = [9+ , 5-] ; where 9 – yes Entropy (S) = -9/14 log2 9/14 – 5/14 log2 5/14 = 0.94

18
5 - no
Ssunny – [2+, 3-] Entropy (S) = -2/5 log2 2/5 – 3/5 log2 3/5 = 0.971
Sovercast – [4+, 0-] Entropy (S) = -4/4 log2 4/4 – 0/4 log2 0/4 = 0
Srain – [3+, 2-] Entropy (S) = -3/5 log2 3/5 – 2/5 log2 2/5 = 0.971

Calculate the Gain:


Gain (S, Outlook) = Entropy (S) - ∑ |S v| Entropy (Sv)
v∊(sunny, overcast, rain) |S|

Gain (S, Outlook)


= Entropy (S) – 5/14 Entropy (Ssunny) – 4/14 Entropy (Sovercast) – 5/14 Entropy (Srain)
Gain (S, Outlook) = 0.94 – 5/14 (0.971) – 4/14 (0) – 5/14 (0.971) = 0.2464

2. For the other attribute Calculate the Entropy and Gain. Same solution in number 1.

So:
Gain (S, Outlook) = 0.2464
Gain (S, Temp) = 0.0289
Gain (S, Humidity) = 0.1516
Gain (S, Wind) = 0.0478

{D1, D2, …. , D14}


[9+, 5-]

Outlook

Sunny Rain
Overcast
{D3, D7, D12, D13}
[4+, 0-]
{D1, D2, D8, D9, D11} {D4, D5, D6, D10, D14}
[2+, 3-] [3+, 2-]
Yes
? ?

Day Temp Humidity Wind Play Tennis

D1 Hot High Weak No

D2 Hot High Strong No

D8 Mild High Weak No

D9 Cool Normal Weak Yes

D11 Mild Normal Strong Yes

Solution:
Values (Temp) = Hot, Mild, Cool
19
Ssunny – [2+, 3-] Entropy (Ssunny) = -2/5 log2 2/5 – 3/5 log2 3/5 = 0.971
Shot– [0+, 2-] Entropy (Shot) = 0.0
Smild – [1+, 1-] Entropy (Smild) = 1.0
Scool– [1+, 0-] Entropy (Scool) = 0.0
Calculate the Gain:
Gain (S sunny, Temp) = Entropy (S) - ∑ |S v| Entropy (Sv)
v∊(hot, mild, cool) |S|

Gain (Ssunny, Temp)


= Entropy (S) – 2/5 Entropy (Shot) – 2/5 Entropy (Smild) – 1/5 Entropy (Scool)
Gain (Ssunny, Temp) = 0.971 – 2/5 (0.0) – 2/5 (1) – 1/5 (0.0) = 0.570

2. Same calculation for the Gain (Ssunny, Humidity and Ssunny, Wind)

So:
Gain (Ssunny, Temp) = 0.570
Gain (Ssunny, Humidity) = 0.97
Gain (Ssunny, Wind) = 0.0192

{D1, D2, …. , D14}


Outlook [9+, 5-]

Sunny Rain
Overcast
{D3, D7, D12, D13}
[4+, 0-]
{D4, D5, D6, D10, D14}
Humidity
[3+, 2-]
Yes
?

High Normal
{D9, D11}
{D1, D2, D8}
NO YES

ASSESSMENT TASK
1. As you read the lesson four (4) which is the Decision Tree, based on
your understanding with the given example. What would be the
objective of a Decision Tree? (10pts)

20
Answer: The goal of using a Decision Tree is to create a training model that can
use to predict the class or value of the target variable by learning simple decision
rules inferred from prior data (training data). In Decision Trees, for predicting a
SUMMARY
Naive Bayes algorithms are mostly used in sentiment analysis, spam filtering, and
recommendation systems etc. They are fast and easy to implement but their biggest
disadvantage is that the requirement of predictors to be independent. In most of the real
life cases, the predictors are dependent, this hinders the performance of the classifier.

The k-nearest neighbors (KNN) algorithm is a simple, supervised machine


learning algorithm that can be used to solve both classification and regression problems.
It’s easy to implement and understand, but has a major drawback of becoming
significantly slows as the size of that data in use grows.
KNN works by finding the distances between a query and all the examples in the
data, selecting the specified number examples (K) closest to the query, then votes for the
most frequent label (in the case of classification) or averages the labels (in the case of
regression).
In the case of classification and regression, we saw that choosing the right K for
our data is done by trying several Ks and picking the one that works best.

Support vector machines (SVMs) are a set of supervised learning methods used


for classification, regression and outliers detection. The advantages of support vector
machines are: Effective in high dimensional spaces. Still effective in cases where number of
dimensions is greater than the number of samples.

Decision trees use multiple algorithms to decide to split a node into two or more
sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes.
In other words, we can say that the purity of the node increases with respect to the
target variable.

A decision tree is a very specific type of probability tree that enables you to


make a decision about some kind of process. For example, you might want to choose
between manufacturing item A or item B, or investing in choice 1, choice 2, or choice 3.

POST-TEST
Direction: Answer the given problem and show your solution. Write your solution on a
separate sheet of paper.

21
1. Given the data set below. By using the Naïve Bayes principle on the 11th day,
determine if you can play or cannot play?
DAY OUTLOOK TEMPERATURE HUMIDITY WINDY PLAY

1 Sunny Hot high False No

2 Sunny Hot High True No

3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Overcast Mild High True ?

2. Given the data set below. By using KNN principle, determine if which sports the
person belong to K=5. M=0 F=1
NAME AGE GENDER DISTANCE SPORTS
Juan 20 M ? Neither
Aj 35 M ? Football
Paul 15 M ? Cricket
Melchor 25 M ? Cricket
Sarah 30 F ? Neither
James 36 M ? Football
Sam 30 F ? Neither
Alex 35 F ? Cricket
Laiza 21 F ? Football
John 15 M ? ?

3. Using Decision Tree principle, calculate the entropy and gain on the other side and
determine of which attribute it is.

{D1, D2, …. , D14}


[9+, 5-]

Outlook
22
REFERENCES

Books

1. Aidley, D. (2018). Introducing Quantitative Methods: A Practical Guide (1st ed. 2019
ed.). Red Globe Press.
2. B., Pope, D., Stanistreet, D., & Bruce, N. (2018). Quantitative Methods for Health
Research. Wiley.
3. Richert, W., & Coelho, L. P. (2013). Building Machine Learning Systems with Python.
Birmingham: Packt Publishing.
4. Sebastian Raschka (2015): Python Machine Learning
5. Gareth James (2013): An Introduction to Statistical Learning
6. Philip S. Yu, Jiawei Han, and Christos Faloutsos(eds), Link Mining: Models, Algorithms,
and Applications, Springer, 2010
7. Machine Learning.
Journal

8. Data science and prediction (https://dl.acm.org/doi/abs/10.1145/2500499)


9. Data science and its relationship to big data and data-driven decision making
(https://www.liebertpub.com/doi/abs/10.1089/big.2013.1508)
10. SSVM: A simple SVM algorithm
(http://ieeexplore.ieee.org/abstract/document/1007516/)
11. The quantified self: Fundamental disruption in big data science and biological discovery
(https://www.liebertpub.com/doi/abs/10.1089/big2012.0002)

Methods on sending your answers (Offline) in Hand delivery or Courier:


1. Compile all your outputs accordingly and place it on a brown envelop.
2. Write your full name, course and section, and instructor’s name at the back.
Ex. Juan Dela Cruz, Jr.
BSInfo.Tech 1-A
Dr. Albert C. Einstien
3. Send your outputs thru LBC, JRS Express, J&T or Hand-deliver by your friends or peers
address
to SLSU-Main, Brgy. San Roque, Sogod, Southern Leyte. Or just submit it to the nearest
SLSU-LGU Link
For more information and concerns you may contact the following:

Name of Instructor Contact Number Facebook Account

Julius Amfil E. Dublado 09066056202 Fhel Dublado


Arman M. Masangkay 09385051210 Arman Macasuhot Masangkay

23
24

You might also like