Quantitative Methods Module 1
Quantitative Methods Module 1
1
MODULE GUIDE
This activity module is made and developed primarily for you (especially those students
who have no access in the internet) as a learning guide as you engage in the subject this Second
Semester of A.Y. 2021-2022. In the absence of face-to-face mode of learning, this learning
material could be helpful in supplementing your learning needs for the subject, the task
throughout the module, which you are required to do, are purposely designed so that course
outcomes and intended learning outcomes could still attained even in distance learning.
Before going through with the task, you are advised to answer first the pre-test to
measure your prior knowledge. As an activity module, series of task are provided in the effort to
satisfy both course and learning outcomes. After answering the series of task/activities in the
module you are going to answer the post-test found on the last part to measure how much you
learn from the subject in the whole perspective.
At the end of the module, requirements on answering are clearly stipulated. Also, the
methods of sending your answers both online and offline are directed.
Furthermore, please be guided that learning this module will be an advantage on your
part specifically, in understanding the basic context of quantitative methods, its classification
and common prediction techniques.
For more information and concerns you may contact:
2
PRE-TEST
LEARNING PLAN
3
Introduction
KEYS TO REMEMBER:
Discussions DISCUSSIONS
-
- are a group of straightforward training such classifiers, there is no one
"probabilistic classifiers" based on Bayes' algorithm, but rather a variety of algorithms
theorem and strong (naive) independence based on the same principle: all naive Bayes
assumptions between features. classifiers assume that the value of one
feature is independent of the value of any
- is a simple technique for building other feature, given the class variable. For
classifiers: models that give class labels to example, if a fruit is red, round, and around
problem occurrences, represented as 10 cm in diameter, it is termed an apple.
vectors of feature values, with the class Regardless of any possible relationships
labels selected from a finite set of labels. For between these traits, a naive Bayes
4
classifier considers each of them to contribute independently to the likelihood
that this fruit is an apple
.
A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification
task. The crux of the classifier is based on the Bayes theorem.
Prior Probability – which describes the
Bayes Theorem: degree to which we believe the model
accurately describes reality based on all
Likelihood – which describes how of our prior information.
well the model predicts the data
Using Bayes theorem, we can find the is that the predictors/features are
probability of A happening, given that B has independent. That is presence of one
occurred. Here, B is the evidence and A is particular feature does not affect the other.
the hypothesis. The assumption made here Hence it is called naive.
Example:
Let us take an example to get some better intuition. Consider the problem of playing golf. The
dataset is represented as below.
Example 1, Table 1:
5
We classify whether the day is suitable for above we consider that these predictors are
playing golf, given the features of the day. independent. That is, if the temperature is
The columns represent these features and hot, it does not necessarily mean that the
the rows represent individual entries. If we humidity is high. Another assumption made
take the first row of the dataset, we can here is that all the predictors have an equal
observe that is not suitable for playing golf if effect on the outcome. That is, the day being
the outlook is rainy, temperature is hot, windy does not have more importance in
humidity is high and it is not windy. We deciding to play golf or not.
make two assumptions here, one as stated
The variable y is the class variable (play golf), which represents if it is suitable to play golf or not
given the conditions. Variable X represent the parameters/features.
Here x_1,x_2….x_n represent the features, i.e they can be mapped to outlook, temperature,
humidity and windy. By substituting for X and expanding using the chain rule we get,
Now, you can obtain the values for each by looking at the dataset and substitute them into the
equation. For all entries in the dataset, the denominator does not change, it remain static.
Therefore, the denominator can be removed and a proportionality can be introduced.
In our case, the class variable(y) has only two outcomes, yes or no. There could be cases where
the classification could be multivariate. Therefore, we need to find the class y with maximum
probability.
Using the above function, we can obtain the class, given the predictors.
6
Multinomial Naive Bayes: This is similar to the multinomial naive
bayes but the predictors are boolean
This is mostly used for document variables. The parameters that we use to
classification problem, i.e whether a predict the class variable take up only values
document belongs to the category of sports, yes or no, for example if a word occurs in the
politics, technology etc. The text or not.
features/predictors used by the classifier
are the frequency of the words present in
Gaussian Naive Bayes:
the document.
When the predictors take up a continuous
Bernoulli Naive Bayes: value and are not discrete, we assume that
these values are sampled from a gaussian
distribution.
Since the way the values are present in the dataset changes, the formula for conditional
probability changes to,
7
3 Overcast Hot High False Yes
Solution:
1. Determine/Count the probability of how many YES and NO.
P(YES) = 9/14
P(NO) = 5/14
2. Count the YES & NO of the every the data set under the variables of Outlook, Temp., Humidity,
Windy & Play.
i.e.:
OUTLOOK
TEMPERATURE
8
HUMIDITY
WINDY
Play P(Yes)/P(No)
Yes 9 9/14
No 5 5/14
Total 14 100%
9
P( X | Play = Yes)P(Play=Yes) = (2/9) * (3/9) * (3/9) * (3/9) * (9/14) = 0.0053
P(X | Play = No) P(Play = No) = (3/5) * (1/5) * (4/5) * (3/5) * (5/14) = 0.0206
- P (X) = 0.02186
ASSESSMENT TASK
1. Given the data set above Naïve Bayes (i.e. Example), with a variables of Outlook,
Temperature, Humidity, Windy. Since on the day 15 th the answer of variable play is NO,
determine on the day 16th if you can PLAY or cannot PLAY given the data:
Outlook = Rainy
Temperature = Cool
Humidity = Normal
Windy = True
Play =?
10
frequent label (in the case of classification) or averaging the labels (in the case of
regression).
'k' in KNN is a parameter that refers to the number of nearest neighbours to include in the
majority of the voting process. ... Let's say k = 5 and the new data point is classified by the
majority of votes from its five neighbours and the new point would be classified as red since
four out of five neighbours are red.
Example:
1. Given the dataset we have to classify if which sports the person belong to. With that,
we have to use the distance formula and determine the closest distance.
Solution:
1. We have to scale the Gender in order for us to calculate by using the Euclidean
Distance formula.
i.e. : Male = 0
Female = 1
11
Rahul 40 1 ? Cricket
Poja 20 1 ? Neither
Smith 15 0 ? Cricket
Laxmi 55 1 ? Football
Michael 15 0 ? Football
Angelina 5 1 ?
2. We have to determine the distance of Angelina and Ajay and soon given the distance
formula.
= √729 + 1
= 27.02 – distance from Angelina to Ajay
3. To determined of which sports of angelina would be, we have to give the value of K. Suppose
the value of K = 3.
4. Since our K =3, we have to get the three (3) close distance.
Zaira =9 – Cricket
Smith = 10 – Cricket
Michael = 10.05 - Football
ASSESSMENT TASK
1. Given the data set above of K-NN (i.e. Example), with a variables of Name, Age,
Gender, Distance & Sports. Since the answer is CRICKET of the given example.
Determine or classify of which sports the person belong to.
Name = Janjan
Age = 30
Gender = F
Sports =?
12
LESSON THREE SUPPORT VECTOR MACHINE (SVM)
Support Vectors are simply the co-ordinates of individual observation. The SVM classifier is a
frontier which best segregates the two classes (hyper-plane/ line).
Let’s understand:
Here, maximizing the distances between nearest data point (either class) and hyper-
plane will help us to decide the right hyper-plane. This distance is called as Margin. Let’s
look at the below snapshot:
Above, you can see that the margin for hyper-plane C is high as compared to both A and
B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low
margin then there is high chance of miss-classification.
13
Some of you may have selected the hyper-plane B as it has higher margin compared to A. But,
here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior
to maximizing margin. Here, hyper-plane B has a classification error and A has classified all
correctly. Therefore, the right hyper-plane is A.
As I have already mentioned, one star at other end is like an outlier for star class. The
SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the
maximum margin. Hence, we can say, SVM classification is robust to outliers.
14
two classes? Till now, we have only looked at the linear hyper-plane.
SVM can solve this problem. Easily! It solves this problem by introducing additional
feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on
axis x and z:
In the SVM classifier, it is easy to have a linear hyper-plane between these two classes.
But, another burning question which arises is, should we need to add this feature
manually to have a hyper-plane. No, the SVM algorithm has a technique called
the kernel trick. The SVM kernel is a function that takes low dimensional input space
and transforms it to a higher dimensional space i.e. it converts not separable problem to
separable problem. It is mostly useful in non-linear separation problem. Simply put, it
does some extremely complex data transformations, then finds out the process to
separate the data based on the labels or outputs you’ve defined.
When we look at the hyper-plane in original input space it looks like a circle:
ASSESSMENT TASK
15
As you read and understand the lesson three (3) which is support vector machine
(SVM). Now, kindly jot down what is the main goal of SVM. (10pts)
Answer: The goal of a support vector machine is not only to draw hyper planes and
divide data points, but to draw the hyper plane the separates data points with the
largest margin, or with the most space between the dividing line and any given data
point.
To understand the concept of Decision Tree consider the above example. Let’s say you want to predict
whether a person is fit or unfit, given their information like age, eating habits, physical activity, etc.
The decision nodes are the questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of
pizzas’? And the leaves represent outcomes like either ‘fit’, or ‘unfit’.
There are two main types of Decision Trees:
1. Classification Trees.
2. Regression Trees.
1. Classification trees (Yes/No types) :
What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’
or ‘unfit’. Here the decision variable is Categorical/ discrete.
Such a tree is built through a process known as binary recursive partitioning. This is an iterative
process of splitting the data into partitions, and then splitting it up further on each of the branches.
16
Example of a Classification Tree
17
5. Large trees can be difficult to interpret and the decisions they make may seem
counter intuitive.
Applications of Decision trees in real life :
1. Biomedical Engineering (decision trees for identifying features to be used in
implantable devices).
2. Financial analysis (Customer Satisfaction with a product or service).
3. Astronomy (classify galaxies).
4. System Control.
5. Manufacturing and Production (Quality control, Semiconductor manufacturing, etc).
6. Medicines (diagnosis, cardiology, psychiatry).
7. Physics (Particle detection).
Example:
Solution:
1. Determine the attribute of Outlook, Temp, Humidity and Wind.
Attribute: Outlook
Values (Outlook) = Sunny, Overcast, Rain
Calculate the Entropy:
S = [9+ , 5-] ; where 9 – yes Entropy (S) = -9/14 log2 9/14 – 5/14 log2 5/14 = 0.94
18
5 - no
Ssunny – [2+, 3-] Entropy (S) = -2/5 log2 2/5 – 3/5 log2 3/5 = 0.971
Sovercast – [4+, 0-] Entropy (S) = -4/4 log2 4/4 – 0/4 log2 0/4 = 0
Srain – [3+, 2-] Entropy (S) = -3/5 log2 3/5 – 2/5 log2 2/5 = 0.971
2. For the other attribute Calculate the Entropy and Gain. Same solution in number 1.
So:
Gain (S, Outlook) = 0.2464
Gain (S, Temp) = 0.0289
Gain (S, Humidity) = 0.1516
Gain (S, Wind) = 0.0478
Outlook
Sunny Rain
Overcast
{D3, D7, D12, D13}
[4+, 0-]
{D1, D2, D8, D9, D11} {D4, D5, D6, D10, D14}
[2+, 3-] [3+, 2-]
Yes
? ?
Solution:
Values (Temp) = Hot, Mild, Cool
19
Ssunny – [2+, 3-] Entropy (Ssunny) = -2/5 log2 2/5 – 3/5 log2 3/5 = 0.971
Shot– [0+, 2-] Entropy (Shot) = 0.0
Smild – [1+, 1-] Entropy (Smild) = 1.0
Scool– [1+, 0-] Entropy (Scool) = 0.0
Calculate the Gain:
Gain (S sunny, Temp) = Entropy (S) - ∑ |S v| Entropy (Sv)
v∊(hot, mild, cool) |S|
2. Same calculation for the Gain (Ssunny, Humidity and Ssunny, Wind)
So:
Gain (Ssunny, Temp) = 0.570
Gain (Ssunny, Humidity) = 0.97
Gain (Ssunny, Wind) = 0.0192
Sunny Rain
Overcast
{D3, D7, D12, D13}
[4+, 0-]
{D4, D5, D6, D10, D14}
Humidity
[3+, 2-]
Yes
?
High Normal
{D9, D11}
{D1, D2, D8}
NO YES
ASSESSMENT TASK
1. As you read the lesson four (4) which is the Decision Tree, based on
your understanding with the given example. What would be the
objective of a Decision Tree? (10pts)
20
Answer: The goal of using a Decision Tree is to create a training model that can
use to predict the class or value of the target variable by learning simple decision
rules inferred from prior data (training data). In Decision Trees, for predicting a
SUMMARY
Naive Bayes algorithms are mostly used in sentiment analysis, spam filtering, and
recommendation systems etc. They are fast and easy to implement but their biggest
disadvantage is that the requirement of predictors to be independent. In most of the real
life cases, the predictors are dependent, this hinders the performance of the classifier.
Decision trees use multiple algorithms to decide to split a node into two or more
sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes.
In other words, we can say that the purity of the node increases with respect to the
target variable.
POST-TEST
Direction: Answer the given problem and show your solution. Write your solution on a
separate sheet of paper.
21
1. Given the data set below. By using the Naïve Bayes principle on the 11th day,
determine if you can play or cannot play?
DAY OUTLOOK TEMPERATURE HUMIDITY WINDY PLAY
2. Given the data set below. By using KNN principle, determine if which sports the
person belong to K=5. M=0 F=1
NAME AGE GENDER DISTANCE SPORTS
Juan 20 M ? Neither
Aj 35 M ? Football
Paul 15 M ? Cricket
Melchor 25 M ? Cricket
Sarah 30 F ? Neither
James 36 M ? Football
Sam 30 F ? Neither
Alex 35 F ? Cricket
Laiza 21 F ? Football
John 15 M ? ?
3. Using Decision Tree principle, calculate the entropy and gain on the other side and
determine of which attribute it is.
Outlook
22
REFERENCES
Books
1. Aidley, D. (2018). Introducing Quantitative Methods: A Practical Guide (1st ed. 2019
ed.). Red Globe Press.
2. B., Pope, D., Stanistreet, D., & Bruce, N. (2018). Quantitative Methods for Health
Research. Wiley.
3. Richert, W., & Coelho, L. P. (2013). Building Machine Learning Systems with Python.
Birmingham: Packt Publishing.
4. Sebastian Raschka (2015): Python Machine Learning
5. Gareth James (2013): An Introduction to Statistical Learning
6. Philip S. Yu, Jiawei Han, and Christos Faloutsos(eds), Link Mining: Models, Algorithms,
and Applications, Springer, 2010
7. Machine Learning.
Journal
23
24