Module2 PPT
Module2 PPT
Bhaktavatsala Shivaram
Adjunct Faculty
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 3
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 4
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 5
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 6
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 7
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 8
Module 2
Informed Search Strategies
• Greedy best-first search
• A* search
• Heuristic function
Machine Learning
• Introduction
• Understanding Data
S
e
a
r
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 14
Model 2: Informed Search Strategies – Greedy Best First Search
G Greedy Search Example1
r
e A 75 f(n) = h(n) = straight Open Closed
118 B
e 140 Line distance heuristic 1. [A] []
d [A]
C 2. [E,C,B]
y E
111
99 3. [F,G,C,B] [A, E]
B D 80
F 4. [I,G,C,B] [A, E, F]
e
G
s 211 5. [E,B,D] [A, E, F, I]
97
t
H 101
S I Traverse Path = A -> E -> F-> I
e Distance (A-E-F-I) =253+178+0 = 431
a
r
Path Cost (A-E-F-I) =140+99+211 = 450
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 15
Model 2: Informed Search Strategies – Greedy Best First Search
G Greedy Search Example2
r
e A 75 f(n) = h(n) = straight Open Closed
118 B
e 140 Line distance heuristic 1. [A] []
d [A]
C 2. [C,E,B]
y E
111
99 3. [D,E,B] [A, C]
80 ** 220
B D
F
e
s G
211 INFINITE LOOP
97
t
H 101
S I
e
a
r
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 16
Model 2: Informed Search Strategies – Greedy Best First Search
G • Greedy best first search can start down an infinite path and never
r return to try other possibilities, it is incomplete
e
e • Because of its greediness the search makes choice that can lead to a
d dead end; then one backs up in the search tree to the deepest
y unexpanded node.
▪ Greedy best first search resembles depth-first search in the way it prefers to follow
B
a single path all the way to the goal, but will back up when it hits a dead end.
e
s ▪ The quality of the heuristic function determines the practical usability of greedy
t search
S
e
a
r
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 17
Model 2: Informed Search Strategies – Greedy Best First Search
G Greedy Search Example3
r
e A 7 Straight Line Distance Open Closed
11 D
e 14 Given (say): 1. [A] []
d B 18 A -> G = 40 2. [C,B,D] [A]
25
y C B -> G = 32
10 C -> G = 25 3. [F,E,B,D] [A, C]
15
8
B
F
D -> G = 35 4. [G,E,B,D] [A, C, F]
e E E -> G = 19
s 20 F -> G = 17 5. [E,B,D] [A, C, F, G]
9
t G -> G = 0
H 10 H -> G = 10
S G Traverse Path = A -> C -> F-> G
e Path Cost (A-C-F-G) = 14+10+20 = 44
a
r
Distance (A-C-F-G) = 40+19+17 = 66
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 18
Model 2: Informed Search Strategies – Greedy Best First Search
G Greedy Search Example4
r
e S Node (n) ->H(n) Open Closed
3
e 2
A -> 12 1. [S] []
d B -> 4 [S]
A B 2. [B,A]
y 4 C -> 7
1
3 1
D -> 3 3. [F,E,A] [S, B]
B E -> 8 4.
C E F [G,I,E,A] [S, B, F]
e D 5 F -> 2
2 3
s G -> 0 5. [I,E,A] [S, B, F, G]
t H -> 4
H I
G I -> 9
S Traverse Path = S -> B -> F-> G
S -> 13
e Path Cost (S-B-F-G) = 2+1+3 = 6
a
r
Distance (S-B-F-G) = 13+4+2+0 = 19
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 19
Model 2: Informed Search Strategies – Greedy Best First Search
G • Greedy search is not optimal
r
e
• Greedy search is incomplete without systematic checking of repeated
e states.
d • In the worst case, the Time and Space Complexity of Greedy Search
y
are both same = O (bm)
B
where b – branching factor and m – max path length
e
s
t
S
e
a
r
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 20
Model 2: Informed Search Strategies – Greedy Best First Search
G
Advantages: Disadvantages:
r
e • Simple and Easy to • Inaccurate Results
e
Implement • Local Optima
d
y • Fast and Efficient • Heuristic Function
• Low Memory Requirements • Lack of Completeness
B
• Flexible
e
s • Efficiency
t
S
e
a
r
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 21
Model 2: Informed Search Strategies – Greedy Best First Search
G Few applications of Greedy Best First Search
r
Pathfinding: used to find the shortest path between two points in a graph. It is used
e
in many applications such as video games, robotics, and navigation systems.
e
Machine Learning: used in machine learning algorithms to find the most promising
d
path through a search space.
y
Optimization: used to optimize the parameters of a system in order to achieve the
desired result.
B
Game AI: used in game AI to evaluate potential moves and chose the best one.
e
Navigation: used to navigate to find the shortest path between two locations.
s
Natural Language Processing: used in natural language processing tasks such as
t
language translation or speech recognition to generate the most likely sequence of
words.
S
Image Processing: used in image processing to segment image into regions of
e
interest.
a
r
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 22
Model 2: Informed Search Strategies – A*Search
A A* Search
*
Greedy Best First Search minimizes a heuristic h(n) which is an
S estimated cost from a current state n to the goal state
e Greedy Best First Search is efficient but it is not optimal and not
a complete
r
Uniform Cost Search minimizes the cost g(n) from the initial state to
c
h the current state n.
Uniform cost search is optimal and complete but not efficient
A* SEARCH:
It combines Greedy Best Search and Uniform Cost Search to
get an efficient algorithm which is complete and optimal.
H 101
I
H 101
[415] I
H 101
[415] I Traverse Path = A -> E -> G-> H->I
[418]
o b c d c
r g
e f g h b f
m
e e k
i j k l
d i
m
Goal m
s j
e l
a
Start – Arad Traverse Path = ?
r
Goal Node - Bucharest
c
h
City A City B
(Start) (Goal)
20 Km 215 Km
R2
[215]
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 36
Model 2: Heuristic Function
B
a
Types of Heuristic
c
k
Admissible Non-Admissible
g • Never overestimates the • Overestimates
r cost of reaching the goal • h(n) is always greater than
o
u
• H(n) is always less than or the actual cost of lowest
n equal to actual cost of cost path from node n to
d lowest cost path from goal
node n to goal
L
e
a
r
n
i
n
g
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 38
Model 2: Machine Learning - Introduction
M
Data
a
c • DATA - full potential not utilized at most businesses
h • Data being scattered across different archive systems and organizations not
i being able to integrate these sources fully.
n • Lack of awareness about software tools that could help to unearth the useful
e information from data.
• To improve efficiency and productivity at Business Organizations
L
e
• Business organizations have now adopting use of latest
a technology, machine learning.
r
n
i Machine learning is the field of study that gives the computers ability to
n learn without being explicitly programmed – Arthur Samuel
g systems should learn by itself without explicit programming
Heuristics
mental shortcuts that allow people to solve problems and make judgments quickly and efficiently.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 41
Model 2: Machine Learning
M
Machine Learning
a
c Machine learning is the field of study that gives the computers ability to
h learn without being explicitly programmed – Arthur Samuel
i systems should learn by itself without explicit programming
• Mathematical
equation
• Relational
diagrams like
trees/graphs
• Logical if/else
rules, or
• Groupings
called clusters
Heuristics
mental shortcuts that allow people to solve problems and make judgments quickly and efficiently.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 45
Model 2: Machine Learning
M
Models
a
c
h • Machine Learning models can be understood as a program
i
n that has been trained to find patterns within new data and
e make predictions.
L
• model can be a formula, procedure or representation that can
e generate data decisions.
a • model is generated automatically from the given data.
r
Pattern Model
n
i • It is local and applicable only • It is global and fits the entire
n
to certain attributes dataset.
g
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 46
Model 2: Machine Learning
M
Models
a
c
h
• Models of computer systems are equivalent to human experience.
i • Experience is based on data.
n • Humans gain experience by various means.
e • They gain knowledge by rote learning.
• They observe others and imitate it.
L
e
• Humans gain a lot of knowledge from teachers and books.
a • We learn many things by trial and error.
r • Once the knowledge is gained, when a new problem is
n encountered, humans search for similar past situations and
i then formulate the heuristics and use that for prediction.
n
g
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 47
Model 2: Machine Learning
M
Models – Experience
a
c
• Collection of data
h
• Once data is gathered, abstract concepts are formed out of that data. Abstraction
i
is used to generate concepts. This is equivalent to humans’ idea of objects, for
n
example, we have some idea about how an elephant looks like.
e
• Generalization converts the abstraction into an actionable form of intelligence. It
can be viewed as ordering of all possible concepts. So, generalization involves
L
ranking of concepts, inferencing from them and formation of heuristics, an
e
actionable aspect of intelligence.
a
r
• Heuristics are educated guesses for all tasks. For example, if one runs or
n
encounters a danger, it is the resultant of human experience or his heuristics
i
formation. In machines, it happens the same way.
n
• Heuristics normally works! But, occasionally, it may fail too. It is not the fault of
g
heuristics as it is just a ‘rule of thumb′. The course correction is done by taking
evaluation measures. Evaluation checks the thoroughness of the models and to-
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 48
do course correction, if necessary, to generate better formulations.
Model 2: Machine Learning
M
Relation to other fields
a
c ML uses concepts of:
h • Artificial Intelligence
i • Data Science (Big Data, Data Mining, Data Analytics, Pattern Recognition)
n
• Statistics
e
• Resultant of Combined Ideas and Diverse fields
L
e
a
r
n
i
n
g
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 49
Model 2: Machine Learning - Relation to other fields
M
Machine Learning and Artificial Intelligence
a
c ML is a branch of Artificial Intelligence
h • AI – develop intelligence agents
i • Focus on logic & logic inferences – ups & downs (AI Winters)
n
e Resurgence of AI – due to development of Data Driven Systems
• aim – find Relations and Regularities present in data
L Machine Learning
e Sub branch of AI
a Aim – extract patterns and predictions
r - learning from examples
n - reinforcement learning
i Deep Learning
n Sub branch of Machine Learning
g Models constructed using neural network technology
- neural networks based onAI&ML
21CS54, human neuron
- Bhaktavatsala models.
Shivaram, Adjunct Faculty, CSE 50
Model 2: Machine Learning - Relation to other fields
M
Machine Learning – Data Science, Data Mining & Data Analytics
a
c Data science – encompasses many fields & Machine Learning starts
h with Data.
i So Data science and Machine Learning are interlinked.
n
e ML is a branch of data science. Data science deals with gathering of
data for analysis
L ▪ Big Data
e
▪ Data Mining
a
r ▪ Data Analytics
n ▪ Pattern Recognition
i
n
g
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 51
Model 2: Machine Learning - Relation to other fields
M
Machine Learning & Data Science
a
c Big Data – is a field of Data science that deals with data with following
h characteristics
i
n ▪ Volume – huge amount of data
e ▪ Variety – data in variety of forms like images, videos, etc
L
▪ Velocity – refers to speed at which data is generated &
e processed
a
➢Big Data is used in many Machine Learning Algorithms for
r
n applications such has language translation and image
i recognition.
n
g
➢Big Data influences growth of Deep Learning
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 52
Model 2: Machine Learning - Relation to other fields
M
Machine Learning & Data Mining
a
c Data Mining – aims to extract the hidden patterns that are present in
h the data
i
n Machine Learning – aims to use the data mining for prediction
e
linear regression takes the training set and tries to fit it with
a line – product sales = 0.66 × Week + 0.54.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 63
Model 2: Machine Learning – Types of Machine Learning
T
Supervised Learning - Regression
Y
P • predict continuous variables like price
E
S
• The advantage of this model is
that prediction for product sales
(y) can be made for unknown
week data (x).
• For example, the prediction for
unknown eighth week can be
made by substituting x as 8 in
that regression formula to get y.
linear regression takes the training set and tries to fit it with
a line – product sales = 0.66 × Week + 0.54.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 64
Model 2: Machine Learning – Types of Machine Learning
T
Supervised Learning – Classification vs Regression
Y
P • Regression models predict continuous variables
E
S such as product price, while
• Classification models concentrates on assigning
labels such as class.
Nominal data is the simplest data type. It classifies data purely by labeling or
naming values e.g. measuring marital status, hair, or eye color. It has no
hierarchy to it.
Ordinal data classifies data while introducing an order, or ranking. For instance,
measuring economic status using the hierarchy: ‘wealthy’, ‘middle income’ or
‘poor.’ 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 96
Model 2: Machine Learning – Descriptive Statistics
S Data types
T
Data Type
A
T Quantitative data is, quite simply, information
I that can be quantified.
Categorial Data Numerical Data
S
T It can be counted or measured, and given a
I numerical value—such as length in centimeters
C or Nominal
revenue in dollars. Ordinal Data
Data Interval Data Ordinal Data
S
Quantitative data tends to be structuredFig 2.1inTypes of Data
nature and is suitable for statistical analysis.
If you have questions such as “How many?”,
“How often?” or “How much?”, you’ll find the
answers in quantitative data.
I 40
40
C 20 90
0 70
S 1 2 3 4 5
Student ID Area Chart
85 70
Students Marks Area Chart
Dot Plots 90
80
70
60
50
40
30
20
10
0
1 2 3 4 5
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 105
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Central Tendency)
I 1. Mean – It is a measure of central
S tendency that represents the ‘center’
T of the dataset.
I Arithmetic Mean
C
S
In Larger Computing cases
2. Geometric Mean
is the average value or mean which signifies the central
tendency of the set of numbers by finding the product
of their values.
OR
is defined as the nth root of the product of n numbers.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 106
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Central Tendency)
I
3. Harmonic Mean
S • The harmonic mean is calculated as the number of values N divided by the sum of the
T reciprocal of the values (1 over each value).
I
• Harmonic Mean = N / (1/x1 + 1/x2 + … + 1/xN)
C
S
Tips:
•If values have the same units: Use the arithmetic mean.
•If values have differing units: Use the geometric mean.
•If values are rates: Use the harmonic mean.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 107
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Central Tendency)
I
S
T
I
C
S
Range: 30
Variance: 121.6
Standard Deviation: 11.027239001672177
S The columns are standings points (PTS; teams earn three OL Reign 42 13 8 3 37 24
points for a win and one point for a tie), wins (W), losses Washington
39 11 7 6 29 26
T (L), ties (T), goals scored by that team (GF), and goals
Spirit
Chicago Red
I scored against that team (GA). Stars
38 11 8 5 28 28
C NJ/NY
35 8 5 11 29 21
1. Compute the standard deviation and range of points.
Gotham FC
Kansas City
16 3 14 7 15 36
Current
SIQR = 4.5
Min Q1 Q2 Q3 Max
IQR = (Q3 - Q1) = 30-15 = 15 8 15 22 30 54
Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)] = [-7.5, 52.5]
SIQR = IQR/2 or (Q3 - Q1) /2
SIQR = 7.5
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 122
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Measurement of Spread)
I Quartiles :
S • Quartiles are the values that divide a list of numerical data into three quarters.
T • The middle part of the three quarters measures the central point of distribution and shows the
I data which are near to the central point.
• The lower part of the quarters indicates just half information set which comes under the median
C and the upper part shows the remaining half, which falls over the median.
S
Min Q1 Q2 Q3 Max
Median
Inter Quartile Range : It is the difference between Q3 and Q1. Semi Inter Quartile Range (SIQR) = IQR/2 or
IQR = (Q3 - Q1) SIQR = (Q3 - Q1) /2
Outliners:
Are normally the values falling apart at least by 1.5 times IQR above the third quartile or below the first quartile
Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)]
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 123
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Measurement of Spread)
I Five point summary and box plots:
S The median, quartiles Q1 and Q3 and minimum and maximum, written in the
T order < minimum, Q1, Median, Q3, Maximum > is known as Five point summary
I
Min Q1 Q2 Q3 Max
C Median
S
Box plots: (also called as Box and Whisker Plot)
Suitable for continuous variables and nominal variable
Used to illustrate data distributions and summary of data
• Box contains bulk of data.
• These data are between first and third quartiles.
• The line inside the box indicates – mostly median of data.
• If the median is not equidistant, then data is skewed.
• The whiskers that project from the ends of the data indicate the spreads of the tails and the
max and min of the data 21CS54,
valueAI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 124
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Example:
T Find 5 point summary of the list {13,11,2,3,4,8,9}
I
S
T
I
C
S
8
3
C
S
Rule of thumb:
• For skewness values between -0.5 and 0.5, the data exhibit approximate symmetry.
• Skewness values within the range of -1 and -0.5 (negative skewed) or 0.5 and 1
(positive skewed) indicate slightly skewed data distributions.
• Data with skewness values less than -1 (negative skewed) or greater than 1 (positive
skewed) are considered highly skewed.
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 129
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Measure of Shape)
I
Kurtosis
S • Kurtosis is a statistical measure that quantifies the shape of a probability distribution.
T • It provides information about the tails and peakedness of the distribution compared to a
I normal distribution.
C • Positive kurtosis indicates heavier tails and a more peaked distribution,
S • Negative kurtosis suggests lighter tails and a flatter distribution.
• Kurtosis helps in analyzing the characteristics and outliers of a dataset.
• The measure of Kurtosis refers to the tailedness of a distribution.
• Tailedness refers to how often the outliers occur.
Yi = mean
For more details:
Y- = Standard deviation
https://www.educba.com/kurtosis-formula/
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 131
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Measure of Shape)
I Leptokurtic (Kurtosis > 3)
S Leptokurtic has very long and thick tails, which means there
T are more chances of outliers. Positive values of kurtosis
I indicate that distribution is peaked and possesses thick tails.
Extremely positive kurtosis indicates a distribution where more
C
numbers are located in the tails of the distribution instead of
S around the mean.
Platykurtic (Kurtosis < 3)
Platykurtic having a thin tail and stretched around the center
means most data points are present in high proximity to the
mean. A platykurtic distribution is flatter (less peaked) when
For more details:
compared with the normal distribution.
https://www.analyticsvidhya.com/blog/20
Mesokurtic (Kurtosis = 3) 21/05/shape-of-data-skewness-and-
Mesokurtic is the same as the normal distribution, which kurtosis/
means kurtosis is near 0. In 21CS54,
Mesokurtic, distributions are
AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 132
moderate in breadth, and curves are a medium peaked height
Model 2: Machine Learning – Descriptive Statistics
S Descriptive Statistics
T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable (Measure of Shape)
I Mean Absolute Deviation (MAD):
S • It is another dispersion measure and is robust to outliners.
T • It is a measure of variability that indicates the average distance between observations and their
mean.
I
C Where:
X = the value of a data point
S µ = mean
|X – µ| = absolute deviation
N = sample size
C
S sepal_length
sepal_width petal_length petal_width
Type 1 Error
Reject
H0 H1
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 162
Model 2: Machine Learning – Hypothesis
H Statistical Tests – 2 types (True Positive and True Negative)
y
p
o
t
A true positive is an
h outcome where the
e model correctly predicts
s the positive class.
i "Wolf" is a positive class. "No wolf" is a negative class.
s
A true negative is an
outcome where the
Type 1 Error model correctly
Reject
predicts the
H1 negative class.
H0 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 163
Model 2: Machine Learning – Hypothesis
H Statistical Tests – 2 types of errors involved
y
A false positive is an outcome where the model incorrectly predicts
p
o
the positive class.
t
- Type I error – incorrect rejection of a true null hypothesis
h And a false negative is an outcome where the model incorrectly predicts
e the negative class.
s - Type II error – incorrect failure of rejecting a false hypothesis
i "Wolf" is a positive class. "No wolf" is a negative class.
s
Type 1 Error
Reject
H0 H1
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 164
Model 2: Machine Learning – Hypothesis
H Statistical Tests – 2 types of errors involved
y
A false positive is an outcome where the model incorrectly predicts
p
o
the positive class.
t
- Type I error – incorrect rejection of a true null hypothesis
h And a false negative is an outcome where the model incorrectly predicts
e the negative class.
s - Type II error – incorrect failure of rejecting a false hypothesis
i "Wolf" is a positive class. "No wolf" is a negative class.
s Type 2 Error
Do Not Reject
Type 1 Error
Reject
H0 H1
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 165
Model 2: Machine Learning – Hypothesis
H For calculation -
y
p • Size of Data sample
o • Degree of freedom indicates the number of independent pieces – ν
t (mean or a variance)
h
e
s
i
s
SE is standard error (SE) of the sample mean. The standard error of the sample
mean can be calculated as the following:
SE = σ/√n
Where standard deviation of the sampling distribution is denoted as σ and the
sample size is n. 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 170
Model 2: Machine Learning – Hypothesis
H Comparing Learning Models
y
p Example
o Consider sample X =[1,2,3,4,5] and population mean (μ ) is 12, the population
t variance (σ2) is 2.
h Apply Z test and show whether the result is significant?
e
Sample Mean X = 15/5 =3, n=5
s
i
Z = (X-μ)/SE
s SE = σ/√n
Z = -10.06
The critical value at significance 0.05, the null hypothesis H0 is
rejected.