Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Lecture 8 - Joint Probability Distributions - Applications in Machine Learning

Uploaded by

alams.wp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture 8 - Joint Probability Distributions - Applications in Machine Learning

Uploaded by

alams.wp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Advanced Statistics

Dr. Syed Faisal Bukhari


Associate Professor
Department of Data Science
Faculty of Computing and Information Technology
University of the Punjab
Textbooks

Probability & Statistics for Engineers & Scientists,


Ninth Edition, Ronald E. Walpole, Raymond H.
Myer

Dr. Faisal Bukhari, Department of Data


2
Science, PU, Lahore
Checking Independence
Two random variables X and Y are independent if:
P(X = x, Y = y) = P(X = x) × P(Y = y)
(for discrete variables)
f(x, y) = 𝑓(𝑥)𝑋 × 𝑓(𝑦)𝑌
(for continuous variables)
Introduction to Machine Learning
• Machine Learning (ML) enables computers to learn
from data.

• ML makes predictions or decisions without being


explicitly programmed.

• Types of Machine Learning: Supervised,


Unsupervised, Semi-supervised, Reinforcement,
Self-supervised, Transfer Learning.

Dr. Faisal Bukhari, Department of Data


4
Science, PU, Lahore
Types of Machine Learning
1. Supervised Learning: Learns from labeled data.
2. Unsupervised Learning: Finds patterns in unlabeled
data.
3. Semi-Supervised Learning: Combines labeled and
unlabeled data.
4. Reinforcement Learning: Learns by interacting with the
environment.
5. Self-Supervised Learning: Generates its own labels
from data.
6. Transfer Learning: Applies knowledge from one task to
another.

Dr. Faisal Bukhari, Department of Data


5
Science, PU, Lahore
Supervised Learning
• Goal: Learn a mapping from input to output based
on labeled data.

• Common Algorithms: Linear Regression, Decision


Trees, SVM, Neural Networks.

Example: Predicting house prices, email spam


detection.

Dr. Faisal Bukhari, Department of Data


6
Science, PU, Lahore
Unsupervised Learning
Goal: Find hidden patterns in unlabeled data.

Common Algorithms: K-Means Clustering, Hierarchical


Clustering, PCA.

Example: Customer segmentation, anomaly detection.

Clustering (e.g., K-Means, Hierarchical Clustering): This


involves grouping data points into clusters based on
similarity. It’s often used in market segmentation, social
network analysis, or even image compression.
Dr. Faisal Bukhari, Department of Data
7
Science, PU, Lahore
Semi-Supervised Learning
Goal: Use a small amount of labeled data with a large
amount of unlabeled data. This approach helps improve
learning when labeling data is expensive or time-
consuming.

Example: Medical Imaging


Labeled medical images, such as X-rays, MRI scans, or CT
scans, often require expert annotation, which is time-
consuming and expensive.
However, a large amount of unlabeled medical image data
may be available. A semi-supervised learning model can
learn to classify images (e.g., diagnosing diseases) using a
combination of expert-labeled images and many unlabeled
images.
Dr. Faisal Bukhari, Department of Data
8
Science, PU, Lahore
Reinforcement Learning
Goal: Reinforcement learning (RL) involves training an
agent to make a sequence of decisions by interacting
with an environment, receiving rewards or penalties
as feedback, and learning the best strategies (called
policies) over time.

Common Algorithms: Q-Learning, Deep Q-Networks,


Policy Gradients.

Example: Game playing (AlphaGo), robotics.


Dr. Faisal Bukhari, Department of Data
9
Science, PU, Lahore
Reinforcement Learning
Recommendation Systems:
Example: Online Recommendations (Netflix, YouTube)

Reinforcement learning is used in recommendation


systems (e.g., Netflix or YouTube) to suggest content to
users.
The system receives feedback in the form of user
interactions (e.g., clicks, likes, or time spent
watching), which it uses as rewards to learn what
types of content are most engaging to the user,
improving its future recommendations.

Dr. Faisal Bukhari, Department of Data


10
Science, PU, Lahore
Self-Supervised Learning
Goal: Predict parts of data to generate labels.

Self-supervised learning (SSL) is a paradigm of machine


learning where the model learns to predict part of the
input data using the other parts, effectively creating its
own supervisory signal.

Common Applications: Language models, image


inpainting.

Example: Predicting missing parts of images.


Dr. Faisal Bukhari, Department of Data
11
Science, PU, Lahore
Transfer Learning
Goal: Use knowledge from one task to improve another task.

Example: Using a pre-trained model for classifying medical


images.

• Image Classification
• Example: Using Pre-trained CNNs for Medical Image
Classification A model trained on a large dataset like
ImageNet (which contains millions of labeled general images)
can be fine-tuned to classify medical images such as X-rays or
MRI scans, even though medical images differ significantly
from the original dataset.

Dr. Faisal Bukhari, Department of Data


12
Science, PU, Lahore
Finite Stochastic Processes And Tree
Diagrams
A (finite) sequence of experiments in which each
experiment has a finite number of outcomes with
given probabilities is called a (finite) stochastic
process.

A convenient way of describing such a process and


computing the probability of any event is by a tree
diagram.

Dr. Faisal Bukhari, Department of Data


13
Science, PU, Lahore
Example:
We are given three boxes as follows:
Box 1 has 10 light bulbs of which 4 axe defective.
Box 2 has 6 light bulbs of which 1 is defective.
Box 3 has 8 light bulbs of which 3 are defective.
We select a box at random and then draw a bulb at
random. What is the probability p that the bulb is
defective?

Dr. Faisal Bukhari, Department of Data


14
Science, PU, Lahore
Solution:
Here we perform a sequence of two experiments:
(i) select one of the three boxes;
(ii) select a bulb which is either defective (D) or
nondefective (N).

Dr. Faisal Bukhari, Department of Data


15
Science, PU, Lahore
D
𝟐Τ𝟓

1
N
𝟏Τ𝟑 𝟑Τ𝟓
𝟏Τ𝟔 D
𝟏Τ𝟑
2
𝟓Τ𝟔 N

𝟏Τ𝟑 𝟑Τ𝟖 D
3

𝟓Τ𝟖 N
Dr. Faisal Bukhari, Department of Data
16
Science, PU, Lahore
𝟐
Example : A coin, weighted so that P(H) = and P(T) =
𝟑
𝟏
,
is tossed. If heads appears, then a number is selected
𝟑
at random from the numbers 1 through 9; if tails
appears, then a number is selected at random from the
numbers 1 through 6.

Find the probability p that an even number is selected.

Dr. Faisal Bukhari, Department of Data


17
Science, PU, Lahore
𝟓 O
𝟗

𝟐 H
𝟑 𝟒
E
𝟗

𝟏 O
𝟏 𝟐
𝟑 T

𝟏
𝟐 E
Dr. Faisal Bukhari, Department of Data 18
Science, PU, Lahore
Probability of even using 1 through 9:
4
P(E) =
9

Probability of even using 1 through 6:


3 1
P(E) = =
6 2

2 4 1 1 𝟐𝟓
p= ( )( )+( )( ) =
3 9 3 2 𝟓𝟒

Dr. Faisal Bukhari, Department of Data


19
Science, PU, Lahore
Example: Three machines A, B and C produce
respectively 50%, 30% and 20% of the total number of
items of a factory. The percentages of defective output
of these machines are 3%, 4% and 5%. If an item is
selected at random, find the probability that the item
is defective.

Dr. Faisal Bukhari, Department of Data


20
Science, PU, Lahore
𝟎. 𝟎𝟑 D

A
𝟎. 𝟗𝟕 N
𝟎. 𝟓𝟎
𝟎. 𝟎𝟒 D

𝟎. 𝟑𝟎 B
𝟎. 𝟗𝟔 N

𝟎. 𝟎𝟓 D
𝟎. 𝟐𝟎
C
𝟎. 𝟗𝟓 N
Dr. Faisal Bukhari, Department of Data
21
Science, PU, Lahore
Solution:
Let D be the event that the item is defective
∵ P(B) = P(A1 )P(B|A1) + P(A2 )P(B|A2) + ... + P(An
)P(B|An)
∴ P(D) = P(A)P(D|A) + P(B )P(D|B) + P(C )P(D|C)
= (0.50)(0.03) + (0.30)(0.04) + (0.20)(0.05)
= 0.037 (or 3.7%)

Dr. Faisal Bukhari, Department of Data


22
Science, PU, Lahore
Example: Consider the factory in the preceding
example. Suppose an item is selected at random and is
found to be defective. Find the probability that the
item was produced by machine A; that is, find P(A|D).

Dr. Faisal Bukhari, Department of Data


23
Science, PU, Lahore
Solution:
By Bayes' theorem,
P(Ai)P(B|Ai)
P(Ai|B) = σ𝒏
𝒊=𝟏 P(Ai )P(B|Aj)

P(A)P(D|A)
P(A|D) =
P(A)P(D|A) + P(B )P(D|B) +P(C )P(D|C)
(0.50)(0.03)
P(A|D) =
(0.50)(0.03) + (0.30)(0.04) +(0.20)(0.05)
15
= (or 0.4054)
37

Dr. Faisal Bukhari, Department of Data


24
Science, PU, Lahore
What is Bayes' Law?
P(B|A)P(A)
Formula: P(A|B) =
P(B)
Bayes' Law is a formula for determining conditional
probabilities.
Components:
P(A|B): Posterior probability
P(B|A): Likelihood
P(A): Prior probability
P(B): Marginal likelihood
Dr. Faisal Bukhari, Department of Data
25
Science, PU, Lahore
Joint Probability Distribution
Consider the following random variables used in a
machine learning context:
- X: Weather feature (X = 0: Rainy, X = 1: Sunny)
- Y: Target label (Y = 0: No Tennis, Y = 1: Plays Tennis)
You are given the following joint probability table:

Y = No Tennis (0) Y = Plays Tennis (1)


X = Rainy (0) 0.3 0.2
X = Sunny (1) 0.1 0.4

Dr. Faisal Bukhari, Department of Data


26
Science, PU, Lahore
Joint Probability Distribution
Questions:
1. (a) What is the probability that it is "Sunny" and
"Plays Tennis"?
2. (b) What is the marginal probability of "Playing
Tennis"?
3. (c) Given that it is "Rainy," what is the probability of
"No Tennis"?
4. (d) Are the events "Weather" and "Playing Tennis"
independent? Justify your answer.

Dr. Faisal Bukhari, Department of Data


27
Science, PU, Lahore
X = 1: Sunny, Y = 1: Plays Tennis

1(a) Solution: The probability that it is "Sunny" and


"Plays Tennis" is:
P(X = 1, Y = 1) = 0.4

Dr. Faisal Bukhari, Department of Data


28
Science, PU, Lahore
Marginal Probability
Y = 1: Plays Tennis
1 (b)Solution The marginal probability of "Playing
Tennis" is calculated as:
Marginal Probability of Y:
P(Y = 0) = P(X = 0, Y = 0) + P(X = 1, Y = 0)
= 0.1 + 0.3 = 0.4
P(Y = 1) = P(X = 0, Y = 1) + P(X = 1, Y = 1)
= 0.2 + 0.4 = 0.6
Marginal Probability of "Plays Tennis"
y=0 0 1 Total
h(y) 𝟎. 𝟒𝟎 𝟎. 𝟔𝟎
Dr. Faisal Bukhari, Department of Data
1
29
Science, PU, Lahore
Y = 0: No Tennis, X = 0: Rainy,
1 (c) Conditional Probability of "No Tennis" given
"Rainy“
The conditional probability is:
P(B|A)P(A)
P(A|B) =
P(B)
P(X = 0, Y = 0)
P(Y = 0 | X = 0) =
P(X = 0)
0.3
P(Y = 0 | X = 0) = = 0.6
0.5

Dr. Faisal Bukhari, Department of Data


30
Science, PU, Lahore
Joint Probability Distribution
Consider the following random variables used in a
machine learning context:
- X: Weather feature (X = 0: Rainy, X = 1: Sunny)
- Y: Target label (Y = 0: No Tennis, Y = 1: Plays Tennis)
You are given the following joint probability table:

Y = No Tennis (0) Y = Plays Tennis (1)


X = Rainy (0) 0.3 0.2
X = Sunny (1) 0.1 0.4

Dr. Faisal Bukhari, Department of Data


31
Science, PU, Lahore
1 (d)Solution: Independence of "Weather" and
"Playing Tennis"
Events X (Weather) and Y (Playing Tennis) are
independent if:
P(X, Y) = P(X) × P(Y)
For P(X = 0, Y = 0):
P(X = 0, Y = 0) = 0.3
P(X = 0) × P(Y = 0) = 0.5 × 0.4
= 0.2
⇒P(X, Y) ≠ P(X) × P(Y)

Since 0.3 ≠ 0.2, so the events are not independent.


Dr. Faisal Bukhari, Department of Data
32
Science, PU, Lahore
Classifying Emails: Spam Detection
Given the presence of words "offer" and "win" :
P(spam) = 0.4, P(not spam) = 0.6
P(offer | spam) = 0.7, P(offer | not spam) = 0.2
P(win | spam) = 0.8, P(win | not spam) = 0.1

1. What is the probability of spam given "offer" is


present?
2. What is the probability of spam given both "offer"
and "win" are present?
Dr. Faisal Bukhari, Department of Data
33
Science, PU, Lahore
Given Data

P(spam) = 0.4, P(not spam) = 0.6


P(offer | spam) = 0.7, P(offer | not spam) = 0.2
P(win | spam) = 0.8, P(win | not spam) = 0.1

Dr. Faisal Bukhari, Department of Data


34
Science, PU, Lahore
Probability of spam given 'offer' is
present
P(B|A)P(A)
P(A|B) =
P(B)
Using Bayes' Theorem:
P(offer | spam) × P(spam)
P(spam | offer) =
P(offer)
P(offer) = P(offer | spam) × P(spam) + P(offer | not
spam) × P(not spam)
P(offer) = 0.7 × 0.4 + 0.2 × 0.6 = 0.4
0.7 × 0 .4
P(spam | offer) = = 0.70
0.4
Dr. Faisal Bukhari, Department of Data
35
Science, PU, Lahore
By Bayes' theorem,
P(Ai)P(B|Ai)
P(Ai|B) = σ𝒏
𝒊=𝟏 P(Ai )P(B|Aj)

Using Bayes' Theorem:


P(spam | offer and win)

P(offer and win | spam) × P(spam)


= - ----------(1)
P(offer and win)

Dr. Faisal Bukhari, Department of Data


36
Science, PU, Lahore
Computation of P(offer and win | spam)

P(offer and win | spam) = P(offer | spam) × P(win |


spam)
Since the two events are conditionally independent
given spam, we multiply their individual probabilities:
= 0.7 × 0.8 = 0.56

Since the two events are conditionally independent


given spam, we multiply their individual probabilities:

P(offer and win | not spam) = P(offer | not spam) ×


P(win | not spam)
= 0.2 × 0.1 = 0.02
Dr. Faisal Bukhari, Department of Data
37
Science, PU, Lahore
Computation of P(offer and win) = ?

P(offer and win) = P(offer and win | spam)


× P(spam) + P(offer and win | not spam)
× P(not spam) = 0.56 × 0.40 + 0.02 × 0.06
= 0.236
Put values in equation (1), we get
0.56 × 0.4
P(spam | offer and win) =
0.236
= 0.949

Dr. Faisal Bukhari, Department of Data


38
Science, PU, Lahore
Interpretation of the Result:
0.56 × 0.4
P(spam | offer and win) =
0.236
= 0.949
It means that if an email contains both the words
"offer" and "win," there is approximately a 94.9%
chance that the email is spam.

This high probability indicates that these words are


strong indicators of spam, as they are commonly used
in spam emails to attract attention.

Dr. Faisal Bukhari, Department of Data


39
Science, PU, Lahore

You might also like