2. Given Information:: 3. Calculate ε

1) Hoeffding inequality: Let us draw a sample of size 500 (N = 500) and
observed v to be v=0.42; then if weclaim that µ ∈ [0.35, 0.49], we will be

correct 97.4% of the time; is this true? Show all your work.
ANS:
To determine if the claim that µ ∈ [0.35, 0.49] is true with a 97.4%
confidence level using the Hoeffding inequality, we can follow these
steps:
Hoeffding Inequality:
The Hoeffding Inequality provides an upper bound on the probability
that the sample mean (v) deviates significantly from the true mean (µ)
of a random variable within a specified range (ε). The inequality is
given as follows:
 P(∣v−µ∣≥ε)≤2e−2Nε^2
Where:
 P(∣v−µ∣≥ε) is the probability of the deviation being greater than ε.

 N is the sample size.
 ε is the margin of error.
 e is the base of the natural logarithm, approximately equal to 2.71828.
2. Given Information:
In this case, we have the following information:
 Sample size: N=500
 Observed sample mean: v=0.42
 Claimed range for µ: [0.35,0.49]
We want to calculate the probability that the true mean µ falls outside the claimed range.
3. Calculate ε:
ε is the half-width of the claimed range, so ε = (0.49 - 0.35) / 2 = 0.07.
4. Apply Hoeffding Inequality:

Plug the values into the Hoeffding Inequality:
P ( | 0.42−µ∣≥0.07)≤2e−2⋅500⋅(0.07)^2
5. Calculate the Exponent:
2⋅500⋅ (0.07)2=490
6.Calculate the Probability:
P (∣0.42−µ∣≥0.07)≤2e−490≈2⋅2.71828−490≈2⋅1.2339×10−213
The probability is extremely close to zero, indicating that the probability of µ falling outside the
claimed range [0.35, 0.49] is almost negligible.
7. Conclusion:
Based on the Hoeffding Inequality, the claim that µ ∈ [0.35, 0.49] is very likely to be true with a high
confidence level. The probability of being correct is extremely high, well above 97.4%.
2) PLA: Given wt= [ 2 3 2 -2 -5] and xi = [ -1 2 -1 -3 2] and yi

= + l find wt+1, show step by step.
ANS :
The Perceptron Learning Algorithm (PLA) is an iterative algorithm used for

binary classification. Given the weight vector wt, the input vector xi, and the
corresponding target label yi, the PLA updates the weight vector wt+1 based
on whether the current weight vector correctly classifies the input xi or not.
Here are the steps to update wt+1 step by step:
Given:
 w t =[2,3,2,−2,−5] (the current weight vector)

 x i =[−1,2,−1,−3,2] (the input vector)
 y i =+1 (the target label)
1. Calculate the dot product of w t and x i :
wt ⋅ xi=2 ∗ (−1)+3 ∗ 2+2 ∗ (−1)+(−2) ∗ (−3)+(−5) ∗ 2
w t ⋅ x i =−2+6−2+6−10=−2
2. Check if the current prediction is correct or not:
If w t ⋅ x i is positive and yi is +1, or if wt ⋅ xi is negative and yi is -1, then the current
prediction is correct, and no update is needed.
In this case, wt ⋅ xi=−2 and yi=+1, which means the prediction is incorrect because
wt predicts the opposite class of yi .
3. Update wt+1:
To update w t+ 1 , you need to add xi to w t because the current prediction is incorrect, and
wt+1 should be adjusted to correctly classify xi :
wt+1=wt+yi ⋅ xi
Substituting the values:

w t+1 =[2,3,2,−2,−5]+(+1)⋅[−1,2,−1,−3,2]
w t+1 =[2,3,2,−2,−5]+[−1,2,−1,−3,2]
w t+1 =[2−1,3+2,2−1,−2−3,−5+2]
w t+1 =[1,5,1,−5,−3]
So, wt+1=[1,5,1,−5,−3] is the updated weight vector after processing xi with the target label yi=+1.
3) We have three coins each tossed 4 times what is the probability

that one of the 3 coins gets all heads?
ANS:
To find the probability that one of the three coins gets all heads when each
of them is tossed 4 times, we can use the binomial probability formula. In
this case, we want to calculate the probability of success (getting all heads)
for one specific coin in 4 tosses. Then, we multiply this probability by 3
because there are three coins, and any one of them could get all heads.
The binomial probability formula is:
P(X=k)=(nk)⋅ p k ⋅ (1−p) n−k
where
P(X=k) is the probability of getting exactly k successes.
n is the number of trials (in this case, the number of coin tosses for one
specific coin, which is 4).
k is the number of successful outcomes (in this case, getting all heads,
which is 4 heads).
p is the probability of success on a single trial (in this case, the
probability of getting heads on one toss of a fair coin, which is 0.5).
(𝑛𝑘) represents the binomial coefficient, which is the number of ways to
choose k successes out of n trials.
Now, let's calculate the probability of getting all heads for one specific
coin in 4 tosses:
P(One coin gets all heads)=(44) ⋅(0.5)4⋅(1−0.5)4−4
P(One coin gets all heads)=1⋅(0.5)4⋅(0.5)0
P(One coin gets all heads)=(0.5)4
Now, since there are three coins, and we want to find the probability that any one
of them gets all heads, we multiply this probability by 3:
P(At least one coin gets all heads)=3 ⋅ (0.5) 4
Now, calculate the value:
1 3
P(At least one coin gets all heads)=3⋅(0.5)4=3⋅ =
16 16
So, the probability that at least one of the three coins gets all heads when
3
each of them is tossed 4 times is or approximately 0.1875 (18.75%).
16
4) In two-dimensional plane of(x1,
x2); shown, illustrate the
function x2 = 2x1 + 3
assuming it is a separating
hyperplane and the two
dimensions are xl and x2). (b)
Write down two points on this
line and show them in the
graph.
(c) write and show on the graph two points above the line and
one point below the line. (d) Write a data set of 2-class
classification where this separating hyperplane is the solution (the
data set contains 4 data points: two data points in class+1 and two
in class -1; notice here that d=2}; explain briefly.
ANS:
(a) To illustrate the function x 2 =2x 1 +3 as a separating hyperplane in a two-
dimensional plane (x 1 ,x 2 ), you can consider it as a linear equation in slope -
intercept form, y=mx+b, where y=x 2 , =m=2, and b=3. This equation
represents a line in the x 1 x 2 -plane.
(b) Let's write down two points on this line and show them on the graph:
 Point 1: When x 1 =0, we can find x 2 by substituting into the equation:
X 2 =2(0)+3=3
So, one point on the line is (0, 3).
 Point 2: When x1=1, we can find x2
X 2 =2(1)+3=5
Now, lets show these points on the graph
(c) To show two points above the line and one point below the line on the graph, we
can choose arbitrary values for x 1 and calculate x 2 based on the equation x 2 =2x 1 +3.
Here are
three points:
POINT ABOVE THE LINE 1: LET X1=2, THEN X2=2(2)+3=7. SO, ONE POINT ABOVE THE LINE IS (2, 7).
POINT ABOVE THE LINE 2: LET X1=−1, THEN X2=2(−1)+3=1. SO, ANOTHER POINT ABOVE THE LINE IS (-1,
1).
POINT BELOW THE LINE: LET X1=3, THEN X2=2(3)+3=9. SO, ONE POINT BELOW THE LINE IS (3, 9).
Now, we can add these points to the graph

d) For a 2-class classification problem where the separating hyperplane is x2=2x1+3, we can
create a dataset with two data points in class +1 and two data points in class -1. Here's a simple example:
In this dataset, the first two points (0, 3) and (-1, 1) belong to class +1, and the last two points (1,
5) and (3, 9) belong to class -1. The separating hyperplane x2=2x1+3 serves as the boundary between these
two classes, correctly classifying them into +1 and -1.
5) Hoeffding inequality: Let us draw a sample of size 700; then fill

in the blank:
P[ | V- µ| ≤ 0.05 ] ≥ …………………………… }explain
briefly in 2 -- 3 lines}
ANS:
The Hoeffding inequality states that the probability of the absolute

difference between the sample mean (V) and the true mean (µ) being
within a certain range (in this case, 0.05) is greater than or equal to
1−2e −2N ε2 , where N is the sample size. In this specific scenario with a
sample size of 700 and ε=0.05, the blank can be filled as follows :
P[∣V−µ∣≤0.05]≥1−2e−2⋅700⋅(0.05)2
We can calculate the numerical value to determine the actual probability, but the
Hoeffding inequality guarantees that it will be greater than or equal to the
expression provided, ensuring a certain level of confidence in the proximity of the
sample mean to the true mean.
6) Quick Questions:
a) Classification is one of the most common tasks in ML, name two
other common ML tasks: (Regression & Clustering)
b) Clustering is an unsupervised ML Task; T/F? (True)
c) In general machine learning can be divided into two types:
supervised learning and unsupervised learning
d) Artificial Intelligence is a branch of ML; T/F?(False)
e) Fair coin tossed six times; what is the prob not all heads (at least
one tail); explain briefly?
Ans: To find the probability of not getting all heads (at least one
tail) when a fair coin is tossed six times, you can use the
complement probability approach. The probability of getting all
heads in a single toss is 0.5, so the probability of not getting all
heads in one toss is 1−0.5=0.5. Since the tosses are independent,
you can raise this probability to the power of 6 (the number of
tosses) to find the probability of not getting all heads in all six
tosses:
P(At least one tail)=1−(0.5)6
Calculating this gives:
P (At least one tail)=1−0.015625=0.984375
So, the probability of not getting all heads (at least one tail) when a fair
coin is tossed six times is approximately 0.984375 or 98.44
7) Given this line y = x2 + 5 (also can be written as f(x) = y = x2 +

5). Provide: one point on this line, onepoint above the line; and
one point below the line (optional draw this line).
Ans:
The equation of the line is y=x2 + 5. Let's provide one point on the
line, one point above the line, and one point below the line:
P OINTON THE L INE : W HEN X =0, WE CAN FIND THE CORR ESPONDING Y - COORDINATE
ON THE LINE :
Y =0 2 +5=5
SO, THE POINT (0, 5) LIES ON THE LINE Y = X 2 +5.
P OINT A BOVE
THE L INE : C HOOSE A VALUE OF X AND CALCULATE Y SUCH THAT Y IS
GREATER THAN THE VALUE GIVEN BY Y= X 2 +5. F OR EXAMPLE , WHEN X =2:
Y=X 2 +5=9
SO, THE POINT (2, 9) LIES ABOVE THE LINE Y = X 2 +5.
P OINT B ELOW THE L INE : C HOOSE A VALUE OF X AND CALCULATE Y SUCH THAT Y IS
LESS THAN THE VALUE GIVEN BY Y = X 2 +5. F OR EXAMPLE , WHEN X =−2:
Y =(−2) 2 +5=9
SO, THE POINT (-2, 1) LIES BELOW THE LINE Y = X 2 +5.
H ERE IS A SIMPLE GRAPH OF THE LINE Y = X 2 +5:
^
10| .
| .
5| .
| .
| .
| .
+---------------------------------->
-5 -4 -3 -2 -1 0 1 2 3 4 5
8) Write the PLA algorithm and explain each step in one line (write in
pseudocode sequence of steps).
ANS:
PLA Algorithm:
1. INITIALIZE THE WEIGHT VECTOR W AND BIAS TERM B TO ZEROS.
2. REPEAT UNTIL CONVERGENCE:
A. FOR EACH TRAINING EXAMPLE (XI, YI):
I. CALCULATE THE PREDICTED OUTPUT Y_HAT USING THE
CURRENT W AND B : Y_HAT = SIGN (W · XI + B).
II. IF Y_HAT IS NOT EQUAL TO THE TRUE LABEL YI
(MISCLASSIFIED):
- UPDATE THE WEIGHT VECTOR: W = W + Η * YI * XI.
- UPDATE THE BIAS TERM : B = B + Η * YI.
B. CHECK FOR CONVERGENCE : IF THERE ARE NO MISCLASSIFIED
EXAMPLES, STOP.
Explanation of each step:

1. Initialize the weight vector w and bias term b to zeros: We start with initial
values for the parameters.
Repeat until convergence: Iterate through the training data until there are no more
misclassified examples.
a. For each training example (xi, yi):
i. Calculate the predicted output y_hat using the current w and b: We compute the
weighted sum of inputs and apply the sign function to make predictions.
ii. If y_hat is not equal to the true label yi (misclassified):
- Update the weight vector w: Adjust the weights based on the misclassification to
reduce the error.
- Update the bias term b: Adjust the bias to correct the misclassification.
b. Check for convergence: Determine if there are any misclassified examples left.
If not, stop; the algorithm has converged.
The PLA algorithm iteratively updates the weights and bias to learn a decision boundary that
separates the two classes in the binary classification problem. It continues this process until no
more misclassifications occur, ensuring that it converges to a solution when possible.

2. Given Information:: 3. Calculate ε

Uploaded by

Copyright:

Available Formats

2. Given Information:: 3. Calculate ε

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2. Given Information:: 3. Calculate ε

Uploaded by

Copyright:

Available Formats

1) Hoeffding inequality: Let us draw a sample of size 500 (N = 500) and

observed v to be v=0.42; then if weclaim that µ ∈ [0.35, 0.49], we will be

 P(∣v−µ∣≥ε) is the probability of the deviation being greater than ε.

4. Apply Hoeffding Inequality:

2) PLA: Given wt= [ 2 3 2 -2 -5] and xi = [ -1 2 -1 -3 2] and yi

The Perceptron Learning Algorithm (PLA) is an iterative algorithm used for

 w t =[2,3,2,−2,−5] (the current weight vector)

Substituting the values:

3) We have three coins each tossed 4 times what is the probability

Now, we can add these points to the graph

5) Hoeffding inequality: Let us draw a sample of size 700; then fill

The Hoeffding inequality states that the probability of the absolute

7) Given this line y = x2 + 5 (also can be written as f(x) = y = x2 +

SO, THE POINT (0, 5) LIES ON THE LINE Y = X 2 +5.

SO, THE POINT (-2, 1) LIES BELOW THE LINE Y = X 2 +5.

H ERE IS A SIMPLE GRAPH OF THE LINE Y = X 2 +5:

Explanation of each step:

You might also like