Mid Term Test
Mid Term Test
Mid Term Test
Name
ID:
Date:
For each of following questions find the best answer to the questions.
1
6. Which of the following statements do not describe machine learning?
a. The field of study that gives computers the ability to learn without being
explicitly programmed.
b. A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
c. A computer program is said to learn from some class of tasks T if it uses the
performance measure P to improve its experience E
d. Machine learning can be used to teach a computer to distinguish between
pictures of cats and dogs that it has not previously seen.
7. A statistic is a number calculated from the _____?
a. Parameter
b. Population data
c. Sample data
d. Mean population data
8. The naïve Bayes classifier is naïve because ___
a. It was invented by a naïve Bayes practitioner.
b. It assumes that the users are naïve.
c. It assumes that class labels are separable
d. It assumes the input attributes are independent
9. What is the purpose of having separate training and test data?
a. Training set is used to find the model parameters
b. Test set is used to find the model parameters
c. It is separated so that the machine uses less computing power
d. It is separated so that different models can be generated
10. Which of the following is not true?
a. Supervised learning requires a teacher
b. Unsupervised learning does not require a teacher
c. Supervised learning is a reinforcement learning problem
d. Clustering is a type of unsupervised learning
2
Section B (30 marks)
a. We collect a set of data on the top 500 firms in the US. For each firm we
record profit, number of employees, industry and the CEO salary. We are
interested in understanding which factors affect CEO salary. (2 marks)
Sepal length, Sepal width, Petal length, Petal width; Class (setosa, versicolor,
virginica)
3
(b) Describe one real-life applications in which regression might be useful. Describe
the response, as well as the predictors. Give a detailed example of a training
sample (see sample above). (4 marks)
4
Section C (50 marks)
Use the Naïve Bayes classifier to learn P(Y|X) is to use the training data to estimate P(X|Y) and
P(Y). Then use these estimates together with Bayes rule above, to determine P(Y|X=x k) for any
new instance of xk
∑ P ( Y j ) ∏i P( X i∨Y j )
j
Given a new instance X =⟨ X 1 ,… X n ⟩ this equation shows how to calculate the probability of Y
for the given X, provided that we can estimate P ( Y ) and P( X i∨Y ).
The Dataset on Social media ads predict whether users have purchased a product (Yes class) by
clicking on the advertisements shown to them. The frequency counts are listed in each table.
Class Gender
Male Female Total
Yes 100 35 135
No 35 65 100
Total 135 100 235
Class Age
<22 23-30 >31 Total
Yes 20 30 85 135
No 50 25 25 100
Total 70 55 110 235
Class Salary
<15k 15001-60k >61k Total
Yes 20 25 90 135
No 25 50 25 100
Total 45 75 115 235
5
Based on the above data set, answer the following questions.
b. Estimate the class conditionals probabilities of each attribute. You may fill up the table
below. (16 marks)
Class Gender
Male Female Total
Yes 135
No 100
Total 235
Class Age
<22 23-30 >31 Total
Yes 135
No 100
Total 235
Class Salary
<15k 15001-60k >61k Total
Yes 135
No 100
Total 235
c. Show your calculation and prediction for the following input data (30 marks)
Yes/No
1. Male Age=27 Salary=18k
2. Female Age=35 Salary=62k
3. Male Age=32 Salary=60k