0% found this document useful (0 votes)

4 views

Assignment 1

Uploaded by

eddie594100

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Assignment 1

Uploaded by

eddie594100

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

10-701 Machine Learning: Assignment 1

Due on Februrary 20, 2014 at 12 noon

Barnabas Poczos, Aarti Singh

Instructions: Failure to follow these directions may result in loss of points.

• Your solutions for this assignment need to be in a pdf format and should be submitted
to the blackboard and a webpage (to be specified later) for peer-reviewing.
• For the programming question, your code should be well-documented, so a TA can
understand what is happening.
• DO NOT include any identification (your name, andrew id, or email) in both the content
and filename of your submission.

MLE, MAP, Concentration (Pengtao)

1. MLE of Uniform Distributions [5 pts]
Given a set of i.i.d samples X1 , ..., Xn U nif orm(0, θ), find the maximum likelihood estimator of θ.

(a) Write down the likelihood function (3 pts)

(b) Find the maximum likelihood estimator (2 pts)

2. Concentration [5 pts]
The instructors would like to know what percentage of the students like the Introduction to Machine Learn-
ing course. Let this unknown—but hopefully very close to 1—quantity be denoted by µ. To estimate µ, the
instructors created an anonymous survey which contains this question:

”Do you like the Intro to ML course? Yes or No”

Each student can only answer this question once, and we assume that the distribution of the answers is i.i.d.

(a) What is the MLE estimation of µ? (1 pts)

(b) Let the above estimator be denoted by µ̂. How many students should the instructors ask if they want
the estimated value µ̂ to be so close to the unknown µ such that

P(|µ̂ − µ| > 0.1) < 0.05, (4pts)

1
3. MAP of Multinational Distribution [10 pts]
You have just got a loaded 6-sided dice from your statistician friend. Unfortunately, he does not remem-
ber its exact probability distribution p1 , p2 , ..., p6 . He remembers, however, that he generated the vector
(p1 , p2 , . . . , p6 ) from the following Dirichlet distribution.
P6 6 6
Γ( i=1 ui ) Y ui −1 X
P(p1 , p2 , . . . , p6 ) = Q6 pi δ( pi − 1),
i=1 Γ(ui ) i=1 i=1

where he chose ui = i for all i = 1, . . . , 6. Here Γ denotes the gamma function, and δ is the Dirac delta. To
estimate the probabilities p1 , p2 , . . . , p6 , you roll the dice 1000 times and then observe that side i occurred
P6
ni times ( i=1 ni = 1000).

(a) Prove that the Dirichlet distribution is conjugate prior for the multinomial distribution.

(b) What is the posterior distribution of the side probabilities, P(p1 , p2 , . . . , p6 |n1 , n2 , . . . , n6 )?

Linear Regression (Dani)

1. Optimal MSE rule [10 pts]
Suppose we knew the joint distribution PXY . The optimal rule f ∗ : X → Y which minimizes the MSE
(Mean Square Error) is given as:
f ∗ = arg min E[(f (X) − Y )2 ]
f
∗
Show that f (X) = E[Y |X].

2. Ridge Regression [10 pts]

In class, we discussed `2 penalized linear regression:
n
X
βb = arg min (Yi − Xi β)2 + λkβk22
β
i=1

(1) (p)
where Xi = [Xi . . . Xi ].

A>A + λII )−1A >Y where

a) Show that a closed form expression for the ridge estimator is βb = (A
A = [X1 ; . . . ; Xn ] and Y = [Y1 ; ...; Yn ].

A>A +λII ) is invertible.

b) An advantage of ridge regression is that a unique solution always exists since (A
>
A A + λII ) is full rank by characterizing
To be invertible, a matrix needs to be full rank. Argue that (A
its p eigenvalues in terms of the singular values of A and λ.

Logistic Regression (Prashant)

1. Overfitting and Regularized Logistic Regression [10 pts]
a) Plot the sigmoid function 1/(1 + e−wX ) vs. X ∈ R for increasing weight w ∈ {1, 5, 100}. A
qualitative sketch is enough. Use these plots to argue why a solution with large weights can cause
logistic regression to overfit.

Page 2 of 6
b) To prevent overfitting, we want the weights to be small. To achieve this, instead of maximum
conditional likelihood estimation M(C)LE for logistic regression:
n
Y
max P (Yi |Xi , w0 , . . . , wd ),
w0 ,...,wd
i=1

we can consider maximum conditional a posterior M(C)AP estimation:

n
Y
max P (Yi |Xi , w0 , . . . , wd )P (w0 , . . . , wd )
w0 ,...,wd
i=1

where P (w0 , . . . , wd ) is a prior on the weights.

Assuming a standard Gaussian prior N (0, I ) for the weight vector, derive the gradient ascent update
rules for the weights.

2. Multi-class Logistic Regression [10 pts]

One way to extend logistic regression to multi-class (say K class labels) setting is to consider (K-1) sets of
weight vectors and define
d
X
P (Y = yk |X) ∝ exp(wk0 + wki Xi ) for k = 1, . . . , K − 1
i=1

a) What model does this imply for P (Y = yK |X)?

b) What would be the classification rule in this case?

c) Draw a set of training data with three labels and the decision boundary resulting from a multi-class
logistic regression. (The boundary does not need to be quantitatively correct but should qualitatively
depict how a typical boundary from multi-class logistic regression would look like.)

Naive Bayes Classifier (Pulkit)

1. Naive Bayes Classification Implementation [25 pts]
In this question, you will write a Naive Bayes classifier and verify its performance on a news-group data-set.
As you learned in class, Naive Bayes is a simple classification algorithm that makes an assumption about
conditional independence of features, but it works quite well in practice. You will implement the Naive Bayes
algorithm (Multinomial Model) to classify a news corpus into 20 different categories.

Handout - http://www.cs.cmu.edu/~aarti/Class/10701_Spring14/assignments/homework1.tar

You have been provided with the following data files in the handout:

• train.data - Contains bag-of-words data for each training document. Each row of the file represents
the number of occurrences of a particular term in some document. The format of each row is (docId,
termId, Count).

• train.label - Contains a label for each document in the training data.

Page 3 of 6
• test.data - Contains bag-of-words data for each testing document. The format of this file is the same
as that of the train.data file.
• test.label - Contains a label for each document in the testing data.
For this assignment, you need to write code to complete the following functions:
• logPrior(trainLabels) - Computes the log prior of the training data-set. (5 pts)
• logLikelihood(trainData, trainLabels) - Computes the log likelihood of the training data-set. (7 pts)
• naiveBayesClassify(trainData, trainLabels, testData) - Classifies the data using the Naive Bayes algo-
rithm. (13 pts)
Implementation Notes
1. We compute the log probabilities to prevent numerical underflow when calculating multiplicative prob-
abilities. You may refer to this article on how to perform addition and multiplication in log space.
2. You may encounter words during classification that you haven’t during training. This may be for a
particular class or over all. Your code should deal with that. Hint: Laplace Smoothing
3. Be memory efficient and please do not create a document-term-matrix in your code. That would require
upwards of 600MB of memory.
Due to popular demand, we are allowing the solution to be coded in 3 languages: Octave, Julia, and Python.

Octave is an industry standard in numerical computing. Unlike MATLAB, it is an open-source language

and has similar capabilities and syntax.

Julia is a popular new open-source language developed for numerical and scientific computing was well
as beginning effective for general programming purposes. This is the first time this language is being sup-
ported in a CMU course.

Python is an extremely flexible language and is popular in industry and the data science community. Pow-
erful python libraries would not be available to you.

For Octave and Julia, a blank function interface has been provided for you (in the handout). However, for
Python, you will need to perform the I/O for the data files and ensure the results are written to the correct
output files.

Challenge Question
This question is not graded, but it is highly recommended that you try it. In the above question, we are using
all the terms from the vocabulary to make a prediction. This would lead to a lot of noisy features. Although
it seems counter-intuitive, classifiers built from a smaller vocabulary perform better because they generalize
better over unseen data. Noisy features that are not well-represented often skew the perceived distribution
of words, leading to classification errors. Therefore, the classification can be improved by selecting a subset
of extremely effective words.
Write a program to select a subset of the words from the vocabulary provided to you and then use this subset
to run your naive bayes classification again. Verify changes in accuracy. TF-IDF and Information Theory
are good places to start looking.

Page 4 of 6
Support Vector Machines (Jit)
1. SVM Matching [15 points]
Figure 1 (at the end of this problem) plots SVM decision boundries resulting from using different kernels
and/or different slack penalties. In Figure 1, there are two classes of training data, with labels yi ∈ {−1, 1},
represented by circles and squares respectively. The SOLID circles and squares represent the Support Vectors.
Determine which plot in Figure 1 was generated by each of the following optimization problems. Explain
your reasoning for each choice.

1.
n
1 X
min w · w + C ξi
2 i=1

s.t. ∀i = 1, · · · , n:
ξi ≥ 0
(w · xi + b)yi − (1 − ξi ) ≥ 0
and C = 0.1.

2.
n
1 X
min w · w + C ξi (1)
2 i=1

s.t. ∀i = 1, · · · , n:
ξi ≥ 0
(w · xi + b)yi − (1 − ξi ) ≥ 0
and C = 1.

3.
n
X 1X
max αi − αi αj yi yj K(xi , xj ) (2)
i=1
2 i,j
Pn
s.t. i=1 αi yi = 0;
αi ≥ 0, ∀i = 1, · · · , n;
where K(u, v) = u · v + (u · v)2 .

4.
n
X 1X
max αi − αi αj yi yj K(xi , xj ) (3)
i=1
2 i,j
Pn
s.t. i=1 αi yi = 0;
αi ≥ 0, ∀i = 1, · · · , n;
2
where K(u, v) = exp(− ku−vk2 ).

5.
n
X 1X
max αi − αi αj yi yj K(xi , xj ) (4)
i=1
2 i,j
Pn
s.t. i=1 αi yi = 0;
αi ≥ 0, ∀i = 1, · · · , n;
where K(u, v) = exp(− k u − v k2 ).

Page 5 of 6
Figure 1: Induced Decision Boundaries

Page 6 of 6

Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
HW3 Solutions 2017 Spring
100% (1)
HW3 Solutions 2017 Spring
4 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Homework 1
0% (1)
Homework 1
4 pages
Gerund or Infinitive Exercises and Rephrasing
75% (4)
Gerund or Infinitive Exercises and Rephrasing
4 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Admm Homework
No ratings yet
Admm Homework
5 pages
Homework 4
0% (1)
Homework 4
4 pages
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
No ratings yet
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
6 pages
MedTerm Machine Learning
No ratings yet
MedTerm Machine Learning
14 pages
University of Edinburgh College of Science and Engineering School of Informatics
No ratings yet
University of Edinburgh College of Science and Engineering School of Informatics
5 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
05_lecturenote_NB
No ratings yet
05_lecturenote_NB
10 pages
hw1
No ratings yet
hw1
7 pages
p4
No ratings yet
p4
4 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Ps 1
No ratings yet
Ps 1
5 pages
Exercises 02
No ratings yet
Exercises 02
3 pages
Practice Midterm
No ratings yet
Practice Midterm
8 pages
Ass 1
No ratings yet
Ass 1
3 pages
Assignment MEF 1 2018
No ratings yet
Assignment MEF 1 2018
5 pages
Stanford University CS 229, Autumn 2015 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2015 Midterm Examination
25 pages
Tut3 Questions
No ratings yet
Tut3 Questions
2 pages
Assignment 06
No ratings yet
Assignment 06
3 pages
COL726_A1_1
No ratings yet
COL726_A1_1
3 pages
ST3189 Exam Paper - October 2023
No ratings yet
ST3189 Exam Paper - October 2023
5 pages
assignment-2
No ratings yet
assignment-2
5 pages
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
No ratings yet
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
2 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
2024-exam2-solution
No ratings yet
2024-exam2-solution
11 pages
HW 4
No ratings yet
HW 4
7 pages
Problemset2 PDF
No ratings yet
Problemset2 PDF
4 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
Statistical Modelling and Inference Assignment 4: Spring 2 2022
No ratings yet
Statistical Modelling and Inference Assignment 4: Spring 2 2022
12 pages
Tut5 Questions
No ratings yet
Tut5 Questions
2 pages
Detailed Sigmoid and Softmax Activation Function
No ratings yet
Detailed Sigmoid and Softmax Activation Function
5 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
midterm2008f_sol
No ratings yet
midterm2008f_sol
12 pages
5_LR_Apr_7_2021 (3)
No ratings yet
5_LR_Apr_7_2021 (3)
93 pages
ml-20240315
No ratings yet
ml-20240315
8 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Assign 1
No ratings yet
Assign 1
5 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
ML_Final2012
No ratings yet
ML_Final2012
2 pages
CMPUT 466/551 - Assignment 1: Paradox?
No ratings yet
CMPUT 466/551 - Assignment 1: Paradox?
6 pages
HW2 2
No ratings yet
HW2 2
3 pages
Clustering With Gradient Descent: 1 Performance
No ratings yet
Clustering With Gradient Descent: 1 Performance
4 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
ML LW 6 Kernel SVM
No ratings yet
ML LW 6 Kernel SVM
4 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
homework1
No ratings yet
homework1
3 pages
CS 229, Summer 2020 Problem Set #1
No ratings yet
CS 229, Summer 2020 Problem Set #1
14 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
hw1 PDF
No ratings yet
hw1 PDF
3 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
EE 769 2020.02.29 Mid Term Solution
No ratings yet
EE 769 2020.02.29 Mid Term Solution
6 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
DELA-Unit-V
No ratings yet
DELA-Unit-V
40 pages
lec25
No ratings yet
lec25
7 pages
hw7
No ratings yet
hw7
3 pages
CSCI374_Homework1
No ratings yet
CSCI374_Homework1
5 pages
m211oh66
No ratings yet
m211oh66
2 pages
syllabus
No ratings yet
syllabus
2 pages
Slides 01
No ratings yet
Slides 01
8 pages
6603d69bfd0a8a25b5154e4b Original
No ratings yet
6603d69bfd0a8a25b5154e4b Original
32 pages
664a2ccfce05c65e387fb6a2 Original
No ratings yet
664a2ccfce05c65e387fb6a2 Original
1 page
Week 2
No ratings yet
Week 2
24 pages
Chapter 6 - Logistic Regression
No ratings yet
Chapter 6 - Logistic Regression
24 pages
Chapter 10 - Neural Network
No ratings yet
Chapter 10 - Neural Network
62 pages
Useful Vocabulary - Unit 1 - CPE
No ratings yet
Useful Vocabulary - Unit 1 - CPE
9 pages
English SE1 AY 22-23 Past Papers
No ratings yet
English SE1 AY 22-23 Past Papers
6 pages
Understanding The Indian English Accent
No ratings yet
Understanding The Indian English Accent
10 pages
Los Verbos Modales Have To - Must - Should
No ratings yet
Los Verbos Modales Have To - Must - Should
14 pages
Module 1 Midterm
100% (2)
Module 1 Midterm
6 pages
Lacan, Jacques - Insistence of The Letter in The Unconscious, (1966) 36-37 YFS 112
No ratings yet
Lacan, Jacques - Insistence of The Letter in The Unconscious, (1966) 36-37 YFS 112
36 pages
Ap MC Breakdown
No ratings yet
Ap MC Breakdown
3 pages
How To Improve Your Vocabulary 100 Words To Impress An Examiner!
No ratings yet
How To Improve Your Vocabulary 100 Words To Impress An Examiner!
58 pages
Module3 PPT
No ratings yet
Module3 PPT
78 pages
고1도표그래프5개년-1
No ratings yet
고1도표그래프5개년-1
14 pages
Schumann S Ciphers Sams
No ratings yet
Schumann S Ciphers Sams
9 pages
Household Survey Questionnaire
No ratings yet
Household Survey Questionnaire
5 pages
Metro Scope and Sequence-Level 2
No ratings yet
Metro Scope and Sequence-Level 2
2 pages
Tenses
No ratings yet
Tenses
4 pages
The Sun Rising
No ratings yet
The Sun Rising
20 pages
List of Irregular Verbs: Same V1, V2, and V3 Form. Infinitive Simple Past Past Participle
No ratings yet
List of Irregular Verbs: Same V1, V2, and V3 Form. Infinitive Simple Past Past Participle
5 pages
Fragments Workshop
No ratings yet
Fragments Workshop
5 pages
Worksheet 5 Adjectives
No ratings yet
Worksheet 5 Adjectives
8 pages
Adjective
No ratings yet
Adjective
4 pages
Lesson Plan 5e Christina Davis 2
No ratings yet
Lesson Plan 5e Christina Davis 2
3 pages
Sentiment Prediction in Hindi and English Language
No ratings yet
Sentiment Prediction in Hindi and English Language
25 pages
Duolingo Test Guide
No ratings yet
Duolingo Test Guide
35 pages
Logical Fallacies - Quizizz
No ratings yet
Logical Fallacies - Quizizz
7 pages
Lesson Note English Jss2
No ratings yet
Lesson Note English Jss2
10 pages
Exclamatory Sentences Worksheet
No ratings yet
Exclamatory Sentences Worksheet
1 page
horen 15 hueber buch translation
No ratings yet
horen 15 hueber buch translation
16 pages
Lesson Plan Procedure
No ratings yet
Lesson Plan Procedure
3 pages
Sir Noor Essay Document From 3063115373
No ratings yet
Sir Noor Essay Document From 3063115373
11 pages