0% found this document useful (0 votes)

415 views

Machine Learning Assignment

This document provides instructions for Assignment 1 of the course COMP4670/6467 - 2013 Semester 1 Introduction to Statistical Machine Learning. It outlines the maximum marks, weight towards the final grade, submission deadline, file format requirements, submission mode, expectations for formula explanations, code quality, code efficiency, late penalties, cooperation policies, and availability of solutions.

Uploaded by

Scarl0s

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

415 views

Machine Learning Assignment

Uploaded by

Scarl0s

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

COMP4670/6467 - 2013 Semester 1

Introduction to Statistical Machine Learning

Assignment 1
Maximum marks
Weight
Submission deadline
Document format

Submission mode
Formula Explanations

Code quality

Code efficiency

Late Penalty

Cooperation

Solutions

20
20% of final grade
Monday, April 22, 2013, 23:59
Portable Document Format (PDF); ASCII file for Pyton code.
Please package multiple files into a .zip or .tar archive.
Put your name and students ID on the top of each document.
e-mail to christfried.webers@nicta.com.au
All formulas which you derive need to be explained unless you
use very common mathematical facts. Picture yourself as explaining
your arguments to somebody who is just learning about your
assignment. With other words, do not assume that the person marking
your assignment knows all the background and therefore you can
just write down the formulas without any explanation. It is your
task to convince the reader that you know what you are doing when
you derive an argument.
Python code should be well structured, use meaningful identifiers for
variables and subroutines, and provide sufficient comments. Please
refer to the examples given in the tutorials.
An efficient implementation of an algorithm uses fast subroutines
provided by the language or additional libraries. For the purpose of
implementing Machine Learning algorithms in this course, that means
using the appropriate data structures provided by Python and in
numpy/scipy (e.g. Linear Algebra and random generators).
20% per day overdue (a day starts at midnight!)
All assignments must be done individually. Cheating and plagiarism will
be dealt with in accordance with University procedures (please see
the ANU policies on Academic Honesty and Plagiarism
http://academichonesty.anu.edu.au). Hence, for example,
code for programming assignments must not be developed in groups, nor
should code be shared.
You are encouraged to broadly discuss ideas, approaches and techniques
with a few other students, but not at a level of detail where specific
solutions or implementation issues are described by anyone. If you choose
to consult with other students, you will include the names of your
discussion partners for each solution.
If you have any questions on this, please ask the lecturer before you act.
To be presented in the tutorials.

Probabilities

1.1

(1/20) Covariance of Sum

Prove that the following holds for the variance of a sum of two random variables X and Y
var[X + Y ] = var[X] + var[Y ] + 2 cov[X, Y ],
where cov[X,Y] is the covariance between X and Y .
For each step in your proof, provide a verbal explanation why this transformation step
holds.

1.2

(2/20) Probability of Babies

My neighbour has two children. Let us assume that the gender of a child is a binary
variable (in reality it is not!) following a Bernoulli distribution with parameter 1/2 and
that those defined genders of children are iid.
1. How many girls/boys will my neighbour most likely have?
2. Suppose I ask him whether he has any girls, and he says yes. What is the probability
that one child is a boy?
3. Suppose instead that I happen to see one of his children in the garden, and it is a
girl. What is the probability that the other child is a boy?

1.3

(3/20) Maximum Likelihood for Multivariate Gaussian Distribution

Assume input data xn RD , n = 1, . . . , N , are drawn i.i.d. from a Gaussian distribution.

1. Define the likelihood of drawing all data X = (x1 , x2 , . . . , xN )T given the parameters
of the Gaussian distribution.
2. Calculate the parameters of the Gaussian distribution for which the above defined
likelihood has an extremum.
3. Show that with respect to the mean, the extremum found is a maximum.
4. Is the order in which the maximising parameters are found arbitrary? Please justify
your answer.
5. How do the parameters maximising the likelihood change if the order of the input
data {xn }, n = 1, . . . , N changes? Justify your result.
Hints: The directional derivative of the determinant |A| of a matrix A in direction B is
given by

D|A|(B) = |A| tr A1 B
where tr {} is the trace function.
Furthermore, the directional derivative of the inverse A1 of a matrix A in direction B is
given by
DA1 (B) = A1 BA1 .
2

1.4

(2/20) Lifetime of Equipment

The lifetime X of some equipment is modelled by an exponential distribution with an

unknown parameter as

p(x | ) = ex

x 0,

> 0.

1. Calculate the maximum likelihood estimator (MLE) b for a number of observed iid
lifetimes X1 , . . . , XN .
2. Derive the relation between the mean of X1 , . . . , XN and the maximum likelihood
b
estimator .
3. Suppose we observe X1 = 5, X2 = 4, X3 = 3 and X4 = 4 as the lifetimes (in years)
of four different machines. What is the MLE given this data?

2
2.1

Decision Theory
(2/20) Lower Bound for the Correct Classification

Consider two nonnegative numbers a and b. Show that, if a b, then a (ab)1/2 holds.
Now, consider a two-class classification problem where the decision region was chosen to
mimimise the probability of misclassification. Use the above inequality to prove that
Z p
p
p(mistake)
p(x, C1 ) p(x, C2 ) dx.
Please explain each step in your derivation. If possible, provide the intuition behind a
step.

3
3.1

Dimensionality Reduction
(5/20) Projection with Fishers Discriminant

Fishers Linear Discriminant finds a projection of the input data in which two goals are
combined: maximising the distance between the centres of points belonging to each class
and minimising the variance of data in each class.
We will investigate Fishers Linear Discriminant using the Iris Flower data data set in
order to project the data from the original 4 dimensions into a lower dimension D0 < 4.

1. Given the set of input data X and class labels t of K different classes, calculate
the within-class and between-class scatter matrices SW and SB (see lecture slides).
(Note that scatter matrices are not estimates of covariance matrices because they
differ by a scaling factor related to the number of data points.) Find the matrix W
with columns equal to the D0 normalised eigenvectors of S1
W SB which are associated
with the D0 largest eigenvalues.
2. For D0 = 2:
(a) Report the two eigenvalues and eigenvectors found.
(b) Provide a plot of the projected data using different colours for each class.
(c) Discuss the ratio of the two eigenvalues found with respect to the task of classifying the data in the projected 2-dimensional space.
0

3. For a set of N projected data Y RN D , implement code to calculate the criterion

J given by

J = tr s1
W sB

using the definitions

sW =

K X
X

(yn n )(yn n )

sB =

k=1 nCk

K
X

Nk (k )(k )T

k=1
N
1 X
=
yn
N

1 X
k =
yn
Nk

n=1

nCk

where Ck is the set of indices of data belonging to class k, and Nk is the number of
data points belonging to class k.
4. Project the Iris data into D0 = 2 dimensions using the W found in 2. and report
the criterion J for this projection. Using the original data of the Iris Flower data
set, report the criterion for all 2-dimensional orthogonal projections onto a plane
spanned by a pair of axes out of the 4 axes of the data set. Compare all criteria J
found and discuss the results.

Cross Validation and Classification

In the next problems, we will use the Iris flower data set available via the course web site
at
http://sml.nicta.com.au/isml13/assignments/bezdekIris.data.txt

The Iris flower data set or Fishers Iris data set is a multivariate data set introduced
by Sir Ronald Aylmer Fisher (1936) as an example of discriminant analysis. (The file
bezdekIris.data.txt corrects two erroneous entries in the original data set.)
The file consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris
virginica and Iris versicolor).
Sepal Length
5.1
...
7.0
...
6.3
...

Sepal Width
3.5
...
3.2
...
3.3
...

Petal Length
1.4
...
4.7
...
6.0
...

Petal Width
0.2
...
1.4
...
2.5
...

Species
Iris-setosa
...
Iris-versicolor
...
Iris-virginica
...

The first four comma separated entries in this data set are the input data x R4 as
floating point numbers, the fifth entry is a class name from the ordered set {Iris-setosa,
Iris-versicolor, Iris-virginica} which you should map to {0, 1, 2}.

4.1

(5/20) k- Nearest Neighbours Algorithm k-NN

The k-Nearest Neighbours Algorithm (k-NN) classifies a new point in the input space by
the most frequent class amongst its k nearest neighbours provided as training data. If
more than one class is the most frequent, the class is decided randomly. The neighbourhood
of a point in input space is given by a metric, usually the Euclidian metric.
1. Implement the k-NN algorithm using a Euclidian distance for the Iris Flower data
set.
2. Apply 2-fold, 5-fold and 10-fold Cross Validation to select the best k for the k-NN
algorithm.
In case of several k having the same lowest error rate, we pick the largest one.
Explain why this is a good strategy.
3. Report the results for k = 2, 4, . . . , 38, 40.
4. The optimal errors decrease with the fold number. Explain why.
5. The optimal k increases with the fold number. Explain why.
6. Provide the listing of your program together with the solution.

VILLAPANDO, LEIZL A. Remedial Law - BAR QUESTIONS AND ANSWERS.1&2
67% (6)
VILLAPANDO, LEIZL A. Remedial Law - BAR QUESTIONS AND ANSWERS.1&2
2 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Machine Learning Week 2 Coursera
100% (1)
Machine Learning Week 2 Coursera
4 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Python Assignment
No ratings yet
Python Assignment
7 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Understanding The Threading Dial
No ratings yet
Understanding The Threading Dial
5 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
Data Mining Quiz
No ratings yet
Data Mining Quiz
4 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
01 Enisa Network Forensics Toolset
No ratings yet
01 Enisa Network Forensics Toolset
32 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Bankruptcy Prevention Project
No ratings yet
Bankruptcy Prevention Project
16 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Artificial Neural Networks Kluniversity Course Handout
No ratings yet
Artificial Neural Networks Kluniversity Course Handout
18 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Strategic Approach To Software Testing
No ratings yet
Strategic Approach To Software Testing
6 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Clustering Project
100% (1)
Clustering Project
44 pages
Buisiness Reoprt Extended As Project Report
No ratings yet
Buisiness Reoprt Extended As Project Report
18 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Quiz Week 7 - Support Vector Machines
100% (1)
Quiz Week 7 - Support Vector Machines
3 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Problem 2 - Survey: Importing Nessceary Libraries
No ratings yet
Problem 2 - Survey: Importing Nessceary Libraries
10 pages
Design A Library System - Flowcharts and Pseudocode
No ratings yet
Design A Library System - Flowcharts and Pseudocode
1 page
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
ML Assignemnt PDF
No ratings yet
ML Assignemnt PDF
21 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
No ratings yet
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
18 pages
Problem 1
No ratings yet
Problem 1
12 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
House Price Prediction Using Data Science
No ratings yet
House Price Prediction Using Data Science
8 pages
Module 1 Quiz
No ratings yet
Module 1 Quiz
7 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Artificial Neural Networks Quiz Questions 1
No ratings yet
Artificial Neural Networks Quiz Questions 1
17 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Text
No ratings yet
Text
131 pages
Project Report: CS 574 - Computer Vision Using Machine Learning
No ratings yet
Project Report: CS 574 - Computer Vision Using Machine Learning
38 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
VARUNSAINI - 13 Nov 2022
No ratings yet
VARUNSAINI - 13 Nov 2022
14 pages
Chapter 5 The Network Layer Control Plane
No ratings yet
Chapter 5 The Network Layer Control Plane
88 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Attappadi Tribes: Case Study On The Socio-Economic Problems
No ratings yet
Attappadi Tribes: Case Study On The Socio-Economic Problems
20 pages
Phonological Theories
No ratings yet
Phonological Theories
13 pages
Certificate of Calibration: Daniel Norton
100% (1)
Certificate of Calibration: Daniel Norton
1 page
Comunity Pharmacy Introduction Mod
100% (1)
Comunity Pharmacy Introduction Mod
42 pages
Troubleshooting
No ratings yet
Troubleshooting
9 pages
Handout5 Sacramentals
No ratings yet
Handout5 Sacramentals
3 pages
BA201 Engineering Mathematic UNIT8 - Cramer's Rule and Inverse Matrix Method
No ratings yet
BA201 Engineering Mathematic UNIT8 - Cramer's Rule and Inverse Matrix Method
14 pages
CIR V Mindanao Sanitarium
No ratings yet
CIR V Mindanao Sanitarium
2 pages
People Vs Dela Cruz Digest
No ratings yet
People Vs Dela Cruz Digest
2 pages
Chaos Theory in Politics (Understanding Complex Systems)
No ratings yet
Chaos Theory in Politics (Understanding Complex Systems)
201 pages
Intasc Standard 3: Learning Environments
No ratings yet
Intasc Standard 3: Learning Environments
2 pages
Flag of Israel
No ratings yet
Flag of Israel
3 pages
Gravitation ws1
No ratings yet
Gravitation ws1
3 pages
Amanda Class 10 Poem 6 Question Answers
No ratings yet
Amanda Class 10 Poem 6 Question Answers
1 page
Quiz For Structure of Phrases and Sentences
No ratings yet
Quiz For Structure of Phrases and Sentences
4 pages
Utopia 1
No ratings yet
Utopia 1
147 pages
Pressure Drop Across A Fixed Bed Reactor With Mechanical Failure of Catalyst Pellets Described by Simplified Ergun'S Equation
No ratings yet
Pressure Drop Across A Fixed Bed Reactor With Mechanical Failure of Catalyst Pellets Described by Simplified Ergun'S Equation
3 pages
Politics of Choice - Deconstructing The Idea of Marriage
No ratings yet
Politics of Choice - Deconstructing The Idea of Marriage
3 pages
Jared Pangit
No ratings yet
Jared Pangit
1 page
10.0 Chemical Equilibrium 2 PDF
No ratings yet
10.0 Chemical Equilibrium 2 PDF
83 pages
Kalinga School of Architecture
No ratings yet
Kalinga School of Architecture
3 pages
Games and Sports Injuries: Essay 2
No ratings yet
Games and Sports Injuries: Essay 2
2 pages
AspenInPlantCostEstimatorV7 3 1-Usr
No ratings yet
AspenInPlantCostEstimatorV7 3 1-Usr
316 pages
A Narrative Report On Work Immersion at The Municipal Budgeting Office On December 5 - December 19, 2019
No ratings yet
A Narrative Report On Work Immersion at The Municipal Budgeting Office On December 5 - December 19, 2019
31 pages
Jorge Carrión (Future-Simple)
No ratings yet
Jorge Carrión (Future-Simple)
1 page
Topic 5
No ratings yet
Topic 5
11 pages
BBS111S - Basic Business Statistics 1a - 1ST Opp - June 2018
No ratings yet
BBS111S - Basic Business Statistics 1a - 1ST Opp - June 2018
7 pages
Chapter Four-Relevance of Lab
No ratings yet
Chapter Four-Relevance of Lab
16 pages