Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Question 1 (Linear Regression)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

These ten tools are:

1. Visualization
2. Journey Mapping
3. Value Chain Analysis
4. Mind Mapping
5. Brainstorming
6. Concept Development
7. Assumption Testing
8. Rapid Prototyping
9. Customer Co-creation
10. Learning Launch

Question 1 (Linear Regression): In this question, you will implement


linear basis function regression with polynomial and Gaussian basis.
Start by downloading the code and dataset from the website:
http://vda.univie.ac.at/Teaching/ML/15s/assignments/asgn02-data.zip.
The dataset is Housing dataset from the UCI repository. The task is to predict median
house value from features describing a town.
Functions are provided for loading the data1 , and normalizing the features and target
to have 0 mean and unit variance:
[ t ,X] = loadData ( ) ;
X n = normalizeData (X,X) ;
t = normalizeData ( t , t ) ;

the provided functions that you can use are:


1
Note that loadData reorders the datapoints using a fixed permutation. Use this fixed permutation for
the questions in this assignment. If you are interested in what happens in ”reality”, try using a random
permutation afterwards. Results will not always be as clean as you will get with the fixed permutation.

1
• [t,X] = loadData(): loads data from ’housing.data’ data file. t is the target
output and X is the input features.
• X n = normalizeData( X, ref ): Normalizes the data in X using the mean
and variance of the data in ref. If ref=X, then X n is a linear transformation of
X with zero mean and unit variance.

For the following, use these normalized features X n and targets for learning the model.
Have a look at the source code for all provided files. You may be able to use the
structures as hint.

Polynomial Basis Function


Implement linear basis function regression with polynomial basis functions. Use
only monomials of a single variable (x1 , x21 , x31 , · · · , x2 , x22 , · · ·), and no cross-terms
(x1 x2 ).
(a): [+CODE] Create a MATLAB script polynomial regression.m for the fol-
lowing:
Using the first 100 points as training data, and the remainder as testing data,
fit a polynomial basis function regression for degree 1 to degree 7 polynomials.
Do not use any regularization. Plot training error and test error (in RMS error)
versus polynomial degree.
Put this plot, along with a brief comment on what you see, in your
report.
(b): Run your polynomial regression using a degree 1 polynomial. Examine the
learned weights. What value is chosen for w5 , the weight on the 5th feature (aver-
age number of rooms per dwelling)? What value is chosen for the weight on the
7th feature (weighted distance to five Boston employment centers)? (Don’t forget
the bias weight and the normalizations.) Do these 2 weights seem reasonable?
Put the values of all weights and your comments on weights for the 5th
and 7th features in your report. You do not need to submit code for
this part.
(c): [+CODE] Create a MATLAB script polynomial regression 1d vis.m for
the following:
It is difficult to visualize the results of high-dimensional regression. Instead, only
use one of the features (use X n(:,2)) and again perform polynomial regression.
Produce plots of the training data points, learned polynomial, and test data
points. The code visualize 1d.m may be useful as a template. Do not forget
the normalization.
Put 3 of these plots, for interesting (low-order, high-order polynomials)
results, in your report. Include brief comments.

Page 2
(d): [+CODE] Create a MATLAB script polynomial regression reg.m for the
following:
Implement L2 -regularized regression using the first 100 points, and only the 2nd
feature. Fit a degree 8 polynomial using each value in {0, 0.01, 0.1, 1, 10, 100, 1000}
for λ. Use 10-fold cross-validation to decide on the best value of λ. Produce a plot
of average validation set error versus the regularizing constant λ. Use a semilogx
plot, putting the regularizing constant λ on a log scale2 .
Put this plot in your report, and note which regularizing constant λ
you would choose for the cross-validation.

Gaussian Basis Function


Implement linear basis function regression with Gaussian basis functions. You
may use the supplied function dist2.m. For the centers µj use randomly chosen
training data points. For example, if A is a N × M matrix, you can use the
following code to generate a random permutation of its rows:
p = randperm (N ) ;
A p = A( p , : ) ;
Set s = 2 and perform the following experiments:
(e): [+CODE] Create a MATLAB script gaussian regression.m for the follow-
ing:
Use the first 100 points as training data, and the remainder as testing data. Fit a
Gaussian basis function regression using 5, 15, 25, · · · , 95 basis functions (Generate
a random permutation of points and pick the first K points as the center of basis
functions). Do not use any regularization. Plot the training error and test error
(in RMS error) versus number of basis functions.
Put this plot, along with a brief comment on what you see, in your
report
(f ): [+CODE] Create a MATLAB script gaussian regression reg.m for the
following:
Implement L2 -regularized regression. Again, use the first 100 points (do not
only use the second feature, use them all). Fit a regression model with 90 basis
functions using each value in {0, 0.01, 0.1, 1, 10, 100, 1000} for λ. Use 10-fold cross-
validation to decide on the best value of λ. Produce a plot of average validation
set error versus the regularizing constant λ. Use a semilogx plot, putting the
regularizing constant λ on the log scale (see note previously about λ = 0).
Put this plot on your report, and note which regularizing constant λ
you would choose from the cross-validation.
2
The unregularized result (λ = 0) will not appear on this scale. You can either add it as a separate
horizontal line as a baseline, or report this number separately.

Page 3
Question 2 (Fisher Distance): Imaging X is a random variable generated
from the two classes C1 and C2 , and the data from two classes are Gaussian. Suppose
a is the random variable X|C1 and b is X|C2 :
 
a ∼ X|C1 ∼ N µ1 , σ12
 
b ∼ X|C2 ∼ N µ2 , σ22
(c)
:
We define random variable c = a − b. Each sample of c is generated by a random
(b W
): ha
ti
sample from C2 , subtracted from a random sample from C1 .
(a
):
W
ha st
he
W t is pro
ha t he b
ti
st pro abili
he ba ty
dis bil tha
tri i ty tc
bu <
tio that 0
no c= ?
fc 0±
?
?

2 Z x −t2
erf (x) = √ e dt. (1)
π 0
(d): Write down the solution of the last two sections in terms of the Fisher
criteria. Explain why this relation between the Fisher criteria and distribution of
the random variable c = a − b makes sense.

Page 4
These ten tools are:
1. Visualization
2. Journey Mapping
3. Value Chain Analysis
4. Mind Mapping
5. Brainstorming
6. Concept Development
7. Assumption Testing
8. Rapid Prototyping
9. Customer Co-creation
10. Learning Launch

Question 1 (Linear Regression): In this question, you will implement


linear basis function regression with polynomial and Gaussian basis.
Start by downloading the code and dataset from the website:
http://vda.univie.ac.at/Teaching/ML/15s/assignments/asgn02-data.zip.
The dataset is Housing dataset from the UCI repository. The task is to predict median
house value from features describing a town.
Functions are provided for loading the data1 , and normalizing the features and target
to have 0 mean and unit variance:
[ t ,X] = loadData ( ) ;
X n = normalizeData (X,X) ;
t = normalizeData ( t , t ) ;

the provided functions that you can use are:


1
Note that loadData reorders the datapoints using a fixed permutation. Use this fixed permutation for
the questions in this assignment. If you are interested in what happens in ”reality”, try using a random
permutation afterwards. Results will not always be as clean as you will get with the fixed permutation.

1
• [t,X] = loadData(): loads data from ’housing.data’ data file. t is the target
output and X is the input features.
• X n = normalizeData( X, ref ): Normalizes the data in X using the mean
and variance of the data in ref. If ref=X, then X n is a linear transformation of
X with zero mean and unit variance.

For the following, use these normalized features X n and targets for learning the model.
Have a look at the source code for all provided files. You may be able to use the
structures as hint.

Polynomial Basis Function


Implement linear basis function regression with polynomial basis functions. Use
only monomials of a single variable (x1 , x21 , x31 , · · · , x2 , x22 , · · ·), and no cross-terms
(x1 x2 ).
(a): [+CODE] Create a MATLAB script polynomial regression.m for the fol-
lowing:
Using the first 100 points as training data, and the remainder as testing data,
fit a polynomial basis function regression for degree 1 to degree 7 polynomials.
Do not use any regularization. Plot training error and test error (in RMS error)
versus polynomial degree.
Put this plot, along with a brief comment on what you see, in your
report.
(b): Run your polynomial regression using a degree 1 polynomial. Examine the
learned weights. What value is chosen for w5 , the weight on the 5th feature (aver-
age number of rooms per dwelling)? What value is chosen for the weight on the
7th feature (weighted distance to five Boston employment centers)? (Don’t forget
the bias weight and the normalizations.) Do these 2 weights seem reasonable?
Put the values of all weights and your comments on weights for the 5th
and 7th features in your report. You do not need to submit code for
this part.
(c): [+CODE] Create a MATLAB script polynomial regression 1d vis.m for
the following:
It is difficult to visualize the results of high-dimensional regression. Instead, only
use one of the features (use X n(:,2)) and again perform polynomial regression.
Produce plots of the training data points, learned polynomial, and test data
points. The code visualize 1d.m may be useful as a template. Do not forget
the normalization.
Put 3 of these plots, for interesting (low-order, high-order polynomials)
results, in your report. Include brief comments.

Page 2
Some of the tools can be useful in more than one stage. When the tool is used outside
of its commonly considered stage, the tool remains fairly consistent in terms of how it is
executed, but the objectives for the tool align with the intention of the stage in which it
is being utilized. You will
notice none of the tool sequences shown introduce repetition at the ends of the process
model. Repetition within the What is stage would require a reset or delay of the
subsequent stage outcomes in order to be consistent with a revised outlook of the
current state. This implies the original
problem is being abandoned or has undergone major revision. Starting an independent
design- thinking project to explore the new problem may be a better choice than to start
over from within an existing project. The learning launch is a tool that is rarely
repeated within a design-thinking project. Subsequent revision and additional market
delivery would occur in either future design thinking projects or by business units who
have taken on the task of scaling up marketing, manufacturing and delivery of the
innovative solution.

Put this plot on your report, and note which regularizing constant λ
you would choose from the cross-validation.
2
The unregularized result (λ = 0) will not appear on this scale. You can either add it as a separate
horizontal line as a baseline, or report this number separately.

Page 3
 
a ∼ X|C1 ∼ N µ1 , σ12
 
b ∼ X|C2 ∼ N µ2 , σ22
(c)
:
We define random variable c = a − b. Each sample of c is generated by a random
(b W
): ha
ti
sample from C2 , subtracted from a random sample from C1 .
(a
):
W
ha st
he
W t is pro
ha t he b
ti
st pro abili
he ba ty
dis bil tha
tri i ty tc
bu <
tio that 0
no c= ?
fc 0±
?
?

2 Z x −t2
erf (x) = √ e dt. (1)
π 0
(d): Write down the solution of the last two sections in terms of the Fisher
criteria. Explain why this relation between the Fisher criteria and distribution of
the random variable c = a − b makes sense.

Page 4
Visualization, Mapping, Value Chain Analysis, Mind Mapping,
Brainstorming, Concept Development, Assumption Testing,
Rapid Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
[(Empathize: Typeform, Zoom, Creatlr), (Define: Smaply,
Userforge, MakeMyPersona), (Ideate: SessionLab, Stormboard,
IdeaFlip), (Prototype: Boords, Mockingbird, POP), (Test:
UserTesting, HotJar, PingPong), (For the complete process:
Sprintbase, InVision, Mural, Miro),

Question 1 (Linear Regression): In this question, you will implement


linear basis function regression with polynomial and Gaussian basis.
Start by downloading the code and dataset from the website:
http://vda.univie.ac.at/Teaching/ML/15s/assignments/asgn02-data.zip.
The dataset is Housing dataset from the UCI repository. The task is to predict median
house value from features describing a town.
Functions are provided for loading the data1 , and normalizing the features and target
to have 0 mean and unit variance:
[ t ,X] = loadData ( ) ;
X n = normalizeData (X,X) ;
t = normalizeData ( t , t ) ;

the provided functions that you can use are:


1
Note that loadData reorders the datapoints using a fixed permutation. Use this fixed permutation for
the questions in this assignment. If you are interested in what happens in ”reality”, try using a random
permutation afterwards. Results will not always be as clean as you will get with the fixed permutation.

1
• [t,X] = loadData(): loads data from ’housing.data’ data file. t is the target
output and X is the input features.
• X n = normalizeData( X, ref ): Normalizes the data in X using the mean
and variance of the data in ref. If ref=X, then X n is a linear transformation of
X with zero mean and unit variance.

For the following, use these normalized features X n and targets for learning the model.
Have a look at the source code for all provided files. You may be able to use the
structures as hint.

Polynomial Basis Function


Implement linear basis function regression with polynomial basis functions. Use
only monomials of a single variable (x1 , x21 , x31 , · · · , x2 , x22 , · · ·), and no cross-terms
(x1 x2 ).
(a): [+CODE] Create a MATLAB script polynomial regression.m for the fol-
lowing:
Using the first 100 points as training data, and the remainder as testing data,
fit a polynomial basis function regression for degree 1 to degree 7 polynomials.
Do not use any regularization. Plot training error and test error (in RMS error)
versus polynomial degree.
Put this plot, along with a brief comment on what you see, in your
report.
(b): Run your polynomial regression using a degree 1 polynomial. Examine the
learned weights. What value is chosen for w5 , the weight on the 5th feature (aver-
age number of rooms per dwelling)? What value is chosen for the weight on the
7th feature (weighted distance to five Boston employment centers)? (Don’t forget
the bias weight and the normalizations.) Do these 2 weights seem reasonable?
Put the values of all weights and your comments on weights for the 5th
and 7th features in your report. You do not need to submit code for
this part.
(c): [+CODE] Create a MATLAB script polynomial regression 1d vis.m for
the following:
It is difficult to visualize the results of high-dimensional regression. Instead, only
use one of the features (use X n(:,2)) and again perform polynomial regression.
Produce plots of the training data points, learned polynomial, and test data
points. The code visualize 1d.m may be useful as a template. Do not forget
the normalization.
Put 3 of these plots, for interesting (low-order, high-order polynomials)
results, in your report. Include brief comments.

Page 2
Some of the tools can be useful in more than one stage. When the tool is used outside
of its commonly considered stage, the tool remains fairly consistent in terms of how it is
executed, but the objectives for the tool align with the intention of the stage in which it
is being utilized. You will
notice none of the tool sequences shown introduce repetition at the ends of the process
model. Repetition within the What is stage would require a reset or delay of the
subsequent stage outcomes in order to be consistent with a revised outlook of the
current state. This implies the original
problem is being abandoned or has undergone major revision. Starting an independent
design- thinking project to explore the new problem may be a better choice than to start
over from within an existing project. The learning launch is a tool that is rarely
repeated within a design-thinking project. Subsequent revision and additional market
delivery would occur in either future design thinking projects or by business units who
have taken on the task of scaling up marketing, manufacturing and delivery of the
innovative solution.

Visualization, Mapping, Value Chain Analysis, Mind Mapping,


Brainstorming, Concept Development, Assumption Testing, Rapid
Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
Put this ploton your report, and note which regularizing constant λ
[(Empathize:
youTypeform, Zoom,
would choose fromCreatlr), (Define: Smaply, Userforge,
the cross-validation.
2

MakeMyPersona), (Ideate:
or reportSessionLab, Stormboard, IdeaFlip),
The unregularized result (λ = 0) will not appear on this scale. You can either add it as a separate
horizontal line as a baseline, this number separately.

(Prototype: Boords, Mockingbird, POP), (Test: UserTesting, HotJar,


PingPong), (For the complete process:
Page Sprintbase,
3 InVision, Mural,
Miro),
 
a ∼ X|C1 ∼ N µ1 , σ12
 
b ∼ X|C2 ∼ N µ2 , σ22
(c)
:
We define random variable c = a − b. Each sample of c is generated by a random
(b W
): ha
ti
sample from C2 , subtracted from a random sample from C1 .
(a
):
W
ha st
he
W t is pro
ha t he b
ti
st pro abili
he ba ty
dis bil tha
tri i ty tc
bu <
tio that 0
no c= ?
fc 0±
?
?

2 Z x −t2
erf (x) = √ e dt. (1)
π 0
(d): Write down the solution of the last two sections in terms of the Fisher
criteria. Explain why this relation between the Fisher criteria and distribution of
the random variable c = a − b makes sense.

Page 4
Visualization, Mapping, Value Chain Analysis, Mind Mapping,
Brainstorming, Concept Development, Assumption Testing,
Rapid Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
[(Empathize: Typeform, Zoom, Creatlr), (Define: Smaply,
Userforge, MakeMyPersona), (Ideate: SessionLab, Stormboard,
IdeaFlip), (Prototype: Boords, Mockingbird, POP), (Test:
UserTesting, HotJar, PingPong), (For the complete process:
Sprintbase, InVision, Mural, Miro),
• [t,X] = loadData(): loads data from ’housing.data’ data file. t is the target
output and X is the input features.
• X n = normalizeData( X, ref ): Normalizes the data in X using the mean
and variance of the data in ref. If ref=X, then X n is a linear transformation of
X with zero mean and unit variance.

For the following, use these normalized features X n and targets for learning the model.
Have a look at the source code for all provided files. You may be able to use the
structures as hint.

Page 2
Some of the tools can be useful in more than one stage. When the tool is used outside
of its commonly considered stage, the tool remains fairly consistent in terms of how it is
executed, but the objectives for the tool align with the intention of the stage in which it
is being utilized. You will
notice none of the tool sequences shown introduce repetition at the ends of the process
model. Repetition within the What is stage would require a reset or delay of the
subsequent stage outcomes in order to be consistent with a revised outlook of the
current state. This implies the original
problem is being abandoned or has undergone major revision. Starting an independent
design- thinking project to explore the new problem may be a better choice than to start
over from within an existing project. The learning launch is a tool that is rarely
repeated within a design-thinking project. Subsequent revision and additional market
delivery would occur in either future design thinking projects or by business units who
have taken on the task of scaling up marketing, manufacturing and delivery of the
innovative solution.

Visualization, Mapping, Value Chain Analysis, Mind Mapping,


Brainstorming, Concept Development, Assumption Testing, Rapid
Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
Put this ploton your report, and note which regularizing constant λ
[(Empathize:
youTypeform, Zoom,
would choose fromCreatlr), (Define: Smaply, Userforge,
the cross-validation.
2

MakeMyPersona), (Ideate:
or reportSessionLab, Stormboard, IdeaFlip),
The unregularized result (λ = 0) will not appear on this scale. You can either add it as a separate
horizontal line as a baseline, this number separately.

(Prototype: Boords, Mockingbird, POP), (Test: UserTesting, HotJar,


PingPong), (For the complete process:
Page Sprintbase,
3 InVision, Mural,
Miro),
 
a ∼ X|C1 ∼ N µ1 , σ12
 
b ∼ X|C2 ∼ N µ2 , σ22
(c)
:
We define random variable c = a − b. Each sample of c is generated by a random
(b W
): ha
ti
sample from C2 , subtracted from a random sample from C1 .
(a
):
W
ha st
he
W t is pro
ha t he b
ti
st pro abili
he ba ty
dis bil tha
tri i ty tc
bu <
tio that 0
no c= ?
fc 0±
?
?

2 Z x −t2
erf (x) = √ e dt. (1)
π 0
(d): Write down the solution of the last two sections in terms of the Fisher
criteria. Explain why this relation between the Fisher criteria and distribution of
the random variable c = a − b makes sense.

Page 4
Visualization, Mapping, Value Chain Analysis, Mind Mapping,
Brainstorming, Concept Development, Assumption Testing,
Rapid Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
[(Empathize: Typeform, Zoom, Creatlr), (Define: Smaply,
Userforge, MakeMyPersona), (Ideate: SessionLab, Stormboard,
IdeaFlip), (Prototype: Boords, Mockingbird, POP), (Test:
UserTesting, HotJar, PingPong), (For the complete process:
Sprintbase, InVision, Mural, Miro),
 
a ∼ X|C1 ∼ N µ1 , σ12
 
b ∼ X|C2 ∼ N µ2 , σ22
(c)
:
We define random variable c = a − b. Each sample of c is generated by a random
(b W
): ha
ti
sample from C2 , subtracted from a random sample from C1 .
(a
):
W
ha st
he
W t is pro
ha t he b
ti
st pro abili
he ba ty
dis bil tha
tri i ty tc
bu <
tio that 0
no c= ?
fc 0±
?
?

2 Z x −t2
erf (x) = √ e dt. (1)
π 0
(d): Write down the solution of the last two sections in terms of the Fisher
criteria. Explain why this relation between the Fisher criteria and distribution of
the random variable c = a − b makes sense.

Page 4

You might also like