Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

Statistics For Data Science 20 21 Programming Exercises 1

This document outlines 14 programming exercises on topics in statistics and probability, including sampling from distributions, probability spaces, random variables, transformations, expectation, maximum likelihood estimation, and other statistical concepts. Students are asked to write simulations and programs to explore these topics.

Uploaded by

MANISH BARAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Statistics For Data Science 20 21 Programming Exercises 1

This document outlines 14 programming exercises on topics in statistics and probability, including sampling from distributions, probability spaces, random variables, transformations, expectation, maximum likelihood estimation, and other statistical concepts. Students are asked to write simulations and programs to explore these topics.

Uploaded by

MANISH BARAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Programming Exercises | Submission Deadline 31.03.

2021 Statistics for Data Science 20/21

(1) Introduction
1. Sample a univariate Gaussian using scipy.stats.

2. Evaluate the PDF of a univariate Gaussian using scipy.stats.

3. Visualize the PDF of a univariate and a normalized sample histogram of samples from a univariate
Gaussian with identical parameters on top of each other using Matplotlib.

(2) Probability spaces


1. (Dice experiment 1) Consider the probability space model of tossing a fair dice. Let A = {2, 4, 6}
and B = {1, 2, 3, 4} be two events. Then, P(A) = 1/2, P(B) = 2/3 and P(A ∩ B) = 1/3. Since
P(A ∩ B) = P(A)P(B), the events A and B are independent. Simulate draws from the outcome
space and verify that P̂(A ∩ B) = P̂(A)P̂(B), where P̂(E) denotes the proportion of times an event
E occurs in the simulation.

2. (Dice experiment 2) Consider the probability space model of tossing a fair dice. Identify two events
A and B that are not independent. Analytically, evaluate P(A), P(B), P(A ∩ B), P(A|B) and
P(B|A) and verify these values by means of simulation.

3. (Coin experiment) Consider the probability space model of tossing a fair coin twice, i.e. a uniform
probability measure on Ω = {HH, HT, T H, T T }, where H indicates heads and T indicates tails.
Simulate draws from this probability space and verify that the events  H appears on the rst toss,
 H appears on the second toss, and both tosses have the same outcome each have probability
1/2.

(3) Random variables


1. Simulate the probability space model of throwing to dice and the random variable corresponding
the sum of the pips. Visualize a normalized histograms of simulated outcomes of this random
variable and compare it to the theoretical prediction.

2. Visualize the PMF of a Bernoulli random variable and a normalized histogram of many samples of
a Bernoulli random variable with identical parameter setting on top of each other.

3. Visualize the PDF of a Gaussian random variable and a normalized histogram of many samples of
a Gaussian random variable with identical parameter settings on top of each other.

(4) Joint distributions


1. Write a simulation that demonstrates that the marginal distributions of a bivariate Gaussian distri-
bution with expectation parameter and covariance parameters

   
1 0.3 0.2
µ= and Σ= , (1)
2 0.2 0.5

respectively, are given by univariate Gaussian distributions with expectation parameters µ1 =


1, µ2 = 2 and variance parameters σ 2 = 0.3 and σ 2 = 0.5, respectively.

2. Write a simulation that veries that obtaining samples from 2 independent univariate Gaussian
distributions with parameters µi , σi2 > 0, i = 1, 2 is equivalent to obtaining samples from a two-
dimensional Gaussian distribution with the appropriately specied parameters µ ∈ R2 and Σ ∈
2×2
R .

Page 1 of 3
Programming Exercises | Submission Deadline 31.03.2021 Statistics for Data Science 20/21

3. Write a simulation that exemplary veries the analytical results on conditional Gaussian distributions
for the case of a bivariate Gaussian distribution.

(5) Transformations
1. Write a program that generates pseudo-random numbers from an exponential distribution using a
uniform pseudo-random number generator and the probability integral transform theorem.

2. Let X ∼ N (0, 1) and let Y = exp(X). Evaluate the PDF of Y analytically and verify your
evaluation using a simulation based on drawing random numbers from N (0, 1).

3. Let X ∼ N (0, 1) and let Y = X 2. By simulation, validate that Y is distributed according to


a chi-squared distribution with one degree of freedom. Next, let X1 , ..., X10 ∼ N (0, 1) and let
P10 2
Y = i=1 Xi . By simulation, validate that Y is distributed according to a chi-squared distribution
with ten degrees of freedom.

(6) Expectation and covariance


1. Sample n = 10 data points of a univariate Gaussian distribution and evaluate the sample mean,
sample variance, and sample standard deviation.

2. Sample n = 10 data points of a bivariate Gaussian distribution and evaluate the sample covariation
and sample correlation.

3. Validate the theorem on the variances of sums and dierences of random variables using a sampling
approach in a bivariate Gaussian scenario.

(7) Inequalities and limits


1. Write simulations that validate the Markov and Chebychev inequalities.

2. Write a simulation that validates the Weak Law of Large Numbers.

3. Write a simulation that validates the Lindenberg-Lévy Central Limit Theorem.

4. Write a simulation that validates the Liapunov Central Limit Theorem.

(8) Maximum likelihood estimation


1. Let X1 , ..., Xn ∼ Bern(µ) be n = 20 i.i.d. Bernoulli random variables. Using an optimization
routine of your choice, formulate and implement the numerical maximum likelihood estimation of
µ for true, but unknown values of µ = 0.7 and µ=1 based on X1 , ..., Xn .

2. Let X1 , ..., Xn ∼ Bern(µ). For a large number n, sample the X1 , ..., Xn and evaluate the max-
ML
imum likelihood estimator µ̂ . Repeat this m times and create a histogram of the realized
µ̂M L ML
1 , ..., µ̂m .

(9) Finite estimator properties


1. For X1 , ..., Xn ∼ Bern(µ) implement a simulation which validates the unbiasedness of the sample
mean, the unbiasedness of the sample variance, the biasedness of the sample standard deviation,
and the biasedness of the maximum likelihood variance parameter estimator.

Page 2 of 3
Programming Exercises | Submission Deadline 31.03.2021 Statistics for Data Science 20/21

2. For X1 , ..., Xn ∼ N (µ, σ 2 ) implement a simulation which validates the unbiasedness of the sample
mean, the unbiasedness of the sample variance, the biasedness of the sample standard deviation,
and the biasedness of the maximum likelihood variance parameter estimator.

(10) Asymptotic estimator properties


1. Write a simulation that veries the asymptotic unbiasedness of the maximum likelihood estimator
for the variance parameter of a univariate Gaussian distribution. Include a verication of the
unbiasedness of the sample variance.

2. Write a simulation that veries the asymptotic eciency of the maximum likelihood estimator for
the parameter of a Bernoulli distribution.

3. Write a simulation that veries the asymptotic eciency of the maximum likelihood estimator for
the variance parameter of a univariate Gaussian distribution.

(11) Condence intervals


1. Write a simulation that veries that the T statistic is distributed according to a t-distribution with
n−1 degrees of freedom.

2. Write a simulation that veries that the 95%-condence interval for the expectation parameter
of a Gaussian distribution with unknown variance comprises the true, but unknown, expectation
parameter in ≈ 95% of its realizations.

3. Write a simulation that veries that the approximate 95%-condence interval for the expectation
parameter of a Bernoulli distribution comprises the true, but unknown, expectation parameter in
≈ 95% of its realizations.

(12) Hypothesis testing


1. By means of simulation, show that a two-sided T test with simple null hypothesis Θ0 := {µ0 } of
signicance level α0 is exact.

2. By means of simulation, demonstrate that the δ -condence interval-based test for the expectation
parameter of univariate Gaussian distribution is of signicance level α0 = 1 − δ .

(13) Conjugate inference


1. For n = 10, implement batch and recursive Bayesian estimation for the Beta-Binomial model.
Compare the results based on identical samples.

(14) Numerical methods


1. Estimate the expected value of a Beta(α, β) for varying values of α and β by means of Monte
Carlo integration by using a Beta distribution random number generator. Compare the results to
the true expected values.

2. Estimate the expected value of a Beta(α, β) for varying values of α and β by means of Monte
Carlo integration using an importance sampling scheme and a uniform random number generator.

3. Use an acceptance-rejection algorithm to sample random numbers from Beta(2, 6).

Page 3 of 3

You might also like