Statistics For Data Science 20 21 Programming Exercises 1
Statistics For Data Science 20 21 Programming Exercises 1
(1) Introduction
1. Sample a univariate Gaussian using scipy.stats.
3. Visualize the PDF of a univariate and a normalized sample histogram of samples from a univariate
Gaussian with identical parameters on top of each other using Matplotlib.
2. (Dice experiment 2) Consider the probability space model of tossing a fair dice. Identify two events
A and B that are not independent. Analytically, evaluate P(A), P(B), P(A ∩ B), P(A|B) and
P(B|A) and verify these values by means of simulation.
3. (Coin experiment) Consider the probability space model of tossing a fair coin twice, i.e. a uniform
probability measure on Ω = {HH, HT, T H, T T }, where H indicates heads and T indicates tails.
Simulate draws from this probability space and verify that the events H appears on the rst toss,
H appears on the second toss, and both tosses have the same outcome each have probability
1/2.
2. Visualize the PMF of a Bernoulli random variable and a normalized histogram of many samples of
a Bernoulli random variable with identical parameter setting on top of each other.
3. Visualize the PDF of a Gaussian random variable and a normalized histogram of many samples of
a Gaussian random variable with identical parameter settings on top of each other.
1 0.3 0.2
µ= and Σ= , (1)
2 0.2 0.5
2. Write a simulation that veries that obtaining samples from 2 independent univariate Gaussian
distributions with parameters µi , σi2 > 0, i = 1, 2 is equivalent to obtaining samples from a two-
dimensional Gaussian distribution with the appropriately specied parameters µ ∈ R2 and Σ ∈
2×2
R .
Page 1 of 3
Programming Exercises | Submission Deadline 31.03.2021 Statistics for Data Science 20/21
3. Write a simulation that exemplary veries the analytical results on conditional Gaussian distributions
for the case of a bivariate Gaussian distribution.
(5) Transformations
1. Write a program that generates pseudo-random numbers from an exponential distribution using a
uniform pseudo-random number generator and the probability integral transform theorem.
2. Let X ∼ N (0, 1) and let Y = exp(X). Evaluate the PDF of Y analytically and verify your
evaluation using a simulation based on drawing random numbers from N (0, 1).
2. Sample n = 10 data points of a bivariate Gaussian distribution and evaluate the sample covariation
and sample correlation.
3. Validate the theorem on the variances of sums and dierences of random variables using a sampling
approach in a bivariate Gaussian scenario.
2. Let X1 , ..., Xn ∼ Bern(µ). For a large number n, sample the X1 , ..., Xn and evaluate the max-
ML
imum likelihood estimator µ̂ . Repeat this m times and create a histogram of the realized
µ̂M L ML
1 , ..., µ̂m .
Page 2 of 3
Programming Exercises | Submission Deadline 31.03.2021 Statistics for Data Science 20/21
2. For X1 , ..., Xn ∼ N (µ, σ 2 ) implement a simulation which validates the unbiasedness of the sample
mean, the unbiasedness of the sample variance, the biasedness of the sample standard deviation,
and the biasedness of the maximum likelihood variance parameter estimator.
2. Write a simulation that veries the asymptotic eciency of the maximum likelihood estimator for
the parameter of a Bernoulli distribution.
3. Write a simulation that veries the asymptotic eciency of the maximum likelihood estimator for
the variance parameter of a univariate Gaussian distribution.
2. Write a simulation that veries that the 95%-condence interval for the expectation parameter
of a Gaussian distribution with unknown variance comprises the true, but unknown, expectation
parameter in ≈ 95% of its realizations.
3. Write a simulation that veries that the approximate 95%-condence interval for the expectation
parameter of a Bernoulli distribution comprises the true, but unknown, expectation parameter in
≈ 95% of its realizations.
2. By means of simulation, demonstrate that the δ -condence interval-based test for the expectation
parameter of univariate Gaussian distribution is of signicance level α0 = 1 − δ .
2. Estimate the expected value of a Beta(α, β) for varying values of α and β by means of Monte
Carlo integration using an importance sampling scheme and a uniform random number generator.
Page 3 of 3