Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
41 views

Statistics BI: Models of Random Outcomes. What Is A Model?

This document provides an overview of key concepts in statistics and probability including: - Models of random outcomes and how to investigate their properties through transformations and simulations. - Probability distributions and how to estimate parameters from data. - The probabilistic foundations of models, methods, and their analysis. - Key probability themes like discrete and continuous models, dependence and independence, and simulations. - Statistical themes including empirical methods, parameterised models and estimators, and maximum likelihood estimation. - Common distributions in R and how to simulate from them. - Common summaries and estimators like the mean, variance, and their properties. - The concept of a statistical model and how estimators are used to infer parameters.

Uploaded by

Pedro Gouv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Statistics BI: Models of Random Outcomes. What Is A Model?

This document provides an overview of key concepts in statistics and probability including: - Models of random outcomes and how to investigate their properties through transformations and simulations. - Probability distributions and how to estimate parameters from data. - The probabilistic foundations of models, methods, and their analysis. - Key probability themes like discrete and continuous models, dependence and independence, and simulations. - Statistical themes including empirical methods, parameterised models and estimators, and maximum likelihood estimation. - Common distributions in R and how to simulate from them. - Common summaries and estimators like the mean, variance, and their properties. - The concept of a statistical model and how estimators are used to infer parameters.

Uploaded by

Pedro Gouv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Statistics BI

Models of random outcomes. What is a model?

Probabilities

How do we investigate model properties?

Transformations (simulations)

How do we transfer data into a model?

Estimation

How do we investigate estimation procedures and model validity?

Distributions (=probabilities)

– p. 1/22
Course Theme

Probabilistic foundation of models.

Probabilistic founded methods.

Probabilistic analysis of methods.

– p. 2/22
Probability Themes

Models (probability measures and sample spaces).


Discrete sample space and point probabilities.
Continuous sample space (R, Rd ), distribution functions and
densities.
Dependence and independence.
Transformations.
Transform observations and probability measures.
Example: Estimators.
Simulations.
Tool to investigate/compute properties of a probability measure.
Tool for studying transformed probability measures.
Examples: Distribution of estimators – bootstrapping.

– p. 3/22
Discrete models

Discrete sample space E and point probabilities p(x) ∈ [0, 1], x ∈ E,


X
p(x) = 1,
x

X
P (A) = p(x).
x∈A

Independent and identically distributed replications (iid): Sample


space E n discrete, observations x = (x1 , . . . , xn ), point probabilities:
n
Y
p(x) = p(x1 )p(x2 ) · · · p(xn ) = p(xi ).
i=1

– p. 4/22
Discrete models

The relative frequency approximates the probability (frequency


interpretation):
n
1X
n (x) = 1(xi = x) ' p(x).
n i=1

Empirical estimator:
p̂(x) = n (x).

Properties: np̂(x) ∼ Bin(n, p(x))

Ep̂(x) = p(x)

Vp̂(x) = f racp(x)(1 − p(x))n

– p. 5/22
Continuous models

Continuous sample space Rd and densities f : Rd → [0, ∞),


Z
f (x)dx = 1,

Z
P (A) = f (x)dx.
A

Independent and identically distributed replications (iid): Sample


space (Rd )n continuous, observations x = (x1 , . . . , xn ), density:
n
Y
f (x) = f (x1 )f (x2 ) · · · f (xn ) = f (xi ).
i=1

– p. 6/22
Continuous models

The relative frequency approximates the probability (frequency


interpretation):
n Z
1X
n (A) = 1(xi ∈ A) ' P (A) = f (x)dx.
n i=1 A

Empirical estimator:
P̂ (A) = n (A).

Properties: nP̂ (A) ∼ Bin(n, P (A))

Ep̂(x) = P (A)

P (A)(1 − P (A))
Vp̂(x) =
n

– p. 7/22
Dependence and independence

The conditional probability of A given B is

P (A ∩ B)
P (A|B) =
P (B)

Two events, A and B are independent if

P (A ∩ B) = P (A)P (B).

Two random variables X1 and X2 are independent if

P(X1 ∈ A, X2 ∈ B) = P(X1 ∈ A)P(X2 ∈ B)

The distribution of the random variables X1 , . . . , Xn taking values in


E is a probability measure on E n . They are iid if they are independent
and identically distributed (identical marginal distributions).
– p. 8/22
Simulations

The computer can simulate almost any random variable.


First it emulates (one or more) iid random variables uniformly
distributed on [0, 1]
Then these variables are transformed.
Typical transformation is by the generalised inverse, F ← : [0, 1] → R,
of a distribution function F : R → [0, 1].
Possible to study, empirically, the distribution of almost any
transformation.

– p. 9/22
R and standard distributions

Some of the standard distributions in R are: unif, norm, exp,


binom, pois, beta, logis.
p for distribution function, e.g. punif(0.5), punif(2,0,10).
d for density function or point probabilities, e.g. dunif(0.5).
q for quantile function, e.g. qunif(0.4).
r for simulation, e.g. runif(100), runif(100,0,10).

– p. 10/22
Statistical themes
Empirical methods
Summeries of iid observations.
Simple/intuitive estimators to infer unknown quantities.
Parameterised models and estimators
Formulation of what is assumed (model assumptions) and what is
not (parameters).
Formulation of parametric dependencies and inference of the
parameters from data (no-iid assumption).
Maximum likelihood estimation
Choose the parameter that maximises how likely the
observations are.
A de facto standard with provable nice asymptotic properties
(large number of observations and small number of parameters).
Model control. Investigate the model assumptions.
– p. 11/22
Empirical methods/estimators

Tables (table) of relative frequencies (discrete models).


Empirical distribution function (ecdf),

n ((−∞, x]) ' F (x) = P ((−∞, x]).

Histograms (hist),

n ([x, y])
' f (z), z ∈ [x, y].
y−x

Quantiles (quantile): The q-quantile, xq , for q ∈ [0, 1] is the fraction


of observations ≤ xq . Inverse distribution function.
QQplot (qqplot, qqnorm) compares two distributions visually via
quantiles.
Boxplots (boxplot) summarise a few quantiles graphically. Useful
for comparing two or more distributions.
– p. 12/22
Summaries – mean
The expectation of a real valued random variable X (discrete or
continuous):
X Z
EX = xp(x), EX = xf (x)dx.
x

The average of iid random variables X1 , . . . , Xn approximates the


mean (frequency interpretation):
n
1X
µ̂ = xi ' EX.
n i=1

Properties:

Eµ̂ = EX (unbiased estimator)


1
Vµ̂ = VX
n
– p. 13/22
Summaries – variance

The variance of a real valued random variable X:

VX = E(X − EX)2 = EX 2 − (EX)2 .

The average based on iid random variables X1 , . . . , Xn approximates


the variance (frequency interpretation):
n
˜ 1X
2
σ = (xi − µ̂)2 ' VX.
n i=1

Properties :

n−1
Eσ˜2 = VX (biased estimator)
n
n
1 X
σˆ2 = (xi − µ̂)2 (bias correcte)
n−1 i=1
– p. 14/22
Summaries – covariance etc.

The covariance of X and Y is

V(X, Y ) = E(X − EX)(Y − EY ) = EXY − EXEY.

Important formulas

E(cX) = cEX
E(X + Y ) = EX + EY
V(cX) = c2 EX
V(X + Y ) = VX + VY + 2V(X, Y )
EXY = EXEY + V(X, Y )

If X and Y are independent then V(X, Y ) = 0


– p. 15/22
Models and estimators

A parameterised (statistical) model is a family of probability


measures (Pθ )θ∈Θ on a sample space E.
An estimator is a map, θ̂ : E → Θ.
The properties of an estimator (from a statistical point of view) is its
distribution under Pθ .
Often the distribution is summarised by its mean, Eθ θ̂, and its
variance, Vθ θ̂, as a function of θ.
The mean squared error is one combined quality measure

MSEθ (θ̂) = Vθ θ̂ + (θ − Eθ θ̂)2 .


|{z} | {z }
variance squared bias

– p. 16/22
Construction of estimators

Least squares (lm) – optimisation of the squared differences.


Least squares linear regression (ordinary linear regression =
OLS): With EXi = β0 + β1 f (yi ) minimise
n
X
(xi − β0 − β1 f (yi ))2
i=1

More generally (non-linear least squares), Eθ Xi = f (θ, yi ),


minimise
Xn
(xi − f (θ, yi ))2 .
i=1

Ad hoc estimators.

– p. 17/22
MLE

Maximum likelihood estimation (MLE) – optimisation of the likelihood


function.
If X1 , . . . , Xn are iid on discrete E, sample space E n , observation
x = (x1 , . . . , xn ), parameterised family of point probabilities pθ :
n
Y
Lx (θ) = pθ (xi ).
i=1

If X1 , . . . , Xn are iid on a continuous space Rd , sample space


(Rd )n , observation x = (x1 , . . . , xn ), parameterised family of
densities fθ :
Yn
Lx (θ) = fθ (xi ).
i=1

– p. 18/22
Practical MLE

Work with the minus-log-likelihood lx (θ) = − log Lx (θ).


Analytic solution when parameters are continuous
Differentiate w.r.t. θ and find stationary points (derivative = 0).
Check that the second derivative is > 0 (or by other means make
sure its a local minimum).
Control the behaviour on the “boundary” of the parameter space.
Numerical solutions as implemented in glm (logistic regression etc.).

– p. 19/22
Example - local alignment

The maximal local alignment score S:

P(S ≤ x) ' exp(−Knm exp(−λx))

with parameters (λ, K) ∈ (0, ∞)2 .


A scale-location transformation of a standard Gumbel distribution
with location and scale parameters

log(Knm) 1
, .
λ λ

One can estimate the parameters using least squares linear


regression based upon scores from local alignment of randomly
selected, unrelated proteins or simulated protein sequences (different
underlying model assumptions).

– p. 20/22
Model control

Does a model fit?


A model does not fit the data if we can find a “point of view” on
the data that violates or contradicts the model assumptions –
even when we take random variations into account.
If the model doesn’t fit, conclusions based upon model
assumptions are questionable. Back to start – invent a model that
fits.
Regression models are for instance investigated through the
residuals.
Residual plot. Plot (standardised?) residuals against fitted values.
Check for deviations from a plot of iid variables.
QQplot of residuals to check marginal distributional assumptions.

– p. 21/22
Confidence sets

Level 1 − α sets I(x) ⊂ Θ depending on the observation x such that

Pθ (θ ∈ I(X)) ≥ 1 − α.

Typical confidence intervals with θ̂ : E → R, µ(θ) = Eθ θ̂ and


σ(θ)2 = Vθ θ̂

I(x) = [θ̂(x) − σ(θ̂(x))z, θ̂ + σ(θ̂(x))z]

with z the α/2-quantile in the N (0, 1)-distribution (α = 0.05, z = 1.96).


Bootstrap. A procedure to compute confidence intervals. Avoiding the
normal approximation and the explicit knowledge about µ(θ) and
σ(θ).

– p. 22/22

You might also like