0% found this document useful (0 votes)

41 views

Statistics BI: Models of Random Outcomes. What Is A Model?

This document provides an overview of key concepts in statistics and probability including: - Models of random outcomes and how to investigate their properties through transformations and simulations. - Probability distributions and how to estimate parameters from data. - The probabilistic foundations of models, methods, and their analysis. - Key probability themes like discrete and continuous models, dependence and independence, and simulations. - Statistical themes including empirical methods, parameterised models and estimators, and maximum likelihood estimation. - Common distributions in R and how to simulate from them. - Common summaries and estimators like the mean, variance, and their properties. - The concept of a statistical model and how estimators are used to infer parameters.

Uploaded by

Pedro Gouv

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Statistics BI: Models of Random Outcomes. What Is A Model?

Uploaded by

Pedro Gouv

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Statistics BI

Models of random outcomes. What is a model?

Probabilities

How do we investigate model properties?

Transformations (simulations)

How do we transfer data into a model?

Estimation

How do we investigate estimation procedures and model validity?

Distributions (=probabilities)

– p. 1/22
Course Theme

Probabilistic foundation of models.

Probabilistic founded methods.

Probabilistic analysis of methods.

– p. 2/22
Probability Themes

Models (probability measures and sample spaces).

Discrete sample space and point probabilities.
Continuous sample space (R, Rd ), distribution functions and
densities.
Dependence and independence.
Transformations.
Transform observations and probability measures.
Example: Estimators.
Simulations.
Tool to investigate/compute properties of a probability measure.
Tool for studying transformed probability measures.
Examples: Distribution of estimators – bootstrapping.

– p. 3/22
Discrete models

Discrete sample space E and point probabilities p(x) ∈ [0, 1], x ∈ E,

X
p(x) = 1,
x

X
P (A) = p(x).
x∈A

Independent and identically distributed replications (iid): Sample

space E n discrete, observations x = (x1 , . . . , xn ), point probabilities:
n
Y
p(x) = p(x1 )p(x2 ) · · · p(xn ) = p(xi ).
i=1

– p. 4/22
Discrete models

The relative frequency approximates the probability (frequency

interpretation):
n
1X
n (x) = 1(xi = x) ' p(x).
n i=1

Empirical estimator:
p̂(x) = n (x).

Properties: np̂(x) ∼ Bin(n, p(x))

Ep̂(x) = p(x)

Vp̂(x) = f racp(x)(1 − p(x))n

– p. 5/22
Continuous models

Continuous sample space Rd and densities f : Rd → [0, ∞),

Z
f (x)dx = 1,

Z
P (A) = f (x)dx.
A

Independent and identically distributed replications (iid): Sample

space (Rd )n continuous, observations x = (x1 , . . . , xn ), density:
n
Y
f (x) = f (x1 )f (x2 ) · · · f (xn ) = f (xi ).
i=1

– p. 6/22
Continuous models

The relative frequency approximates the probability (frequency

interpretation):
n Z
1X
n (A) = 1(xi ∈ A) ' P (A) = f (x)dx.
n i=1 A

Empirical estimator:
P̂ (A) = n (A).

Properties: nP̂ (A) ∼ Bin(n, P (A))

Ep̂(x) = P (A)

P (A)(1 − P (A))
Vp̂(x) =
n

– p. 7/22
Dependence and independence

The conditional probability of A given B is

P (A ∩ B)
P (A|B) =
P (B)

Two events, A and B are independent if

P (A ∩ B) = P (A)P (B).

Two random variables X1 and X2 are independent if

P(X1 ∈ A, X2 ∈ B) = P(X1 ∈ A)P(X2 ∈ B)

The distribution of the random variables X1 , . . . , Xn taking values in

E is a probability measure on E n . They are iid if they are independent
and identically distributed (identical marginal distributions).
– p. 8/22
Simulations

The computer can simulate almost any random variable.

First it emulates (one or more) iid random variables uniformly
distributed on [0, 1]
Then these variables are transformed.
Typical transformation is by the generalised inverse, F ← : [0, 1] → R,
of a distribution function F : R → [0, 1].
Possible to study, empirically, the distribution of almost any
transformation.

– p. 9/22
R and standard distributions

Some of the standard distributions in R are: unif, norm, exp,

binom, pois, beta, logis.
p for distribution function, e.g. punif(0.5), punif(2,0,10).
d for density function or point probabilities, e.g. dunif(0.5).
q for quantile function, e.g. qunif(0.4).
r for simulation, e.g. runif(100), runif(100,0,10).

– p. 10/22
Statistical themes
Empirical methods
Summeries of iid observations.
Simple/intuitive estimators to infer unknown quantities.
Parameterised models and estimators
Formulation of what is assumed (model assumptions) and what is
not (parameters).
Formulation of parametric dependencies and inference of the
parameters from data (no-iid assumption).
Maximum likelihood estimation
Choose the parameter that maximises how likely the
observations are.
A de facto standard with provable nice asymptotic properties
(large number of observations and small number of parameters).
Model control. Investigate the model assumptions.
– p. 11/22
Empirical methods/estimators

Tables (table) of relative frequencies (discrete models).

Empirical distribution function (ecdf),

n ((−∞, x]) ' F (x) = P ((−∞, x]).

Histograms (hist),

n ([x, y])
' f (z), z ∈ [x, y].
y−x

Quantiles (quantile): The q-quantile, xq , for q ∈ [0, 1] is the fraction

of observations ≤ xq . Inverse distribution function.
QQplot (qqplot, qqnorm) compares two distributions visually via
quantiles.
Boxplots (boxplot) summarise a few quantiles graphically. Useful
for comparing two or more distributions.
– p. 12/22
Summaries – mean
The expectation of a real valued random variable X (discrete or
continuous):
X Z
EX = xp(x), EX = xf (x)dx.
x

The average of iid random variables X1 , . . . , Xn approximates the

mean (frequency interpretation):
n
1X
µ̂ = xi ' EX.
n i=1

Properties:

Eµ̂ = EX (unbiased estimator)

1
Vµ̂ = VX
n
– p. 13/22
Summaries – variance

The variance of a real valued random variable X:

VX = E(X − EX)2 = EX 2 − (EX)2 .

The average based on iid random variables X1 , . . . , Xn approximates

the variance (frequency interpretation):
n
˜ 1X
2
σ = (xi − µ̂)2 ' VX.
n i=1

Properties :

n−1
Eσ˜2 = VX (biased estimator)
n
n
1 X
σˆ2 = (xi − µ̂)2 (bias correcte)
n−1 i=1
– p. 14/22
Summaries – covariance etc.

The covariance of X and Y is

V(X, Y ) = E(X − EX)(Y − EY ) = EXY − EXEY.

Important formulas

E(cX) = cEX
E(X + Y ) = EX + EY
V(cX) = c2 EX
V(X + Y ) = VX + VY + 2V(X, Y )
EXY = EXEY + V(X, Y )

If X and Y are independent then V(X, Y ) = 0

– p. 15/22
Models and estimators

A parameterised (statistical) model is a family of probability

measures (Pθ )θ∈Θ on a sample space E.
An estimator is a map, θ̂ : E → Θ.
The properties of an estimator (from a statistical point of view) is its
distribution under Pθ .
Often the distribution is summarised by its mean, Eθ θ̂, and its
variance, Vθ θ̂, as a function of θ.
The mean squared error is one combined quality measure

MSEθ (θ̂) = Vθ θ̂ + (θ − Eθ θ̂)2 .

|{z} | {z }
variance squared bias

– p. 16/22
Construction of estimators

Least squares (lm) – optimisation of the squared differences.

Least squares linear regression (ordinary linear regression =
OLS): With EXi = β0 + β1 f (yi ) minimise
n
X
(xi − β0 − β1 f (yi ))2
i=1

More generally (non-linear least squares), Eθ Xi = f (θ, yi ),

minimise
Xn
(xi − f (θ, yi ))2 .
i=1

Ad hoc estimators.

– p. 17/22
MLE

Maximum likelihood estimation (MLE) – optimisation of the likelihood

function.
If X1 , . . . , Xn are iid on discrete E, sample space E n , observation
x = (x1 , . . . , xn ), parameterised family of point probabilities pθ :
n
Y
Lx (θ) = pθ (xi ).
i=1

If X1 , . . . , Xn are iid on a continuous space Rd , sample space

(Rd )n , observation x = (x1 , . . . , xn ), parameterised family of
densities fθ :
Yn
Lx (θ) = fθ (xi ).
i=1

– p. 18/22
Practical MLE

Work with the minus-log-likelihood lx (θ) = − log Lx (θ).

Analytic solution when parameters are continuous
Differentiate w.r.t. θ and find stationary points (derivative = 0).
Check that the second derivative is > 0 (or by other means make
sure its a local minimum).
Control the behaviour on the “boundary” of the parameter space.
Numerical solutions as implemented in glm (logistic regression etc.).

– p. 19/22
Example - local alignment

The maximal local alignment score S:

P(S ≤ x) ' exp(−Knm exp(−λx))

with parameters (λ, K) ∈ (0, ∞)2 .

A scale-location transformation of a standard Gumbel distribution
with location and scale parameters

log(Knm) 1
, .
λ λ

One can estimate the parameters using least squares linear

regression based upon scores from local alignment of randomly
selected, unrelated proteins or simulated protein sequences (different
underlying model assumptions).

– p. 20/22
Model control

Does a model fit?

A model does not fit the data if we can find a “point of view” on
the data that violates or contradicts the model assumptions –
even when we take random variations into account.
If the model doesn’t fit, conclusions based upon model
assumptions are questionable. Back to start – invent a model that
fits.
Regression models are for instance investigated through the
residuals.
Residual plot. Plot (standardised?) residuals against fitted values.
Check for deviations from a plot of iid variables.
QQplot of residuals to check marginal distributional assumptions.

– p. 21/22
Confidence sets

Level 1 − α sets I(x) ⊂ Θ depending on the observation x such that

Pθ (θ ∈ I(X)) ≥ 1 − α.

Typical confidence intervals with θ̂ : E → R, µ(θ) = Eθ θ̂ and

σ(θ)2 = Vθ θ̂

I(x) = [θ̂(x) − σ(θ̂(x))z, θ̂ + σ(θ̂(x))z]

with z the α/2-quantile in the N (0, 1)-distribution (α = 0.05, z = 1.96).

Bootstrap. A procedure to compute confidence intervals. Avoiding the
normal approximation and the explicit knowledge about µ(θ) and
σ(θ).

– p. 22/22

Statistics For Traffic Engineers
No ratings yet
Statistics For Traffic Engineers
55 pages
Math Practice - Tests - Level - 4 PDF
100% (3)
Math Practice - Tests - Level - 4 PDF
129 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Probability
No ratings yet
Probability
28 pages
Lecture03 Discrete Random Variables Ver1
No ratings yet
Lecture03 Discrete Random Variables Ver1
37 pages
Lecture_4 Random Variables and Probability Distributions
No ratings yet
Lecture_4 Random Variables and Probability Distributions
28 pages
Expectation
No ratings yet
Expectation
19 pages
Formulae
No ratings yet
Formulae
2 pages
lec2 (1)
No ratings yet
lec2 (1)
46 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Statistics For Management and Economics, Tenth Edition Formulas
No ratings yet
Statistics For Management and Economics, Tenth Edition Formulas
11 pages
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
No ratings yet
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
113 pages
4Gaussian Discriminant
No ratings yet
4Gaussian Discriminant
50 pages
Point Estimation
No ratings yet
Point Estimation
7 pages
Chapter 5
No ratings yet
Chapter 5
21 pages
12 Maths Key Notes CH 13 Probability
No ratings yet
12 Maths Key Notes CH 13 Probability
2 pages
My Notes For Discrete and Continuous Distributions 987654
No ratings yet
My Notes For Discrete and Continuous Distributions 987654
28 pages
3logistic Regression
No ratings yet
3logistic Regression
61 pages
Formulas
No ratings yet
Formulas
21 pages
LectSlides#4
No ratings yet
LectSlides#4
43 pages
Discrete RV and Binomial, Poisson, And Hypergeometric Distributions - Lecture Notes [Dr AKM Azad] (1)
No ratings yet
Discrete RV and Binomial, Poisson, And Hypergeometric Distributions - Lecture Notes [Dr AKM Azad] (1)
3 pages
Formula Sheet
No ratings yet
Formula Sheet
2 pages
Formulae
No ratings yet
Formulae
11 pages
Statistical Models: Modeling and Simulation
No ratings yet
Statistical Models: Modeling and Simulation
51 pages
Lecture 2
No ratings yet
Lecture 2
29 pages
Statistics For Management and Economics
100% (1)
Statistics For Management and Economics
16 pages
Mathematics PDF
No ratings yet
Mathematics PDF
280 pages
Formulae Sheet
No ratings yet
Formulae Sheet
11 pages
STAT 342 Statistical Methods For Engineers
No ratings yet
STAT 342 Statistical Methods For Engineers
26 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Non Parametric
No ratings yet
Non Parametric
42 pages
Prob Random Variables
No ratings yet
Prob Random Variables
24 pages
Slide Chap4
No ratings yet
Slide Chap4
22 pages
Statinf Estimation
No ratings yet
Statinf Estimation
110 pages
Random Variables: Fall 2017 Instructor: Ajit Rajwade
No ratings yet
Random Variables: Fall 2017 Instructor: Ajit Rajwade
74 pages
Empirical Distribution Function & Exploratory Data Analysis: Vijay Kumar
No ratings yet
Empirical Distribution Function & Exploratory Data Analysis: Vijay Kumar
22 pages
BU255 - All Formulas 2022fall
No ratings yet
BU255 - All Formulas 2022fall
9 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
CPD
No ratings yet
CPD
5 pages
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
No ratings yet
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
23 pages
S1) Basic Probability Review
No ratings yet
S1) Basic Probability Review
71 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
Lecture-10: Random Variable and Probability Distribution: Prepared By: Mashfiqul Huq Chowdhury September 18, 2020
No ratings yet
Lecture-10: Random Variable and Probability Distribution: Prepared By: Mashfiqul Huq Chowdhury September 18, 2020
3 pages
Formula Sheet Final Exam
No ratings yet
Formula Sheet Final Exam
5 pages
2 Random Variables
No ratings yet
2 Random Variables
7 pages
Formulae-gulag-free
No ratings yet
Formulae-gulag-free
1 page
Unit-1-Single Random Variable
No ratings yet
Unit-1-Single Random Variable
64 pages
Statics Chapter 6 66
No ratings yet
Statics Chapter 6 66
15 pages
lecture 6
No ratings yet
lecture 6
43 pages
Module 2
No ratings yet
Module 2
36 pages
SLIDES Probability-Part2
No ratings yet
SLIDES Probability-Part2
22 pages
Statistical Methods
No ratings yet
Statistical Methods
25 pages
Open Formula For Probability Distribution 1
No ratings yet
Open Formula For Probability Distribution 1
3 pages
EN007001 Engineering Research Methodology: Statistical Inference: Bayesian Inference
No ratings yet
EN007001 Engineering Research Methodology: Statistical Inference: Bayesian Inference
72 pages
Probability Distribution
0% (1)
Probability Distribution
21 pages
Probability Cheatsheet
No ratings yet
Probability Cheatsheet
4 pages
STAT 552 Probability and Statistics Ii: Short Review of S551
No ratings yet
STAT 552 Probability and Statistics Ii: Short Review of S551
51 pages
Probability Distributions n Special
No ratings yet
Probability Distributions n Special
86 pages
ECO 201 Lecture 2: Dr. Anomita Ghosh
No ratings yet
ECO 201 Lecture 2: Dr. Anomita Ghosh
46 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Chapter 1 Sta408 Sept17
No ratings yet
Chapter 1 Sta408 Sept17
31 pages
(Sheldon Axler) Linear Algebra Done Right Ch01 EspaciosVectoriales
100% (1)
(Sheldon Axler) Linear Algebra Done Right Ch01 EspaciosVectoriales
20 pages
ADVANCED ENGINEERING MATH Ce
No ratings yet
ADVANCED ENGINEERING MATH Ce
2 pages
Lobontiu System Dynamics For Engineering Students 1st TXTBK PDF
No ratings yet
Lobontiu System Dynamics For Engineering Students 1st TXTBK PDF
691 pages
Precalculus Midterm Examination
100% (1)
Precalculus Midterm Examination
5 pages
Lesson 2.2 Scientific Notation
No ratings yet
Lesson 2.2 Scientific Notation
38 pages
Fourier Series
No ratings yet
Fourier Series
22 pages
U18Cn404 Theory of Computation: Languages and Computation, 3rd Ed. Hong Kong: Pearson Education Asia, 2007
No ratings yet
U18Cn404 Theory of Computation: Languages and Computation, 3rd Ed. Hong Kong: Pearson Education Asia, 2007
2 pages
Pythagorean Theorem Check in - Google Forms
No ratings yet
Pythagorean Theorem Check in - Google Forms
3 pages
Ranjan Bose Information Theory Coding and Cryptography Solution Manual
89% (38)
Ranjan Bose Information Theory Coding and Cryptography Solution Manual
61 pages
Module 2: Mathematical Language and Symbols: What This Module Is About
No ratings yet
Module 2: Mathematical Language and Symbols: What This Module Is About
7 pages
Abdullah Gül University ME 301 Machine Elements Group Project Fall 2020/2021
No ratings yet
Abdullah Gül University ME 301 Machine Elements Group Project Fall 2020/2021
8 pages
Galois Theory For Dummies
No ratings yet
Galois Theory For Dummies
19 pages
Grade 7 Mathematics Curriculum Guide
No ratings yet
Grade 7 Mathematics Curriculum Guide
2 pages
Combinatorics and Probability Drill
No ratings yet
Combinatorics and Probability Drill
28 pages
Mathematics: Quarter 2 - Module 3: The Relations Among Chords, Arcs, Central Angles and Inscribed Angles
100% (1)
Mathematics: Quarter 2 - Module 3: The Relations Among Chords, Arcs, Central Angles and Inscribed Angles
25 pages
British Mathematical Olympiad: Round 1
No ratings yet
British Mathematical Olympiad: Round 1
2 pages
Class 12_KST STUDY POINT_Question Bank_Maths
No ratings yet
Class 12_KST STUDY POINT_Question Bank_Maths
230 pages
Basic Maths Operations
No ratings yet
Basic Maths Operations
2 pages
State Math Contest 2015 - Senior Exam Solutions
No ratings yet
State Math Contest 2015 - Senior Exam Solutions
15 pages
Coupon Collector
No ratings yet
Coupon Collector
16 pages
Fundamental Ideas of Analysis
No ratings yet
Fundamental Ideas of Analysis
440 pages
MAT 3633 Note 1 Lagrange Interpolation: 1.1 The Lagrange Interpolating Polynomial
No ratings yet
MAT 3633 Note 1 Lagrange Interpolation: 1.1 The Lagrange Interpolating Polynomial
16 pages
FEA 16 Mark QB
No ratings yet
FEA 16 Mark QB
12 pages
9 - Maths Question Paper
No ratings yet
9 - Maths Question Paper
7 pages
Q3-Cot Lesson Plan in Mathematics 1
100% (4)
Q3-Cot Lesson Plan in Mathematics 1
6 pages
Mandelbrot's Model For Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law For Language?
No ratings yet
Mandelbrot's Model For Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law For Language?
14 pages
Conic Section
No ratings yet
Conic Section
12 pages
Scope For FINAL Exam Grade 10 2024 - TERM 4
100% (1)
Scope For FINAL Exam Grade 10 2024 - TERM 4
5 pages