0% found this document useful (0 votes)

45 views

Statistical Model

Uploaded by

hasan jami

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Statistical Model

Uploaded by

hasan jami

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the
generation of sample data (and similar data from a larger population). A statistical model represents, often in
considerably idealized form, the data-generating process.[1]

A statistical model is usually specified as a mathematical relationship between one or more random
variables and other non-random variables. As such, a statistical model is "a formal representation of a
theory" (Herman Adèr quoting Kenneth Bollen).[2]

All statistical hypothesis tests and all statistical estimators are derived via statistical models. More generally,
statistical models are part of the foundation of statistical inference.

Contents
Introduction
Formal definition
An example
General remarks
Dimension of a model
Nested models
Comparing models
See also
Notes
References
Further reading

Introduction
Informally, a statistical model can be thought of as a statistical assumption (or set of statistical assumptions)
with a certain property: that the assumption allows us to calculate the probability of any event. As an
example, consider a pair of ordinary six-sided dice. We will study two different statistical assumptions
about the dice.

The first statistical assumption is this: for each of the dice, the probability of each face (1, 2, 3, 4, 5, and 6)
coming up is 16 . From that assumption, we can calculate the probability of both dice coming up 5:
1 1 1
6 × 6 = 36 . More generally, we can calculate the probability of any event: e.g. (1 and 2) or (3 and 3) or
(5 and 6).
The alternative statistical assumption is this: for each of the dice, the probability of the face 5 coming up is
1
8
(because the dice are weighted). From that assumption, we can calculate the probability of both dice
coming up 5: 18 × 18 = 64
1
. We cannot, however, calculate the probability of any other nontrivial event, as
the probabilities of the other faces are unknown.

The first statistical assumption constitutes a statistical model: because with the assumption alone, we can
calculate the probability of any event. The alternative statistical assumption does not constitute a statistical
model: because with the assumption alone, we cannot calculate the probability of every event.

In the example above, with the first assumption, calculating the probability of an event is easy. With some
other examples, though, the calculation can be difficult, or even impractical (e.g. it might require millions of
years of computation). For an assumption to constitute a statistical model, such difficulty is acceptable:
doing the calculation does not need to be practicable, just theoretically possible.

Formal definition
In mathematical terms, a statistical model is usually thought of as a pair ( ), where is the set of
possible observations, i.e. the sample space, and is a set of probability distributions on .[3]

The intuition behind this definition is as follows. It is assumed that there is a "true" probability distribution
induced by the process that generates the observed data. We choose to represent a set (of distributions)
which contains a distribution that adequately approximates the true distribution.

Note that we do not require that contains the true distribution, and in practice that is rarely the case.
Indeed, as Burnham & Anderson state, "A model is a simplification or approximation of reality and hence
will not reflect all of reality"[4]—hence the saying "all models are wrong".

The set is almost always parameterized: . The set defines the parameters of the
model. A parameterization is generally required to have distinct parameter values give rise to distinct
distributions, i.e. must hold (in other words, it must be injective). A
parameterization that meets the requirement is said to be identifiable.[3]

An example
Suppose that we have a population of children, with the ages of the children distributed uniformly, in the
population. The height of a child will be stochastically related to the age: e.g. when we know that a child is
of age 7, this influences the chance of the child being 1.5 meters tall. We could formalize that relationship in
a linear regression model, like this: heighti = b0 + b1 agei + εi, where b0 is the intercept, b1 is a parameter
that age is multiplied by to obtain a prediction of height, εi is the error term, and i identifies the child. This
implies that height is predicted by age, with some error.

An admissible model must be consistent with all the data points. Thus, a straight line (heighti = b0 + b1 agei)
cannot be the equation for a model of the data—unless it exactly fits all the data points, i.e. all the data
points lie perfectly on the line. The error term, εi, must be included in the equation, so that the model is
consistent with all the data points.

To do statistical inference, we would first need to assume some probability distributions for the εi. For
instance, we might assume that the εi distributions are i.i.d. Gaussian, with zero mean. In this instance, the
model would have 3 parameters: b0 , b1 , and the variance of the Gaussian distribution.
We can formally specify the model in the form ( ) as follows. The sample space, , of our model
comprises the set of all possible pairs (age, height). Each possible value of = (b0 , b1 , σ2 ) determines a
distribution on ; denote that distribution by . If is the set of all possible values of , then
. (The parameterization is identifiable, and this is easy to check.)

In this example, the model is determined by (1) specifying and (2) making some assumptions relevant to
. There are two assumptions: that height can be approximated by a linear function of age; that errors in
the approximation are distributed as i.i.d. Gaussian. The assumptions are sufficient to specify —as they
are required to do.

General remarks
A statistical model is a special class of mathematical model. What distinguishes a statistical model from
other mathematical models is that a statistical model is non-deterministic. Thus, in a statistical model
specified via mathematical equations, some of the variables do not have specific values, but instead have
probability distributions; i.e. some of the variables are stochastic. In the above example with children's
heights, ε is a stochastic variable; without that stochastic variable, the model would be deterministic.

Statistical models are often used even when the data-generating process being modeled is deterministic. For
instance, coin tossing is, in principle, a deterministic process; yet it is commonly modeled as stochastic (via
a Bernoulli process).

Choosing an appropriate statistical model to represent a given data-generating process is sometimes

extremely difficult, and may require knowledge of both the process and relevant statistical analyses.
Relatedly, the statistician Sir David Cox has said, "How [the] translation from subject-matter problem to
statistical model is done is often the most critical part of an analysis".[5]

There are three purposes for a statistical model, according to Konishi & Kitagawa.[6]

Predictions
Extraction of information
Description of stochastic structures

Those three purposes are essentially the same as the three purposes indicated by Friendly & Meyer:
prediction, estimation, description.[7] The three purposes correspond with the three kinds of logical
reasoning: deductive reasoning, inductive reasoning, abductive reasoning.

Dimension of a model
Suppose that we have a statistical model ( ) with . The model is said to be
parametric if has a finite dimension. In notation, we write that where k is a positive integer (
denotes the real numbers; other sets can be used, in principle). Here, k is called the dimension of the
model.

As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming
that

.
In this example, the dimension, k, equals 2.

As another example, suppose that the data consists of points (x, y) that we assume are distributed according
to a straight line with i.i.d. Gaussian residuals (with zero mean): this leads to the same statistical model as
was used in the example with children's heights. The dimension of the statistical model is 3: the intercept of
the line, the slope of the line, and the variance of the distribution of the residuals. (Note that in geometry, a
straight line has dimension 1.)

Although formally is a single parameter that has dimension k, it is sometimes regarded as

comprising k separate parameters. For example, with the univariate Gaussian distribution, is formally a
single parameter with dimension 2, but it is sometimes regarded as comprising 2 separate parameters—the
mean and the standard deviation.

A statistical model is nonparametric if the parameter set is infinite dimensional. A statistical model is
semiparametric if it has both finite-dimensional and infinite-dimensional parameters. Formally, if k is the
dimension of and n is the number of samples, both semiparametric and nonparametric models have
as . If as , then the model is semiparametric; otherwise, the model is
nonparametric.

Parametric models are by far the most commonly used statistical models. Regarding semiparametric and
nonparametric models, Sir David Cox has said, "These typically involve fewer assumptions of structure
and distributional form but usually contain strong assumptions about independencies".[8]

Nested models
Two statistical models are nested if the first model can be transformed into the second model by imposing
constraints on the parameters of the first model. As an example, the set of all Gaussian distributions has,
nested within it, the set of zero-mean Gaussian distributions: we constrain the mean in the set of all
Gaussian distributions to get the zero-mean distributions. As a second example, the quadratic model

y = b0 + b1x + b2x2 + ε, ε ~ 𝒩(0, σ2)

has, nested within it, the linear model

y = b0 + b1x + ε, ε ~ 𝒩(0, σ2)

—we constrain the parameter b 2 to equal 0.

In both those examples, the first model has a higher dimension than the second model (for the first example,
the zero-mean model has dimension 1). Such is often, but not always, the case. As a different example, the
set of positive-mean Gaussian distributions, which has dimension 2, is nested within the set of all Gaussian
distributions.

Comparing models
Comparing statistical models is fundamental for much of statistical inference. Indeed, Konishi & Kitagawa
(2008, p. 75) state this: "The majority of the problems in statistical inference can be considered to be
problems related to statistical modeling. They are typically formulated as comparisons of several statistical
models."
Common criteria for comparing models include the following: R2 , Bayes factor, Akaike information
criterion, and the likelihood-ratio test together with its generalization, the relative likelihood.

See also
All models are wrong Response modeling methodology
Blockmodel Scientific model
Conceptual model Statistical inference
Design of experiments Statistical model specification
Deterministic model Statistical model validation
Effective theory Statistical theory
Predictive model Stochastic process

Notes
1. Cox 2006, p. 178 5. Cox 2006, p. 197
2. Adèr 2008, p. 280 (https://books.google.co 6. Konishi & Kitagawa 2008, §1.1
m/books?id=LCnOj4ZFyjkC&pg=PA280) 7. Friendly & Meyer 2016, §11.6
3. McCullagh 2002 8. Cox 2006, p. 2
4. Burnham & Anderson 2002, §1.2.5

References
Adèr, H. J. (2008), "Modelling", in Adèr, H. J.; Mellenbergh, G. J. (eds.), Advising on
Research Methods: A consultant's companion, Huizen, The Netherlands: Johannes van
Kessel Publishing, pp. 271–304.
Burnham, K. P.; Anderson, D. R. (2002), Model Selection and Multimodel Inference
(2nd ed.), Springer-Verlag.
Cox, D. R. (2006), Principles of Statistical Inference, Cambridge University Press.
Friendly, M.; Meyer, D. (2016), Discrete Data Analysis with R, Chapman & Hall.
Konishi, S.; Kitagawa, G. (2008), Information Criteria and Statistical Modeling, Springer.
McCullagh, P. (2002), "What is a statistical model?" (http://www.stat.uchicago.edu/~pmcc/pu
bs/AOS023.pdf) (PDF), Annals of Statistics, 30 (5): 1225–1310,
doi:10.1214/aos/1035844977 (https://doi.org/10.1214%2Faos%2F1035844977).

Further reading
Davison, A. C. (2008), Statistical Models, Cambridge University Press
Drton, M.; Sullivant, S. (2007), "Algebraic statistical models" (http://www3.stat.sinica.edu.tw/s
tatistica/oldpdf/A17n41.pdf) (PDF), Statistica Sinica, 17: 1273–1297
Freedman, D. A. (2009), Statistical Models, Cambridge University Press
Helland, I. S. (2010), Steps Towards a Unified Basis for Scientific Models and Methods,
World Scientific
Kroese, D. P.; Chan, J. C. C. (2014), Statistical Modeling and Computation, Springer
Shmueli, G. (2010), "To explain or to predict?", Statistical Science, 25 (3): 289–310,
arXiv:1101.0891 (https://arxiv.org/abs/1101.0891), doi:10.1214/10-STS330 (https://doi.org/1
0.1214%2F10-STS330)
Retrieved from "https://en.wikipedia.org/w/index.php?title=Statistical_model&oldid=1083915276"

This page was last edited on 21 April 2022, at 14:44 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 3.0;

additional terms may apply. By
using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the
Wikimedia Foundation, Inc., a non-profit organization.

Project Management 5th edition Edition Harvey Maylor download pdf
100% (1)
Project Management 5th edition Edition Harvey Maylor download pdf
41 pages
Alevel S2
No ratings yet
Alevel S2
37 pages
Assignment of Fundamentals of Econometrics, On Multiple Regression
No ratings yet
Assignment of Fundamentals of Econometrics, On Multiple Regression
4 pages
Kalton 1983 Introduction To Survey Sampling
No ratings yet
Kalton 1983 Introduction To Survey Sampling
98 pages
Mathematical Statistics For Economics: Lecturer: DR Ioannis (Yiannis) Karavias
No ratings yet
Mathematical Statistics For Economics: Lecturer: DR Ioannis (Yiannis) Karavias
33 pages
Zakaria, Sabri (2013) - Review of Financial Capability Studies PDF
No ratings yet
Zakaria, Sabri (2013) - Review of Financial Capability Studies PDF
7 pages
The Growth of E-Commerce & Its Implications of HR System: Mgt-702 Group Term Paper
No ratings yet
The Growth of E-Commerce & Its Implications of HR System: Mgt-702 Group Term Paper
17 pages
Research Economic Models
No ratings yet
Research Economic Models
2 pages
Introduction To Survey Sampling7 PDF
No ratings yet
Introduction To Survey Sampling7 PDF
10 pages
Entrepreneur Proactiveness and Customer Value - Shabbir Et Al. (2010)
No ratings yet
Entrepreneur Proactiveness and Customer Value - Shabbir Et Al. (2010)
17 pages
Full Download Time Series Econometrics: Learning Through Replication 2nd Edition Levendis PDF DOCX
100% (2)
Full Download Time Series Econometrics: Learning Through Replication 2nd Edition Levendis PDF DOCX
50 pages
The Pros and Cons of Fair Value Accounting in A Globalized Economy: A Never Ending Debate
No ratings yet
The Pros and Cons of Fair Value Accounting in A Globalized Economy: A Never Ending Debate
10 pages
Applied Econometrics Final Study Guide
No ratings yet
Applied Econometrics Final Study Guide
2 pages
Financial Literacy and Attitudes To Cryptocurrencies
No ratings yet
Financial Literacy and Attitudes To Cryptocurrencies
96 pages
Strategic Management
0% (1)
Strategic Management
8 pages
ARCH GARCH Assignment
No ratings yet
ARCH GARCH Assignment
8 pages
Statistical Model
No ratings yet
Statistical Model
5 pages
Statistical Inference
No ratings yet
Statistical Inference
14 pages
Statistical Inference
100% (1)
Statistical Inference
11 pages
Statistical Model
No ratings yet
Statistical Model
9 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
8 pages
Statistical Model Specification
No ratings yet
Statistical Model Specification
3 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
4 pages
09 Mathematical Models
No ratings yet
09 Mathematical Models
4 pages
Statistics - Wikipedia
No ratings yet
Statistics - Wikipedia
23 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
IEA 01 Probability & Statastical Method
No ratings yet
IEA 01 Probability & Statastical Method
30 pages
Statistics
No ratings yet
Statistics
14 pages
exponential family
No ratings yet
exponential family
45 pages
First Unit: Introduction To Statistical Modeling
No ratings yet
First Unit: Introduction To Statistical Modeling
12 pages
Data-Science Needed
No ratings yet
Data-Science Needed
24 pages
7 Inference L8 Unlocked
No ratings yet
7 Inference L8 Unlocked
29 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
An Introduction To Objective Bayesian Statistics PDF
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
69 pages
Statistics
No ratings yet
Statistics
21 pages
Introduction To Statistical Modeling With SAS/STAT Software
No ratings yet
Introduction To Statistical Modeling With SAS/STAT Software
60 pages
Statistical Modeling Notes
No ratings yet
Statistical Modeling Notes
25 pages
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
No ratings yet
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
22 pages
WST312 Notes 2023
No ratings yet
WST312 Notes 2023
101 pages
Stat Lecture 2
No ratings yet
Stat Lecture 2
6 pages
Week 1 Intro To Statistics and Level of Measurement
100% (1)
Week 1 Intro To Statistics and Level of Measurement
6 pages
Final AB 19-21 PIM3 Basics of Business Statistics
No ratings yet
Final AB 19-21 PIM3 Basics of Business Statistics
37 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
Applications of Inference Statistics
No ratings yet
Applications of Inference Statistics
28 pages
Statistics - Wikipedia
No ratings yet
Statistics - Wikipedia
22 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
Principles of Statistical Inference
100% (10)
Principles of Statistical Inference
236 pages
All Economterics Note 1-66
No ratings yet
All Economterics Note 1-66
184 pages
Chapter 1: Introduction To (Actuarial) Modelling: Models
No ratings yet
Chapter 1: Introduction To (Actuarial) Modelling: Models
17 pages
Apuntes Estadistica
No ratings yet
Apuntes Estadistica
116 pages
What Is Statistics Intro
No ratings yet
What Is Statistics Intro
16 pages
Statistics: Jump To Navigationjump To Search
No ratings yet
Statistics: Jump To Navigationjump To Search
3 pages
Advanced Stats - 01 - What Are Statistics For
No ratings yet
Advanced Stats - 01 - What Are Statistics For
21 pages
Estima
No ratings yet
Estima
378 pages
Origin of Statistics
No ratings yet
Origin of Statistics
4 pages
Si Notes
No ratings yet
Si Notes
58 pages
3998
No ratings yet
3998
55 pages
ApplStat2007ZK
No ratings yet
ApplStat2007ZK
124 pages
Week 01statistics
No ratings yet
Week 01statistics
47 pages
Simple Data Types: Binary
No ratings yet
Simple Data Types: Binary
2 pages
Indian English
No ratings yet
Indian English
20 pages
Proletariat
No ratings yet
Proletariat
6 pages
Construction
No ratings yet
Construction
16 pages
International Organization
No ratings yet
International Organization
10 pages
Library of Alexandria
No ratings yet
Library of Alexandria
22 pages
Fundamental Rights
No ratings yet
Fundamental Rights
5 pages
1936 Constitution of The Soviet Union
No ratings yet
1936 Constitution of The Soviet Union
6 pages
Goznak
No ratings yet
Goznak
4 pages
Workers of The World, Unite!
No ratings yet
Workers of The World, Unite!
7 pages
Money Supply
No ratings yet
Money Supply
15 pages
History of Money
No ratings yet
History of Money
22 pages
History of The Mediterranean Region
No ratings yet
History of The Mediterranean Region
13 pages
Pinakes
No ratings yet
Pinakes
3 pages
Index Term
No ratings yet
Index Term
2 pages
Mao Zedong
No ratings yet
Mao Zedong
54 pages
Bibliographic Record
No ratings yet
Bibliographic Record
3 pages
Tender 26012023
No ratings yet
Tender 26012023
1 page
Résumé
No ratings yet
Résumé
6 pages
International Trade
No ratings yet
International Trade
10 pages
Limited Liability Company: History
No ratings yet
Limited Liability Company: History
9 pages
United States Army Security Agency
No ratings yet
United States Army Security Agency
3 pages
Minicomputer
No ratings yet
Minicomputer
7 pages
Sino Soviet Split
No ratings yet
Sino Soviet Split
22 pages
Trip Adler: John R. "Trip" Adler III Is An American Entrepreneur
No ratings yet
Trip Adler: John R. "Trip" Adler III Is An American Entrepreneur
4 pages
Branch (Computer Science)
No ratings yet
Branch (Computer Science)
5 pages
Control Flow: Categories Primitives
No ratings yet
Control Flow: Categories Primitives
15 pages
Preferred Stock
No ratings yet
Preferred Stock
8 pages
Articles of Organization: United States
No ratings yet
Articles of Organization: United States
2 pages
Two-Sample Tests of Hypothesis: Mcgraw-Hill/Irwin
No ratings yet
Two-Sample Tests of Hypothesis: Mcgraw-Hill/Irwin
14 pages
Regreesion Analysis
No ratings yet
Regreesion Analysis
24 pages
Bladt, Bo Friis Nielsen - Matrix-Exponential Distributions in Applied Probability-Springer US (2017)
No ratings yet
Bladt, Bo Friis Nielsen - Matrix-Exponential Distributions in Applied Probability-Springer US (2017)
749 pages
Eric Jang - Tutorial - Categorical Variational Autoencoders Using Gumbel-Softmax
No ratings yet
Eric Jang - Tutorial - Categorical Variational Autoencoders Using Gumbel-Softmax
8 pages
Rao 2020
No ratings yet
Rao 2020
31 pages
5 Regularization
No ratings yet
5 Regularization
79 pages
BE602 - November 01 2022 Class Deck
No ratings yet
BE602 - November 01 2022 Class Deck
50 pages
Sma2217 Tutorial 2
50% (2)
Sma2217 Tutorial 2
6 pages
L09 Measurement Uncertainty in Microbiological Examinations of Foods Technique For Determination of Pathogens - Hilde Skår Norli
No ratings yet
L09 Measurement Uncertainty in Microbiological Examinations of Foods Technique For Determination of Pathogens - Hilde Skår Norli
20 pages
International Standard: Iteh Standard Preview (Standards - Iteh.ai)
No ratings yet
International Standard: Iteh Standard Preview (Standards - Iteh.ai)
13 pages
Practise Problem3
0% (1)
Practise Problem3
6 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
CS5560: Probabilistic Models For ML
No ratings yet
CS5560: Probabilistic Models For ML
3 pages
Measures of Central Tendency and Dispersion/ Variability
No ratings yet
Measures of Central Tendency and Dispersion/ Variability
35 pages
Theory of Causality
No ratings yet
Theory of Causality
18 pages
Alternative Experiment Designs - Mateus Facure
No ratings yet
Alternative Experiment Designs - Mateus Facure
45 pages
MGT 11 1 18
No ratings yet
MGT 11 1 18
5 pages
Measurement Uncertainty Chemical
No ratings yet
Measurement Uncertainty Chemical
8 pages
10 Correlation JASP
100% (1)
10 Correlation JASP
12 pages
Statistics Assignment
No ratings yet
Statistics Assignment
4 pages
Mansi Bharne 16/10: Unit Viii: Correlation and Regression
No ratings yet
Mansi Bharne 16/10: Unit Viii: Correlation and Regression
5 pages
Business Statistics 1 Set 1
No ratings yet
Business Statistics 1 Set 1
2 pages
Biostatistics: DR Shakil, MD Resident Neurology Bsmmu Fcps Part 2 Internal Medicine
No ratings yet
Biostatistics: DR Shakil, MD Resident Neurology Bsmmu Fcps Part 2 Internal Medicine
71 pages
Batch38 CSE7315c Probability Basics Lab04 Solutions
No ratings yet
Batch38 CSE7315c Probability Basics Lab04 Solutions
3 pages
Normal Distribution Notes & Exam Type Qstns
No ratings yet
Normal Distribution Notes & Exam Type Qstns
18 pages
Random Process Homework
100% (1)
Random Process Homework
6 pages
3 Way ANOVA
No ratings yet
3 Way ANOVA
4 pages
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Introduction
0% (1)
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Introduction
73 pages
Config Test 7
No ratings yet
Config Test 7
255 pages