Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Statistics 3

This document discusses key concepts in inferential statistics including statistical inference, estimators, pivotal quantities, null and alternative hypotheses, types of errors, and techniques for minimizing errors like least squares regression. It provides definitions and explanations of important statistical terminology.

Uploaded by

Chryss26
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Statistics 3

This document discusses key concepts in inferential statistics including statistical inference, estimators, pivotal quantities, null and alternative hypotheses, types of errors, and techniques for minimizing errors like least squares regression. It provides definitions and explanations of important statistical terminology.

Uploaded by

Chryss26
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Inferential statistics[edit]

Main article: Statistical inference


Statistical inference is the process of using data analysis to deduce properties of
an underlying probability distribution.[52] Inferential statistical analysis infers properties
of a population, for example by testing hypotheses and deriving estimates. It is
assumed that the observed data set is sampled from a larger population. Inferential
statistics can be contrasted with descriptive statistics. Descriptive statistics is solely
concerned with properties of the observed data, and it does not rest on the
assumption that the data come from a larger population.[53]

Terminology and theory of inferential statistics[edit]


Statistics, estimators and pivotal quantities[edit]
Consider independent identically distributed (IID) random variables with a
given probability distribution: standard statistical inference and estimation
theory defines a random sample as the random vector given by the column vector of
these IID variables.[54] The population being examined is described by a probability
distribution that may have unknown parameters.

A statistic is a random variable that is a function of the random sample, but not a
function of unknown parameters. The probability distribution of the statistic, though,
may have unknown parameters. Consider now a function of the unknown parameter:
an estimator is a statistic used to estimate such function. Commonly used estimators
include sample mean, unbiased sample variance and sample covariance.

A random variable that is a function of the random sample and of the unknown
parameter, but whose probability distribution does not depend on the unknown
parameter is called a pivotal quantity or pivot. Widely used pivots include the z-
score, the chi square statistic and Student's t-value.

Between two estimators of a given parameter, the one with lower mean squared
error is said to be more efficient. Furthermore, an estimator is said to be unbiased if
its expected value is equal to the true value of the unknown parameter being
estimated, and asymptotically unbiased if its expected value converges at the limit to
the true value of such parameter.

Other desirable properties for estimators include: UMVUE estimators that have the
lowest variance for all possible values of the parameter to be estimated (this is
usually an easier property to verify than efficiency) and consistent
estimators which converges in probability to the true value of such parameter.

This still leaves the question of how to obtain estimators in a given situation and
carry the computation, several methods have been proposed: the method of
moments, the maximum likelihood method, the least squares method and the more
recent method of estimating equations.

Null hypothesis and alternative hypothesis[edit]


Interpretation of statistical information can often involve the development of a null
hypothesis which is usually (but not necessarily) that no relationship exists among
variables or that no change occurred over time.[55][56]
The best illustration for a novice is the predicament encountered by a criminal trial.
The null hypothesis, H0, asserts that the defendant is innocent, whereas the
alternative hypothesis, H1, asserts that the defendant is guilty. The indictment comes
because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and
is maintained unless H1 is supported by evidence "beyond a reasonable doubt".
However, "failure to reject H0" in this case does not imply innocence, but merely that
the evidence was insufficient to convict. So the jury does not
necessarily accept H0 but fails to reject H0. While one can not "prove" a null
hypothesis, one can test how close it is to being true with a power test, which tests
for type II errors.

What statisticians call an alternative hypothesis is simply a hypothesis that


contradicts the null hypothesis.

Error[edit]
Working from a null hypothesis, two broad categories of error are recognized:

 Type I errors where the null hypothesis is falsely rejected, giving a "false
positive".
 Type II errors where the null hypothesis fails to be rejected and an actual
difference between populations is missed, giving a "false negative".
Standard deviation refers to the extent to which individual observations in a sample
differ from a central value, such as the sample or population mean, while Standard
error refers to an estimate of difference between sample mean and population mean.

A statistical error is the amount by which an observation differs from its expected
value. A residual is the amount an observation differs from the value the estimator of
the expected value assumes on a given sample (also called prediction).

Mean squared error is used for obtaining efficient estimators, a widely used class of
estimators. Root mean square error is simply the square root of mean squared error.

A least squares fit: in red the points to be fitted, in blue the


fitted line.
Many statistical methods seek to minimize the residual sum of squares, and these
are called "methods of least squares" in contrast to Least absolute deviations. The
latter gives equal weight to small and big errors, while the former gives more weight
to large errors. Residual sum of squares is also differentiable, which provides a
handy property for doing regression. Least squares applied to linear regression is
called ordinary least squares method and least squares applied to nonlinear
regression is called non-linear least squares. Also in a linear regression model the
non deterministic part of the model is called error term, disturbance or more simply
noise. Both linear regression and non-linear regression are addressed in polynomial
least squares, which also describes the variance in a prediction of the dependent
variable (y axis) as a function of the independent variable (x axis) and the deviations
(errors, noise, disturbances) from the estimated (fitted) curve.

Measurement processes that generate statistical data are also subject to error. Many
of these errors are classified as random (noise) or systematic (bias), but other types
of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be
important. The presence of missing data or censoring may result in biased
estimates and specific techniques have been developed to address these problems.
[57]

You might also like