Univariate Statistics

UNIVARIATE STATISTICS
Mode
The mode is the most frequently occurring value in a set of data.
Median
The median is the middle value in an ordered array of numbers. For an array with an odd number
of terms, the median is the middle number. For an array with an even number of terms, the
median is the average of the two middle numbers.
Mean
The arithmetic mean is the average of a group of numbers and is computed by summing all
numbers and dividing by the number of numbers.
Percentiles
Percentiles are measures of central tendency that divide a group of data into 100 parts. There are
99 percentiles because it takes 99 dividers to separate a group of data into 100 parts. The nth
percentile is the value such that at least n percent of the data are below that value and at most
(100 ) percent are above that value.
Quartiles
Quartiles are measures of central tendency that divide a group of data into four subgroups or parts.
The three quartiles are denoted as Q1, Q2, and Q3.
The first quartile, Q1, separates the first, or lowest, one-fourth of the data from the upper
three-fourths and is equal to the 25th percentile.
The second quartile, Q2, separates the second quarter of the data from the third quarter.
Q2 is located at the 50th percentile and equals the median of the data.
The third quartile, Q3, divides the first three-quarters of the data from the last quarter
and is equal to the value of the 75th percentile.
Range
The range is the difference between the largest value of a data set and the smallest value of a set.
Interquartile Range
Another measure of variability is the interquartile range. The interquartile range is the range of
values between the first and third quartile. Essentially, it is the range of the middle 50% of the data
and is determined by computing the value of Q3 - Q1.
Subtracting the mean from each value of data yields the deviation from the mean ( ). For
a given set of data, the sum of all deviations from the arithmetic mean is always zero.
Mean Absolute Deviation

Subtracting the mean from each value of data yields the deviation from the mean (
). For a given set of data, the sum of all deviations from the arithmetic mean is always
zero.
( ) = 0
The mean absolute deviation (MAD) is the average of the absolute values of the
deviations around the mean for a set of numbers.
| |
=

Variance
The variance is the average of the squared deviations about the arithmetic mean for a set of
numbers. The population variance is denoted by 2 .
2
( )2
=

The mean standard deviation is used in calculation of variability for the population, while the
variance is for the samples.
Standard Deviation
The standard deviation is the square root of the variance. The population standard deviation is
denoted by .
( )2
= =

Population Versus Sample Variance and Standard Deviation

The sample variance is denoted by 2 and the sample standard deviation by . The main use for
sample variances and standard deviations is as estimators of population variances and standard
deviations. Because of this, computation of the sample variance and standard deviation differs
slightly from computation of the population variance and standard deviation. Both the sample
variance and sample standard deviation use 1 in the denominator instead of because using
in the denominator of a sample variance results in a statistic that tends to underestimate the
population variance. Whereas using in the denominator of the sample variance makes it a biased
estimator, using 1 allows it to be an unbiased estimator, which is a desirable property in
inferential statistics.
Both the MAD and standard deviation provide measures of spread. They measure the average
deviation of the observations from their mean. If the observations are spread out, they will tend
to be far from the mean, both above and below. Some deviations will be large positive numbers,
and some will be large negative numbers. But the squared deviations will all be positive. So both
MAD and SD will be large when the data are spread out and small when the data are close
together.
z Scores
A z score represents the number of standard deviations a value () is above or below the mean of a set of
numbers when the data are normally distributed.
If a z score is negative, the raw value () is below the mean. If the z score is positive, the raw value () is
above the mean.
For example, for a data set that is normally distributed with a mean of 50 and a standard deviation of 10,
suppose a statistician wants to determine the z score for a value of 70. This value ( = 70) is 20 units
above the mean, so the z value is
70 50
= = +2.00
10
This z score signifies that the raw score of 70 is two standard deviations above the mean.
DISTRIBUTIONS AND SAMPLING
Discrete Random Variables

A random variable is a discrete random variable if the set of all possible values is at most a finite or a
countably infinite number of possible values.
For example, suppose an experiment is to measure the arrivals of automobiles at a turnpike tollbooth
during a 30-second period. The possible outcomes are: 0 , 1 , 2 , . . . , . These numbers
(0, 1, 2, . . . , ) are the values of a random variable.
Continuous Random Variables

Continuous random variables take on values at every point over a given interval. Thus, continuous random
variables have no gaps or unassumed values. It could be said that continuous random variables are
generated from experiments in which things are measured not counted.
Suppose another experiment is to measure the time between the completion of two tasks in a production
line. The values will range from 0 seconds to n seconds.
The outcomes for random variables and their associated probabilities can be organized into distributions.
The two types of distributions are discrete distributions, constructed from discrete random variables, and
continuous distributions, based on continuous random variables.
Discrete distributions:
1. Binomial distribution
2. Poisson distribution
3. Hypergeometric distribution
Continuous distributions:
1. Uniform distribution
2. Normal distribution
3. Exponential distribution
4. t distribution
5. Chi-square distribution
6. F distribution
Uniform Distribution
The uniform distribution, sometimes referred to as the rectangular distribution, is a relatively simple
continuous distribution in which the same height, or (), is obtained over a range of values. It is a probability
distribution that has constant probability. The following probability density function defines a uniform
distribution.
1
,
() = {
0,
In a uniform, or rectangular, distribution, the total area under the curve is equal to the product of the length
and the width of the rectangle and equals 1.
( + )
= =
2
( )
= =
12

Univariate Statistics

Uploaded by

Copyright:

Available Formats

Univariate Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Univariate Statistics

Uploaded by

Copyright:

Available Formats

UNIVARIATE STATISTICS

Mean Absolute Deviation

Population Versus Sample Variance and Standard Deviation

Discrete Random Variables

Continuous Random Variables

You might also like