Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Statistics: Parameter Mean Standard Deviation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

EDA01: Definition of Terms

Statistics

Is the branch of mathematics that deals with the collection, organization, analysis, and interpretation of
numerical data. Statistics is especially useful in drawing general conclusions about a set of data from a sample of the
data. Use with a plural verb Numerical data.

A statistic is used to estimate the value of a population parameter. A parameter is a measurable


characteristic of a population, such as a mean or a standard deviation.

For instance, suppose we selected a random sample of 100 students from a school with 1000 students. The
average height of the sampled students would be an example of a statistic. So would the average grade point
average. In fact, any measurable characteristic of the sample would be an example of a statistic.

Data

Facts and statistics collected together for reference or analysis.

Information

Facts provided or learned about something or someone.

Sample

A sample refers to a set of observations drawn from a population. Often, it is necessary to use samples for
research, because it is impractical to study the whole population.

A sample consists of one or more observations drawn from the population. a measurable characteristic of a
sample is called a statistic.

Population

It refers to the total set of observations that can be made. A population includes all of the elements from a
set of data.

For example, if we are studying the weight of adult women, the population is the set of weights of all the
women in the world. If we are studying the grade point average (GPA) of students at Harvard, the population is the
set of GPA's of all the students at Harvard.

A census

Is a study that obtains data from every member of a population. In most studies, a census is not practical,
because of the cost and/or time required.

A measure of central tendency

Is a summary statistic that represents the center point or typical value of a dataset. In statistics, the three
most common measures of central tendency are the mean, median, and mode. Each of these measures calculates
the location of the central point using a different method.

Mean

Is the arithmetic average, and it is probably the measure of central tendency that you are most familiar.
Calculating the mean is very simple. You just add up all of the values and divide by the number of observations in
your dataset.
The calculation of the mean incorporates all values in the data. If you change any value, the mean changes.
However, the mean doesn’t always locate the center of the data accurately.

The median

Is the middle value. It is the value that splits the dataset in half. To find the median, order your data from
smallest to largest, and then find the data point that has an equal amount of values above it and below it. The method
for locating the median varies slightly depending on whether your dataset has an even or odd number of values.

When there is an even number of values, you count in to the two innermost values and then take the
average. The average of 27 and 29 is 28. Consequently, 28 is the median of this dataset.

The mode

Is the value that occurs the most frequently in your data set. On a bar chart, the mode is the highest bar. If
the data have multiple values that are tied for occurring the most frequently, you have a multimodal distribution. If no
value repeats, the data do not have a mode.

Data Levels of Measurement

A variable has one of four different levels of measurement: Nominal, Ordinal, Interval, or Ratio. (Interval and
Ratio levels of measurement are sometimes called Continuous or Scale). It is important for the researcher to
understand the different levels of measurement, as these levels of measurement, together with how the research
question is phrased, dictate what statistical analysis is appropriate.

Nominal Level of Measurement

The nominal level of measurement is the lowest of the four ways to characterize data. Nominal means "in name only"
and that should help to remember what this level is all about. Nominal data deals with names, categories, or labels.

Data at the nominal level is qualitative. Colors of eyes, yes or no responses to a survey, and favorite breakfast cereal all deal with
the nominal level of measurement. Even some things with numbers associated with them, such as a number on the back of a
football jersey, are nominal since it is used to "name" an individual player on the field.

Data at this level can't be ordered in a meaningful way, and it makes no sense to calculate things such as means and standard
deviations.

Ordinal Level Of Measurement

The next level is called the ordinal level of measurement. Data at this level can be ordered, but no
differences between the data can be taken that are meaningful.

Here you should think of things like a list of the top ten cities to live. The data, here ten cities, are ranked from one to
ten, but differences between the cities don't make much sense. There's no way from looking at just the rankings to
know how much better life is in city number 1 than city number 2.

Another example of this are letter grades. You can order things so that A is higher than a B, but without any other
information, there is no way of knowing how much better an A is from a B.

As with the nominal level, data at the ordinal level should not be used in calculations.

The interval level of measurement


It deals with data that can be ordered, and in which differences between the data does make sense. Data at
this level does not have a starting point.

The Fahrenheit and Celsius scales of temperatures are both examples of data at the interval level of measurement.
You can talk about 30 degrees being 60 degrees less than 90 degrees, so differences do make sense. However, 0
degrees (in both scales) cold as it may be does not represent the total absence of temperature.

Data at the interval level can be used in calculations. However, data at this level does lack one type of comparison.
Even though 3 x 30 = 90, it is not correct to say that 90 degrees Celsius is three times as hot as 30 degrees Celsius.

Ratio Level

The fourth and highest level of measurement is the ratio level. Data at the ratio level possess all of the features of the
interval level, in addition to a zero value. Due to the presence of a zero, it now makes sense to compare the ratios of
measurements. Phrases such as "four times" and "twice" are meaningful at the ratio level.

Distances, in any system of measurement, give us data at the ratio level. A measurement such as 0 feet does make sense, as it
represents no length. Furthermore, 2 feet is twice as long as 1 foot. So ratios can be formed between the data.

At the ratio level of measurement, not only can sums and differences be calculated, but also ratios. One measurement can be
divided by any nonzero measurement, and a meaningful number will result.

Standard deviation

Is a number used to tell how measurements for a group are spread out from the average (mean), or
expected value. A low standard deviation means that most of the numbers are close to the average. A high standard
deviation means that the numbers are more spread out. The reported margin of error is usually twice the standard
deviation.

Variance

The variance is a numerical value used to indicate how widely individuals in a group vary. If individual
observations vary greatly from the group mean, the variance is big; and vice versa.

It is important to distinguish between the variance of a population and the variance of a sample. They have
different notation, and they are computed differently. The variance of a population is denoted by σ2; and the variance
of a sample, by s2.

The variance of a population is defined by the following formula:

σ2 = Σ ( Xi - X )2 / N

where σ2 is the population variance, X is the population mean, Xi is the ith element from the population, and
N is the number of elements in the population.

The variance of a sample is defined by slightly different formula:


s2 = Σ ( xi - x )2 / ( n - 1 )

where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the
number of elements in the sample. Using this formula, the variance of the sample is an unbiased estimate of the
variance of the population.

And finally, the variance is equal to the square of the standard deviation.

You might also like