Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Topic 2- Descriptive_statistics

The document discusses descriptive statistics, focusing on how to collect, organize, and graph data to gain insights. It covers various methods for pictorial representation, summary measures of central tendency, variation, position, and shape, and emphasizes that results are limited to the specific sample or population analyzed. Additionally, it provides examples of raw data, frequency distributions, and graphical presentations, along with explanations of mean, median, mode, and measures of dispersion.

Uploaded by

lukman shaabaan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Topic 2- Descriptive_statistics

The document discusses descriptive statistics, focusing on how to collect, organize, and graph data to gain insights. It covers various methods for pictorial representation, summary measures of central tendency, variation, position, and shape, and emphasizes that results are limited to the specific sample or population analyzed. Additionally, it provides examples of raw data, frequency distributions, and graphical presentations, along with explanations of mean, median, mode, and measures of dispersion.

Uploaded by

lukman shaabaan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Descriptive statistics

In this section we learn how to collect, organize


and graph our data to get insights. We will also
learn to use summary measures to numerically
describe the main characteristics of a data set.
1
Descriptive statistics
• A population or sample may be described pictorially or summarily using descriptive
statistics.

• Pictorially, we may use any of the graphs or charts of our choice depending on the nature
or type of data. Bar charts, pie charts, dot plots, histograms, etc.

• Summarily, we may use measures of; central tendency, variation, position and shape to
give insights into our population or sample.

• With descriptive statistics, the outcome or results are usually limited to the provided
sample or population data and are not used to generalize for the entire population or
other populations.
2
Outline
• Raw data

• Organizing and Graphing Data


➢Frequency distribution
➢Graphical presentation of qualitative data
➢Graphical presentation of quantitative data

• Summary measures
➢Measures of central tendency
➢Measures of variation
➢Measures of position
➢Measures of shape
3
• Table 1: Ages of 15 respondents in a survey
29 27 35 24 40

Raw Data 33
28
29
32
45
25
47
23
30
38

Data recorded in the sequence in which


they are collected and before they are
processed or ranked are called raw data.
• Table 2: Marital status of respondents
M S M M S
S W D W M
S M M S M

4
Organizing and Graphing Data
In this section we learn how to organize and display data using
tables and graphs. We will learn how to prepare frequency
distribution tables for qualitative and quantitative data; how to
construct bar graphs, pie charts, histograms, and polygons for
such data.

5
• Table 3: Frequency distribution of marital status of respondents
Status Tally Frequency Relative Percentage
frequency
Single(S) //// 5 0.33 33.33
Married(M) ////// 7 0.47 46.67
Frequency distribution Divorced(D) / 1 0.07 6.67
Widowed(W) // 2 0.13 13.33
A frequency distribution Total 15 1.0 100.0
exhibits how the frequencies
are distributed over various Table 4: Frequency distribution of ages of respondents
categories.
Ages Tally Frequency Relative Percentage
frequency
23 - 31 /////// 8 0.53 53.33
32 - 40 //// 5 0.33 33.33
41 - 49 // 2 0.13 13.33
Total 15 1.0 100.0
6
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 (𝑓𝑖 )
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦(𝑖) =
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠(𝑛)

𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 = 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 × 100

7
Graphical Presentation of Data
Data can be graphically or pictorially displayed using any of the numerous charts and graphs such as;
line graphs, bar graphs, pie charts, histograms, and polygons for such data; and how to prepare stem-
and-leaf displays.

8
FREQUENCY DISTRIBUTION OF MARITAL
8
STATUS OF RESPONDENTS
7
7

Graphical presentation 6

of qualitative data 5
5

Frequency
4
For qualitative data the two most
commonly used graphical 3

displays are the bar graph and 2


2

the pie chart. 1


1

0
Single Married Divorced Widowed
Status

• Figure1: A bar chart for the frequency distribution of


marital status of respondents 9
Percentage distribution of marital status of
respondents

13.33

Pie Chart
6.67 33.33

A pie chart for the frequency 46.67


distribution of marital status
of respondents is illustrated
by Figure 2. Single(S) Married(M) Divorced(D) Widowed(W)

• Figure 2: A bar chart for the frequency distribution


of marital status of respondents
10
Graphical presentation
of quantitative data

For quantitative data apart


from the pie chart and bar
graphs, other charts such as
the histograms and
frequency polygons are
also used. • Figure 3: Frequency distribution of ages
of respondents
11
Frequency Polygon
• Frequency polygons are a graphical device for understanding the
shapes of distributions. They serve the same purpose as histograms,
but are especially helpful for comparing sets of data. Frequency
polygons are also a good choice for displaying cumulative frequency
distributions.

12
Summary Measures
In this section we learn how to compute and use numerical summary
measures, usually referred to as “typical values”, such as the ones
that identify the center and spread of a distribution to identify many
important features of a distribution. The summary measure considered
include the measures of; location (central tendency), spread(variation),
position and shape.

13
Measures of central tendency
These are measures that describe the center of a distribution. Thus, they give
the center of a histogram or a frequency distribution curve. This section
discusses the three measures of central tendency; mean, median, and mode for
both grouped and ungrouped data. The other types of means either than the
arithmetic mean will be mentioned. We will also figure out the situations for
which each measure is most appropriate to use.

14
Mean
• Also known as the arithmetic mean, it is the most
frequently used measure of central tendency.
• It is also sometimes referred to as the ‘average’ or
‘expectation’.

• It is generally obtained as
sum of all values
𝑚𝑒𝑎𝑛 =
number of values
15
Mean
• When the mean of a variable, x, is computed for sample data of size n, it is called sample mean and
denoted by 𝐱ത.

• when the mean of a variable, x, is computed for a population it is called population mean and denoted by
𝛍.
σ𝑥
• For ungrouped data, 𝑥ҧ =
𝑛

σ 𝑓𝑥
• For grouped data, 𝑥ҧ = σ𝑓

• Note: For any data, the sum of all values is equal to the product of the sample size and mean; that
is,
σ 𝑥 = 𝑛𝑥.ҧ
16
17
Combined mean
• This is used to obtain the mean of two or more data sets. Once the means and sample
sizes of the two (or more) data sets are know, the combined mean of the two or more
data sets can be computed as follows

𝑛1 𝑥ҧ1 + 𝑛2 𝑥ҧ2 + ⋯ + 𝑛𝑘 𝑥ҧ𝑘


𝑥ҧ =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘

• where 𝑛1 , 𝑛2 , … , 𝑛𝑘 are the sample sizes of the data sets and 𝑥ҧ1 , 𝑥ҧ2 ,…, 𝑥ҧ𝑘 are
the corresponding means of the data sets.

18
Applications
• Suppose a sample of 10 statistics books gave a mean price of ₵140 and a sample of 8 mathematics books gave a
mean price of ₵ 160. Find the combined mean

• Twenty business majors and 18 economics majors go bowling. Each student bowls one game. The scorekeeper
announces that the mean score for the 18 economics majors is 144 and the mean score for the entire group of 38
students is 150. Find the mean score for the 20 business majors.

• . Suppose the average amount of money spent on shopping by 10 persons during a given week is ₵ 105.50. Find the
total amount of money spent on shopping by these 10 persons.

• The mean 2009 income for five families was ₵ 99,520. What was the total 2009 income of these five families?

• The mean age of six persons is 46 years. The ages of five of these six persons are 57, 39, 44, 51, and 37 years,
respectively. Find the age of the sixth person.

19
The sat scores of 12 students who sat for the exam are as follows:
1113 2009 1374 1137 2110 1086 1166 1039 1673 2300 1139 5490
a. Calculate the mean and median for these data.
b. Identify the outlier in this data set. Drop the outlier and recalculate the mean and median. Which of these two summary
measures changes by a larger amount when you drop the outlier?
c. Which is the better summary measure for these data, the mean or the median? Explain.

The yearly salaries of all employees who work for a company have a mean of $62,350 and a standard deviation
of $6820. The years of experience for the same employees have a mean of 15 years and a standard
deviation of 2 years. Is the relative variation in the salaries larger or smaller than that in years of experience
for these employees?

The following table gives information on the amounts (in dollars) of electric bills for August 2022 for a sample of 50
families.
Amount of Electric Bill (in Ghana cedis) Number of Families
0 to less than 40 5
40 to less than 80 16
80 to less than 120 11
120 to less than 160 10
160 to less than 200 8
Find the mean, variance, and standard deviation.

The following data give the weights (in pounds) lost by 15 teacher, who are members of a health club at the end of
2 months after joining the club.
5 10 8 7 25 12 5 14 11 10 21 9 8 11 18
a. Compute the values of the three quartiles and the interquartile range.
b. Calculate the (approximate) value of the 82nd percentile.
20
Median
• It is the value of the middle term of a ranked dataset.

• For ungrouped data it is given by

𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑥𝑛+1 for an odd sample size


2
1
𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑥𝑛 + 𝑥𝑛+2 for an even sample size
2 2 2

21
Median
• For grouped data,
𝑤 𝑛
𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑙𝑚 + −𝐹
𝑓𝑚 2
Where
• 𝑙𝑚 is the lower class boundary of the median class
• 𝑤 is the class width of the median class
• 𝑓𝑚 is the frequency of the median class
• 𝐹 is the cumulative frequency of the pre-median class
• 𝑛 is the sample size

22
Mode
• It is the most frequent occurring number or observation in the dataset.

• For grouped data


∆1
𝑚𝑜𝑑𝑒 = 𝑙𝑚 + 𝑤
∆1 + ∆2
Where
• 𝑙𝑚 is the lower limit of the modal class
• 𝑤 is the class width of the modal class
• ∆1 is the difference between frequencies of the modal class and the class before it
• ∆2 is the difference between frequencies of the modal class and the class after it

23
Measures of dispersion
• These are measures that give the spread of a
distribution.

• The commonest measures of dispersion or variations


are; range, variance, and standard deviation.

• The range is given by the difference between the


highest and the lowest values of a dataset.
24
Variance and Standard deviation
• The variance denoted by 𝑆 2 𝑎𝑛𝑑 𝜎 2 for sample and population data respectively
are obtained as

1 1 σ𝑥 2
𝑆2 = σ 𝑥 − 𝑥ҧ 2
= σ 𝑥2 −
𝑛−1 𝑛−1 𝑛
and
2 1 2 1 σ𝑥 2
𝜎 = σ 𝑥−𝜇 = σ 𝑥2 − .
𝑁 𝑁 𝑁

• The standard deviation is denoted by 𝑆 𝑎𝑛𝑑 𝜎 for sample and population data
respectively are obtained by taking the positive squared root of the variance.
25
Measures of position

These are measures that determines the position of a


single value in relation to other values in a sample or a
population data set. There are many measures of position;
however we will discuss only quartiles and percentiles.

26
Quartiles

• Quartiles are three summary measures that divide a ranked data set into
four equal parts.

• The second quartile is the same as the median of a data set.

• The first quartile is the value of the middle term among the observations
that are less than the median.

• The third quartile is the value of the middle term among the observations
that are greater than the median.

27
Percentiles
• These are summary measures that divide a ranked data set into 100 equal parts.

• The 𝑘 𝑡ℎ percentile, 𝑃𝑘 , can be defined as a value in a data set such that about k% of the
measurements are smaller than the value of 𝑃𝑘 and about (100 - k)% of the
measurements are greater than the value of 𝑃𝑘 .
𝑘𝑛 𝑡ℎ
• 𝑃𝑘 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 term in a ranked data set
100
• The percentile rank of a value, 𝑥𝑖 , gives the percentage of values in the data set that are
less than 𝑥𝑖 .
Number of values less than 𝑥𝑖
• 𝑃𝑅 𝑥𝑖 = × 100
Total number of values in the data set
28
Measures of shape

These describe the manner in which the data is


distributed by describing the distribution or pattern
of data within a dataset. They compare the shape of
the distribution to that of a normal curve. The
measures of shape include; skewness and kurtosis.
29

Skewness
• This refers to the presence or lack of symmetry in a distribution.

• There are three measures of skewness; symmetric(normal), negative


skewness(left skewed) and positive skewness(right skewed).

• A distribution is symmetric(normally distributed) if the right side of the distribution is similar to the left side of the
distribution. The median= mean= mode and the coefficient of skewness is 0.
• If the coefficient of skewness is greater than 0, then it is right-skewed and the right tail is longer than the left tail. If the
coefficient of skewness is less than 0, then it is left-skewed and the left tail is longer than the right tail.
• skewness can be measured using the coefficient of skewness(moments),
Bowley’s coefficient of skewness(quantiles) and Karl Pearson’s measure of
skewness among others.

30
31
coefficient of skewness

• Moments coefficient of skewness


σ 𝑥 − 𝑥ҧ 3
𝑆𝑀 =
𝑛 − 1 𝑠2
• Karl Pearson’s measure of skewness
3 𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛
𝑠𝑘 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
• Bowley’s coefficient of skewness
𝑄3 + 𝑄1 − 2𝑄2
𝑠𝐵 =
𝑄3 − 𝑄1
32
Kurtosis
• This measure draws a distinction between two datasets that have the same mean and standard
deviation.
• kurtosis is a measure of the combined weight of a distribution’s tails relative to the rest of the
distribution.
• It measures the peakness or flatness of the distribution of the dataset relative to a normal
distribution.

• There are three forms of kurtosis; mesokurtic(normal=3), leptokurtic (thicker tails>3) and
platykurtic(thinner tails<3).

• It is usually measured using the coefficient of kurtosis.

33
kurtosis

34
35
Types of Variables and their associated Descriptive statistics
Measures of Measures of Measures of
Variable type examples Possible plots
location variability position

sex, occupation, Pie chart, bar IQV(variation


Nominal mode
marital status chart, dot plot ratio)
Educational
level, income Median,
Pie chart, bar
Ordinal level, level of Mode, median Range, IQR quartiles,
chart, dot plot
awareness, rank percentiles.
of a lecturer
Pie chart, bar
Range, standard Median,
time, chart, dot plot, Mode, median,
Interval deviation, quartiles,
temperature histogram, box mean
variance percentiles.
and whisker plot
age, income,
Pie chart, bar Range, standard Median,
expenditure, Mode, median, 36
Ratio chart, dot plot, deviation, quartiles,

You might also like