0% found this document useful (0 votes)

3 views

Topic 2- Descriptive_statistics

The document discusses descriptive statistics, focusing on how to collect, organize, and graph data to gain insights. It covers various methods for pictorial representation, summary measures of central tendency, variation, position, and shape, and emphasizes that results are limited to the specific sample or population analyzed. Additionally, it provides examples of raw data, frequency distributions, and graphical presentations, along with explanations of mean, median, mode, and measures of dispersion.

Uploaded by

lukman shaabaan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Topic 2- Descriptive_statistics

Uploaded by

lukman shaabaan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Descriptive statistics

In this section we learn how to collect, organize

and graph our data to get insights. We will also
learn to use summary measures to numerically
describe the main characteristics of a data set.
1
Descriptive statistics
• A population or sample may be described pictorially or summarily using descriptive
statistics.

• Pictorially, we may use any of the graphs or charts of our choice depending on the nature
or type of data. Bar charts, pie charts, dot plots, histograms, etc.

• Summarily, we may use measures of; central tendency, variation, position and shape to
give insights into our population or sample.

• With descriptive statistics, the outcome or results are usually limited to the provided
sample or population data and are not used to generalize for the entire population or
other populations.
2
Outline
• Raw data

• Organizing and Graphing Data

➢Frequency distribution
➢Graphical presentation of qualitative data
➢Graphical presentation of quantitative data

• Summary measures
➢Measures of central tendency
➢Measures of variation
➢Measures of position
➢Measures of shape
3
• Table 1: Ages of 15 respondents in a survey
29 27 35 24 40

Raw Data 33
28
29
32
45
25
47
23
30
38

Data recorded in the sequence in which

they are collected and before they are
processed or ranked are called raw data.
• Table 2: Marital status of respondents
M S M M S
S W D W M
S M M S M

4
Organizing and Graphing Data
In this section we learn how to organize and display data using
tables and graphs. We will learn how to prepare frequency
distribution tables for qualitative and quantitative data; how to
construct bar graphs, pie charts, histograms, and polygons for
such data.

5
• Table 3: Frequency distribution of marital status of respondents
Status Tally Frequency Relative Percentage
frequency
Single(S) //// 5 0.33 33.33
Married(M) ////// 7 0.47 46.67
Frequency distribution Divorced(D) / 1 0.07 6.67
Widowed(W) // 2 0.13 13.33
A frequency distribution Total 15 1.0 100.0
exhibits how the frequencies
are distributed over various Table 4: Frequency distribution of ages of respondents
categories.
Ages Tally Frequency Relative Percentage
frequency
23 - 31 /////// 8 0.53 53.33
32 - 40 //// 5 0.33 33.33
41 - 49 // 2 0.13 13.33
Total 15 1.0 100.0
6
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 (𝑓𝑖 )
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦(𝑖) =
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠(𝑛)

𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 = 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 × 100

7
Graphical Presentation of Data
Data can be graphically or pictorially displayed using any of the numerous charts and graphs such as;
line graphs, bar graphs, pie charts, histograms, and polygons for such data; and how to prepare stem-
and-leaf displays.

8
FREQUENCY DISTRIBUTION OF MARITAL
8
STATUS OF RESPONDENTS
7
7

Graphical presentation 6

of qualitative data 5
5

Frequency
4
For qualitative data the two most
commonly used graphical 3

displays are the bar graph and 2

the pie chart. 1

0
Single Married Divorced Widowed
Status

• Figure1: A bar chart for the frequency distribution of

marital status of respondents 9
Percentage distribution of marital status of
respondents

13.33

Pie Chart
6.67 33.33

A pie chart for the frequency 46.67

distribution of marital status
of respondents is illustrated
by Figure 2. Single(S) Married(M) Divorced(D) Widowed(W)

• Figure 2: A bar chart for the frequency distribution

of marital status of respondents
10
Graphical presentation
of quantitative data

For quantitative data apart

from the pie chart and bar
graphs, other charts such as
the histograms and
frequency polygons are
also used. • Figure 3: Frequency distribution of ages
of respondents
11
Frequency Polygon
• Frequency polygons are a graphical device for understanding the
shapes of distributions. They serve the same purpose as histograms,
but are especially helpful for comparing sets of data. Frequency
polygons are also a good choice for displaying cumulative frequency
distributions.

12
Summary Measures
In this section we learn how to compute and use numerical summary
measures, usually referred to as “typical values”, such as the ones
that identify the center and spread of a distribution to identify many
important features of a distribution. The summary measure considered
include the measures of; location (central tendency), spread(variation),
position and shape.

13
Measures of central tendency
These are measures that describe the center of a distribution. Thus, they give
the center of a histogram or a frequency distribution curve. This section
discusses the three measures of central tendency; mean, median, and mode for
both grouped and ungrouped data. The other types of means either than the
arithmetic mean will be mentioned. We will also figure out the situations for
which each measure is most appropriate to use.

14
Mean
• Also known as the arithmetic mean, it is the most
frequently used measure of central tendency.
• It is also sometimes referred to as the ‘average’ or
‘expectation’.

• It is generally obtained as
sum of all values
𝑚𝑒𝑎𝑛 =
number of values
15
Mean
• When the mean of a variable, x, is computed for sample data of size n, it is called sample mean and
denoted by 𝐱ത.

• when the mean of a variable, x, is computed for a population it is called population mean and denoted by
𝛍.
σ𝑥
• For ungrouped data, 𝑥ҧ =
𝑛

σ 𝑓𝑥
• For grouped data, 𝑥ҧ = σ𝑓

• Note: For any data, the sum of all values is equal to the product of the sample size and mean; that
is,
σ 𝑥 = 𝑛𝑥.ҧ
16
17
Combined mean
• This is used to obtain the mean of two or more data sets. Once the means and sample
sizes of the two (or more) data sets are know, the combined mean of the two or more
data sets can be computed as follows

𝑛1 𝑥ҧ1 + 𝑛2 𝑥ҧ2 + ⋯ + 𝑛𝑘 𝑥ҧ𝑘

𝑥ҧ =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘

• where 𝑛1 , 𝑛2 , … , 𝑛𝑘 are the sample sizes of the data sets and 𝑥ҧ1 , 𝑥ҧ2 ,…, 𝑥ҧ𝑘 are
the corresponding means of the data sets.

18
Applications
• Suppose a sample of 10 statistics books gave a mean price of ₵140 and a sample of 8 mathematics books gave a
mean price of ₵ 160. Find the combined mean

• Twenty business majors and 18 economics majors go bowling. Each student bowls one game. The scorekeeper
announces that the mean score for the 18 economics majors is 144 and the mean score for the entire group of 38
students is 150. Find the mean score for the 20 business majors.

• . Suppose the average amount of money spent on shopping by 10 persons during a given week is ₵ 105.50. Find the
total amount of money spent on shopping by these 10 persons.

• The mean 2009 income for five families was ₵ 99,520. What was the total 2009 income of these five families?

• The mean age of six persons is 46 years. The ages of five of these six persons are 57, 39, 44, 51, and 37 years,
respectively. Find the age of the sixth person.

19
The sat scores of 12 students who sat for the exam are as follows:
1113 2009 1374 1137 2110 1086 1166 1039 1673 2300 1139 5490
a. Calculate the mean and median for these data.
b. Identify the outlier in this data set. Drop the outlier and recalculate the mean and median. Which of these two summary
measures changes by a larger amount when you drop the outlier?
c. Which is the better summary measure for these data, the mean or the median? Explain.

The yearly salaries of all employees who work for a company have a mean of $62,350 and a standard deviation
of $6820. The years of experience for the same employees have a mean of 15 years and a standard
deviation of 2 years. Is the relative variation in the salaries larger or smaller than that in years of experience
for these employees?

The following table gives information on the amounts (in dollars) of electric bills for August 2022 for a sample of 50
families.
Amount of Electric Bill (in Ghana cedis) Number of Families
0 to less than 40 5
40 to less than 80 16
80 to less than 120 11
120 to less than 160 10
160 to less than 200 8
Find the mean, variance, and standard deviation.

The following data give the weights (in pounds) lost by 15 teacher, who are members of a health club at the end of
2 months after joining the club.
5 10 8 7 25 12 5 14 11 10 21 9 8 11 18
a. Compute the values of the three quartiles and the interquartile range.
b. Calculate the (approximate) value of the 82nd percentile.
20
Median
• It is the value of the middle term of a ranked dataset.

• For ungrouped data it is given by

𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑥𝑛+1 for an odd sample size

2
1
𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑥𝑛 + 𝑥𝑛+2 for an even sample size
2 2 2

21
Median
• For grouped data,
𝑤 𝑛
𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑙𝑚 + −𝐹
𝑓𝑚 2
Where
• 𝑙𝑚 is the lower class boundary of the median class
• 𝑤 is the class width of the median class
• 𝑓𝑚 is the frequency of the median class
• 𝐹 is the cumulative frequency of the pre-median class
• 𝑛 is the sample size

22
Mode
• It is the most frequent occurring number or observation in the dataset.

• For grouped data

∆1
𝑚𝑜𝑑𝑒 = 𝑙𝑚 + 𝑤
∆1 + ∆2
Where
• 𝑙𝑚 is the lower limit of the modal class
• 𝑤 is the class width of the modal class
• ∆1 is the difference between frequencies of the modal class and the class before it
• ∆2 is the difference between frequencies of the modal class and the class after it

23
Measures of dispersion
• These are measures that give the spread of a
distribution.

• The commonest measures of dispersion or variations

are; range, variance, and standard deviation.

• The range is given by the difference between the

highest and the lowest values of a dataset.
24
Variance and Standard deviation
• The variance denoted by 𝑆 2 𝑎𝑛𝑑 𝜎 2 for sample and population data respectively
are obtained as

1 1 σ𝑥 2
𝑆2 = σ 𝑥 − 𝑥ҧ 2
= σ 𝑥2 −
𝑛−1 𝑛−1 𝑛
and
2 1 2 1 σ𝑥 2
𝜎 = σ 𝑥−𝜇 = σ 𝑥2 − .
𝑁 𝑁 𝑁

• The standard deviation is denoted by 𝑆 𝑎𝑛𝑑 𝜎 for sample and population data
respectively are obtained by taking the positive squared root of the variance.
25
Measures of position

These are measures that determines the position of a

single value in relation to other values in a sample or a
population data set. There are many measures of position;
however we will discuss only quartiles and percentiles.

26
Quartiles

• Quartiles are three summary measures that divide a ranked data set into
four equal parts.

• The second quartile is the same as the median of a data set.

• The first quartile is the value of the middle term among the observations
that are less than the median.

• The third quartile is the value of the middle term among the observations
that are greater than the median.

27
Percentiles
• These are summary measures that divide a ranked data set into 100 equal parts.

• The 𝑘 𝑡ℎ percentile, 𝑃𝑘 , can be defined as a value in a data set such that about k% of the
measurements are smaller than the value of 𝑃𝑘 and about (100 - k)% of the
measurements are greater than the value of 𝑃𝑘 .
𝑘𝑛 𝑡ℎ
• 𝑃𝑘 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 term in a ranked data set
100
• The percentile rank of a value, 𝑥𝑖 , gives the percentage of values in the data set that are
less than 𝑥𝑖 .
Number of values less than 𝑥𝑖
• 𝑃𝑅 𝑥𝑖 = × 100
Total number of values in the data set
28
Measures of shape

These describe the manner in which the data is

distributed by describing the distribution or pattern
of data within a dataset. They compare the shape of
the distribution to that of a normal curve. The
measures of shape include; skewness and kurtosis.
29
•
Skewness
• This refers to the presence or lack of symmetry in a distribution.

• There are three measures of skewness; symmetric(normal), negative

skewness(left skewed) and positive skewness(right skewed).

• A distribution is symmetric(normally distributed) if the right side of the distribution is similar to the left side of the
distribution. The median= mean= mode and the coefficient of skewness is 0.
• If the coefficient of skewness is greater than 0, then it is right-skewed and the right tail is longer than the left tail. If the
coefficient of skewness is less than 0, then it is left-skewed and the left tail is longer than the right tail.
• skewness can be measured using the coefficient of skewness(moments),
Bowley’s coefficient of skewness(quantiles) and Karl Pearson’s measure of
skewness among others.

30
31
coefficient of skewness

• Moments coefficient of skewness

σ 𝑥 − 𝑥ҧ 3
𝑆𝑀 =
𝑛 − 1 𝑠2
• Karl Pearson’s measure of skewness
3 𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛
𝑠𝑘 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
• Bowley’s coefficient of skewness
𝑄3 + 𝑄1 − 2𝑄2
𝑠𝐵 =
𝑄3 − 𝑄1
32
Kurtosis
• This measure draws a distinction between two datasets that have the same mean and standard
deviation.
• kurtosis is a measure of the combined weight of a distribution’s tails relative to the rest of the
distribution.
• It measures the peakness or flatness of the distribution of the dataset relative to a normal
distribution.

• There are three forms of kurtosis; mesokurtic(normal=3), leptokurtic (thicker tails>3) and
platykurtic(thinner tails<3).

• It is usually measured using the coefficient of kurtosis.

33
kurtosis

34
35
Types of Variables and their associated Descriptive statistics
Measures of Measures of Measures of
Variable type examples Possible plots
location variability position

sex, occupation, Pie chart, bar IQV(variation

Nominal mode
marital status chart, dot plot ratio)
Educational
level, income Median,
Pie chart, bar
Ordinal level, level of Mode, median Range, IQR quartiles,
chart, dot plot
awareness, rank percentiles.
of a lecturer
Pie chart, bar
Range, standard Median,
time, chart, dot plot, Mode, median,
Interval deviation, quartiles,
temperature histogram, box mean
variance percentiles.
and whisker plot
age, income,
Pie chart, bar Range, standard Median,
expenditure, Mode, median, 36
Ratio chart, dot plot, deviation, quartiles,

Statistics For Business Handbook PDF
No ratings yet
Statistics For Business Handbook PDF
108 pages
Statistical Analysis 2023
No ratings yet
Statistical Analysis 2023
56 pages
Frequency Distributions and Graphs2
No ratings yet
Frequency Distributions and Graphs2
8 pages
2466939-EDA_and_STATISTICS_NOTES
No ratings yet
2466939-EDA_and_STATISTICS_NOTES
15 pages
Week 5 - Result and Analysis 1 (UP)
No ratings yet
Week 5 - Result and Analysis 1 (UP)
7 pages
Week 11 Measure of Center and Variability
No ratings yet
Week 11 Measure of Center and Variability
35 pages
Chapter 1 Review of Elementary Statistics
No ratings yet
Chapter 1 Review of Elementary Statistics
5 pages
Ch1 Prob&Stat NEW
No ratings yet
Ch1 Prob&Stat NEW
35 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
STPDF2 - Descriptive Statistics
100% (1)
STPDF2 - Descriptive Statistics
74 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
42 pages
Measures of CT and Dispersion
No ratings yet
Measures of CT and Dispersion
43 pages
WEEK 3 - Central-Tendency-Variation-And-Shape
No ratings yet
WEEK 3 - Central-Tendency-Variation-And-Shape
39 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Math
No ratings yet
Math
13 pages
Central Tendency
No ratings yet
Central Tendency
105 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Descr Iptive Statis Tics: Inferential Statistics
No ratings yet
Descr Iptive Statis Tics: Inferential Statistics
36 pages
Physics
No ratings yet
Physics
6 pages
Statistics
No ratings yet
Statistics
12 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Measures of Central Tendency and Dispersion
100% (1)
Measures of Central Tendency and Dispersion
7 pages
3jane - Data Description Finala4
No ratings yet
3jane - Data Description Finala4
14 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
50 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
51 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Psychology Project
No ratings yet
Psychology Project
14 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
86 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
Chap 4 Part1 Intro Measures of Central Tendency of Ungrouped Data 1
No ratings yet
Chap 4 Part1 Intro Measures of Central Tendency of Ungrouped Data 1
74 pages
Statistics
No ratings yet
Statistics
68 pages
MMW Nursing
No ratings yet
MMW Nursing
23 pages
Da Session 2
No ratings yet
Da Session 2
95 pages
GE 104 Module 4
No ratings yet
GE 104 Module 4
24 pages
Summarizing Data
No ratings yet
Summarizing Data
49 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Chapter-5-Statistics-and-Data
No ratings yet
Chapter-5-Statistics-and-Data
25 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
Measures of CT and Dispersion
No ratings yet
Measures of CT and Dispersion
57 pages
Business Statistics NOtes
No ratings yet
Business Statistics NOtes
46 pages
Statistics
100% (4)
Statistics
124 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Statistics
No ratings yet
Statistics
46 pages
Chapter 1 BFC34303 (Lyy)
No ratings yet
Chapter 1 BFC34303 (Lyy)
104 pages
Statistics
No ratings yet
Statistics
13 pages
Statistical Methods for Six Sigma: In R&D and Manufacturing
From Everand
Statistical Methods for Six Sigma: In R&D and Manufacturing
Anand M. Joglekar
No ratings yet
Using Statistical Methods for Water Quality Management: Issues, Problems and Solutions
From Everand
Using Statistical Methods for Water Quality Management: Issues, Problems and Solutions
Graham B. McBride
No ratings yet
Basic Statistics: A Primer for the Biomedical Sciences
From Everand
Basic Statistics: A Primer for the Biomedical Sciences
Olive Jean Dunn
No ratings yet
Rocess Ontrol Tatistical: Drrswalia
No ratings yet
Rocess Ontrol Tatistical: Drrswalia
153 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
35 pages
Reading 2 Time-Series Analysis - Answers
No ratings yet
Reading 2 Time-Series Analysis - Answers
63 pages
Huong Dan Lam Tat Ca Cac Dang
No ratings yet
Huong Dan Lam Tat Ca Cac Dang
14 pages
Math11 SP Q3 M3 PDF
No ratings yet
Math11 SP Q3 M3 PDF
16 pages
Business Statistics Unit 2
No ratings yet
Business Statistics Unit 2
17 pages
MTP 4 32 Questions 1684300064
No ratings yet
MTP 4 32 Questions 1684300064
17 pages
Contoh: Analisis Bivariat
No ratings yet
Contoh: Analisis Bivariat
1 page
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
27 pages
Data Handling Class 7 Extra Questions Answers
0% (1)
Data Handling Class 7 Extra Questions Answers
7 pages
Statistics Midterms Notes
No ratings yet
Statistics Midterms Notes
8 pages
Measures of Position - The Quartiles For Ungrouped Data
No ratings yet
Measures of Position - The Quartiles For Ungrouped Data
29 pages
Karl Pearson's Coefficient of Correlation: Formula
No ratings yet
Karl Pearson's Coefficient of Correlation: Formula
2 pages
Basic Statistics (Module - 3)
100% (2)
Basic Statistics (Module - 3)
12 pages
Lesson 4 - Ungrouped
No ratings yet
Lesson 4 - Ungrouped
24 pages
Rohini 27786294869
No ratings yet
Rohini 27786294869
10 pages
B.A./B.Sc. (STATISTICS)
No ratings yet
B.A./B.Sc. (STATISTICS)
34 pages
Research Methods Session 11 Data Preparation and Preliminary Data Analysis (Compatibility Mode)
No ratings yet
Research Methods Session 11 Data Preparation and Preliminary Data Analysis (Compatibility Mode)
9 pages
Okit Tla3
No ratings yet
Okit Tla3
4 pages
Flood Estimation by Log Pearson's Type III Method
No ratings yet
Flood Estimation by Log Pearson's Type III Method
3 pages
Absolute Measure of Dispersion
No ratings yet
Absolute Measure of Dispersion
4 pages
Hypothesis Testing 7,8ppt
No ratings yet
Hypothesis Testing 7,8ppt
58 pages
Business Statistics (B.com) P 1
No ratings yet
Business Statistics (B.com) P 1
99 pages
Lesson Note For S.S 2
No ratings yet
Lesson Note For S.S 2
24 pages
Correlation & Regression in hindi-1
No ratings yet
Correlation & Regression in hindi-1
14 pages
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
No ratings yet
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
7 pages
Central Limit Theorem: Finding The Mean and Variance of The Sampling Distribution of Means
No ratings yet
Central Limit Theorem: Finding The Mean and Variance of The Sampling Distribution of Means
5 pages
wst03 01 Que 20220611 1
No ratings yet
wst03 01 Que 20220611 1
28 pages