Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 2 Descriptive Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

CHAPTER 2: DESCRIPTIVE STATISTICS

CHAPTER DISCUSSIONS

2.1 Organizing Data


Organizing and Graphing qualitative data
 Frequency distribution table
 Contingency table/Two-way table
 Bar Chart
 Pie Chart
Organizing and Graphing quantitative data
 Histogram
 Stem and Leaf
2.2 Numerical Descriptive Measures
(ungrouped data)
Measures of Central Tendency
 Mean
 Median
 Mode
Measures of Variation
 Range
 Standard deviation and Variance
 Coefficient of Variation (CV)
Measures of Skewness
Measures of Position
 Quartiles (Q1, Q2 and Q3)
 Box-and-Whisker Plot

2.1 ORGANIZING DATA

ORGANIZING AND GRAPHING QUALITATIVE DATA

Frequency distribution table

Describes information concerning a single variable at a time. For Qualitative data.

e.g.

1
Contingency table/Two-way table

The simplest form of a contingency table describes information in a single variable or


multiple variables. An example of a contingency table is shown below.

e.g.
Distribution of students according to sex and
programme at College XYZ in 2004
Sex
Programme Male Female Total
Business Studies 270 180 450
Banking Studies 125 125 250
Art and Design 10 90 100
Accountancy 80 120 200
Total 485 515 1,000
Source : College XYZ

Bar charts

i) Simple bar chart

e.g. Illustrate the following information in a vertical bar chart.


Year 1997 1998 1999 2000
Number of students 250 300 450 500
Answer

2
ii) Multiple bar chart

e.g. Draw a multiple bar chart to elicit the following information.


Course 2000 2001 2002
DBS 23 28 36
DIB 16 20 36
DPA 48 50 70
Answer

iii) Component bar chart

e.g. Draw a component bar chart to illustrate the following information.


Course 2001 2002
Statistics 35 45
Mathematics 45 60
Accounting 20 45
Answer

3
Pie charts

e.g. Use the data in example 5 and draw a pie chart for courses attended in 2001.
Answer

ORGANIZING AND GRAPHING QUANTITATIVE DATA

Histogram

A bar graph of a frequency distribution. It consists of a set of vertical bars which are located
continuously side by side.

The distribution is said to be positively skewed.

Shape of the Distribution

Bell-shaped /Normally Distributed


Has a single peak and Symmetric
Right-skewed/positively
The peak of a distribution to the left.
Left-skewed/negatively
The peak of a distribution to the right.

Stem-and-leaf

4
The stem-and-leaf display is a valuable tool for organizing a set of data and understanding
how the values distribute and cluster over the range of observations in the data set. It shows
the values of the original observations. The display separates data entries into stems
(leading digits on the left) and leaves (trailing digits on the right). A display that organizes
data to show its shape and distribution

e.g. The followings are marks obtained by 20 students in a quiz:-

12, 15, 16, 20, 25, 25, 26, 27, 30, 31, 33, 38, 42, 42, 43, 45, 46, 49, 50, 50.

Represent the data in a stem-and-leaf display. Comment.

Answer
Unit = 10
1 2 represents 12
1 256
2 05567
3 0138
4 223569
5 00

From the display, we observe that


1) The observations range from 12 to 50;
2) Most observations fall from 42 to 49;
3) The shape of the distribution is not symmetrical.

e.g. Raw data of a sample of weekly income of 8 secretaries are as follows:


RM490, RM550, RM570, RM590, RM620, RM640, RM710, RM830
Represent the data in a stem-and-leaf display.

Answer
Unit = 100
4 9 represents 490
4 9
5 579
6 24
7 1
8 3

2.2 NUMERICAL DESCRIPTIVE MEASURES (UNGROUPED DATA)

MEASURES OF CENTRAL TENDENCY

Measure of central tendency is a number (or a character) that is used to represent data set.
It is also known as the average. There are 3 measures: the arithmetic mean, the median,
and the mode.

The Arithmetic Mean ( X )

The arithmetic mean, or simply the mean, is the sum of all the observations divided by the
number of observations. It is also known as the simple average. The data to compute mean
must be at least interval level of measurement.

5
Ungrouped data

x̄=
∑x
n

e.g. X : 110, 112, 98, 100, 115, 95, 100


Find the mean.

Answer

X
X= n
110+112+ 98+100+115 +95+100 730
= 7 = 7 = 104.3
~
Median ( X )

The median is a value located in the center of a distribution. As such, 50% of the
observations are below the median and 50% are above the median. The data must be at
least ordinal level of measurement.

e.g. X : 110, 112, 98, 100, 115, 95, 100


Find the median for the above data.

Answer
Array X : 95, 98, 100, 100, 110, 112, 115
n+1 7+1
~
Location of X = 2 = 2 = 4
~
∴ X = 100

e.g. X : 40, 50, 80, 90, 100, 120, 112, 115


Find the median.

Answer
n+1 8+1
~
Location of X = 2 = 2 = 4.5
90+100
~ =95
∴X = 2

The Mode ( X^ )

The mode is the value (or character) of the observation that appears most frequently. It is
the most popular value (or character). It can be computed for all levels of measurement of
data: nominal, ordinal, interval, and ratio.

e.g. Find mode for the following data set.


X : 17, 17, 18, 19, 19, 18, 20, 23, 19
Answer
^ = 19
∴X

e.g. What is the value of mode for these data?


X : 47, 48, 49, 47, 49, 47, 49, 50
6
Answer
^ = 47 and 49 (bimodal)
∴X

e.g. Find the modal value for these data


X : 90, 91, 92, 93, 94, 95, 96, 97
Answer
X^ does not exist.

Comparisons between the mean, the median, and the mode

(i) X is popular because it is based on all the observations. It can be manipulated


algebraically.
However, X is sensitive to outliers (or extreme values).
~
(ii) X is especially useful for ordinal data. It is not sensitive to outliers, thus it is a measure
for skewed data set.
(iii) X^is useful in describing nominal and ordinal data. It is not affected by extreme values.
There might be more than one mode or none at all.

Relationship between the mean, the median, and the mode

a) If mean = median = mode, the distribution is symmetrical or bell-shaped or no


skewness.
b) If mean > median, the distribution is skewed to the right or positively-skewed.
c) If mean < median, the distribution is skewed to the left or negatively-skewed.

e.g. Distributions of students’ weights with the following statistical analysis.


X = 60 kg, ~ ^ = 52 kg
X = 55 kg, X
Determine the shape of the distribution of students’ weights. Interpret your
findings.

Answer
~ ^
Since X > X > X , the distribution is said to be positively skewed. This means
that most students have weights less than 60 kg.

MEASURES OF VARIATION

Data Set A Data Set B Data Set C


100 100 80
100 105 70
100 102 120
100 103 140
100 90 90

100 100 100


7
X:

Variations of data in each set can be seen clearly. The variations can be measured by:
(i) Range
(ii) Standard deviation and Variance
(iii) Coefficient of Variation

Range (R)

X
R = largest
−X smallest
e.g.
For the above data sets, compute range for each of them.
Answer
Data set A : R = 100 – 100 = 0
Data set B : R = 105 – 90 = 15
Data set C : R = 140 – 70 = 70
2
Standard deviation (s) and Variance (s )

Standard deviation is the square root of variance, as such variance is the square of standard
deviation, i.e.

Standard deviation = √ Variance ;


2
Variance = (Standard deviation)

√1
S = n−1
[ ( ∑ X )2
∑X − n
2
]
e.g.
X : 17, 17, 18, 19, 19, 18, 20, 23, 19
Calculate the standard deviation.

Answer

√ [ ∑ X − ∑n ] √ [ ]
2
1 ( X )2
2 1 (170 )
3238−
S = n−1 = 9−1 9 = 1.83

Coefficient of variation (CV)

Whenever two samples have the same units of measure, the variance and standard
deviation for each can be compared directly. For example, suppose an automobile dealer
wanted to compare the standard deviation of miles driven for the cars she received as trade-
ins on new cars. She found that for a specific year, the standard deviation for Buicks was
422 miles and the standard deviation for Cadillacs was 350 miles. She could say that the
variation in mileage was greater in the Buicks. But what if a manager wanted to compare the
standard deviations of two different variables, such as the number of sales per salesperson
over a 3-month period and the commissions made by these salespeople?

A statistic that allows you to compare standard deviations when the units are different, as in
this example, is called the coefficient of variation. Coefficient of variation is used to compare
dispersion between data sets.
8
S
CV = X̄ x 100%

Interpretation of CV:-
(i) Data set with larger CV means the distribution is more dispersed compared to data set
with smaller CV.
(ii) Data set with smaller CV means the distribution is more consistent (or more uniform)
compared to data set with larger CV.

e.g. Compare the consistency of number of hours spent watching television among the
3 locations below.

Residential Industrial Town


35 43 36 39 28 27 15 8 14 12 15
28 29 25 38 27 49 25 30 32 21 20
26 32 29 40 35 4 41 34 7 11 24
41 37 31 45 34 10 30
Answer

Residential
678
X̄ = 20 = 33.9

√ [ ]
2
1 (678 )
23656−
S = 20−1 20 = 5.95
5. 95
CV = 33 . 9 x 100% = 17.6%

Industrial
201
X̄ = 8 = 25.13

√ [ ]
2
1 (201 )
6677−
S = 8−1 8 = 15.25
15 .25
CV = 25 .13 x 100% = 60.7%

Town
228
X̄ = 12 = 19

9
√ [ ]
2
1 (228)
5296−
S = 12−1 12 = 9.36
9 .36
CV = 19 x 100% = 49.3%

The distribution of time spent on watching television in Residential area is the most
consistent. The least consistent is the Industrial area.

MEASURES OF SKEWNESS

Pearson’s coefficient of skewness is given as

mean−mod e
Coefficient of skewness 1 = s tan dard deviation
3 (mean−median)
Coefficient of skewness 2 = s tan dard deviation

Interpretation of the coefficient:-


Coefficient Type of distribution
>0 Positively skewed (or skewed to the right)
=0 Symmetrical (or normal distribution)
<0 Negatively skewed (or skewed to the left)

e.g.

The following information were calculated for the distribution of money received by 50
children.
X̄ = 115.60; ~ X^ = 114.55;
X = 115.33; S = 14.49
Determine the shape of the distribution by computing the coefficient of skewness.

Answer
X̄− X^
Coefficient of skewness 1 = S
115. 60−114 . 55
= 14 . 49 = 0.07 = 0
The distribution is said to be symmetrical.

p.s. The computation can also be done using the second coefficient of skewness.
~
3 ( X̄ − X )
Coefficient of skewness 2 = S
3 ( 115. 60 - 115. 33 )
= 14 . 49 = 0.06 = 0

MEASURES OF POSITION

10
Quartiles

Quartiles divide an array into 4 equal parts. Thus there are 3 quartiles: the first quartile ( Q 1 ),
the second quartile (Q 2 ), and the third quartile (Q 3 ).

25% 25% 25% 25%

Q1 Q2 Q3

e.g.
Find the first, second and third quartiles for the following data.
X: 95, 98, 100, 100, 110, 112, 115
Answer
n+1 7+1
Location of Q 1 = 4 = 4 = 2nd
∴ Q 1 = 98
n+1 7+1
Location of Q 2 = 2 = 2 = 4th
∴ Q 2 = 100
3( n+1 ) 3(7 +1 )
Location of Q 3 = 4 = 4 = 6th
∴ Q 3 = 112
Box-and-Whisker Plot

The Box-and-Whisker Plot is used to display some information on measures of position and
to determine the shape of the distribution. These plots involve five specific values:

1. The lowest value of the data set (minimum)


2. Q1
3. The median
4. Q3
5. The highest value of the data set (maximum)

These values are called a five-number summary of the data set.

A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum
data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing
a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box
passing through the median or Q2.

Procedure for constructing a boxplot

1. Find the five-number summary for the data values, that is, the maximum and minimum
data values, Q1 and Q3, and the median.
2. Draw a horizontal axis with a scale such that it includes the maximum and minimum data
values.
3. Draw a box whose vertical sides go through Q1 and Q3, and draw a vertical line though
the median.
4. Draw a line from the minimum data value to the left side of the box and a line from the
maximum data value to the right side of the box.

11
Information obtained from a Box Plot.

1. a. If the median is near the center of the box, the distribution is approximately symmetric.
b. If the median falls to the left of the center of the box, the distribution is positively
skewed.
c. If the median falls to the right of the center, the distribution is negatively skewed.

2. a. If the lines are about the same length, the distribution is approximately symmetric.
b. If the right line is larger than the left line, the distribution is positively skewed.
c. If the left line is larger than the right line, the distribution is negatively skewed.

12

You might also like