Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Stat - Sir Corpuz

Download as pdf or txt
Download as pdf or txt
You are on page 1of 213




  

 
  !"" 
   "          
"  # 
#    $ #    #  %    
$   #     # #
 & %#"#  #      
'   #
( # (# (# ($  # % (   
%#(""("
      "      
         "     


 
 
  
 

 
  



      !"

! '  %  # ##)*


 +%
!  ,' '#

) ! ((

) - 



 
 
 #
! 

 
  
 

 
 
  
 

 


 
 
  
 

 


 

 !" #

$  %$ & '( %)*


 
      
      

      
      
   
    
 
     !    
"   
      #         $   !%
 !      &  $ 
 '        
'     ( $         '     # %  %
) 
 %* 
 %'     
$ 
'   
   
  +     "  %


  &
  '   ! # 

  
        $,  
 (   $    
- $$$  
. /"/"#012"   
)*3- +)
*  !4 !& 5!6%55787&  % 

2  9:;567<=8>!<7>%2  ?9:;567<=8>!<7>;
0 @!
 
* 
  
A
   &  B
$!+,-./010/2/20-3420$ &5)!*
     
 C    
   2   

  

  
       
      
         !  
"C  
    
      
   
(    %
    
 
   
  
 
 
2 
 
% 
% 
% 
% 


     $        
 $
 
   $C  

      
  
 C      
 
    
  
   
  
 
CC
-  $$$  

/"/"#012"   
)*3- +)
*  !4 !& 5!6%55787&  %) C
  9:;567<=8>!<7>%D?9:;567<=8>!<7>;
0 @!
 
  E & "
  E + CA

 B
$!+,-./010/2/20-3420-C F8>78C  /"/"#012"   
)*3- +)
 


" 

  &  8>78

About the Author


Dr. Onofre Sumindo Corpuz is native in Matalam, Cotabato. He
finished his Doctor of Philosophy in Forestry in 2008 at the
University of the Philippines, Los Baos Laguna; Master of Science
in Agriculture and Master of Arts in Educational Administration
(Acad. Requirements) and BS Forestry studies at the Cotabato
Foundation College of Science and Technology, Doroluman
Arakan, Cotabato in 2002 and 1993 respectively. He also earned
units in Research and Development Management at the University
of the Philippines Open University in 1997.
The author is an Associate Professor of the Cotabato Foundation
College of Science and Technology, Doroluman, Arakan, Cotabato
PHILIPPINES. At present, he was designated Dep. Director of the
Research and Extension at the same time Chairman of the MS
Degree Program of the same School. He has presented various
scientific papers in National and International conferences and
seminars.
He has published four books entitled ROOT GROWTH
POTENTIALS AND HERITABILITY OF GMELINA ARBOREA,
CARBON BUDGET AND STEM CUT PROPAGATION
TECHNOLOGIES
FOR
RUBBER
TREES,
FORESTRY
STATISTICS AND RESEARCH METHODS, and PROCEDURES IN
FORESTRY AND AGRICULTURAL RESEARCH. He also published
scientific papers in an ISI index Journal such as the International
ASIA Life Science Journal, and the Philippine Journal of Crop
Science. Published two articles in the Book of Abstract of the 2nd
World Agroforestry Congress held at Nairobi, Kenya in 2008.
The author has various Professional and Scientific affiliations like;
Society of Filipino Foresters, Philippine- ASIAN Japan Association,
Forests and Natural Resources Research Society of the Philippines,
Philippine Society for the Study of Nature, CSDi Community
Development (www. Developmentcommunity.csd-i.org.). Member of
the following group: Forest and Trees, Subsistence Farming,
i

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Feature your Project, Adapting to Climate Change, Environmental


Education, Conservation and Restoration, LinkedIn Group Member
(www.linkedin.com): Society for Conservation Biology, Precision
Forestry, Asia Life Science Academy (ALSA) Alumni and Network,
Climate Eval: Evaluation of Climate Change, Esri Forestry Group,
Freelance Natural Resources Professionals, GIS & Geospatial
Technology, GIS Mapping and Geo Technology Professionals,
GISuser-Mobil Mapping and Field GIS, GRASS GIS, Natural
Resources
Management
Professionals,
Professional
in
Environmental Risk Assessment, SEARCAL, Society for
Conservation Biology, Society for Conservation GIS, Arboriculture
and
Landscape,
Environmental
Compliance
Consultants,
Environmental Sustainability Professionals, International Society of
Arboriculture, National Spatial Data Infrastructure, Trees,
Sustainable Forestry Initiatives and Supporters, and National
Biodiversity Network.

ii

ACKNOWLEDGEMENT
The authors express their sincere thanks and gratitude to their
brothers and sisters, kids, friends and relatives.
To all those who help and encouraged them finished this humble
work. They have been indebted with ideas, inspirations, moral and
spiritual supports, and suggestions which contributed much to the
realization of this book.
Above all, the Almighty ALLAH for his daily blessing and guidance.

iii

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

FOREWORD
This book is design to College students and other Professionals
who would like to enhance knowledge on social and educational
statistics. The book deals with uses and importance of statistics,
collection and presentation of data, sampling technique, data
organization, descriptive statistics such as frequency distribution,
measures of central tendencies, position, dispersion/ variability,
skewness, kurtosis, and normal distribution of data.
Discussion and concrete illustrations on regression and correlation
analysis, t- test, F-test, z-test, chi-square test, alternative nonparametric analysis of variance, one-way classification and two-way
classification analysis of data. The methods of social research
presented will enlighten young researcher on the appropriateness of
statistical tools they will use in the analysis of data.

The Authors

iv

Title

Page

Chapter 1. INTRODUCTION
Meaning and Uses
Classification of Statistics
Descriptive Statistics
Inferential Statistics
Parametric Statistics
Non-Parametric Statistics
Concept of Population and Sample
Parameters and Estimates
Subscript and Summation Notations
Chapter 2. COLLECTION AND PRESENTATION OF
DATA
1. Types of Data
a. Qualitative Data
b. Quantitative Data
2. Sources of data
3. Types of Error in Data
4. Methods of Gathering Data
5. Sampling Technique
a. Probability/Scientific Sampling
Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Multi Stage
b. Non-Probability/non-scientific Sampling
Purposive Sampling
Quota Sampling
Convenience Sampling

1
1
1
1
2
2
3
3
4
5
3
8

8
8
8
8
9
11
12
12
12
13
14
15
18
18
18
18
19
v

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Incidental sampling
Chapter 3. DATA ORGANIZATION
Tabular Presentation
Graphs and Diagram
Chapter 4. FREQUENCY DISTRIBUTION
Relative frequency distribution
Cumulative frequency distribution
Cumulative percent
Chapter 5. MEASURES OF CENTRAL TENDENCIES
The Mean
The Median
The Mode
Chapter 6. MEASURES OF POSITION
The Quartiles
The Deciles
The percentiles
Chapter 7. MEASURES OF DISPERSION/VARIABILITY
The Range
The Inter quartile Range
The Quartile Deviation
The Average Deviation
The Standard Deviation
Chapter 8. MEASURES OF SKEWNESS AND
KURTOSIS
Positive Skewness
vi

19
20
20
21
24
25
27
30
32
32
36
39
42
42
46
53
61
61
62
63
65
67
71

71

Negative Skewness
Moment Coefficient of Skewness
Measures of Kurtosis
Leptokurtic
Mesokurtic
Platykurtic
Moment Coefficient of Kurtosis
Chapter 9. NORMAL DISTRIBUTION
The Normal Curve
Standard Normal Scores
Areas Under the Normal Curve
Chapter 10. T-TESTS
Test for Dependent or Correlated Samples
T-test for Independent Samples
Test for Equality of Variance
Testing for the Difference in Means of
the Two Independent Samples

71
72
76
76
76
76
80
80
81
82
84
84
94
96
97

Chapter 11. SIMPLE LINEAR REGRISSION ANALYSIS

110

Evaluation of the Simple Linear Regression Equation

112

Chapter 12. CORELATION ANALYSIS


Estimation of the Correlation Coefficient
Pearson Product-Moment Coefficient of Correlation
Spearman Rank Correlation
Chapter 13. ALTERNATIVE NON-PARAMETRIC
TECHNIQUES
McNemar Change Test
Sign Test
Wilcoxon Signed-Ranks Test

117
117
122
126
130

130
134

vii

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

For Two Dependent Samples


Independent Sample (two-sample case)
Wilcoxon-Mann-Whitney Test
Friedman Two-Way Analysis of Variance by Ranks
Multiple Comparisons Between Groups of Conditions
From the Result of the Friedman Two-Way ANOVA
Chi-square Test for r x k Tables
Kruskal Wallis Test

141
143
146
149
153
155
159

Chapter 14. ANALYSIS OF VARIANCE


One-Way Classification Analysis
Two-Way Classification Analysis

162
162
174

APPENDICES

182

REFERENCES

193

viii

Introduction
Meaning and Uses of Statistics
Statistics is a tools or methods use in data analysis. A
scientific methods for collecting, organizing, analyzing, and
interpreting quantitative data, as well as drawing valid conclusions
and making reasonable decisions on the basis of such analysis. It is
an essential tool in almost all fields of knowledge.
Collecting data refers to the process of obtaining qualitative
information and quantitative or numerical measurements needed in
the study.
Organizing is the tabulation/or presentation of data into
tables, graphs or chart to formulate logical and statistical
conclusions from the collected measurements.
Analysis of data pertains to the process of extracting relevant
information from the given data to formulate numerical descriptions.
Interpretation of data on the other hand refers to the task of
drawing conclusions from the analyzed statistical data. It also
involves the formulation of forecast or predictions about larger
populations based on the data collected from sample populations.
Classification of Statistics
Statistics can be classified into two:
1.

Descriptive Statistics describes and analyze a given


group without drawing any conclusions and inferences
about a larger group. It is also known as Deductive
1

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Statistics. Its concern includes the gathering,


classification, and presentation of data such as in
frequency
distribution,
cumulative
frequency,
percentage frequency, measures of central tendencies,
measures of position, dispersion or variation, Skewness
and Kurtosis.
2.

Inferential Statistics drawing important conclusions


about the population inferred from the analysis of the
sample. It is also known as Inductive Statistics. Testing
of hypothesis through t-test, z-test, f-test, chi-square,
Analysis of Variance (ANOVA) using parametric and
non-parametric variables, regression and correlation
analysis.
Types:
1.1

Parametric Statistics. Are inferential techniques


which make the following assumptions regarding
the nature of the population from which the
observations are drawn.

- the observation must be independent. This means


that in choosing any element from the population to
be included in the sample must not affect the
chances of other elements for inclusion.
- the population must be drawn from normally
distributed populations. The crude way of knowing
that the distribution is normal is when the measures
of central tendencies are all the same
(mean=median=mode). Bell-shape curve will be
produced when drawing a curve for this purpose.
- the populations must have the same variance
(homoscedastic populations)

- the variables must be measured in the interval or


ratio scale, so that the result can be interpreted.
2.2. Non-Parametric Statistics. It makes fewer and
weaker assumptions such as:
- the observations must be independent and the
variable has the underlying continuity.
- the observations are measured in either the nominal
or ordinal scales. To have better understanding on
when to use the parametric and non-parametric
statistics, please refer to the table below:
INFERENTIAL
STATISTICS
Parametric

DISTRIBUTION

MEASUREMENT

Normal

Interval or ratio

Non-Parametric

Unknown
Distribution

Nominal or ordinal

Concept of Population and Sample


The central notion in any sampling problem is the existence of
a population. A population is an aggregate of unit values, where the
unit is the object upon which the observation is made, and the value
is the property observed on that object. For example, in a class in
which the unit being observed in the individual student is age. The
population is the aggregate of all ages of the students in the class.
The GPA of the same student would be another population. Often
times, it is impossible or impractical to observe the entire group or
aggregate especially if it is large. Hence, one may examine a small
part of the entire group called sample.
Parameters and Estimates
To characterize the population as a whole, constants that are
called parameters are often used. Examples are mean value of
3

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

height of student per class in a population of 3 sections and


variability among these measured unit values.
Often, but not always, one may estimate the population mean
or total. The value of the parameter as estimated from a sample is
referred to as the sample estimate or simply the estimate.
Exercise No. 1
Test Yourself: (Follow instruction carefully)
Get sheet of yellow paper and answer the following by
writing the letter of choice after the item number in your paper.
COLUMN A

COLUMN B

__1.The process of extracting


relevant information from a set of
data

a. Analysis of data

__2.Refers to the drawing of


conclusions or generalizations
from the analyzed data

b. Parametric Statistics

__3.A science of collection,


presentation, analysis and
interpretation of data

c. Data Interpretation

__4.This refers to the collection of


facts where the researcher get
information

d. Data Presentation

__5.This refers to the organization of


data into tables, graphs and charts
to get a clear picture of
relationship

e. Organization of Data
f. Descriptive Statistics

__6.Each and every element of a large


group of data

g. Variable

__7.Concern with summarizing values

h. Inferential Statistics

to describe group characteristics


of data
__8.Statistics which make fewer and
weaker assumptions using
nominal and ordinal measurement

i. Sample

__9.An inferential technique, which


assumes normality of distributions
of population from which data are
taken

k. Non-parametric
Statistics

__10.Statistics that needs critical


thinking and judgment using
complex mathematical procedures

l. Statistics

j. Population

Subscript and Summation Notations


A subscript is a number or a letter representing several
numbers placed at the lower right side of a variable. It is used to
specify the item referred to. For example, if we have five trees of
different diameters, and let X represent the tree, we will let X1 stand
for the stem diameter of the first tree, X2 stand for the stem diameter
of the second tree and X3 for the third tree.
Sometimes we would like to summarize in just one term, the
idea that there are five trees with their corresponding stem
diameter. Here, instead of a numerical value, we may use a lettersubscript. We would then write the symbol as Xi where i stands for
the numbers 1,2,3..n. in our particular example, i would stand for
the numbers 1 to 5 because there are five trees with different stem
diameters. Hence,
Xi stands for X1, X2, X3,,Xn
The summation symbol is used to denote that the
subscripted variables are to be added.
n

i=1

Is read as the summation of Xi where i is 1 to n this


indicates that we get X1 + X2 + X3 + Xn.
5

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The Summation Sign:


denotes summation
Consider these values: X1, X2, X3, X4
n

Xi

i-1

i=1

X1 + X2 + X3 + X4

Dot Notation:
Another way of representing a sum of observations is to use a
system called the DOT NOTATION system.
Consider this set of data:
Observation

Group 1

Group 2

Group 3

Group 4

. Total

X11 - 3

X21 -

X31 - 1

X41 - 2

X.1 = 12

X12 - 5

X22 - 3

X32 - 5

X42 - 3

X.2 = 16

X13 - 4

X23 - 1

X33 - 3

X43 - 4

X.3 = 12

X14 - 2

X24 - 5

X34 - 2

X44 - 2

X.4 = 11

X15 - 1

X25 - 2

X35 - 4

X45 - 5

X.5 = 12

Total

X1. = 15

X2. = 17

X3. = 15

X4. = 16

X.. = 63

Xij = stands for an ijth trees where i represents groups and j


represents the number of observation.

X1.

= X11 + X12 + X13 + X14 + X15 =

X2 .

= X2j

X3 .

= X3j

X4 .

= X4j

i=1
5

i=1
5

i=1

Xi

i=1

X..

= X1. + X2. + X3. + X4.=

Xi. = 63

i=1

Example:
Variety

II

III

20

15

10

22

15

13

25

17

15

Total

X1. = 67

Total

X.3 = 57

X23 = 17
X32 = 13
X11 = 20
X.1 = 45
X2. = 47
X33 =15
X.2 = 50
X.. = 15

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Collection and Presentation of Data


1. Kinds of Data
a. Qualitative data. are those that are not amenable to
numerical measurements.
Examples:

b.

yes or no responses
sex, civil status
ratings such as poor, fair, satisfactory, good
or excellent
In agriculture: climatic region, geographical
locations , soil type, slope, land-used, kind of
insecticides and fertilizers

Quantitative data. are those that are amenable to


numerical measurements and statistical analysis.
These are either counts or measures, hence real nos.

Examples:

diameter, heights, test score, weight,


distance, number of class, size of leaves,
basal area, volumes, amount of currency,
flying height, sea level, yield, leaf area
index, panicle, number of branches, carbon
density, biomass density

2. Sources of Data. Data can be obtained from principal


sources such as:
a. Direct or primary data. This are data arise from original
investigations such as observations, interview,

questionnaires, experiments, measurements and the


like
b. Secondary data. Contained vague, general and at times
chopped description of phenomena. This are more likely
subjected to typographic errors.
3. Types of Error in Data
a. Sampling error is the difference between the estimated
and the true values.
That is,
SE = Yi -
(associated with the ith sample)
Where:
Yi = estimated mean of the ith sample
= true population mean
when Yi - = 0, this implies that there is no sampling error.
Consider the set of data:
Y1 = 1
Y2 = 2
Y3 = 3
To get the mean: sum all values divided by the number
of values
Y1 + Y2 + Y3 = 1 + 2 + 3 = 6
3
3
3
= 6/3, the population mean

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Now, consider the sampling experiment of drawing 2 values


from this set of

data without replacement. The number of possible

samples is (3/2) = 3!/1!2! = 3.


As follows: (N/ ) = N!/(N-n)! n!
Sample No.
1

(Y1,Y2)

(1, 2)

Sample Mean
Y1 = 3/3

(Y1,Y3)

(1, 3)

Y2 = 4/3

(Y2,Y3)

(2, 3)

Y3 = 5/3

Therefore, the sampling error for the ith sample is:


1.

3/3 6/3

-1.00

2.

4/3 6/3

-0.67

3.

5/3 6/3

-0.33

Note: it is necessary to know the sampling error, so that we will be


able to reduce, assess and investigate the nature and extent
of SE. Sampling error can be reduced by increasing the size
of the samples.
a. Non-Sampling Error this is usually committed in:
- recording data
- vague definition of terms
- inaccurate measuring devices/techniques
- inability of respondents to give accurate results
- inconsistencies between experimental method and
method of analysis used
- computational mistakes
Non-sampling error is unaffected by sample size

10

4. Methods of Gathering Data


a. The direct or interview method. This is a method of personto-person exchange between the interviewer and the
interviewee. The interview method provides consistent and
more precise information since clarification may be given
by the interviewee. Questions may be repeated or modified
to suit each interviewees level of understanding. This
method is time consuming, expensive and has limited field
coverage.
Types:
a.
b.
c.
d.

Focus Group Discussion


Personal interview
House to house interview
Phone interview

b.The indirect or questionnaire method. Written responses are


given to prepared questions. Prepared questionnaires will
be constructed and submitted to the clients to answer each
written questions. This method is inexpensive and can
cover a wide area in a shorter span of time. Clients may feel
a greater sense of freedom to express views and opinions
because their anonymity is maintained.
c.The registration method. Gathering of information is
enforced by certain laws. Examples are the registration of
births, death, motor vehicles, marriages, and licenses. The
advantage of this method is that, information is kept
systematically and made available to public because of the
requirement of the law.
d.The observation method. The investigator observes the
behavior of the persons or organizations and their

11

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

outcomes. It is usually used when the subjects cannot talk


or write. The method makes possible the recording of
behavior at the appropriate time and situation.
e.The experiment method. This method is used when the
objective is to determine the cause and effect relationship of
certain phenomena under controlled conditions. Scientific
researchers usually use the experiment method in testing
hypothesis.
5. Sampling Technique
It is not necessary for a researcher to examine every
member of a population to get data or information about the
whole population. Cost and time constraints will prohibit one
from undertaking a study of the entire population. At any
rate, drawing sample units systematically or at random is
appropriate. If sample is done in this way, we can validly
infer conclusions about the entire population from the
sample taken.
There are two methods of drawing samples from given
population. These are:
5.1. Probability/Scientific Sampling. Every element of the
population has equal chance of being included in the
sample and the probability that any specified unit of
population is included in the sample is governed by this
known chance. The types of probability sampling are the
following:
a. Random Sampling - is the method of selecting a
sample size (n) from a population (P) such that each
member of the population has an equal chance of being
included in the sample and all possible combinations of

12

size have an equal chance of being selected as a


sample. A prerequisite for the randomization is a
complete listing of the population.
Ways of drawing sample units at random.
- Lottery/draw lot sampling
- Table of random numbers
- Drawing cards
There are two ways of using the Table of random
Numbers to wit:
- Direct Selection method. Applicable when there
are few samples to be selected. The sample unit
can be directly from the table
- Remainder Method. This method is actually a
combination with direct selection. The method is
used when the number obtained from the Table of
Random Numbers exceeds the digit in the
sampling frame
c. Systematic Sampling this method involves selecting
every nth element of a series representing the population.
A complete listing is also required in this method. Under
this system, the sample units may be picked in the
following manner:
For instance,
P = 100
n = 10
The value of n may be obtained by dividing the total number
of elements in the population by the desired sample size. Thus,
P/n = 100/10 = 10th

13

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The 10 sample units would therefore be the persons holding


the following numbers: 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100
sample units.
Types of Systematic Sampling:
- Stratified Sampling. The population is divided into
groups based on homogeneity in order to avoid the
possibility of drawing samples whose members come
from one stratum.
The stratified random sampling may be used when it is
known in advance that a special segment of the
population would not have enough persons/objects in
the sample when simple random sample were drawn.
We allocate n either equally or proportionately to each
stratum
a. Equal Allocation. This methods involves taking the
same number of units from each stratum to make up
the desired sample size n, thus,
k = n/ns
where: k = number of sample elements per stratum
n = desired sample size
ns = number of strata
Example:
If a researcher decided the sample size n=200 taken from 5
strata. What will be the number of sample elements per strata?
Solution:
n = 200
ns = 5

14

k = n/ns
k = 200/5 = 40
Hence, 40 units will be taken from each stratum to
constitute the 200 sample elements decided by the researcher
b. Proportional Allocation. Stratified sampling using
proportionate allocation is used to guarantee a more
representative sample from each stratum. It is
expected that the more population in a stratum, the
more sample units will be taken.
Suppose the researcher decided to take 80 forestry
students as sample which is proportionate in the four
department of the College of Education presented
below:
Department
Biological Science
English
Mathematics
Pilipino
Total

Population
(P)
150
140
200
75
565

Proportion
wi = P/Tp
150/565
140/565
200/565
75/565

k = nswi
21
20
28
11
80

P = population
wi = proportion of the population (P) with total population (Tp)
k = number of elements to be taken per stratum
ns = decided sample size

Cluster Sampling. May refers to an area sample


because it is frequently applied on a geographical basis.
Districts or brgys of a municipality or city are selected.
The districts or brgys constitute a cluster. It does not
consider the homogeneity of a population instead;
heterogeneity of the sample is more precise.

15

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

Determination of sample size in sampling


Determining a sample size is basically anchored to the
decision of researchers. Some researchers have no idea on how to
determine sample size in a given population scientifically and
arbitrarily. Major criterion to them is to select 50% + 1 is sufficient
for their study. These ideas are not scientific in nature.
Take note that sampling is advisable only if the population is
equal or more than 100. Complete enumeration of population is
advisable when the total population under study is less than 100.
Calmorin 1995 suggested the equation to determine sample
size scientifically:
Ss

NV + [Se2 (1 p)]
NSe + [V2 x p(1 p)]

Where:
Ss
N
V

= sample size
= total number of population
= the standard value (2.58) or 1% level of
probability with 0.99 reliability
= sampling error (0.01)
= the largest possible proportion (50%)

Se
p

Example:
The Forestry students want to determine the sample
size of 1000 trees to measure diameter
Solution:

Ss

NV + [Se2 (1 p)]
NSe + [V2 x p(1 p)]

= 1000(2.58) + (0.01)2 x (1 0.50)


1000(0.01) + (2.58)2 x 0.50(1 0.50)
16

= 2,580 + 0.0001 x 0.50


10 + 6.6564 x 0.50 x 0.50
= 2,580 + 0.00005
10 + 1.6641

Ss

= 2,580.00005
11.6641
= 221

Another method of determining sample is:


n =

P/1 +Pe2

where:
n = sample size
P = population
e = sampling error (usually 1% and 5%)
Example:

P = 1000
e = 5%

Solution:
n =
=
=
=
=
n =

P/1 +Pe2
1000/1 + 1000(0.05)2
1000/1 + 1000(0.0025)
1000/(1 + 2.5)
1000/3.5
286

- Multi-stage Sampling. Uses several stages or phases


in getting the sample from the general population but
selection of sample is still done at random. This
technique is useful in conducting nation-wide survey
involving a large population.
Example: A Country survey on climate change
mitigation and adaptation practices. The researcher
decided to conduct the survey in 5 Regions in the
Philippines with 3 Provinces per Region, 4 municipalities
17

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

per Province and 4 barangays per municipality. Thus the


total sample barangay will be: 5 x 3 x 4 x 4 = 240
barangays. Number of household will be another stage
to be considered. Say, 10 household per barangay x
240. The total sample household then will be 2,400.
5.2. Non-Probability/non-scientific Sampling. is a
technique wherein the manner of selecting units of the
population depends on some inclusion rule as specified by the
researcher.
Types:
5.2.1. Purposive Sampling. This technique is based on
the criteria or qualification given by the researcher. This
sampling technique is used based on the knowledge of
respondents on the given situation or questionnaires.
Samples are taken if the researcher thinks that the
respondents could supply/give the information needed by
the researcher.
5.2.2. Quota Sampling. This sampling technique is
quick and cheap method to employ. The researcher is
given a definite instruction and quota about the subject
population he will be working on, but the final choice of
the actual respondent is left to his own preference, and is
not predetermined by some carefully operated
randomization plan. So that, not all population are given
chance to be included.
5.2.3. Convenience Sampling. This uses some
instruments or equipment that provides convenience like
telephone, e-mail or hand set to pick his sample units. If
the researcher calls at random or email at random,
people without telephone or not opening an email will not
be given chance to be a respondents.

18

5.2.4. Incidental Sampling. This sampling technique is


applied to those samples which are taken because
they are the most available (Guilford and Fruchter
1973). The researcher will just take individuals as
sample until sample size is completed.

Exercise No. 2

Test yourself:
Answer the following questions:
1. If the Education students are stratified according to
department where they belong (Biological Science,
Mathematics, English, Pilipino, Physical Education), how
many sample shall we get from each Department of we want
to get 200 sample students allocated equally among each of
the departments? Show your solution.
2. Suppose you would like to allocate proportionately the sample
size of 200 among the Forestry Department with population
given in the table below, how many sampling units would you
allocate per stratum?

DEPARTMENT
Biological Science
Mathematics
English
Pilipino
Physical Education
TOTAL

Proportion
wi = P/Tp

k = nswi

180
200
280
210
130
1000

19

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Data Organization
Data that are collected in any form can be organized through
tables, graphs, diagrams/or plots
1. Tabular Presentation
There are basically two types of tables: the general or
reference table and the summary or text table. The general
table is used mainly as a repository of information. It is the
table of the raw data. The summary table on the other hand
is usually small in size and design to guide the reader in
analyzing the data. It is usually accompanies a text
discussion.
Example 1: General reference table. Score of 4 First Year High
School Students in two subjects
Class Section

Math

Science

A
B
C
D

60
45
50
30

75
27
29
50

Example 2: Summary table. Mean score of 1st year high school


students in Math and Science

20

Class Section

Score

A
B
C
D
Weighted Mean

67.5
36.0
39.5
40.0
45.75

2. Graphs and Diagrams


a. Bar graphs one of the most common and widely used
graphical devices. This consists of bars or heavy lines of
equal widths, either all vertical or all horizontal. The length
of bars represents the magnitude of the quantities being
compared. The illustration below is an example of bar
graph:

d. Line Graph. It is a line connecting points or intersection of


the x and y axis in the graph.
Example:

21

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

e.

Pie Graph. Is usually used for percentage distribution of


data.
Example:

Percentage Distribution of Educational Attainment

Scatter Plot. Used to plot relationship of two variables

Daily Income

Example:

AGE

Scatter plot between age and daily income

22

Exercise No. 3
Test yourself:
Answer the following
1. Give and define the two kinds of data
2. What are the types of error in data? how it is
committed?
3. Give the different methods of collecting data. discuss
atleast 3 methods
4. What are the different methods of organizing collected
or measured data?

23

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Frequency Distribution
Frequency distribution is a common technique of
describing a set of data. It is the listing of the
collected/measured data. To organize said data into a
frequency distribution, we need to pick convenient intervals
and tabulate the number of each data that falls into a
particular interval.
Frequency distribution is used when there are many statistical
data recorded/collected. Usually, it is utilized when the
number of data collected exceeds or equal to 30 (n 30).
The following steps are observed in preparing frequency
distribution:
1. Look for the lowest and the highest data recorded
2. Subtract the lowest data from the highest data plus 1.
3. Decide on the number of steps or class intervals. The
maximum number of intervals is 20, minimum number is 5,
and the ideal number is between 10 and 15 inclusively.
4. Determine the interval size by dividing step 2 by the desired
number of intervals. Unless specified, it is advisable to use
the ideal number of intervals.
5. Choose an appropriate lower limit for the first class interval.
This number should be minus 1 of the lowest data or is
exactly divisible
by the interval size.
6. Write the lowest limit at the bottom and from it, develop the
lower limits of the next higher intervals by adding the
interval size to a preceding lower limit until the highest data
is included. From the lowest limits develop also the
corresponding upper limits.
7. Read each data in a set of collected data and record a tally
opposite the class interval to which it belongs.
24

8. Count the number of tallies falling within each class to get


the frequency of each class intervals.
9. Add the frequencies to get the total number of data or
samples (n).
Sample Frequency Distribution of the Test Scores of 55
Students in Math 12
C.I

Tally

Frequency

65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14

/
/
/
//
///
///
////
////
////
////
////-///
//// - ////
//// - //
///

1
1
1
2
3
3
4
3
4
5
8
10
7
3

Class
Boundary
64.5 69.5
59.5 64.5
54.5 59.5
49.5 54.5
44.5 49.5
39.5 44.5
34.5 39.5
29.5 34.5
24.5 29.5
19.5 24.5
14.5 19.5
9.5 14.5
4.5 9.5
0 4.5

Class Mark
67
62
57
52
47
42
37
32
27
22
17
12
7
2

Relative frequency distribution is a tabular arrangement of data


showing the proportion in percent of each frequency to the
total frequency. The relative frequency for each class interval
is obtained by dividing the class frequency by the total
frequency expressed in percent.

25

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

Example:
In our frequency distribution, if the class frequency is 3,
the relative frequency is 3/55 or 5.45%. The rest of the
answers are shown in the following table.

C.I
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14

Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3

rf (%)
1.82
1.82
1.82
3.64
5.45
5.45
7.27
5.45
7.27
9.1
14.55
18.18
12.73
5.45

The relative frequency distribution can be shown graphically


through the use of the histogram or the relative frequency polygon
as in the following representations:
Bar Graph
20

Relative Frequency (%)

18
16
14
12
10
8
6
4
2
0
2

12

17

22

27

32

37

42

47

52

57

62

Class Mark
26

67

Line Graph
20
18
16
14
12
10
8
6
4
2
0
2

12

17

22

27

32

37

42

47

52

57

62

67

Class Mark

Cumulative frequency distribution is a tabular arrangement of


data by class interval whose frequencies are cumulated.
There are two kinds of cumulative frequency distributions.
These are:
1. The less than cumulative frequency distribution whose
sum of frequencies for each class interval is less than the
upper class boundary of the interval they correspond to. It
is graphically represented by a rising frequency polygon
which we shall call less than ogive or < ogive.
Each number in the < cf column is interpreted as
follows: One item is less than 69.5; two items are less than
64.5; 3 items are less than 59.5; five items are less than 54.5;
eight items are less than 49.5; eleven items are less than
44.5; and so forth.

27

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

C.I
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14

Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3

<cf
55
54
53
52
50
47
44
40
37
33
28
20
10
3

Line graph of < cumulative frequency with class mark


60

Cumulative Freq. <

50
40
30
20
10
0
2

12

17

22

27

32

37

42

47

52

57

62

Class Mark

28

67

2. The greater than cumulative frequency distribution whose


sum of frequencies for each class interval is greater than
the lower class boundary of the interval they correspond to.
It is graphically represented by a falling frequency polygon
which we shall call the greater than ogive or the > ogive.
C.I
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14

Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3

>cf
1
2
3
5
8
11
15
18
22
27
35
45
52
55

Each number in the > cf column is interpreted as follows. Fiftyfive items are greater than 69.6; fifty-four items are greater than
64.5 and so on and so forth.
60

Cumulative Freq.

50
40
30
20
10
0
2

12

17

22

27

32

37

42

47

52

57

62

67

Class Mark

29

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Cumulative percent refers to the tabular presentation of the


percentage of cumulative frequency by class interval. This
can be done by getting first the respective relative frequency
of each class. The first rf is equals to the first cumulative
percent and add each rf for the succeeding cumulative
percent done as follows:
C.I
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14

30

Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3

rf
1.82
1.82
1.82
3.64
5.45
5.45
7.27
5.45
7.27
9.09
14.55
18.18
12.73
5.45

<cf
55
54
53
52
50
47
44
40
37
33
28
20
10
3

Cum. %
100
98.17
96.35
94.53
90.89
85.44
79.99
72.72
67.27
60.00
50.91
36.36
18.18
5.45

Exercise No. 4
The following are the scores obtained by a group of 60
College students in Math 12 examination:
MATH 12 SCORES
88
84
42
96
83
54
44
72
82
86

81
63
72
73
98
73
74
62
69
85

79
86
89
39
45
59
65
78
77
73

90
59
78
88
88
91
68
78
86
89

78
89
72
77
69
74
80
68
79
49

82
76
81
43
40
66
81
70
50
75

Problem Solving and Statistical Analysis:


a. Prepare a frequency distribution of the above scores, using a
class interval of 10
b. Present the frequency distributions as two frequency polygons
(Line and histogram). Plot both frequency on the same
graphs.
c. Compute the relative frequency, cumulative frequency, and
cumulative percent

31

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Measures of Central Tendencies


The measures of central tendency give concise information
about the nature of the distribution of enumerated raw data. They
serve as the representatives of the entire distribution of the raw data
and present appropriate ways of how the data tends toward the
center. There are three commonly used measures of central
tendency: the mean, the median and the mode.
The Mean
The mean is the most frequently used measured of central
tendency because it is simple. Since it is based on all sets of data, it
summarizes a lot of information. It is also the most reliable measure.
Moreover, the mean is required as a basis for the computation of
other statistical measurements. Thus, it is widely used in statistics.
Mean can be applied in two ways: a. when the number of raw
data is small (n<30); and b. when the data set is large (n 30).
When data are small (n < 30), the mean will be calculated as
follows:
X = X
n
where:
X

X
n
32

= mean
= symbol for summation
= individual data
= total number of population

Example:
Calculate the mean scores of 10 students in Math and
Science subjects
No.
Math
Science
66
50
1
78
55
2
89
60
3
88
87
4
97
90
5
78
59
6
59
88
7
85
89
8
84
92
9
79
95
10
Total
765
803
Mean
76.5
80.3
When the number of data is large (n 30), it is easy to
compute the mean by grouping the set of data in terms of frequency
distribution. The mean from a frequency distribution may be
obtained in almost the same ways as the mean from raw data is
computed.
There are many ways in determining the mean when the
number of data is large: a. by midpoint; b. by the class-deviation
method, lower limit method and upper limit method
The mean from a frequency distribution by midpoint method
may be computed using the formula below:
X = fiXi
n
Where:
fi = frequency of the ith interval.
Xi = class mark of the ith class interval
n = number of population

33

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Example:
Calculation of Mean from frequency Distribution of scores of
40 students in Math using mid- point method.
C.I
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24

Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
X = fiXi = 1,875
n
40

Xi
72
67
62
57
52
47
42
37
32
27
22

fiXi
144
134
186
114
416
423
84
111
160
81
22
fiXi = 1,875

= 46.875

Calculation of Mean from frequency Distribution using classdeviation method


C.I
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24

Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40

di
+5
+4
+3
+2
+1
0
-1
-2
-3
-4
-5

X = Xo + {fidi} i = 47 + {-1} 5 = 46.875


n
40
34

fidi
10
8
9
4
8
0
-2
-6
-15
-12
-5
fidi = -1

The steps in determining the mean from frequency distribution


using class-deviation method are as follows:
1. Take the middle lower class limit as an assumed mean
2. Assign zero (0) corresponding to the assumed mean and
with positive integers above zero deviation and negative
integers below it. This column is denoted by di.
3. Multiply the deviations by the corresponding frequencies to
get values for column fidi and then sum up algebraically to
get fidi.
4. Add the frequencies to get the total number of data (n).
5. Divide fidi by n and then multiply the quotient by the
interval size to get the correction value.
6. Add the correction value obtained with the assumed mean.
The result is the actual mean.
Calculation of mean by Lower Class Limit from frequency of
scores of 40 students in Math
C.I
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24

Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
X = flc + i 0. 5
n 2

flc
140
130
180
110
400
405
80
105
150
75
20
Flc=1,795
= 1,795875 + 5/2 0.5 = 46.875
40

Where:
X = mean
= summation notation
35

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

f = frequency
lc = lower class limit
i = class interval size
n = number of observation
uc= upper class limit

Calculation of mean using the Upper Class Limit method


C.I
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24

Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
X = fuc - i 0. 5
n 2

fuc
148
138
192
118
432
441
88
117
170
87
24
fuc=1,955
= 1,95575 - 5/2 0.5 = 46.875
40

The Median
Another measure of central tendency is the median. It is a
point measure that divides the distribution of arranged data from
highest to lowest or vice versa in half; thus, half of the data falls
below the median and another half of the data falls above the
median. It is the most stable measure of central tendency. The
value of the median depends on the number of data, and not on its
magnitude. If most of the data are high, the median is high, and if
most of the data are low, the median is also low.
In identifying median, data should be arranged in the order of
magnitude, either in ascending or descending order, provided that n
is small (n < 30).
36

The following are the steps for median determination from raw
scores:
1. Arrange the data from highest to lowest or vice versa.
2. If n/2 is an integer, the median is taken to be the average of
the two middlemost data.
3. If n/2 is not an integer, the median is taken to be the
middlemost data.
Example: Median from sample raw scores of 8 students in Stat 22
and 9 students in Math 12
STAT 22
MATH 12
17
15
17
19
26
20
28
24
28
30
30
30
32
31
32
37
40
Mdn = 28 + 30 = 29
Mdn = 28
2
When the data are large (n 30), using the less than
cumulative frequency distribution, the median is computed by the
formula:
Mdn = Lm + {n/2 lcf} i
fm
Where:
Mdn = median
Lm
= lower class boundary of the median
Class

37

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

lcf
fm
i
n

= less than cumulative frequency of the class


immediately preceding the median class.
= frequency of the median class
= the size of the interval
= the total number of population

Example:
Median from frequency distribution of scores of 40
students in Math using the Less Than Cumulative Frequency
C.I
fi
<cf
40
2
70 -74
38
2
65 69
36
3
60 64
33
2
55 59
31
8
50 54
23
9
45 49
14
40 44
2
12
4
35 39
8
5
30 34
3
3
25 29
n = 40
Mdn = Lm + {n/2-lcf}i = 44.5 + {20-14}5 = 47.83
fm
9
Using the greater than cumulative frequency, the median is
computed as follows:
Mdn = Um {n/2-gcf}i
fm
where:
Um
gcf
fm
i
38

= upper class boundary of the median class.


= greater than cumulative frequency of the class
immediately preceding the median class.
= frequency of the median class
= the size of the interval

= the total number of population

Median from frequency distribution of scores of 40 students in


Math using the Greater Than Cumulative Frequency
C.I
fi
>cf
2
2
70 74
2
4
65 69
3
7
60 64
2
9
55 59
17
8
50 54
26
9
45 49
28
2
40 44
32
4
35 39
37
5
30 34
40
3
25 29
n = 40
Mdn = Um - {n/2-gcf}i = 49.5 - {20-17}5 = 47.83
fm
9
The Mode
It is defined as the score that occurs frequently. The mode for
a set of test scores need not be unique. Thus, it is possible to have
two or more modes.
Example:
Calculation of Mode from scores of 8 Math students and 9 Science
students
Math
Science
15
17
19
30
20
28
24
37
30
28
26
30
31
32
17
32
40
Mo = 17 and 30
Mo = 32
39

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

If the set of data is in the form of frequency distribution, the


mode is calculated using the formula below:
Mo = Lm + {fm-f1 } i
2fm-f1-f2
Where:
Mo = mode
Lm = lower class boundary of modal class
fm = frequency of the modal class
f1 = frequency of the class preceding the modal class
f2 = frequency of the class following the modal class
i
= size of the interval
Example:
Calculation of mode from frequency distribution of the sample
test scores of 40 students in Math .
C.I
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

freq
2
2
3
2
8 f2
9 fmo
2 f1
3
4
3
2
n = 40

Mo = Lm + {fm f1 } i
2fmo-f1-f2
= 44.5 + {9 2} 5
2(9)-2-8
40

= 44.5 + ( 7 ) 5
8
= 44.5 + 35/8
= 44.5 + 4.375
Mo

= 48.875

Problem Set No. 5


The following are the scores obtained by a group of 60
College students in Math 102 examination:
Math 102 SCORES
88
84
42
96
83
54
44
72
82
86

81
63
72
73
98
73
74
62
69
85

79
86
89
39
45
59
65
78
77
73

90
59
78
88
88
91
68
78
86
89

78
89
72
77
69
74
80
68
79
49

82
76
81
43
40
66
81
70
50
75

Problem Solving and Statistical Analysis:


a. Prepare a frequency distribution of the above marks, using a
class interval of 15
b. Solve the mean, median and mode using the lower class
method and upper class method to check your answer.

41

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Measures of Position
These are the measures that are used to find the specific
location of a point that determines percentage of test scores in the
distribution. These measures are values below which specific
fractions of the test scores in a given set would fall. The quartiles,
deciles, and percentiles are the measures of position that are
commonly used.

The Quartiles
Quartiles are points which divide the total number of test
scores into four equal parts. Each set of test scores has three
quartiles. 25% falls below the first quartile (Q1), 50% is below the
second quartile (Q2), and 75% is below the 3rd quartile (Q3). The 1st
and the 3rd quartiles are used in the computation of the interquartile
range and quartile deviation. Quartiles are computed in the same
way as the median is computed, since Q2 is the same as the
median.
The steps in finding the quartiles of raw scores can be
summarized as follows:
1. Arrange the scores from highest to lowest or lowest to
highest,
2. Determine Qk, where Qk is the kth quartile and k = 1, 2,3.
- If nk/4 is an integer, Qk = (nk/4)th + (nk/4 + 1)th
2
If nk/4 is not an integer, Qk = ith score where i is the closest integer
greater than nk/4.
42

To illustrate the procedure for calculating the quartiles from


raw scores, consider the sample test scores given below:
Calculation of quartiles from sample raw scores of eight
students in Stat 22 and 9 students in Math102
Stat 22
17
17
26
28
30
30
31
37

Math 102
15
19
20
24
28
30
32
32
40

Q1 = (8)(1) = 2 --- Q1 = (2+3)th scores


4
2

Q1 = (9)(1) = 2.25 --- Q1 = 3rd score = 20


4

= 17 + 26 = 21.5
2
Q3 = (8)(3) = 6 --- Q3 = (6+7)th scores
4
2
= 30 + 31 = 30.5
2

Q3 = (9)(3) = 6.75 --- Q3 = 7th score = 32


4

Using the less than cumulative frequency distribution, the first


quartile is computed as follows:
Q1 = LQ1 + {n/4 lcf} i
fQ1
where: Q1 = the first quartile
LQ1 = lower class boundary where Q1 lies
lcf = less than cumulative frequency approaching or
equal to but not exceeding n/4
fQ1 = the frequency where Q1 lies
i
= the size of the interval
n = the total number of scores

43

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

For the third quartile, it is computed by the formula below:


Q3 = LQ3 + {3n/4 lcf} i
fQ3
where:
Q3 = the 3rd quartile
LQ3 = lower class boundary where Q3 lies
For the test scores in the form of a frequency distribution, the
following are the steps in determining the quartiles:
a). For the 1st Quartile
1. Take one fourth of the total number of scores
2. Get a less than cumulative frequency until one fourth
of it is approached or equaled but not exceeded.
3. Subtract Step 2 from Step 1
4. Divide the difference in Step 3 by the frequency of
the next higher step where Q1 lies.
5. Multiply the quotient in Step 4 by the size of the
interval to get the correction value.
6. Add the correction value to the lower class boundary
where Q1 lies. The result is the 1st quartile.
b). For the 3rd Quartile
1. Take of the total number of scores
2. Get a less than cumulative frequency until 3/4 of it is
approached or equaled but not exceeded.
3. Subtract Step 2 from Step 1
4. Divide the difference in Step 3 by the frequency of the
next higher step where Q3 lies.
5. Multiply the quotient in Step 4 by the size of the
interval to get the correction value.
6. Add the correction value to the lower class boundary
where Q3 lies. The result is the 3rd quartile.
44

Example:
Calculation of Quartiles from frequency distribution of test
scores of 40 students in Math 102
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

Q1

freq
2
2
3
2
8
9
2
3
4
3
2
n = 40

<cf
40
38
36
33
31
23
14
12
9
5
2

= nk/4
= 11(1)
4
= 2.75 or 3rd class interval

Q1 = LQ1 + {n/4 Fl} i = 29.5 + {10 9}5 = 30.75


fQ1
4
Q1

= nk/4
= 11(3)
4
= 8.25 or 9th class interval

Q3 = LQ3 + {3n/4 Fl} i


fQ3

= 59.5 + {30 28}5 = 62


4

45

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The Deciles
The deciles are points which divide the total number of test
scores into ten equal parts. Each set of test scores has nine deciles.
10% falls below the 1st decile (D1); 20% falls below the 2nd decile
(D2); 30% falls below the 3rd decile (D3); 40% falls below the 4th
decile (D4); 50% falls below the 5th decile (D5); 60% falls below the
6th decile (D6); 70% falls below the 7th decile (D7); 80% falls below
the 8th decile(D8); and 90% falls below the 9th decile (D9).
The deciles are computed exactly in the same manner as the
median is computed. Hence, the 5th decile (D5) is the same with the
median.
The steps in finding the deciles from raw scores can be
summarized as follows:
1. Arrange the scores from highest to lowest or vice versa
2. Determine Dk, wher Dk is the kth decile and k = 1, 2, 3,,9.
- If nk/10 is an integer, Dk = (nk/10)th + (nk/10 +1)th
2
- If nk/10 is not an integer, Dk = ith score where i is the
closest integer greater than nk/10.
Example:
Calculation of Deciles from raw score of 8 students in
Statistics and 9 students in Math 12
Statistics
Math 12
17
15
17
19
26
20
28
24
30
28
30
30
32
31
37
32
40

46

D1 = (8)(1) = 0.8 --- D1 = (1)st scores = 17


10

D6 = (9)(6) = 5.4 --- D6 = (6)th scores = 30


10

D2 = (8)(2) = 1.6 --- D2 = (2)nd scores = 17


10

D7 = (9)(7) = 6.3 --- D7 = (7)st scores = 32


10

D3 = (8)(3) = 2.4 --- D1 = (3)rd scores = 26


10

D8 = (9)(8) = 7.2 --- D8 = (8)th scores = 32


10

D4 = (8)(4) = 3.4 --- D1 = (4)th scores = 28


10

D9 = (9)(9) = 8.1 --- D9 = (9)th scores = 40


10

D5 = (8)(5) = 4 ---D5 = (4+5)th scores


10
= 28+30 = 29
2

D2 = (9)(2) = 1.8 --- D2 = (2)nd scores = 19


10

For the test scores in the form of frequency distribution, the


following formulas are being employed:
D1 = LD1 + {n/10 lcf}i
fD1
Where:
D1 = the 1st decile
LD1 = lower class boundary where D1 lies
lcf = less than cumulative frequency approaching or equal
to but exceeding n/10.
fD1
i

= the frequency where D1 lies


= the size of the interval

n = the total number of scores


D2 = LD2 + {n/5 lcf}i
fD2
Where:
D2 = the 2nd decile
LD2 = lower class boundary where D2 lies
fD2 = the frequency where D2 lies
47

O.S. Corpuz 2012

lcf

UNDERSTANDING STATISTICS

= less than cumulative frequency approaching or


equal to but exceeding n/5.

D3 = LD3 + {3n/10 lcf}i


fD3
Where:
D3
LD3
fD3
lcf

= the 3rd decile


= lower class boundary where D3 lies
= the frequency where D3 lies
= less than cumulative freq. approaching or equal to but
exceeding 3n/10.

D4 = LD4 + {2n/5 lcf}i


fD4
Where:
= the 4th decile
= lower class boundary where D4 lies
= the frequency where D4 lies
= less than cumulative frequency approaching or equal
to but exceeding 2n/5.
D5 = LD5 + {n/2 lcf}i
fD5

D4
LD4
fD4
lcf

Where:
D5 = the 5th decile
LD5 = lower class boundary where D5 lies
fD5

= the frequency where D5 lies

lcf

= less than cumulative frequency approaching or equal


to but exceeding n/2.

D6 = LD6 + {3n/5 lcf}i


fD6

48

Where: D6 = the 6th decile


LD6 = lower class boundary where D6 lies
fD6

= the frequency where D6 lies

lcf

= less than cumulative frequency approaching or equal


to but exceeding 3n/5.

D7 = LD7+ {7n/10 lcf}i


fD7
Where:
D7
LD7
fD7
lcf

= the 7th decile


= lower class boundary where D7 lies
= the frequency where D7 lies
= less than cumulative freq. approaching or equal to but
exceeding 7n/10.

D8 = LD8 + {4n/5 lcf}i


fD8
Where: D8
LD8
fD8
lcf

D9

Where: D9
LD9
fD9
lcf

= the 8th decile


= lower class boundary where D8 lies
= the frequency where D8 lies
= less than cumulative freq. approaching or equal to but
exceeding 4n/5.
= LD9 + {9n/10 lcf}i
fD9
= the 9th decile
= lower class boundary where D9 lies
= the frequency where D9 lies
= less than cumulative freq. approaching or equal to but
not exceeding 9n/10.

49

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Example:
Calculation of deciles from frequency distribution of test
scores of 40 students in Math 102
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

freq
2
2
3
2
8
9
2
3
4
3
2
n = 40

For Decile 1
D1 = nk
10
= 11(1)
10
= 1.1 or 2nd class interval
D1 = LD1 + {n/10 lcf} i = 24.5 + {4 2}5 = 27.83
fD1
3
For Decile 2
D2 = nk
10
= 11(2)
10
= 2.2 or 3rd class interval

50

<cf
40 D9
38 D7
36 D8
33
31 D6
23 D5
14 D3; D4
12
9 D2
5 D1
2

D2 = LD2 + {n/5 lcf} i


fD2

= 29.5 + {8 5}5 = 33.25


4

For Decile 3
D3 = nk
10
= 11(3)
10
= 3.3 or 4th class interval
D3 = LD3 + {3n/10 lcf} i = 34.5 + {12 9}5 = 39.5
fD3
3
For Decile 4
D4 = nk
10
= 11(4)
10
= 4.4 or 5th class interval
D4 = LD4 + {2n/5 Fl} i
fD4

= 39.5 + {16 14}5 = 39.5


2

For Decile 5
D5 = nk
10
= 11(5)
10
= 5.5 or 6th class interval
D5 = LD5 + {n/2 lcf} i
fD5

= 44.5 + {20 14}5 = 47.83


9

51

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

For Decile 6
D6 = nk
10
= 11(6)
10
= 6.6 or 7th class interval
D6 = LD6 + {3n/5 lcf} i
fD6

= 49.5 + {24 23}5 = 50.125


8

For Decile 7
D7 = nk
10
= 11(7)
10
= 7.7 or 8th class interval
D7 = LD7 + {7n/10 lcf} i = 54.5 + {28 23}5 = 67.00
fD7
2
For Decile 8
D8 = nk
10
= 11(8)
10
= 8.8 or 9th class interval
D8 = LD8 + {4n/5 lcf} i = 59.5 + {32 31}5 = 61.17
fD8
3

For Decile 9
D9 = nk
10
= 11(9)
10

52

= 9.9 or 10th class interval


D9 = LD9+ {9n/10 lcf} i = 64.5 + {36 33}5 = 72.00
fD9
2

The Percentiles
The percentiles are the points that divide the total number of
test scores or data into exactly one hundred equal parts. For each
test score, it is understood that there are ninety-nine (99)
percentiles which determine the points below which specific
percentage of test score would fall. For instance, 9th percentiles
(P9), would include 9% of the test scores in the distribution lie at
below it and 91% lie at or above it. other percentiles would take
similar meaning and interpretation.
The percentiles are calculated exactly in the same manner as
the computation of the median. In effect, the 50th percentile (P50) is
the same with the median.
The steps in finding the percentiles from raw scores can be
summarized to wit:
1. Arrange the scores from highest to lowest or vice versa
2. Determine Pk, where Pk is the kth percentile and k = 1, 2,3,
, 99.
a). If nk/100 is an integer, Pk=(nk/100)th+(nk/100+1)th scores
2
b). If nk/100 is not an integer, Pk = ith score where i is the
closest integer greater than nk/100.

To illustrate the procedure for calculating some percentiles


from raw scores, consider the sample test scores in Stat 22 and
Math 102 classes.

53

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Calculation of percentile from sample raw scores of 8


students in Stat 101 and 9 students in Math102
Stat 101
17
17
26
28
30
30
31
37

Math 102
15
19
20
24
28
30
32
32
40

P20 = (8)(20) = 1.6 --- P20 = (2)nd score = 17


100

P60 = (9)(60) = 5.4 --- P60 = (6)th score = 30


100

P25 = (8)(25) = 2 --- P25 = (2+3)th score


100
= 17 + 26 = 21.5
2

P70 = (9)(70) = 6.3 --- P70 = (7)th score = 32


100
P80 = (9)(80) = 7.2 --- P8 = (8)th score = 32
100

P97 = (8)(97) = 7.76--- P97 = (8)th score = 37


100

P90 = (9)(90) = 8.1 --- P90 = (9)th score = 40


100

P99 = (8)(99) = 7.92 --- P99= (8)th score = 37


100

P20 = (9)(20) = 1.8 --- P20 = (2)nd score = 19


100

For the test scores in the form of frequency distribution, the


determining factor is (k/100)n, where k = 1, 2 ,3,, 99,
corresponding to the ith percentile. The (k/100)n is actually a value
that serves as a reference point when determining a less than
cumulative frequency (Fm-1) immediately preceding the class where
the ith percentile lies. It is a value that should be approached to or
equaled but not exceeded by Fm-1 when solving for certain
percentiles.

54

The following equations will be utilized in determining some


percentiles.
P1 = LP1 + {n/100 lcf} i
fP1
Where: P1 = the 1st percentile
LP1 = lower class boundary where P1 lies
lcf = less than cumulative frequency approaching or
equal to but exceeding n/100.
fP1 = the frequency where P1 lies
i = the size of the interval
n = the total number of scores
P2 = LP2 + {2n/100 lcf}i
fP2
Where: P2
LP2
fP2
lcf

= the 2nd percentile


= lower class boundary where P2 lies
= the frequency where P2 lies
= less than cumulative freq. approaching or equal to but
exceeding 2n/100.

P3 = LP3 + {3n/100 lcf}i


fP3
Where: P3 = the 3rd percentile
LP3 = lower class boundary where P3 lies

fP3
lcf

= the frequency where P3 lies


= less than cumulative freq. approaching or equal to but
exceeding 3n/100.

P4 = LP4 + {4n/100 lcf}i


fP4
55

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

= the 4th percentile


= lower class boundary where P4 lies
= the frequency where D4 lies
= less than cumulative freq. approaching or equal to but
exceeding 4n/100.
P50 = LP5 + {n/2 lcf}i
fP50

Where: P4
LP4
fP4
lcf

Where:
P50 = the 50th percentile
LP5 = lower class boundary where P50 lies
fP5 = the frequency where P50 lies
lcf = less than cumulative frequency approaching or equal
to but exceeding n/2.
P60 = LP60 + {3n/5 lcf}i
fP60
Where:
P60 = the 60th percentile
LP60 = lower class boundary where P60 lies
fP60 = the frequency where P60 lies
lcf = less than cumulative frequency approaching or equal
to but exceeding 3n/5.
P80 = LP80 + {4n/5 lcf}i
fP80
Where:
P80
LP80
fP80
lcf

56

= the 80th percentile


= lower class boundary where P80 lies
= the frequency where P80 lies
= less than cumulative frequency approaching or
equal to but exceeding 4n/5.

P90 = LP90 + {9n/10 lcf}i


fD9
Where:
P90 = the 90th Percentile
LP90 = lower class boundary where P90 lies
fP90 = the frequency where P90 lies
lcf = less than cumulative freq. approaching or equal to but
exceeding 9n/10.
To illustrate the procedure for calculating some percentiles
from a frequency distribution, consider the scores of 40 students in
Statistics
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

freq
2
2
3
2
8
9
2
3
4
3
2

<cf
40
38
36
33
31
23
14
12
9
5
2

n = 40
For percentile 1
P1 = nk
100
= 11(1)
100
= 0.11 or 1st score
P1 = LP1 + {n/100 lcf} i = 19.5 + {0.4 0}5 = 20.167
fP1
3

57

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

For percentile 10
P10 = nk
100
= 11(10)
100
= 1.1 or 2nd score
P10 = LP10 + {n/10 lcf} i = 24.5 + {4 2}5 = 27.833
fP10
3
For percentile 20
P20 = nk
100
= 11(20)
100
= 2.2 or 3rd score
P20 = LP20 + {n/5 lcf} i
fP20

= 29.5 + {8 5}5 = 33.25


4

For percentile 30
P30 = nk
100
= 11(30)
100
= 3.3 or 4th score
P30 = LP30 + {3n/10 lcf} i = 34.5 + {12 9}5 = 39.5
fP30
3

For percentile 40
P40 = nk
100

58

= 11(40)
100
= 4.4 or 5th score
P40 = LP40 + {2n/5 lcf} i
fP40
For percentile 50

= 39.5 + {16 14}5 = 39.5


2

P50 = nk
100
= 11(50)
100
= 5.5 or 6th score
P50 = LP50+ {n/2 lcf} i
fP50

= 44.5 + {20 14}5 = 47.83


9

For percentile 90
P90 = nk
100
= 11(90)
100
= 9.9 or 10th score
P90 = LP90+ {9n/10 Fl} i = 64.5 + {36 33}5 = 72.00
fP90
2
For percentile 99
P99 = nk
100
= 11(99)
100
= 10.89 or 11th score
59

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

P99 = LP9+ {99n/100 Fl} i = 69.5 + {39.6 38}5 = 73.5


fP99
2
Exercise No.6
Calculate the quartiles (1 and 3), deciles (2,4,6,7 and 9), and
Percentiles (70, 80, 85, 90, and 97) of the scores of 55 Education
students in their examination in Stat 22
Prior to the computations fill-up the < cumulative column in the
table.
Class Interval
85 - 89
80 - 84
75 - 79
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

freq
2
2
3
2
9
10
9
5
4
3
2
2
1
1
n = 55

60

<cf

Measures of Dispersion/Variability
It can be utilized in determining the size of the distribution of
test score or a portion of it. They can be used to find the deviation of
test scores from the mean scores. Measures of dispersion can also
be used to establish the actual similarities or the differences of the
distribution. In general, these measures are employed to further
characterize the distribution of test scores.
The most commonly used measures of dispersion are: the
range; interquartile range; quartile deviation; average deviation and
standard deviation.
The Range
It is the simplest and easiest measures of dispersion or
variability. It simply measures how far the highest score is to the
lowest score. It does not tell anything about the scores between
these two extreme scores. Thus, it is considered as the least
satisfactory measure of dispersion.
The equations used are:
a. For Raw Scores
R=HL
Where:
R = range
H = highest score
L = lowest score
b. For Frequency Distributed Scores

61

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

R = (Hmpt Lmpt)
Hmpt = midpoint of the highest step
Lmpt = midpoint of the lowest step
The Inter quartile Range
It refers to the range of score of specified parts of the total
group usually the middle 50% of the cases lying between the 1st
quartile and the 3rd quartile.
The equation used is:
I.Q.R = Q3 Q1
Where:
I.Q.R = inter quartile range
Q3 = 3rd Quartile
Q1 = 1st Quartile
Example:
a. Calculation of Interquartile Range from sample scores of 8
Stat 22 students
DBH:
17
17
26
28
30
30
31
37

Recall: Q1 = 21.5
Q3 = 30.5
I.Q.R = Q3 Q1
= 30.5 21.5
= 9.0

b. Calculation of Interquartile Range from frequency distribution


of sample raw scores of 40 students in Stat 22
62

C.I
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

freq
2
2
3
2
8
9
2
3
4
3
2
n = 40

Recall: Q1 = 30.75

Q3 = 62

I.Q.R = Q3 Q1
= 62 30.75
= 31.25

The Quartile Deviation


Is another measure of dispersion that divides the
difference of the 3rd and the 1st quartiles into halves. It is the
average distance from the median to the two quartiles, i.e.,
it tells how far the quartile points (Q1 and Q3) lie from the
median, on the average. When quartile deviation (Q.D) is
small, the set of test scores is more or less homogenous but
when Q.D is large, the set of scores is more or less
heterogeneous. This measure is used when there are big
gaps between scores. It is also essentially used when the
main concern is the concentration of the middle 50% of the
scores around the median. Mathematically:

63

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Q.D = Q3 Q1
2
Example:
a. Calculation of Interquartile Range from sample raw
scores of 8 Stat 22 students
DBH:
17
17
26
28

Recall: Q1 = 21.5
Q3 = 30.5
IQR = Q3 Q1
2
= 30.5 21.5
2
= 9/2
IQR = 4.5

30
30
31
37

b. Calculation of Interquartile Range from frequency


distribution of raw scores of 40 Stat 22 students.
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24

freq
2
2
3
2
8
9
2
3
4
3
2
n = 40

Recall: Q1 = 30.75
IQR = Q3 Q1
2

64

Q3 = 62

= 62 30.75
2
= 31.25
2
IQR = 15.625
The Average Deviation
Average deviation is a measure of absolute dispersion that is
affected by every individual score. It is the mean of the absolute
deviation of the individual score from the mean of all the scores.
A large average deviation would mean that a set of scores is
widely dispersed about the mean while a small average deviation
would imply that a set of scores is closer to the mean.
The formula in calculating average deviation is as follows:
1. Raw Score
_
A.D = /X-X/
n1

Where: A.D = average deviation


X = Individual score
_
X = mean of all scores
n = total no. of W. lauan
= summation symbol
fi = frequency of the ith
class interval
Xi = midpoint of the ith
Diameter class
2. Frequency Distributed Data
_
A.D = fi/Xi-X/
n1
65

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Example:
a. Calculation of Average Deviation from sample raw scres of
8 Stat 22 students
_
X
XX
_
17
-10
Recall: X = 27
17
-10
26
28
30

-1
1
3

_
A.D = /X-X/
n1

30

= 42

31

37

10
42

A.D = 6.0

c. Calculation of Average Deviation from frequency


distribution of raw scores of 40 Stat 22 students
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29

_
Recall: X = 47.25

fi
2
2
3
2
8
9
2
4
5
3
n = 40

Xi
72
67
62
57
52
47
42
37
32
27

(X X)
24.75
19.75
14.75
9.75
4.75
-0.25
-5.25
-10.25
-15.25
-20.25

_
A.D = fi/Xi-X/
n1
= 381.50
39

66

fi(Xi X)
49.50
39.50
44.25
19.50
38.00
-.2.25
-10.50
-41.00
-76.50
-60.75
381.50

A.D = 9.78
The Standard Deviation
Is the measure of dispersion that involves all scores in
the distribution rather than through extreme scores. It may
be referred to as the root-mean square of the deviation from
the mean. It is considered the most important measure of
dispersion. Mathematically, it is equated as:
1. Raw Score
S.D = (X X)2
n-1
2. Frequency Distribution
a. Midpoint Method
S.D = fi(Xi X)2
n-1
b. Class-deviation Method
S.D = ifidi2 (fidi)2
n-1
n(n-1)
Where: S.D = standard deviation
fi = frequency of the ith diameter class
di = deviation of the ith diameter class
di2 = square of the deviation of the ith diameter class
Xi = midpoint of the diameter class
X = mean of the DBH
Example:
a. Calculation of Standard Deviation from sample raw scores
of 8 Stat 22 students
67

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

X
17

_
XX
- 10

_
(X X)2
100

17

- 10

100

26
28
30

-1
1
3

1
1
9

30
31
37

3
4
10

9
16
100
336

Recall:

_
X = 27

_
S.D = (X-X)2
n1
= 336
7
S.D = 6.93

b. Calculation of Standard Deviation from frequency


distribution of raw scores of 40 students in Stat 22 using
Midpoint Method.
C.I
fi
Xi
(X X)
(X X)2 fi(Xi X)2
1225.12
612.56
24.75
72
70 74
2
780.12
390.06
19.75
2
65 69
67
652.68
217.56
14.75
3
60 64
62
190.12
95.06
9.75
55 59
2
57
180.48
22.56
4.75
8
52
50 54
0.567
0.063
-0.25
45 49
9
47
55.12
27.56
-5.25
2
40 44
42
420.24
105.06
-10.25
4
37
35 39
1162.80
232.56
-15.25
5
32
30 34
1230.18
410.06
-20.25
3
27
25 29
n = 40
5,897.43
_
_
Recall: X = 47.25
S.D = fi(Xi-X)2
n1

= 5,897.43
39
S.D = 12.30

68

c. Calculation of Standard Deviation from frequency


distribution of raw scores of 40 students in Stat 22 using
Class-Deviation Method.
C.I
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29

fi
2
2
3
2
8
9
2
4
5
3
n = 40

di
+5
+4
+3
+2
+1
0
-1
-2
-3
-4

fidi
10
8
9
4
8
0
-2
-8
-15
-12
2

fidi2
50
32
27
8
8
0
2
16
45
48
236

SD = ifidi2 (fidi)2
n-1
n(n-1)
SD = 5236 (2)2
39 40(39)
= 56.05128 4/1560
= 56.05128 0.0025641
= 56.048716
= 5(2.4594)
SD = 12.30

69

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Exercise No. 7
Test yourself:
Given the frequency distributed data on the raw scores of
students in Math
Class Interval

fi

75 - 79
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
15 19
10 14
5-9

2
2
3
2
8
9
2
4
10
3
10
5
5
5
5
n = 75

Compute for:
1. Quartile Deviation
2. Inter-quartile deviation
3. Average Deviation and
4. Standard Deviation

70

Xi

(X X)

(X X)

fi(Xi X)

Measures of Skewness and Kurtosis


Skewness the deviation from the symmetrical distribution. It is a
degree of asymmetry of a distribution or departure from symmetry of
a distribution. It indicates not only the amount of asymmetry but also
the direction of the distribution of the raw data. Thus, a distribution
is said to be skewed in the direction of the extreme values, or
speaking in terms of the curve, in the direction of the excess tail.
The greater the value of skewness departs from 0, the more skewed
is the distribution and the nearer the value of skewness to 0, the
nearer the distribution is to a normal distribution.
Types of Skewness:
1. Positive Skewness or Skewed to the Right refers to the
distribution that tapers more to the right than to the left. In
this kind of distribution, there are more small data than
bigger data. In examination, this type of skewness may be
obtained if the test is very difficult. If skewness has a
positive value, then the distribution is skewed to the right
(see illustration below).

2. Negative Skewness or Skewed to the Left refers to the


distribution that tapers more to the left than to the right.
This means a longer tail is directed to the left. In this
distribution, there are more high scores than low scores;
71

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

this type of skewness may happen if the test is very easy. If


skewness has a negative value, then the distribution is
skewed to the left (see graph below).

Moment Coefficient of Skewness


It is an essential method of describing the distribution of raw
data. It helps determine whether the majority of the data in the
distribution are below or above the mean. In examination, it tells
whether the test itself is difficult or easy. In interpreting the
distribution of data, the value of the moment coefficient of skewness
(degree of asymmetry) and the direction of the distribution should
be highly considered. When its value is positive, it indicates that the
distribution is skewed to the right. This could mean that majority of
the data are below than above the mean. In examination, it can be
said that the test is difficult.
On the other hand, if its value is negative, the distribution of
the data can be interpreted as skewed to the left, this further implies
that there are more data above than below the mean. The test is
easy when talking to examination. In cases where the value, be it
positive or negative, of the moment coefficient of skewness is
approaching or near zero (0) indicates that the distribution of the
data is approaching the normal distribution, the mean, the median,
and the mode are approximately equal.
The get the moment coefficient of skewness, the
equations used are the following:
a. Raw Scores
_
72

SK = (X X)3
(n-1)(SD)3

Where: SK = skewness
Xi = midpoint of the ith class interval
SD = standard deviation

b. Frequency Distribution
_
SK = fi(X X)3
(n-1)(SD)3

fi = frequency of the ith class interval


n = total number of data
= symbol of summation
_
X = mean of all data

Example:
a. Calculation of Moment Coefficient of Skewness from
sample raw scores in Stat 22 of 8 students in the previous
examples.

X
17
17

XX
- 10
- 10

(X X)3
-1000
-1000

26

-1

-1

28
30

1
3

1
27

_
SK = (X X)3
(n-1)(SD)3

30
31
37

3
4
10
42

27
64
1000
-882

= -882
7(6.93)3
SK = -882
7(332.813)

_
Recall: X = 27
SD = 6.93

SK = -882
2,329.69
SK = - 0.38
Take note of the result which is negative. This implies that the
distribution of the scores in Stat 22 of the 8 students is skewed to

73

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

the left; hence, majority of the scores are above the mean and that,
the test is very easy.
d. Calculation of Moment Coefficient of Skewness from
frequency distribution of raw scores of 40 students in Stat
22 using Midpoint Method.
Class Interval

fi

Xi

(X X)

70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29

2
2
3
2
8
9
2
4
5
3
n = 40

72
67
62
57
52
47
42
37
32
27

24.75
19.75
14.75
9.75
4.75
-0.25
-5.25
-10.25
-15.25
-20.25

Recall: X = 47.25

(X X)3

fi(Xi X)3

15160.92 30321.84
7703.73 15407.46
9627.15
3206.09
1853.72
926.86
107.17
857.36
-0.18
-0.02
-289.40
-144.70
-1076.89 -4307.56
-3546.58 -17732.90
-8303.76 -24911.28
10,826.21

S.D = 12.30

SK = fi(X X)3
(n-1)(SD)3
SK = 10,826.21
(39)(12.30)3
= 10,826.21
(39)(1,860.867)
= 10,826.21
72,573.813

SK = 0.15

The result of the moment coefficient of skewness is positive


(0.15). This indicated that the distribution of the raw scores is
skewed to the right; hence, most of the scores are below the mean.
It can be said that the examination is difficult.
74

Exercise No. 8
Test yourself:
Given the frequency distributed data on the scores of students
in Mathematics, find the moment coefficient of skewness
Diameter
Class
75 - 79
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
15 19
10 14
5-9

fi

(X X)

(X X)3

fi(Xi X)3

2
2
3
2
8
9
2
4
10
3
10
5
5
5
5
n = 75

75

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Measures of Kurtosis
Curves of distributions having the same coefficient of
skewness may still differ significantly. Symmetrical curves, for
instance may vary in shape because they may not have the same
peakedness, a property of curves which can be described by
computing for a value called measure of kurtosis.
Kurtosis is a measure of the degree of peakedness or flatness of
a distribution. Hence, the concern is on the height of the curve
along the y-axis.

Types:
1. Leptokurtic refers to the distribution having a relatively
high peak
2. Mesokurtic refers to the distribution neither very peaked
nor very flat-topped
3. Platykurtic refers to the distribution having relatively flattop

Leptokurtic
Mesokurtic
Platykurtic

If KU < 3, the distribution is platykurtic or less peaked than the


normal curve. If KU = 3, the distribution is mesokurtic, and if KU > 3,
the distribution is leptokurtic or more peaked than the normal curve.
Moment Coefficient of Kurtosis
It is utilized when the degree of height (peakedness) and the
degree of flatness of raw data relative to the normal is to be
76

determined. It helps to determine whether the data are far higher or


far lower than the normal curve
The equations used in determining the value of kurtosis are:
1. If the data is arrange in raw
_
KU = (X X)4
(n-1)(SD)4
2. If the data is arrange in Frequency Distributions

_
KU = fi(Xi X)4
(n-1)(SD)4

Where: KU = kurtosis
Others as defined

Example:
a. Calculation of Moment Coefficient of Kurtosis from sample
raw scores of 8 students in Stat 22 in the previous
examples
_
_
X
XX
(X X)4
_
17
- 10
10,000
Recall: X = 27
17
- 10
10,000
SD = 6.93
26
-1
1
_
28
1
1
KU = (X X)4
30
3
81
(n-1)(SD)4
30
3
81
= 30,420
31
4
256
7(6.93)4
37

10
42

10,000
30,420

KU = 30,420
7(2,306.40)
KU = 30,420
16,144.8
KU = 1.88
77

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The result is less than 3 (1.88). This implies that the


distribution of the DBH of W. lauan is platykurtic; hence, the data
are far below the normal curve and the distribution is said to be flattopped.
b. Calculation of Moment Coefficient of Kurtosis from
frequency distribution of sample test scores of 40 students
in Stat 22 using Midpoint Method.
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29

fi
2
2
3
2
8
9
2
4
5
3
n = 40

Xi
72
67
62
57
52
47
42
37
32
27

_
Recall: X = 47.25
KU = fi(X X)4
(n-1)(SD)4
KU = 2,039,462.07
(39)(12.30)4
= 2,039,462.07
(39)(22,888.664)
= 2,039,462.07
892,657.9
KU = 2.28

78

(X X)
24.75
19.75
14.75
9.75
4.75
-0.25
-5.25
-10.25
-15.25
-20.25

(X X)4
375,232.82
152,148.75
47,333.44
9,036.88
509.07
0.004
759.69
11,038.13
54,085.32
168,151.25

S.D = 12.30

fi(XiX)4
750,465.64
304,297.50
142,000.32
18,073.76
4,072.56
0.04
1,519.38
44,152.52
270,426.60
504,453.75
2,039,462.07

The result is less than 3 (2.28). This implies that the


distribution of the test scores is platykurtic; hence, the data are far
below the normal curve and the distribution is said to be flat-topped.
Exercise No. 9
Test yourself:
Given the frequency distributed data on the test scores of 75
students in Arts, find the moment coefficient of kurtosis
Diameter
Class
75 - 79
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
15 19
10 14
5-9

fi

(X X)

(X X)4

fi(Xi X)4

2
2
3
2
8
9
2
4
10
3
10
5
5
5
5
n = 75

79

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The Normal Distribution


It is one of the most important continuous distributions. It is
regarded as the most significant probability in the entire scope of
statistical inferences. The probability is expressed in terms of a
number from 0 to 1. Zero probability implies that there is assurance
that the event will not occur; however, when the probability value is
1, there is an assurance that the event will occur or happen. For the
probability value of 0.5, one is half sure that a case or event will
happen.

The Normal Curve


It is graphically represented by a symmetrical, bell-shaped
curve (see graph below). A curve is symmetry if is the exact
shape of the other half when folded vertically at the middle. This
curve is also described as mesokurtic. It is unimodal or has only one
mode and is centered at its origin. The curve is asymptotic at the
extremities of the horizontal line. De Moivre derived the
mathematical equation of the normal curve in 1773. Normal
distribution is called Gaussian distribution in honor of Gauss, who
also derived the mathemathetical equation in the 19th century.

Normal Curve
The equation of the normal curve is:
80

f(x) = le (x-/)2
22
Where: f(x) = the height of the curve above the x-axis
x = raw score laid off along the x-axis
= mean of the distribution of the test score
= standard deviation of the distribution of the
test score
e = 2.711828
= 3.1415
Characteristics of a normal curve:
1. The mean, median, and mode have the same value which
is a point on the horizontal axis of which the curve is a
maximum.
2. The curve is symmetrical and bell-shaped about a vertical
axis through the mean. This means that the line at both
sides fall off toward the opposite directions at
exactly
equal distances from the center.
2. The normal curve approaches the horizontal axis
asymptotically as we proceed in either direction away from
the mean.
4. The total area under the curve and above the horizontal
axis is equal to 1.
Standard Normal Scores
These are the converted scores, which are needed and
utilized when constructing areas of the normal probability curve.
They are the scores having definite mean of 0 and standard
deviation of 1. These scores are referred to as z-scores.

81

The Normal Distribution

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

A z-score expresses the deviation of given raw scores in


terms of standard deviation units from the mean.
The formula in determining z-scores is:
_
z =XX
s
Where:
z = standard normal score
X = any given raw score
s = standard deviation of the
distribution of X scores
X = mean of the distribution
Areas Under the Normal Curve

P(X1<X<X2)
P(X<X1)

P(X>X2)

X2

X1

Examples:

1. Determine the area of the normal curve given the following:


a. from z = 0 to z
= 2.56
b. from z = -1.25 to z = 0
c. from z = 1.19 to z = 2.59
d. from z = - 2.5 to z = 1.45

82

Solutions
a. First, draw the standard normal curve and indicate
the required area by shading it, then read the
corresponding area from the table. To locate the
area, look for 2.5 along the leftmost column and then
locate 0.06 at the topmost row. The area under the
normal curve is the intersection of 2.5 and 0.06,
which gives 0.4406

3
2.56

83

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The T- Tests
Test for Dependent or Correlated Samples
Note that types of data that consist of information obtained
from matched pairs or repeated measures are classified as
dependent or correlated samples. Examples of correlated data are
effects of socio-economic characteristics with academic
performance. Data on socio economic characteristics serves as
dependent variables and their performance as independent
variables. If the data were taken twice on the same criterion
variable, this can be called repeated measure design.

Testing the Significance of the Difference between Means when


the samples are Dependent or Correlated.

Procedure:
1. Ho: 1=2 that the mean of population 1 is equal to the mean
of population 2.
The alternative hypothesis may be defined in one of the three
hypotheses presented below:
a. Ha: 1>2 that the mean of population 1 is greater than the
mean of population 2.
b. Ha: 1<2 that the mean of population 1 is less than the
mean of population 2
c. Ha: 12 that the mean of population 1 is not equal to the
mean of population 2
84

The alternative hypothesis a and b are directional hypotheses


and so if one uses either of these two hypotheses, a onetailed test will be employed. However, a two tailed test is used
when one indicated the alternative hypothesis c since this is a
non-directional hypothesis.
2. Test statistics: Use t-test for dependent samples
3. Level of significance: Use level of significance
4. Decision Criterion:
Note that the decision criterion would entirely depend on the
formulated alternative hypothesis. As such, use the decision
criterion that corresponds to the alternative hypothesis to wit:
a. Reject Ho if tct (n-1)df
b. Reject Ho if tc -t (n-1)df
c. Reject Ho if /tc/-t (n-1)df
2

5. Computations
Individual
1
2
3
:
:
:
n
Total

Experimental
Group
x11
x21

Control
Treatment
x12
x22

Di

Di2

D1
D2

D12
D22

xi1

xi2

Di

Di2

xn1

xn2

Dn

Dn2

Xi1

i=1

Xi2
i=1

Di
i=1

Dn2
i=1

85

O.S. Corpuz 2012

where:

UNDERSTANDING STATISTICS

Di = difference of the observed values in the first and


second condition of the ith individual
D = mean difference

Solve for the mean difference using the equation

Solve for the variance of the mean difference by the equation

Finally, the test-statistic will be calculated using the equation

6. State your decision based on the decision criterion and the tcomputed.
7. State your conclusion.
Illustration 1:
In a study of the effect of vitamins on the weight increase of
students, a group of 10 students were weighted before and after a
two months of taking the vitamins.

86

The data were as follows:


Seedling
No.
1
2
3
4
5
6
7
8
9
10

Wt. Before

Wt. after

196
171
170
207
177
162
199
173
231
140

200
178
169
212
180
165
201
179
243
144

Use the 0.05 level of significance to test if there significant


difference among weight before and after two months taking of the
vitamins.
Solution:
1.

That the mean of the weight before and after


taking the vitamins do not differ significantly.
That the mean weight before and after taking
the vitamins differ significantly.

2. Test-Statistic: Use t-test for dependent samples.


3. Level of Significance: Use a a=0.05 level of significance.
4. Decision Criterion: Reject

87

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

5. Computation:
Seedling
No.
1
2
3
4
5
6
7
8
9
10
Total

Wt.
before
196
171
170
207
177
162
199
173
231
140
1826

Wt. after
2 Months
200
178
169
212
180
165
201
179
243
144
1871

Solving the mean difference:

Solving for the variance of the mean difference:

= 1.1833

88

4
7
-1
5
3
3
2
6
12
4
45

16
49
1
25
9
9
4
36
144
16
309

Solving for the t-statistics:

6. Decision: Since
, reject Ho.
7. Conclusion: Based on the result, it can be concluded that the
weight after two months of taking the vitamins significantly
heavier than the mean weight before. The result further reveals
that there exists a significant increase of weight as indicated in
the test. Therefore, taking the vitamin in two months is effective
for weight increase.
Illustration 2:
In a study of effectiveness of physical exercise in weight
reduction, a group of 10 students engaged in a prescribed program
of physical exercise for one showed the following results:
INDIVIDUAL

WEIGHT
WEIGHT
BEFORE
AFTER
(pounds)
(pounds)
1
210
196
2
168
170
3
165
170
4
202
200
5
170
157
6
185
152
7
211
189
8
189
170
9
245
221
10
149
130
Use the 0.05 level of significance to test if the prescribed program of
physical exercise is effective in reducing weight.
89

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Solution:
1.

That the mean weights of persons before and


after the physical exercise program do not differ
significantly.
That the mean weight of persons before the
physical exercise program is greater that their mean
weight after the exercises.
2. Test-Statistic: Use t-test for dependent samples.
3. Level of Significance: Use a a=0.05 level of significance.
4. Decision Criterion: Reject

5. Computation:
INDIVIDUAL

1
2
3
4
5
6
7
8
9
10
Total

WEIGHT
BEFORE
(pounds)
200
178
169
212
180
165
201
179
243
144
1871

Solving the mean difference:

90

WEIGHT
AFTER
(pounds)
196
171
170
207
177
162
199
173
231
140
1826

14
-2
-5
2
13
33
22
19
24
19
139

196
4
25
4
169
1089
484
361
576
361
3269

Solving for the variance of the mean difference:

SD2

= 14.85

Solving for the t-statistics:

tc
6. Decision: Since

= 13.65
, reject Ho.

91

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

7. Conclusion: Based on the result, it can be concluded that the


weight before the physical exercises is significantly heavier than
the mean weight of the students after the physical exercises as
the treatment. The result further reveals that there exists a
significant reduction of weight as indicated in the test. Therefore,
the prescribed program of physical exercise is effective in
reducing weight.

Exercise No. 10
1. A certain reducing weight program has produced the following
weight changes (lb) in ten students:
STUDENTS
1

10

Before 124

138

113

129

149

149

177

138

139

129

After

115

110

110

131

122

155

125

142

122

105

Use the 0.05 level of significance to test if the diet is effective in


reducing weight.
2. Two samples of 10 students each has been matched on IQ
before beginning an experiment in learning to calculate simple
statistical problem. They were then allowed 30 minutes to study
the problems, after which they were tested for the number of the
problem answered correctly. Group A studied in pairs, one
student reading the problems and the other Group B studied
alone. The following table shows the results:

NUMBER OF PROBLEMS ANSWERED


92

CORRECTLY
Group A
Group B
19
23
18
10
11
13
25
20
12
10
16
18
11
10
15
11
20
12
18
13
Is there a significant difference at 5% level in the number of
statistical problems answered correctly between Group A and
Group B?
3. A program is designed to enhance readers speed and
comprehension. To evaluate the effectiveness of this program, a
test given both before and after the program, and sample result
follow. At the 0.05 significance level, test the claim that
comprehension is higher after the program.
1

Before 102
After
109

111
117

139
151

169
185

208
189

105
127

102
133

129
127

116
115

10

125
128

4. An exercise is design to lower systolic blood pressure of 10


randomly selected faculty of statistics. The results are as follows:
2
3
4
5
6
7
8
9
10
Samples 1
Before
120 130 160 90 110 110 180 190 130 120
After
110 120 140 100 90 90 180 170 130 130

At 0.05 level of significance, test if systolic blood pressure of the


faculty is not affected by the design exercise. That is, test the claim

93

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

that the before and after the design exercise program values are
equal.

T-test for Independent Samples


Problems on two samples are most commonly associated with
small population/samples and with equal but unknown variances.
For this situation the small samples must be used to provide an
estimate of the common but unknown variances . The sample
variances
and
each provide an estimate of . However, a
can be obtained by combining the two
better estimate of
estimates in a weighted average as follows:

For equal sample sizes, this formula reduces to

which is the simple average of the two estimates. Otherwise, each


of the estimates receives values proportional to its respective
degrees of freedom.
In the presentation of the two-sample t-test the estimates
and
were pooled together to form
as the estimate of the
common variance
However, this pooling is usually done under
the assumption that the populations had equal variance; otherwise,
the pooling is not justified and the t-test cannot be used. In addition,
the populations must come from a normal distribution. The t-statistic
is composed using the equation (case1):

94

Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= pooled variance
Note that the degrees of freedom (df) is equal to n1 + n2 2.
With the assumption that the populations have an unequal
variance (case2), the test-statistic is given by:

Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= sample variance of group 1
= sample variance of group 2
The degree of freedom (df) will be determined using the
equation:

Where:
= sample size of group 1
= sample size of group 2
=
=
= sample variance of group 1
= sample variance of group 2

95

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Test for Equality of Variances


The test for equality of variance involves the ratio of the two
sample variances. A random variable that consists of the ratios of
the two sample variances has an F distribution if the two samples
are independent and from normal population.
The following are the steps to be followed for testing the
variances:
a. State the null and alternative hypotheses.

b. Test-Statistic: Use F-test at a level of significance.


c. Decision Criterion: Reject
if
Note that
Where: = sample size of the group with the larger
variance
= sample size of the group with smaller
variance
d. Computation of the Test-Statistics:

e. State your decision based on the rejection criterion and the


computed test-statistic.
f. State your conclusion based on your decision.

96

Testing for the Difference in Means of


the Two Independent Samples

Case 1:
1.

That the mean of population 1 is equal to the


mean of population 2.

The alternative hypothesis may be defined in one of the three


hypotheses same as the of the t-test dependent samples presented
below:
, That the mean of population 1 is greater
a.
than the mean of population 2.
b.
, That the mean of population 1 is lesser than
the mean of population 2.
c.
, That the mean of population 1 is not equal to
the mean of population 2.
The alternative hypotheses a and b are directional and so if one
uses either of these two hypotheses, a one-tailed test will be
employed. However, a two-tailed test is used when one indicated
the alternative hypothesis c since this is a non-directional
hypothesis.
2. Test-Statistics: Use t-test for independent samples (case 1)
3. Level of Significance: Use a level of significance.
4. Decision Criterion:
Note that the decision criterion would entirely depend on
the formulated alternative hypothesis. As such, use the
following decision criterion that corresponds to the alternative
hypothesis.
a. Reject
b. Reject
c. Reject

if
if
if

,
,

97

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

5. Computation

Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= pooled variance

For equal sample sizes, this formula reduces to

which is the simple average o the two estimates. Otherwise, each of


the estimates receives values proportional to its respective degrees
of freedom.
6. State your decision based on the rejection criterion and the
computed test-statistic.
7. State your conclusion based on your findings.

Illustration for t-test for independent samples (case 1):


An experiment was undertaken to compare the diameter
increase of Gmelina and mahogany seedlings after 2 months
measured in centimeters. Do the following data present sufficient
evidence to conclude that the stem diameter of the two species are
significantly different at 5% level?

98

Example 1:
TEST SCORES
Math
English
45
55
67
78
90
39
58
69
80
89

58
89
59
59
58
60
94
49
90
91

Solution:
The first step to take is to test the quality of variances.
a.

a. Test-Statistic: Use F-test at = 0.05 level of significance.


b. Decision Criterion: Reject
if
Note that
c. Computation of the Test-Statistic:

99

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

TEST SCORES
Math
English

Sample size
Total
Mean
Variance

45
55
67
78
90
39
58
69
80
89
10
670
67
311.1111

58
89
59
59
58
60
94
49
90
91
10
707
70.7
316.0111

Fc = 316.0111
311.1111
= 1.01575

a. Decision: Since Fc < F0.05(9.9) = 1.015, accept the


b. Conclusion:
.
Comparison of the variances of the diameter increase of Benguet
pine and mangiumy seedlings after 2 months measured in
centimeters, the following is undertaken to solve the main problem

100

Example 2:
SEEDLINGS DIAMETER (CM)
Benguet pine
Mangium
3.08
2.38
3.10
2.68
2.35
2.17
3.56
3.86
3.73
3.91
1.48
2.65
1.72
1.85
2.30
1.86
2.80
2.76
3.50
2.68
Total
29.27
25.15
Mean
2.93
2.52
Variance 0.4994
0.5291

1.

, That the two species do not differ in their


stem diameter after two months
,

That the Beguet pine seedlings differs


significantly from mangium seedlings in
terms of diameter in centimeter after two
months.

2. Test-Statistics: Use t-test for independent samples (case


1).
3. Level of Significance: Use = 0.05 level of significance.
4. Decision Criterion: Reject
if
.
5. Computation:

101

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Since the two groups tested have equal sample sizes.

Therefore,

= 1.278
6. Decision: Since

, there is no

sufficient evidence to reject .


7. Conclusion: The data indicate that the fruit-bearing
capabilities of the two varieties of tomatoes do not differ at
5% level of significance.
For Case 2:
1.

That the mean of population 1 is equal to the


mean of population 2.

The alternative hypothesis may be defined in one of the three


hypotheses same as the of the t-test dependent samples:
a.

, That the mean of population 1 is greater


than the mean of population 2
b.
, That the mean of population 1 is lesser that
the mean of population 2.
c.
, That the mean of population 1 is not equal to
the mean of population 2.

The alternative hypotheses a and b are directional and so if


one uses either of these two alternative hypotheses, a one-tailed
test will be employed. However, a two-tailed test is used when one
102

indicated the alternative hypothesis c since this is a non-directional


hypothesis.
2. Test-Statistics: Use t-test for independent samples (case 2)
3. Level of Significance: Use level of significance.
4. Decision Criterion:
Note that the decision criterion would entirely depend on
the formulated alternative hypothesis. As such, use the
following decision criterion and corresponds to the alternative
hypothesis.
a. Reject
b. Reject
c. Reject

if
if
if

,
,

5. Computation:

Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= sample variance of group 1
= sample variance of group 2

Note that

103

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Where:
= sample size of group 1
= sample size of group 2
=
=
= sample variance of group 1
= sample variance of group 2
6. State your decision based on the rejection criterion and the
computer test-statistic.
7. State your conclusion based on your decision.

Illustration for t-test for independent samples (case 2):


A study was conducted to compare the scores of 4 Math
students and 6 Science students. The results are as follows:
Math
Science

15
15

32
28

40
30

50
17

20

22

The researcher wants to determine if the mean scores is


significantly higher for Math than Science. Use 1% level of
significance.

Solution:

Math
Science

Total Mean Variance


15 32 40 50
137 34.25 218.9167
15 28 30 17 20 22 12 22.00 35.6000

First, we will test the equality of variance:


a.

104

b. Test-Statistic: Use F-test at = 0.05 level of significance.


if
c. Decision Criterion: Reject

Note that
Where: = sample size of the group with the larger
variance
= sample size of the group with smaller
variance
d. Computation of the Test-Statistic

= 6.1493
e. Decision: Since
f. Conclusion:

, reject

With the result on the test for the quality of variances, we can
proceed to the solution of the main problem. Thus,
1.

, That the mean scores of 4 students in Math


is equal to the mean scores of 6 science students.
, That the mean scores of 4 students in Math
is significantly higher with the mean scores of 6 science
students
2. Test-Statistics: Use t-test for independent samples (case
2).
3. Level of Significance: Use = 0.01 level of significance.
4. Decision Criterion: Reject
if
.

=4
=6
105

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

=
=

5. Computation of the test-statistic:

6. Decision: Since
, we fail to reject
.
7. Conclusion: The mean scores of 4 Math students is equal
to the mean scores of 6 science students.

106

Exercise No. 11
T-TEST FOR INDEPENDENT SAMPLES
1. The weight of 2 groups of children (randomized samples)
taken were found to be as follows:
Group
A
B

22.5
14.1

24.4
20.6

26.4
24.1

Weight (kg)
25.5 24.9
22.5 24

23.7
31.2

26.5
21.6

23.3

Test the hypothesis, at 5% level of significance, that the two


groups of students are equal against the alternative hypothesis
that they are unequal.
2. Find out whether poly bags affects height growth (m) of White
lauan (Shore contarta). Group A were planted on poly bags
and the other Group B planted in a bed.

Group A
Group B

1.9
2.1

Plant height (m)


0.5
2.8
1.3
1.4

3.1
0.6

0.9

Test the hypothesis at 1% level of significance.


3. In a study designed to estimate the volumes of water
discharge in two major creeks (gal/sec).

Creek 1
Creek 2

1
15
20

2
20
24

No of Trials
3
4
12
10
21
18

5
25
28

6
14

Determine if Creek 1 significantly have higher water discharge


(gal/sec) compared with Creek 2.

107

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

4. Determine whether extra-curricular activities have detrimental


effects to the grades of education students, the following GPA
were recorded over a period of 8 years.
Group
With
Without

YEAR
1

85
95

86
87

83
79

89
91

78
89

92
95

85
81

84
89

Assuming the population to be normally distributed, test at 5%


level of significance whether to participate actively in extra-curricular
activities is detrimental to the student grades (GPA).

Exercise No. 12
1. A student want to buy battery for her flashlight. She has a choice
of three brands of rechargeable batteries that vary in cost. She
obtains the sample data in the following table. She randomly
selects three batteries for each brand, and test them for
operating time (in hours) before recharging is necessary. Are
three brands have the same mean usable time before
recharging is required?
Brand
A
B
C

108

24.7
28.4
27.5

OPERATING TIME (in hours)


27.9
20.9
29.6
23.7
26.7
25.0

28.2
29.5
24.6

2. The following data shows the mid-term grades obtained by the


five students statistics, biology, math, and FOR 102:
STUDENT
SUBJECTS
Statistics
Biology
Math
FOR 102
88
80
79
90
1
86
91
94
83
2
81
83
88
85
3
76
84
80
78
4
87
85
87
83
5
Use a 0.05 level of significance to test the hypothesis that
a. Students have equal ability
b. Courses are just the same
3. A Forester examined the effect of maintaining the water table at
three different heights on seminal length of three hardwood
seedlings. Evaluate at 5% level of whether water table and type
of cereal significantly affect root length.
Tree Species
Molave
Ipil-ipil
Dao

Low
9.2
20.6
24.3

WATER TABLE
Medium
High
9.7
11.5
9.2
6.8
20.3
7.9

Yield
(kgs/ha)

4. Four varieties of rice have been tested for yield production


recorded as follows:

A
4,500
5,500
8,000
4,000
7,735

Rice Varieties
B
C
7,000
5,000
4,125
8,000
9,000
5,000
5,235
9,900
3,699
6,342

D
5,325
7,985
6,689
7,321
6,390

Are the yields the same with respect to varieties?


109

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Simple Linear Regression Analysis


Regression analysis is a statistical technique used to
determining the functional form of relationship between two or more
variables, where one variable is called the dependent or response
variable and the rest are called independent or explanatory
variables. The ultimate objective is usually to predict or estimate the
value of the response variable given the values of the independent
variables.
The relationship between the variables X and Y is represented
by a statistical model of the form Y = f(x) +
Where:
Y = response or dependent variables (measures an outcomes
of the study)
X = explanatory or independent variable (attempts to explain
the observed outcomes)
In the simple linear regression the model is given by:

Where:
= ith observed value of the variable Y
= ith observed value of the variable X
= regression constant. It is the true Y-intercept
= regression coefficient. It measures the true increases in Y
per unit increase in X
110

= random error associated with Yi and Xi.


The general idea in regression analysis is to for the model
mentioned. However the model is based on population values or
parameters. So, estimate the model based on a simple random
sample. The estimated model is denoted as,
= b0 + b1X where, b0 and b1 are estimates of 0 and 1,
respectively,
On the other hand, the Ordinary Least Squares (OLS)
methods may estimate the population parameters by minimizing the
error sum of squares. That is, minimize,

The OLS estimators of 0 and 1 are,

Note that

The computed values of 0 and 1 determine the equation of


the regression line given by = b0 + b1X. This equation is used to
predict/estimates the value of Y, denoted by , given a value of X.
Hence, it is also defined as the predicting equation. Note further that
if the regression coefficient b1 is positive, the relationship between
the X and Y is directly proportional. That is, when X increases, Y
also increases and when X decreases, Y also decreases. However,
111

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

if the regression coefficient b1 is negative, the relationship between


X and Y is inversely proportional. This denoted that when X
increases, Y decreases and when X decreases, Y increases.
Evaluation of the Simple Linear Regression Equation
An overall measure of adequacy of the equation is provided
by the coefficient of determination, R2.

R2 gives the proposition of total variation in Y that is


accounted for by the independent variable X. R2 ranges from 0 to 1
or 0 to 100 if expressed in %. The nearer its value to 1 or 100 the
better is the fit of the regression line.
Individual
1
2

x
x1
x2

y
y1
y2

xy
x1 y1
x2 y2

x2
x12
x22

y2
y12
y22

xi

yi

xi yi

xi2

yi2

n
Total

xn
x

yn
y

xn yn
xy

xn2
x2

yn2
y2

The Test of Hypothesis for the Regression Coefficient (1)


1.

,
That there is no significant linear relationship that
exists between X and Y.
,
That there is a significant linear relationship
between X and Y.

2. Test-Statistics: Use F-test at level of significance


if
,
3. Rejection Criterion: Reject
4. Computation:
112

a.
b.

c.
d.
e.
f.
ANOVA
Source of
Variation
Regression
Error
Total

Degrees of
Freedom
1
n2

Sum of
Squared
SSReg
SSError

n1

SSTotal

Mean
Squares
MSReg
MSError

Computed
F
Fc

5. State your decision.


6. State your conclusion.

Sample Illustration of Simple Regression Analysis


Problem: Suppose that we are given the following sample data to
study the relationship between the Math scores and Science scores
of 7 students.
Math Scores
98
99
118
94
109
116
97

Science Scores
75
70
95
72
88
85
94
113

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

85
70
83

100
99
114

a. Fit an equation of the form = b0 + b1X to a given


data. Interpret the values obtained in the equation.
b. Determine if there is a significant linear relationship
between tree heights and basal area using F-test.

Solution for problem the is:

SUM
MEAN

Math
Scores
(X)
98
99
118
94
109
116
97
100
99
114
1044
104.4

i)
Where:

Therefore:
114

Science
Scores
(Y)
75
70
95
72
88
85
94
85
70
83
827
82.7

x2

y2

xy

9604
9801
13924
8836
11881
13456
9409
10000
9801
12996
109708

5625
4900
9025
5184
7744
7225
8836
7225
4900
8649
69313

7350
6390
11210
6768
9592
9860
9118
8500
6930
10602
86860

ii)
Where:

Therefore:
iii) The resulting regression equation is
= 6.5336 + 0.7296x
The prediction equation has a y-intercept of 6.534 and a
slope of a line equal to 0.7296. The y-intercept of 6.534
indicates that even if the Math scores is 0 (which is
impossible), the Science scores predicted to be 6.534
or 7. The slope of 0.7296 indicates that the Science
score increases to about 0.7296 for every unit increase
in the Math scores.

Solution for problem b:


1.

,
That there is no significant linear relationship that
exists between Math and Science scores of the 7 students
.
2.
, That there is a significant linear relationship
between the Math and Science scores
.
3. Test-Statistics: Use F-test at =0.05 level of significance
if
,
4. Rejection Criterion: Reject
115

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

5. Computation:
a.
b.

c.
= 920.10 380.2484
= 539.8516
d.
e.
f.

The ANOVA Table


Source of Degrees of
Variation
Freedom
1
Regression
Error
8
Total
9

Sum of
Squared
380.2484
539.8516
920.1000

Mean
Squares
380.2484
67.4815

Computed
F
5.6349*

* = significant at 5% level

6. Decision: Since
7. Conclusion: The result indicated that there is a significant linear
relationship between the Math and Science scores of 7 students.

116

Correlation Analysis
Correlation Analysis is a statistical technique used to
determine the strength or degree of linear relationship between two
variables. A measure of the degree of linear relationship is called
correlation coefficient, r. The more pronounced the linear
relationship and the greater is the magnitude of the correlation
coefficient.
The value of r range from -1 to +1. The correlation coefficient,
r is interpreted as follows.
Values of r
1
0.81 to 0.99
0.61 to 0.80
0.41 to 0.60
0.21 to 0.40
0.01 to 0.20
0

Qualitative Interpretation
Perfect linear relationship
Very strong linear relationship
Strong linear relationship
Moderate linear relationship
Weak linear relationship
Very weak linear relationship
No linear relationship

Note: Researcher should be careful in interpreting the


correlation coefficient when it is near zero. It is possible
that variable X and Y are strongly correlated but not in
linear form
Estimation of the Correlation Coefficient,
Based on a Simple Random Sample (SRS) of size n, an
estimator of is the sample correlation coefficient, r, defined as

117

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

Where:
sum of the cross produce of X and Y

sum of squares of x

sum of squares of x

Test of Hypothesis about Correlation Coefficient, :


1.

There is no correlation between X and Y.


, There is a significant positive correlation
between X and Y (one tailed test).
b.
, There is a significant negative correlation
between X and Y (one tailed test).
c.
, There is a significant correlation between X
and Y (two-tailed test)

a.

2. Test-Statistics: Use t-test at a level of significance


3. Rejection Criterion:
The following rejection is used based on the alternative
hypothesis:
a. Reject
b. Reject
c. Reject

if
if
if

,
,

4. Computation for the test-statistic.

118

Where:
Tc = computed t-statistic
r = correlation coefficient
n= sample size
5. Decision: State your decision based on the rejection criterion
and computer t-statistic.
6. Conclusion: State your conclusion.

Illustration:
Problem: Compare and interpret the correlation coefficient for
the following grades of ten education students selected at random
Student
1
2
3
4
5
6
7
8
9
10
Total
Mean

where:

Stat 101
(X)
78
90
79
80
88
90
78
82
90
70
819
81.9

Math 102
(y)
76
88
80
78
90
92
80
82
89
68
823
82.3

x2

y2

xy

5184
8100
6241
6400
7744
8100
6084
6724
8100
4900
67577

5776
7744
6400
6084
8100
8464
6400
6724
7921
4624
68237

5472
7920
6320
6240
7920
8280
6240
6724
8010
4760
67886

sum of the cross produce of X and Y

119

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

= 482.3
sum of squares of x

= 500.9
sum of squares of x
=504.10
Therefore:

r = 0.960
Based on the qualitative interpretation of r, the result indicates
that there is a direct very strong linear relationship between
the grades of students in Stat 101 and Math 102. That is, the
higher is the students Stat 101 grade, the higher is his Math
102 grades.
Scatter plot presentation:

Math





Regression Equation
Coefficient of Determination
Regression Equation
Coefficient of Determination

120

Interpretation:
Regression (Y- intercept)
The prediction equation has a y-intercept of -6.171 and a
slope of a line equal to 1.0724. The y-intercept of -6.171
indicates that if the score in STAT 101 is 0, the Math 102
score is predicted to be -6. The slope of 1.0724 indicates that
the STAT 101 score increases to about 1.07 for every unit
increase in the Math 102 score.
Coefficient of Determination (R2)
The R2 value of 0.9456 implies that the independent variable
STAT 101 (as pre-requisite course) accounts 94.56% on the
scores in Math 102. Only about 5.44% accounts for other
factors not included in the model.
Testing the Degree of Relationship Between Stat 101 and
MATH 102 Grades:
1.

, There is no correlation between the Students Stat


and MATH 102 grades.
, There is a significant degree of relationship
between the students grades in Stat 101 and MATH 102.
2. Test-Statistics: Use t-test at 0.05 level of significance
3. Rejection Criterion: Reject
if
4. Computation for the Test-Statistics:

= 9.697
5. Decision: Since

reject

6. Conclusion: The result reveals that there is a significant


degree of relationship between students grades in Stat 101
and MATH 102.
121

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Measures of Correlation

+ Relationship

No Relationship

- Relationship

No Relationship

Pearson Product-Moment Coefficient of Correlation

Equation:
r=

nXY (X)(Y)_____
[nX (X)2][nY2 (Y)2
2

Where: r = coefficient of correlation


X = the 1st set of test score
Y = the 2nd set of scores
N = total number of pairing
Calculation of Pearson Product-Moment Coefficent of
Correlation Between Heights and Diameter of Molave (Vitex
parviflora) Seedlings

122

Heights (X) Diameter (Y)


0.22
76
0.34
55
0.34
78
0.21
89
0.19
47
0.22
57
0.35
57
0.43
66
0.47
79
604
2.77
=

XY
16.72
18.70
26.52
18.69
8.93
12.54
19.95
28.38
37.13
187.56

X2
5,776
3,025
6,084
7,921
2,209
3,249
3,249
4,356
6,241
42,110

Y2
0.0484
0.1156
0.1156
0.0441
0.0361
0.0484
0.1225
0.1849
0.2209
0.9365

9(187.56) (604)(2.77)
[9(42,110) (604)2][(9)(0.9365)2

= 1,688.04 1,673.08
(378,990 364,816)(8.4285)2
= 14.96
(14,174)(71.04)
= 14.96
1,006,921
= 14.96
1,003.455
= 0.0000149
The result simply means that there is no linear relationship between
DBH and height of the seedlings using Pearson Product-Moment
Coefficient of Correlation.

123

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Scatter plot presentation

The plot explained that tree height is not related to dbh. The
Regression equation represented by y=0.0011x + 0.2369 explained
that the y-intercept of 0.2369 indicates that even if tree height is 0,
dbh remains 0.24. The slope of 0.0011 indicates that dbh increases
to about 0.0011 for every unit increase of tree heights.
Simple Linear Correlation Analysis
It deals with the estimation and test of significance of the
simple linear correlation coefficient r, which is a measure of the
degree of linear relationship between two variables X and Y. this
can be computed using the equation:
r=

xy
(x )(y)2
2

Where:
x = deviate of data X
y = deviate of data Y

Example:
124

X
196
171
170
207
177
162
199
173
231
140
1826
182.6
r=

Y
200
178
169
212
180
165
201
179
243
144
1871
187.1

x2

y2

xy

13.4
12.9
179.56
166.41
172.86
-11.6
-9.1
134.56
82.81
105.56
-12.6
-18.1
158.76
327.61
228.06
24.4
24.9
595.36
620.01
607.56
-5.6
-7.1
31.36
50.41
39.76
-20.6
-22.1
424.36
488.41
455.26
16.4
13.9
268.96
193.21
227.96
-9.6
-8.1
92.16
65.61
77.76
48.4
55.9 2342.56
3124.81 2705.56
-42.6
-43.1 1814.76
1857.61 1836.06
1643.4 1683.9 6042.4 2835519.21 2767321



13.4
12.9

x y
(x2 )(y)2

6492.859
( 6042.4)( 6976.9)

6492.859
6492.859

0.995

Interpretation:
Based on the qualitative interpretation of r, the result
indicates that there is a direct very strong linear relationship
between the X and Y variables. That is, the higher is the X
variable, the higher is the Y value.

125

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Scatter plot presentation

The plot explained that X variable is linearly related to Yvariable. The Regression equation represented by y=1.0685x
8.011 explained that the y-intercept of -8.011 indicates that even if
X variable is 0, Y remains -8. The slope of 1.0685 indicates that Y
increases to about 1.07 for every unit increase of X.
Spearman Rank Correlation
(for Non-parametric data)
= 1 6D2
n(n2-1)
Example:
Entry No.
1
2
3
4
5
6
7
8
9

126

1st Judge
9
2
7
4
5
8
6
3
1

2nd Judge
7
6
8
4
9
1
2
5
3

D
2
-4
-1
0
-4
7
4
-2
-2

D2
4
16
1
0
16
49
16
4
4
110

Computation:
= 1 6D2
n(n2-1)
= 1 6(110)
9(81-1)
= 1- 660
720
= 1 0.9167
= 0.083
Based on the result, no relationship detected between judge
no. 1 and judge no. 2.
Scatter plot presentation
















The plot proved that there is no linear relation of the judging of


the two judges.
Judge No. 1 only accounts 0.007% with judge no. 2. The
remaining large percentage of about 99.99% is due to other
factor not included in the analysis.
127

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

Exercise No. 13
SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS
1. The following data shows the score (X) of students during
examination and the number of hours (Y) they studied for the
examination:

X
Y

85
8

78
3

90
10

92
15

87
11

89
12

80
6

82
8

81
7

a. Estimate the regression equation.


b. Predict the score of a student who studied 13 hours for
the examination.
c. Determine if there is a significant relationship between
the number of hours the students studied for an
examination and their scores.
2. The following data were obtained on a study of the relationship
between the weight and volumes of wood samples:
WEIGHT(g)
2,080
2,150
4,041
3,152
3,221
4,132
2,231
4,430
3,573

Wood Volumes (bdft)


2.5
3.5
3.4
3.4
4.5
6.5
3.0
6.0
5.5

At 5% level of significance, is there a significant correlation


between the weight and volumes of wood samples?

128

3. The figures given below pertain to the monthly disposable


income and consumption expenditure of five families:

FAMILY
2

Disposable Income (X)

800.5

100.5

104.5

200.6

522.0

Consumption
Expenditure(Y)

97.6

53.2

62.8

116.9

218.6

a. Compute the correlation coefficient and test for the


significance of correlation between disposable income and
consumption expenditure at the 5 percent level of
significance.
b. Compute for the coefficient of determination and interpret
the derived value.
4. The following are the data on soil pH and height (H) of seedlings
in cm.
Seedling
Soil
pH
H

10

11

12

6.8

7.0

6.9

7.2

7.3

7.0

7.0

7.5

7.3

7.1

6.5

6.4

154 167 162 175 190 158 166 195 189 186 148 140

A research is interested in learning how strong the association


and how well he can predict the effect soil pH on the height of
seedlings.
a.
b.
c.
d.
e.
f.

Compute a Person r and interpret the result.


Test
at 5% level of significance.
Compute r2 and explain what it means.
Calculate the regression equation for the data.
Test
at the 1% level of significance.
What would be likely height growth of seedlings under
soil pH of 6.3?

129

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

ALTERNATIVE NON-PARAMETRIC TECHNIQUES


Non-Parametric technique is a test of significance
appropriate when the data represent an ordinal or nominal scale or
when assumptions required for parametric test cannot be met.
Related Samples
(two sample Case)
McNemar Change Test
Function:
This test for the significant of changes is particularly
applicable to those before and after designs in which each person
or subject is used as his own control and in which measurement is
in the strength of either nominal or ordinal scale.
Method
To test the significance of any observed changes by this test,
one sets up a fourfold table of frequencies (see figure below) to
represent the first and second sets of responses from the same
individuals. Note that + and are used to signify different
responses.

130

After

Before

Note also that those cases which show changes between the
1st and 2nd response appear in cells A and D. An individual is
tailled in cell A if her changes from + to -, tallied in cell D if he
changed from to +, and tallied in both cells B and C if no changes
is observed.
The formula for a McNemar Change test is equated as:

The significance of any observed


as computed from the
equation given for McNemar Change test is determined by
reference to the Table for the Critical values for
If the
observed value of
is equal to or greater that that shown in the
said table for a particular significance level at df = 1t, the implication
is that a significant effect was demonstrated in the before and
after responses.
Sample Illustration by Tagaro and Tagaro:
A sociologist is interested in 25 fish cage operators sources
of information before (prior to engaging in fish cage operation) and
after (period in operation). He has data that operators prior to
engaging in fish cage culture got their information from co-fisher
farmers. With increasing familiarity and experience in fishcage
culture (period in operation) the sources of information are other
than co-fisher farmers (e.g. extension workers, pamphlets, others).

131

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The data cast in the form show below.


After Operation
Co-Fisher
Others
farmers
Co-Fisher
farmers
Others
Total

Prior
Operation

Total

14

18

3
17

4
8

7
25

The researcher wants to determine if increasing familiarity and


experience, fish cage operators will increasingly change source of
information from co-fisher farmers to other sources of information.
Solution:
1.

, For those operators who change, the probability


that any operators will change his source of information from
co-fisher farmers to other sources (PA) is equal to the
probability that he will change his source of information from
others to co-fisher farmers (PD).

2.

3.

4.
5.
6.

132

, For those operators who change, the probability


that any operators will change his source of information from
co-fisher farmers to other sources (PA) is greater than the
probability that he will change his source of information from
others to co-fisher farmers (PD).
Test-Statistics: The McNemar change test for the significance
of changes is chosen because the study uses two related
samples, each of the before and after type uses normal
(classificatory) measurement.
Significance Level Let = 0.05, N=25
Rejection Criterion: Reject
if
.
Computation:

= 4.50
7. Decision: Since
reject
8. Conclusion: It can be concluded that for those operators who
change, the probability that any operators will change his
source of information from co-fisher farmers to other sources
(PA) is greater than the probability that he will change his
source of information from others to co-fisher farmers
(PD).That is, fish cage operators show a significant tendency
to change their sources of information from co-fisher farmers
to other source when they get familiarity and experience in
fish culture (Tagaro and Tagaro).

Exercise No. 15
McNemar Change Test
1. A scientist is interested in finding out whether the students in
Political Science changed their perception of the government two
years after the election. Immediately after the elections, he asked
the opinion of the community toward the government. Two years
later, he asked the same question to the same people. The result
are as follows:

Before

Favorable(+)
Unfavorable(-)

After Operation
Unfavorable(-)
Favorable(+)
30
19
15
24

2. During the deliberation of the bill reproductive health, a group of


people was asked about their opinions toward the bill. Three
year after becoming a law, the same people were asked the
same question. The intention is to find out whether the events of
133

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

the first year during the law was in effect made them change
their minds, The data are summarized as:
One Year After
Against
Favor
Favor
25
12
3 Years
Before
Against
16
28
3. A group of faculty were asked about their degree of satisfaction
with the university before and after a change in administration.
The test uses significance at 0.05 level of the change in opinion
of the faculty regarding the school administration. The data are
as follows:

Before

Satisfied
Not Satisfied

After
Not Satisfied
Satisfied
40
56
22
18

Sign Test
Functions
The sign test gets its name from the fact that it uses + and signs rather than quantitative measures in its data. It is particularly
useful for research in which quantitative measurement is
impossible, but possible to rank with respect to each other the two
members of each pair.
The sign test is applicable to the case of low related samples
when the researcher wishes to establish that two conditions are
different.

134

Method
The null hypothesis tested by the sign tests that

Where
is the judgment or score under one condition (or before
is the judgment or score under the other
the treatment) and
condition (after the treatment). That is
are the two scores
is that the median
for a matched pair. Another way of stating
difference of X and Y is zero.
In applying the sign test, we focus on the direction of the difference
, nothing where sign of the difference is + or -.
between every
is true, we would expect the number of pairs which have
When
to be equal to the number of pairs which have
. That
is if the null hypothesis is true, we would expect about half of the
difference to be a negative and half to be positive.
is rejected if
too few differences of one sign occur.
Hypothesis Testing Procedure of Sign Test:
1.

2. Test Statistic: Use


(Sign Test) at level of
significance.
.
3. Rejection Criterion:
4. Computation:
a. Determine the sign of the difference between the two
numbers of each pair.
b. By counting, determine the value o n, the number of
pairs whose difference shown a sign. Let a be the
number of negative signs, and b be the number of
positive signs.
c. For small samples, n 35 use the formula.
135

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

d. For large samples, use the formula:

Where: x is the number of fewer signs. Here we use +1 when x<n/2


and -1 when x>n/2. The significance of an obtained x may be
determined by reference to Appendix Table A (Probabilities
associated with the upper tail of the normal distribution).
5. State your decision.
6. State your conclusion.
Sample Illustration No.1 (for small samples)
Suppose we wish to determine whether increase of wages
would increase the daily output of employees. Let X be the daily
output in units before the increase in wages and Y be the daily
output after the increase of wages. Sample of 15 employees yield
the following data:
EMPLOYEES
1
2
3
4
5
6
7
8
9
10
11
12
13
136

X
91
88
70
79
85
86
90
66
72
60
75
84
80

Y
88
87
67
69
83
81
93
67
76
55
74
86
72

di
3
1
3
10
2
5
-3
-1
-4
5
1
-2
8

14
80
90
-10
15
70
75
-5
Using the sign test, it is reasonable to say that the data presented
sufficient evidence to conclude that the daily output after the wage
increase is higher than before the increase.
Solution:
1.

The daily output after the wage increase is the same as


the daily output before the increase.
The daily output after the increase is higher than the daily
output before the wage increase.
(Sign Test) at = 0.05 level of
2. Test Statistic: Use
significance.
3. Rejection Criterion:
4. Computation:
EMPLOYEES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

X
91
88
70
79
85
86
90
66
72
60
75
84
80
80
70

Y
88
87
67
69
83
81
93
67
76
55
74
86
72
90
75

di
+
+
+
+
+
+
+
+
+
-

a. Determine the sign of the difference between the two


numbers of each pair.
137

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

b. By counting, the value of n = 15, the number of pairs


whose difference shows a sign. In the data given, a = 6
(the number of negative signs), and b = 9 (the number
of positive sign.

c. For small samples, n 35 use the formula.

5. Decision: Since
, there is no sufficient evidence
to reject .
6. Conclusion: The hourly output after the raise of salary does not
significantly differ from the hourly output before the raise.
Illustrative Example No. 2 (For large Samples, n > 35)
A company claims that its hiring practices are fair, it does not
discriminate on the basis of gender, and the fact that 40 of the last
50 new employees are men is just a fluke (an unexpected random
event). The company acknowledges that applicants are about half
men and women. Test the null hypothesis that men and are equal in
their ability to be employed by this company. Use the significance
level of 0.05.
Solution:
1.

The proportion of men and women in the company are


equal.

The proportion of men and women in the company are


not equal.
2. Test Statistic and Significant Level: Use Sign Test at 5% level
of significance.
3. Rejection Criterion:
138

4. Computation:
If we denote players by + and non-players by - , we
have 10 positive signs and 40 negative signs. The test statistic X is the smaller of 10 and 40, so X=10. We note that
the value of n = 50 is above 35, so the test-statistic X is
converted to the test-statistic Z. Here, we use +1 since X <
n/2 or X = 10<50/2=25.

5. Decision: Since
reject
6. Conclusion: There is a sufficient evidence to warrant rejection of
the claim that the hiring practices is fair.

139

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Exercise No. 12
Sign Test
1. Two different firms design their own IQ test and psychologist
administer both test to randomly selected students with the
results given below. At the 0.05 level of significance, test the
hypothesis that there is no significant difference between the two
tests.
Test
I
II

99
115

95
103

115
113

102
98

108
112

105
106

92
97

88
97

101
107

99
103

2. A test of running ability is given to a random sample of 8 runners


before and after they completed a formal athletic lesson course
in track and field. The results follow. At the 0.05 significance
level, test the hypothesis that the course does not affect running
scores.
Runners

Before
After

97
115

99
103

110
113

112
98

106
112

105
109

92
97

98
95

140

Wilcoxon Signed-Ranks Test


For Two Dependent Samples
The Wilcoxon signed-ranks test is very useful test for the
behavioral scientist. With behavioral data, it is not uncommon that
the researchers (1) tell which member of a pair is greater than, i.e.
tell the sign of the difference between any pair, and (2) rank the
difference in order of absolute size. That is, the researcher can
make the judgment of greater than between any pairs two values
as well as between any two different scores arising from any two
pairs.
Procedures:
1. For each pair of data, find the difference d by subtracting the
second score from the first. Retain signs, but discard any
pairs for which d = 0.
2. Ignoring the signs of those differences, rank them from lowest
to highest.
When differences have the same numerical value, assign to
them the mean of the ranks involved in the tie.
3. Assign to each rank the sign of the differences from which it
can.
4. Find the sum of the absolute values of the negative ranks.
5. Let T be the smaller of the two sums found in step 4.
6. Let N be the number of pairs of data for which the difference d
is not zero.
7. If n 30, use Table A-8 to find the critical value of T. Reject H0
if the sample data yield a value of T less than or equal to
value in Table A-8. Otherwise, fail to reject the null
hypothesis. If n > 30, compute the test statistics z by using the
formula:

141

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

When eq. z is used, the critical z values are found from


Appendix Table A. Again, reject the null hypothesis if the test
statistics z is greater than or equal the critical value(s) of z.
Otherwise, fall to reject the null hypothesis.
Consider the example given by Tagaro and Tagaro. Thirteen
students were tested for logical thinking. They are then given a
tranquilizer and retested. Let us use the Wilcoxon signed-ranks test
to test the hypothesis that the tranquilizer has no effect, so that
there is no significant difference before and after testing scores. We
will assume a 0.05 level of significance.
a. H0: The tranquilizer has no effect on logical thinking.
Ha: The tranquilizer has an effect on logical thinking.
b. Test Statistics: Use Wilcoxon signed-ranks test.
c. Significance Level: Use 0.05 level of significance.
d. Rejection Criterion: Reject H0 if T is less or equal to the critical
value in Table A-8 since < 30 in the example.
e. Computation:
T = min ( (+), (-))
COURSE

BEFORE

AFTER

DIFFERENCE

RANK OF
DIFFERENCE

SIGNED
RANKS

A
B
C
D
E
F
G
H
I
J
K
L
M

67
78
81
72
75
92
84
83
77
65
71
79
80

68
81
85
60
75
81
73
78
84
56
61
64
63

-1
-3
-4
+12
0
+11
+11
+5
-7
+9
+10
+15
+17

1
2
3
10
8.5
8.5
4
5
6
7
11
12

-1
-2
-3
+10
+8.5
+8.5
+4
-5
+6
+7
+11
+12

Sum of the absolute value of negative ranks = 1+2+3+5=11


142

Sum of the absolute value of the positive ranks =


10+8.5+8.5+4+6+7+11+12=67.
T 11 (smaller of the two sums)
f. Decision: since T = 11 is less than the critical value = 14 in
Table A-8, reject H0.
g. Conclusion: It appears that the drug affects scores.

Independent Sample
(two-sample case)
Chi-square Test for Two Independent Samples
Function
When the data of research consist of frequencies in discrete
categories, x2 test may be used to determine the significance of
difference between two independent groups. The measurement
involved may be as weak as nominal scaling.
The hypothesis under the test is usually that the two groups
differ with respect to some characteristics and therefore with respect
to the relative frequency with which group members fall in several
categories.
Method
The null hypothesis may be tested by

Where:
= the observed number of cases in the ijth category

143

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

= the expected number of cases in the ijth category when the null
hypothesis is true
k = the number of categories
Note that the most common of all uses of the x2 test is the test
whether an observed breakdown of frequencies in a 2 x 2
contingency table could have occurred under H0. When applying a
Chi-square test where both r and k equal 2, the following formula
should be used:
2 x 2 Contingency Table
GROUP
I
A
II
C
Total
A+C

+
B
D
B+D

Total
A+B
C+D
N

with df =1
To test the significance of x2 computed, the researcher has to refer
to the table of critical values x2 (Appendix Table C). If the x2
computed is greater than or equal to the critical value of x2 , then
reject H0.
Sample Illustration by Tagaro and Tagaro:
A researcher studied the relation of feeding management in
tilapia raising with efficiency. Efficiency is measured in terms of total
weights (in kg) at harvest time. The greater the weight means that
the feeding management is more efficient that other assuming all
other factors of production constant. A purposive sampling of 60
fishponds operators using feeding a day (A) and 65 fishpond
operators using 3 times in a day (B) were interviewed. The amounts
of feed for both groups are equal.

144

The response of respondents is summarized in a 2x2 table


below:
FEEDING
MGT.
A
B
Total

Efficient

Not Efficient

Total

40
30
70

20
35
55

60
65
125

Solution:
1. H0: There is no difference in efficiency between the two
feeding management.
Ha: Management has a significant effect on the tilapia raising..
2. Test Statistics: Use The x2 test for two independent samples
is chosen because the two groups (A and B) are independent
and because the score under study are frequencies in
discrete categories (efficient and not efficient).
3. Level of Significance: Use 5% as the level of significance, N
= 125 (number of fishpond operators).
4. Rejection Criterion: Reject H0 if
5. Computation:

= 4.53
, reject H0.
6. Decision: since
7. Conclusion: We conclude that the management has a
significant effect on the tilapia raising.

145

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Wilcoxon-Mann-Whitney Test
Function
The Wilcoxon-Mann-Whitney test is an examination of
equality of two population distributions. The test is most useful in
testing for equality of two population means. As such, the test is an
alternative to the two-sample t-test and is used when the
assumption of normal population distribution is not met. The test is
slightly weaker than the t-test.
The only assumptions required for the test are random
samples from the two populations of interest and that they are also
drawn independently of each other. If we intent to state the
hypothesis in terms of population means or medians, we need to
add an assumptions, the difference is in location (mean or median).
Method
Case 1. When the samples sizes are equal, n1=n2
1. Put all observation in a single array tagging each
observation to differentiate the origin of each observation.
2. Rank the observations in the combined array.
3. Assign the average rank in case of ties.
4. Sum the rank of the first sample (T1) and the rank of the
second sample (T2) and compute T = min (T1,T2).
5. Compare T with tabular value (Table A10)
6. Decision Criterion: Reject H0 if T Ttab.
Case 2. When samples sizes are unequal, n1 < n2.
1. Do step 1 to 3 in case 1.
2. Find the total ranks for sample that has the smaller size, n1
(T1).
3. Compute for T2 = n2 (n1+n2) - T1.
4. Determine T = min (T1,T2).
146

5. Compare T with the tabular value.


6. Decision Criterion: Reject H0 if T Ttab.

Illustration:
Suppose a researcher wants to compare the daily
expenditures of families in rural area with that of the City. Suppose
further that there are 15 sample respondents from the City, while
there are only 10 respondents from the rural area. Below are the
data corresponding to the groups under consideration:
DILY EXPENDITURES IN
RURAL AREA (PhP)
100
200
186
177
67
74
48
300
244
74

DAILY EXPENDITURES IN
CITY (PhP)
250
300
600
134
890
52
570
153
462
115
405
117
334
157
224

Do the data indicate that the daily household expenditures in the


urban area are greater than those of the rural area? (Use 5% level
of significance)

147

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Solution:
1. H0: The daily household expenditures in the city are the same
as those of rural areas.
Ha: The daily household expenditures in the city are greater
than those of rural areas.
2. Test Statistics: Use Wilcoxon-Mann-Whitney test at 5% level
of significance. (case 2)
3. Rejection Criterion: Reject H0 if T Ttab.
4. Computation:
n1 = 10
n2 = 15
A
Array 48
Rank 1
B
157
11

A
177
12

B
52
2

A
67
3

A
186
13

A
74
4.5
A
200
14

A
74
4.5
A
244
15.5

A
B
B
B
B
100 115 117 134 153
6
7
8
9
10
B
244
15.5

B
250
17

B
B
B
B
B
B
Array 334 405 462 570 600 880
Rank 20 21 22 23 24 25
T1 = 1+3+4.5+4.5+6+12+13+14+15.5+18.5
= 92
T2 = n1 (n1+n2+1) - T1
= 10 (10+15+1) 92
= 168

148

B
300
18.5

A
300
18.5

T = min (T1,T2)
= min (92, 168)
= 92
5. Decision: Since T = 92 > T tab = 90, we fail to reject H0.
6. Conclusion: We conclude that the daily household
expenditures in the urban are the same as those in the rural
area.

Related Samples
(k samples case)
Friedman Two-Way Analysis of Variance by Ranks
The Friedman test (Friedman two-way analysis of variances
by ranks) is a nonparametric analogue of the parametric two-way
analysis of variance. The objective of this test is to determine if we
may conclude from a sample of results that there is difference
among treatment effects. The first step in calculating the test
statistic is to convert the original results to ranks. Thus, it ranks the
algorithms for each problem separately, the best performing
algorithm should have the rank of 1, the second best rank 2, etc. In
case of ties, average ranks are computed.
Let rji be the rank of the jth of k algorithms on the ith of n data
sets. The Friedman test needs the computation of the average
ranks of algorithms, Rj = 1/ni rji. Under the null hypothesis, which
states that all the algorithms behave similarly and thus their ranks Rj
should be equal, the Friedman statistic
Function
When the data from k matched samples are in at least an
ordinal scale, this test is useful for testing the null hypothesis that k
samples have been drawn from the same population.
149

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Since k samples are matched, the number of cases is the


same in each of the samples. The matching may be achieved by
studying the same group of subject under each k conditions.
Method
The formula to get the value of Friedman statistic denotes Fr
is given by:
1.
2. X2F = 12n ni R2j - k(k+1)2
k(k+1)
4
Where:
FR or X2F = Friedman value
N = number of rows (subjects)
k = number of columns (variables or conditions)
Rj = sum of ranks in the jth column (i.e, the sum of ranks for
the jth variable)
Directs one to sum th squares of the sums of ranks over
all conditions
The steps in the use of the Friedman two-way analysis of
variance by ranks are as follows:
1. Cast the scores in a two-way table having k column
(conditions) and N rows (subjects or groups).
2. Rank the scores in each row from 1 to k.
3. Determine the sum of the ranks in each column, Rj.
4. Compute the values of Fr using the formula:

150

5. Compare the Fr value with the critical value of x2 distribution


(Appendix Table C). If the probability yielded by the test is
equal to or less than (the set level of significance) then
reject H0. That is, if
, reject H0.
Sample Illustration:
In a study on the influence of three different teaching
strategies on the extent of learning of statistics, three matched
samples (k=3) of 10 students were trained under three strategies of
teaching. Matching was achieved by the use of 10 sets of students,
three in each set.
Prior to the training, a pre-test was given to the 30 statistics
students. After the training, the extent of learning was measured
giving a post test. Scores ranged from 0 to 50. The table below
shows the scores of each of the 30 students as a result of posttest
conducted.

GROUP
A
B
C
D
E
F
G
H
I
J

LECTURE
27
30
31
26
25
24
28
30
30
29

Teaching Strategies
LECTURE W/
LECTURE W/
POWER
FIELD
POINT
DEMONSTRATION
30
35
31
40
28
42
33
28
35
45
39
45
41
42
42
45
40
42
39
45

With the given data, the researcher wished to test if the


teaching strategies have significant effect on the extent of learning
statistics among students.
151

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Solution:
1. H0: The different methods of teaching have no differential
effect.
Ha: The different methods of teaching have differential effect.
2. Test Statistics: Use the non-parametic Friedman two-way
analysis of variance because the scores exhibited possible
lack of homogeneity of variance and thus the data suggested
that one of the basic assumptions of the F-test was
unattainable.
3. Level of Significance: Use 5% level of significance with
N=10 the number of students in the three matched groups.
4. Rejection Criterion: Reject H0 if
5. Computation:

GROUP
1
2
3
4
5
6
7
8
9
10
Rj

152

Methods of Teaching
LECTURE W/
LECTURE W/
LECTURE
POWER
FIELD
POINT
DEMONSTRATION
1
2
3
1
2
3
1.5
1.5
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
10.5
19.5
30

= 19.05
, reject H0.
6. Decision: Since
7. Conclusion: The different types of training have different
effect.
Multiple Comparisons Between Groups of Conditions
From the Result of the Friedman Two-Way ANOVA
When the obtained value of Fr is significant, it indicates that at
least one of the conditions differ from at least one other condition. It
does not tell the researcher which one is different, nor does it tell
the researcher how many of the groups varied from each other.
That is, when the obtained value of Fr is significant, we would like to
test the hypothesis H0: u=v against the alternative hypothesis Ha:
uv for some conditions of u and v. There is a simple procedure
for determining which condition/s differs. Begin by determining the
differences | Ru Rv | for all pairs of conditions or groups. When the
sample size is large, the differences are not independent so, the
comparison procedure must be adjusted appropriately. Suppose the
hypothesis of no difference between k conditions or matched
groups was tested and rejected at a significance level. Then we can
test the significance of individual pairs of differences by using the
following equality. That is, if

Or if the data are expressed in terms of average ranks within each


condition, and if

Then we may reject the hypothesis H0: u=v and conclude that
uv. Thus, if the average between the rank sums (or average
153

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

ranks) exceeds the corresponding critical value given in eq. R and


eq T, then we may conclude that the two conditions are different.
Above which lies
, percent of distribution. The value of Z
can be obtained from Appendix Table A.
Because it is often necessary to obtain values upon extremely
small probabilities, especially when k is large, Appendix Table A
may be used. This is a table of standard normal distribution which
has been arranged so that values used in multiple comparisons
may be obtained easily. The table is arranged on the basis of the
number of comparisons that can be made. The table values are the
upper-tail probabilities associated with values of . When there are
k groups, there are
comparison possible.

Example:
In the example above regarding teaching strategies, the Fr is
significant at 5% level, The following total ranks were obtained: RL =
10.5 (Lecture Method), RLH = 19.5 (Lecture with power point
presentation, RLD=30 (Lecture with Field Demonstration). We
have the following differences:
| RL RLH | = | 10.5 19.5 | = 9
| RL RLD | = | 10.5 30.0 | = 19.5
| RLH RLD | = | 19.5 30.0 | = 10.5
We then find the critical differences by using eq. R. since =
0.05 and k = 3, the number of comparisons, #c, is equal to
Referring to Appendix Table AII, we see that the
value of Z is 2.394. The critical difference is then

154

Since only the second difference (19.5) exceeds the critical


difference, we conclude on the difference between conditions RL
and RLD is significant/. Note that the first and third difference,
although large, are not a magnitude great enough when using the
significance level chosen. Hence, the lecture with power point
presentation is significantly differed with the other two teaching
strategies influencing learning in statistics.

Independent Samples (k-sample case)


Chi-square Test for r x k Tables
Function and Rationale
Research is undertaken because one is interested in the
number of subject, object, or responses which fall into various
categories. For example, root growth potentials of certain tree
species may be categorized according to the frequency of first order
lateral roots (FOLR) in seedlings, the hypothesis being that these
FOLR will differ in frequency in the seedlings. Or persons may be
categorized according to whether they are in favor of, indifferent
to or opposed to an opinion to enable the researcher to test the
hypothesis that these responses will differ in frequency.
The Chi-square test is suitable for analyzing data like these.
The number of categories may be two or more. The technique is of
goodness-of-fit test, in that it may be used to test whether a
significant difference exists between an observed number based
upon the null hypothesis. That is, the chi-square test assesses the
degree of correspondence between the observed and expected
observations in each category.

155

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Method
To compare an observed with an expected group of
frequencies, we must be able to state what frequencies would be
expected. The null hypothesis states the proportion of objects falling
in each of the categories in the presumed population. That, from the
null hypothesis we may deduce what are the expected frequencies.
The chi-square technique gives the probability that the observed
frequencies could have been sampled from a population with the
given expected values.
The null hypothesis may be tested by using the following statistic:

Where:
= the observed number of cases in the ijth category
= the expected number of cases in the ijth category when the null
hypothesis is true
k = the number of categories
If the agreement between the observed and expected
frequencies is close, the differences (
) will be small and the
chi-square will be small. However if the divergence is large, the
value of chi-square as computed in the equation will likewise be
large. Roughly, the bigger the value of chi-square, less likely it is
that the observed frequencies came from the population on which
the null hypothesis and the expected frequencies are based.
Summary Procedure:
These are the steps in the used of the Chi-square test for k in
dependent samples:
156

1. Cast the observed frequencies into r x k contingencies table,


using the k columns for the groups and r rows for the
conditions.
2. Calculate the row totals Ri and the column total Cj.
3. Determine the expected frequency for each cell by finding the
product of the marginal totals common to it and divide this by
N (where N represents the total number of independent
observations); thus
.
4. Determine the significance of the observed x2 by reference to
Appendix Table C with degrees of freedom equal to (r-1)(k-1).
If the probability given by Table C is equal to or smaller than
(set level of significance), reject H0. That is, if
, reject H0 in favor of Ha.
Sample Problem:
1. In an experiment to study the dependence of FOLR in
seedlings, the following data were taken on 180 seedlings.
Group
A

Root Class 1
(1-10)
21

Root Class 2
(11-20)
36

Root Class 3
(21 & Above)
30

48

26

19

Test the hypothesis that the seedlings group is independent of


FOLR:
Solution:
a. H0: The seedling group is independent on FOLR.
Ha: The FOLR is dependent on seedling group.
b.
c.
d.
e.

Test-Statistic: Use Chi-square test.


Significance Level: Use 0.05 level of significance.
Rejection Criterion: Rejects
Computation:
157

O.S. Corpuz 2012

Group

UNDERSTANDING STATISTICS

R. Class 1
(1-10)
21

R. Class 2
(11-20)
36

R.Class 3
(21 +)
30

48

26

19

93

Total

69

62

49

180

Total
87

Note: The numbers written in italics are expected frequencies


determined by:

Solution:

= 4.57 + 1.21 + 1.68 + 4.28 + 1.14 + 1.58


= 14.46
f. Decision: Since

158

reject the H0.

g. Conclusion: The result shows that the FOLR is dependent on


seedling group.

Kruskal Wallis Test


Kruskal-Wallis test is a nonparametric test designed to detect
difference among populations that does not require any
assumptions about the shape of the populations. The test is an
alternative nonparametric test for completely randomized design or
one-way analysis of variance.
The Kruskal-Wallis test is an analysis of variance that uses
the ranks of the observations rather than the data themselves. This
assumes, that the observations are on the interval scale. If data are
in the form of ranks, used them as they are. The Kruskal-Wallis test
is identical to the Mann-Wallis test for comparing k populations,
where k is greater than 2. The null hypothesis is that the k
populations under study have the same distribution, and the
alternative hypothesis is that at least two of the population
distributions are different from each other.
Although the hypothesis test is in terms of the distributions of
the populations of interest, the test is most sensitive to differences
in the locations of the populations. Therefore, the test is actually
used to test the ANOVA hypothesis of equality of k population
means. The only assumption required for the Kruskal-Wallis test are
that the k samples are random variables under study and
continuous, and the measurement scale is atleast ordinal.
Rank data points in the entire set from the smallest to largest,
without regard to which sample they come from. Then sum all the
ranks from each separate sample. Let n1 be the sample size of
population 1, n2 the sample size from population 2, and so on up to
nk, which is the sample size of population k. Define n as the total
sample 1, R2 as the sum of the ranks from sample 2, and so on up

159

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

to Rk as the sum of the ranks from sample k. The Kruskal Wallis


test-statistic is equated as

If the average of the ranks in the jth sample is considered, the


Kruskal-Wallis test-statistic is given by:

The null hypothesis is rejected if H exceeds the critical value of


Chi-square at level of significance with degrees of freedom equal
to k-1.
Example:
The palatability studies are conducted to determine the
acceptability of taste of four menus of cake, and different sample
score follow. At 0.05 level of significance, test, the hypothesis that
the four menus have the same acceptability level.
Menu A
50
50
53
58
59

Menu B
59
60
63
65
67

Menu C
45
48
51
54
55

Menu D
62
64
68
70
72

Solution:
1. H0: The four menus have the same acceptability level.
Ha: The four menus differ in acceptability level.

160

2. Test-Statistic: Use the Kruskal-Wallis test since there are four


independent populations.
3. Level of significance: Use 0.05 level of significance.
4. Rejection Criterion: Reject H0
5. Computation:
a. Rank the combine samples from the lowest to the
highest.
b. For each individual sample, find the number of
observations and the sum of the ranks.
Menu A
50
50
53
58
59
32.50
6.50
5

Menu B
59
60
63
65
67
69.50
13.90
5

Menu C
45
48
51
54
55
23.00
4.60
5

Menu D
62
64
68
70
72
85.00
17.00
5

c. Compute the value of the test statistics H:

= (0.02857)(2728.1) 63
= 14.942
6. Decision: Since

reject H0.

161

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

7. Conclusion: The four menus differ significantly in their


acceptability level.

Exercise No. 16
Test yourself:
Find the significant different of judging beauty contestants
= 1 6D2
n(n2-1)
Entry No.
1
2
3
4
5
6
7
8
9

162

1st Judge
9
2
7
4
5
8
6
3
1

2nd Judge
7
6
8
4
9
1
2
5
3

D
2
-4
-1
0
-4
7
4
-2
-2

D2
4
16
1
0
16
49
16
4
4

ANALYSIS OF VARIANCE
ONE WAY CLASSIFICATION DESIGN: CRD
CRD is a design wherein the allocation of treatments is done by
randomizing the treatments completely over the entire experimental
units (eus) without any restriction imposed on the units and there is
only one criterion for data classification.
CRD is commonly used when:
1. The eus are sufficiently homogenous (like dishes of culture
medium) and
2. Effective local control is assured (as those in laboratories,
greenhouse).
Randomization and layout
Suppose there are: t = 3 treatments, T1, T2, and T3, which are
replicated r1 = 2, r2 = 3 and r3 + 4 times respectively; hence, the nos.
of eus required is n = r1 + r2 + r3 = 9.
The randomization (using random no. generator key on calculator)
may be as follows:
1. Label the eus consecutively from 1 to n = 9.
2. Obtain a sequence of n = 9 random numbers. Rank the
numbers in increasing order. Using the sequence of ranks as a
randomization of the eus, assign the first r1 = 2 eus to T1, the next
r2 = 3 eus to T2 and the last r3 = 4 eus to T3.
Random no.: 0.678 0.124
Rank (eu no.) 7
3
_______________

T1

0.543 0.667 0.119


5
6
2
_________________________

T2

0.076 0.923 0.876 0.436


1
9
8
4
_________________________________

T3

163

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

The layout for this randomization procedure is:


1

T3

T2

T1

T3

T2

T2

T1

T3

T3

Data Presentation: (One-way Classification)


Treatment

Observations (Yij)

Reps

Total

Mean

ri

Yi.

T1

Y11

Y12

Y1r1

r1

Y1.

T2

Y21

Y22

Y2r2

r2

Y2.

Tt

Yt1

Yt2

Ytrt

rt

Yt.

Y..

Linear Model:
Model
= overall population mean
= effect of ith treatment (
(NID = normally, independently distributed)
= residual or error effect of jth measurement of ith treatment
= deviation of the ijth observation from the ith treatment mean

)
164

Advantages
Completely flexible for # of replications or treatments
Missing observations are not a problem
Maximum error degree of freedom of any designs
Analysis of data:
Illustration 1:
Three methods of soil analysis, S1, S2, S3, were tried by a research
institute. Twelve uniform soil samples were taken from a certain
farm and S1 was randomly assigned to 5 of the soil samples, S2 to 3
samples and S3 to 4 samples. The researcher was interested in the
time to complete the soil analysis. The data on time (in hours) of the
soil analysis were summarized as follows:
Method Time to complete analysis (hrs.)
S1

2.4

3.8

2.9

4.6

S2

4.8

1.6

0.2

S3

7.2

5.3

2.9

3.5

Total

14.4

10.7

6.0

8.1

G.Mean

3.1

3.1

Reps Total Mean


5

16.8

3.36

6.6

2.20

18.9

4.72

12

42.3
3.35

At the level of significance = 5%, test the hypothesis that there is no difference in the
time to complete the soil analysis for the three methods.

Check for the satisfaction of the assumptions of ANOVA (to


be discussed next).
Testing significance of treatments via ANOVA F-test
1. State the statistical hypotheses
Ho: 1 = 1 = 1 All the method means are equal
165

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Ha: 1 i, i 1. All least two method means are


different.
2. Formulate the test statistic and get its critical value at the
level of significance .
Test statistic : Fc = MSTr/MSE
Critical value: Ftab = F[(t-1), (n-t)] (obtained from the F-table)
3. State the decision rule
Reject Ho and Accept Ha if Fc Ftab; else, accept Ho.
4. Construct the ANOVA table outlined as follows:
Source of
Variance
Due to
treatments
Experimental
error
TOTAL

df

SS

MS

t-1

TrSS

MSTr

n-t

ESS

MSE

n-1

TSS

FFtab
Computed
MSTr/MSE F[(t-1), (n-t)]

Computations:
Let Yij = jth observation on the ith treatment
Yi = total of observations on the ith treatment
Y.. = grand total of all observations.
Then compute
a. the sums of squares
CF = (Y..)2 = (42.3)2
n
12
t

TSS =

Yij2 CF = (2.42 + 3.82 + + 3.52) 149.11 = 36.50

i=1 j=1
t

166

ri

= 149.11

TrSS = Yi2/ri CF

= (16.82/5 + 6.62/3 + 18.92/4) 149.11 = 11.16

ESS = TSS TrSS

= 36.50 11.16 = 25.34

i=1

b. the mean square


MSTr = TrSS/(t-1) = 11.16/2 = 5.58
MSE = ESS/(n-t)

= 25.34/9 = 2.82

c. the test statistics:


Fc = MSTr/MSE

= 5.58/2.82 = 1.98

The critical value:

Ftab = F0.05(2,9) = 4.26

d. Set up the ANOVA table:

Source of
Variance
Treatment

df

SS

MS

11.16

5.581

Error

25.34

2.82

TOTAL

11

36.50

FComputed
1.98ns

Ftab
4.26

e. State the decision and make the conclusion.


It is concluded that there is no significant differences in the
mean time to complete the soil analysis for the three methods
Note: The observed differences among the means in the
experiment are not large enough to warrant that in the population
(or in general) there are atleast two means which are different.
Compute other summary statistics
1. Coefficient of variation, CV(%) is the experimental error
expressed as percentage of the mean. It measures the degree of
167

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

precision of the experiment or as an index of the reliability of the


experiment. The higher the CV, the lower is the reliability of the
experimental results.
CV(%) = MSE x 100

= 2.82 x 100

= 47.60%

Y..
3.52
2. Standard error of a treatment mean, s.e.(i) is the measure of the
average error in estimating the true treatment mean. It measures
the degree of precision of i as the estimate of the true treatment
mean.
s.e.(i) = MSE/ri
3. Standard error of the difference between two treatment means,
s.e.(i - i) is a measure of the average error in estimating the
difference between two treatment means. It measures the degree of
precision of .(i - i) as the estimate of the difference between the
true means of the treatment i and treatment i.
s.e.(i - i) = MSE (1/ri+1/ri) .. for un equal replication
s.e.d

= 2MSE/r

.. for equal replication

Estimates of treatment means and effects:


1. Estimate of true mean of treatment i; i = Yi
For treatment 1: ; 1 = Y1
For treatment 2: ; 2= Y2
For treatment 3: ; 3 = Y3
For general mean: = Y

=
=
=
=

3.36
2.20
4.72
3.35

2. Estimate of the effect of treatment i; i= Yi. Yi..


For treatment 1: 1 = Y1 Y.. = 3.36 3.35 = 0.01
For treatment 2: 2 = Y2 Y.. = 2.20 3.35 = -1.15
168

For treatment 3: 3 = Y3 Y.. = 4.72 3.35 1.37

Summarize the treatment means and effects:


Method

Means

Effects

S1

3.36

0.01

S2

2.20

-1.15

S3

4.72

1.37

Mean

3.35

0.00

Note: Since the ANOVA test showed no significant effects, the observed effects
cannot be generalized to be significant.

Let us illustrate the analysis of a CRD experiment using the


Statistical Analysis SystemTM (SAS). Data entry is best done using
Microsoft excel. You should have a mother file in Excel, then save
it as a prn file (formatted text [space delimited]) before going to
SAS. Although the cards statement in SAS can be used for data
entry, it is not practical to use for large data sets.

The format of the data entry in Excel should be:




169

UNDERSTANDING STATISTICS

O.S. Corpuz 2012










The column of the Replication (Rep) is not crucial in CRD, take


note though that the columns preceding the response variable are the group
or independent variables.
The Excel data should then be saved as a prn file (formatted text,
space delimited). Close the prn file and then go to SAS. Saved the file
under the file name D:\CRD.prn
In the program editor of SAS, write the following program
statements:










,

imply that the data starts at row 2. Make sure that the order of

the variables in the input statement is exactly the same as the order
inputted in Excel. Click the run icon and if everything is alright, then you
will get the following in the output window of SAS (if there is a mistake in
the program statements, the program may not run. Check the log window
for mistakes and comments on the program statements).
170





 













  

   







 




   







 
 


   







 





   




171

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

TRT and ERROR were particularly interesting; these are the


sources of variation in CRD. The mean square (MS) for the
treatment and error are 931195.8214 and 94773.2143, respectively.
The F Value of the treatment (TRT) is 9.83 which is highly
significant. If Pr > F is less than 0.01, then the difference among the
treatment is highly significant; if it is greater than 0.01 but less that
0.05 it is significant. If Pr > F is greater than 0.05, then there are no
significant differences among treatments.
The result of the analysis is consistent with Table 2.2 of
Gomez and Gomez (1984).
Missing Data in CRD
The previous analysis had equal replication of all treatment.
This is not necessary, and sometimes you may set up an
experiment, planning for equal replication, but you cannot collect
data from all experiment units. Some data may be lost due to a lot
of reason and factors.
Missing data should be entered as . And not as 0 in Excel.
At the end of the infile statement of SAS append the word
missover to indicate that there were missing data. It would also be
better to use proc glm instead of proc anova in the analysis of
variance for experiments with missing data. The modified CRD SAS
program would then be:













The ANOVA output is:






 




172








 
 









  

   




 


    

    







 
 

   




 



    

    





Example DBH measurements taken on 5 trunks of trees per plot


in a tree species trial. The variation among trunks within a plot is not
an estimate of experimental error; it estimates within-plot variation.
The variation among mean DBH means of replicated plots of the
tree species is the estimate of experimental error. Thus, you cannot
plant one replication of the trial, and measure multiple trunks within
plots and call this a replicated experiment. You must always
replicate the experimental units.
An appropriate ANOVA table for sub sampling in a replicate
experiment is like this
Measurements made per plot
173

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Source

df

Hybrids
Within hybrids

9
20

Within plots

120

MS
Variation among hybrids
Variation among reps within
hybrids = experimental error
Variation within plots = sampling
variation

149
observations

Thus, the appropriate F-test to see if there is variation among


hybrids is
MS Hybrids/MS within Hybrids
MS hybrids/MS within plots is not a valid test of the hypothesis.
If you get confused as to which term is experimental error, go back
and think what are the experimental units. It is the variation among
experimental units which are treated alike that is experimental error.
In the case discussed above, it is not the trees that are
experimental units but the plots. The treatments were randomly
assigned to plots, but not to the trees within plots. All of the trees
within a plot were the same treatment.
TWO-WAY CLASSIFICATION DESIGN

Randomized Complete Block Design (RCBD)


Complete Block

A unit that contains every treatment

To use RCBD

(1) Must be able to group treatments into


blocks
(2) Blocks must be large enough to hold
treatments

174

RCBD is a two-way classification. Each observation is classified


according to two criteria
treatment effect and block effect
Model
Same as RCB, except

is effect of jth block:

Now:
Ti

NID

NID

ij

NID

Randomization Procedure
(1)

Randomize treatments to experimental units within blocking


groups
(2) Randomly assign the groups of experimental units to blocks.
Blocking Methods
(1)
(2)

(3)

Time; e.g., soaking hours, years of conducting the


experiment.
Space; e.g., different planting distance or areas within fields,
different laboratories.
People; e.g., each block cared for or measured by a single
person.

Advantages
(1)
(2)
(3)

More precise than CRD if blocking is effective.


It is simple to understand (vs. more complex designs).
Any number of treatments and replication is allowed.
175

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Disadvantages
(1)
(2)

If blocking is ineffective, provision is lost compared to CRD.


Homogeneity within blocks is often reduced as treatment
numbers increases (as block size increases). This is not a
disadvantage compared to CRD, but compared to more
complex designs.

By changing from CRD to RCBD, we gain an increase in


precision if heterogeneity of experimental units can be effectively
partitioned into blocks.
is partitioned into
Remember, in CRD we said
By blocking in RCBD, you are trying to remove
reduce
, and increase precision.

to

But we lose degrees of freedom for estimating error


Note: If you have many treatments relative to blocks, this
df errors : CR
RCBD loss of df is unimportant!
T(r-1) r-1 df
(t-1)(r-1)
If you dont gain precision by blocking in RCBD, stay with a CRD.
ANOVA Layout
Source

Definition
df

Blocks

(r-1)

Treatments

(t-1)

176

SS

Calculation
MS

B x T (error)

(r-1)(t-1)

Total

rt-1

SSTotal - SS - ST

H0 : Ti = 0 (equivalent to all treatments are equal, or


hopefully, MSe is reduces by blocking

= 0)

Relative Efficiency of RCBD vs. CRD


We use the relative efficiency (RE) to estimate the gain precision in
addition to the loss of df.
Estimate what MSerror would have been in a CRD. We use a
weighted average of the block variance and the error
variance. We give the error variance additional weight by
including the treatment df in the weighting factor.

The relative efficiency takes into account the difference in erro


df as well as the error term itself.

As MSeRCB is reduced compared to MSeCR, then RE is increased.


How to set-up blocks if we know a gradient exists
Example Fertility gradient in field
fertility
5
7
RIGHT
1
WAY
2

I
II

fertility
7
1
2 8 3 4
6
WRONG
WAY
177

O.S. Corpuz 2012

8
3
4
6
I

UNDERSTANDING STATISTICS

III
II

III

Right wayarrange blocks perpendicular to gradient. This


maximized variation among blocks and minimizes variation within
blocks.
Missing Values in the RCBD
What to do when you cant collect data from a plot? This
happens every year it is expected!
One Missing Plot
(1) We use a least squares estimate to estimate what we expect
the plot would have yielded, based on the rest of the data from
the experiment. Note: This estimate is never as good as
actually having the data!

Where: i = total of observation in


block containing missing plot:
T.j = total of observations of
treatment with missing plot.
So, this formula assumes that we have good estimates of the block
effect
and the treatment effect
and that the effects are
additive.
(2) Enter the estimate value ( ) into the data table and then
perform the ANOVA.
But theres no free lunch! You must pay a penalty!

178

Reduce the total df by 1


Reduce error df by 1
What about more than one missing plot?
The more missing plots you have, the more difficult it
becomes to estimate them, and the poorer your estimates become.
You can do them by hand, but its best to use matrix algebra to get
simultaneous estimates of the missing plots. In this topic, well let
SAS PROC GLM do the matrix calculations.
To illustrate the ANOVA of RCBD using SAS, we will work on
the following problem:
A forester studied the growth of white lauan under various
weed defoliage schemes. The experiment was conducted in
an RCB design with four locations taken as blocks. [The data
is in the file D:RCB.xls. Again, take note of the way the data
were entered.]
The SAS program for this problem is:










Compared to the SAS program of CRD, the variable blk (i.e.


block) was included in both the class and model statements.
The SAS output is:




 









179

O.S. Corpuz 2012

UNDERSTANDING STATISTICS







 

 











  

   




 


    




    









 
 


   




 




    


    



Based on the analysis, blocking was not significant (P>0.05)


implying that it was not effective in reducing experimental error.
Treatment is also insignificant implying the treatment were just the
same influence on tree growth.

180

181

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Appendix 1
OUTLINE OF RESEARCH METHODS IN EDUCATION
Research is a scientific investigation of phenomena which includes
collection, presentation, analysis , and interpretation of facts that links
mans speculation of reality.
Characteristics of Researcher
1. Intellectual curiosity
2. Prudence
3. Healthy criticism
4. Intellectual honesty
Qualities of a Good Researcher
R research-oriented
E efficient
S scientific
E Effective
A Active
R resourceful
C creative
H honest
E economical
R religious
Characteristics of Research
1.
2.
3.
4.
5.
6.

Empirical
Logical
Cyclical
Analytical
Replicability
Critical

Types of Research
1. Pure research
2. Applied research
3. Action research
Classification of Research
1. Library research
2. Field research
3. Laboratory research

182

The Variable
Types:
1. Independent variable stimulus variable
2. Dependent variable response variable
3. Moderate variable special type of independent variable that may alter
or modify the relationship of the independent and dependent variable
4. Control variable a variable controlled by the researcher which the
effect can be neutralized by eliminating or removing the variable
5. Intervening variable a variable that interferes with the independent and
dependent variable which may either strengthen or weaken the two
variables
Schematic Diagram of the Research Process
Problem/Objectives

Theoretical/Conceptual Framework

Assumptions

Hypothesis

Review of Related Literature

Research Design

Data Collection

Data Processing/Statistical Treatment

Analysis and Interpretations

Summary, Conclusions and Recommendations

183

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Research Designs
1. Historical Designs
Steps:
a. Collection of data
b. Criticism of the data collected
c. Presentation of the facts
2. Descriptive Designs
Types:
a. Descriptive-survey
b. Descriptive-normative survey
c. Descriptive-status
d. Descriptive-analysis
e. Descriptive-classification
f. Descriptive-evaluative
g. Descriptive-comparative
h. Correlational survey
i. Longitudinal survey
3. Case Study Design
Steps:
a. Recognition and determination of the status of the problem to be
investigated
b. Collection of data
c. Diagnosis or identification of the causal factors
d. Application of remedial or adjustment measures
e. Subsequent follow-up to determine the effectiveness of the corrective
or developmental measures applied
Qualities of a Good Research Instrument
1. Validity
a. Content validity
b. Concurrent validity
2. Reliability
3. Usability
Sampling Designs
1. Scientific sampling
a. Random sampling
b. Systematic sampling
- Stratified sampling design
- Multi-stage sampling design
- Clustering
2. Non-scientific sampling
a. Purposive sampling
b. Incidental sampling
c. Quota sampling
184

Data ProcessingParts of Thesis


1. Chapter 1 The Problem and Its Background
a. Statement of the Problem
b. Theoretical/Conceptual Framework
c. Assumptions
d. Hypothesis
e. Significance of the Study
f. Scope and Limitations of the Study
g. Operational Definition of Terms
2. Chapter 2 Review of Related Literature
a. Related readings
b. Related literature
c. Related studies
d. Justification of the present study
3. Chapter 3 Methodology
a. Research Design
b. Determination of Sample Size
c. Sampling Design and Technique
d. The Subject
e. The Research Instrument
f. Validation of the Research Instrument
g. Data Gathering Procedure
h. Data Processing Method
i. Statistical Treatment
4.
5.
6.
7.
8.

Chapter 4 Results, Analysis and Interpretations


Chapter 5 Summary, Conclusions and Recommendations
Bibliography/Literature Cited/References
Appendices
Curriculum Vitae

185

O.S. Corpuz 2012

UNDERSTANDING STATISTICS
Appendix 2

Multiple Comparisons Between Groups of Conditions


From the Results of the Kruskal-Wallis One-Way ANOVA
Table A
Areas Under the Normal Curve
Z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

-3.4
-3.3
-3.2
-3.1
-3.0

0.0003
0.0005
0.0007
0.0010
0.0013

0.0003
0.0005
0.0007
0.0009
0.0013

0.0003
0.0007
0.000
0.0009
0.0013

0.0003
0.0004
0.0006
0.0008
0.0012

0.0003
0.0004
0.0006
0.0008
0.0012

0.0003
0.0004
0.0006
0.0008
0.0011

0.0003
0.0004
0.0005
0.0008
0.0011

0.0003
0.0004
0.0005
0.0008
0.0011

0.0003
0.0004
0.0005
0.0007
0.0010

0.0002
0.0003
0.0005
0.0007
0.0010

-2.9
-2.8
-2.7
-2.6
-2.5

0.0019
0.0026
0.0035
0.0047
0.0062

0.0018
0.0025
0.0034
0.0045
0.0060

0.0018
0.0025
0.0034
0.0045
0.0060

0.0017
0.0023
0.0032
0.0043
0.0057

0.0016
0.0023
0.0031
0.0041
0.0051

0.0016
0.0022
0.0030
0.0040
0.0054

0.0015
0.0021
0.0029
0.0039
0.0052

0.0015
0.0021
0.0028
0.0038
0.0051

0.0014
0.0020
0.0027
0.0037
0.0049

0.0014
0.0019
0.0026
0.0036
0.0048

-2.4
-2.3
-2.2
-2.1
-2.0

0.0082
0.0107
0.0139
0.0179
0.0228

0.0080
0.0104
0.0136
0.0174
0.0222

0.0080
0.0104
0.0136
0.0174
0.0222

0.0075
0.0099
0.0129
0.0166
0.0212

0.0073
0.0096
0.0125
0.0162
0.0207

0.0071
0.0094
0.012
0.0158
0.0202

0.0069
0.0091
0.0119
0.0154
0.0197

0.0068
0.0089
0.0116
0.0154
0.0192

0.0066
0.0087
0.0113
0.0146
0.0188

0.0064
0.0084
0.0110
0.0143
0.0183

-1.9
-1.8
-1.7
-1.6
-1.5

0.0287
0.0359
0.0446
0.0548
0.0668

0.0281
0.0352
0.0436
0.0537
0.0655

0.0281
0.0352
0.0436
0.0537
0.0655

0.0268
0.0336
0.0418
0.0516
0.0630

0.0262
0.0329
0.0409
0.0505
0.0618

0.0256
0.0322
0.0401
0.0495
0.0606

0.0250
0.0314
0.0392
0.0485
0.0594

0.0244
0.0307
0.0384
0.0475
0.0582

0.0239
0.0301
0.0357
0.0465
0.0571

0.0233
0.0294
0.0367
0.0455
0.0559

-1.4
-1.3
-1.2
-1.1
-1.0

0.0808
0.0968
0.1151
0.1357
0.1587

0.0793
0.0951
0.1131
0.1335
0.1562

0.0793
0.0951
0.1131
0.1335
0.1562

0.0764
0.0918
0.1098
0.1292
0.1515

0.0749
0.0901
0.1075
0.1271
0.1492

0.0735
0.0885
0.1056
0.1251
0.1469

0.0722
0.0869
0.1038
0.1230
0.1446

0.0708
0.0853
0.1020
0.1210
0.1423

0.0694
0.0838
0.1003
0.1190
0.1401

0.0681
0.0823
0.0985
0.1170
0.1379

-0.9
-0.8
-0.7
-0.6
-0.5

0.1841
0.2119
0.2420
0.2743
0.3085

0.1841
0.2090
0.2389
0.2709
0.3050

0.1788
0.2061
0.2358
0.2676
0.3015

0.1762
0.2033
0.2327
0.2643
0.2981

0.1736
0.2005
0.2296
0.2611
0.2946

0.1711
0.1977
0.2266
0.2578
0.2912

0.1685
0.1949
0.2236
0.2546
0.2877

0.1660
0.1922
0.2206
0.2514
0.2843

0.1635
0.1894
0.2177
0.2483
0.2810

0.1611
0.1867
0.2148
0.2451
0.2776

-0.4
-0.3
-0.2
-0.1
-0.0

0.3446
0.3821
0.4287
0.4602
0.5000

0.3409
0.3783
0.4168
0.4562
0.4960

0.3372
0.3745
0.4129
0.4522
0.4920

0.3336
0.3707
0.4090
0.4483
0.4880

0.3300
0.3669
0.4052
0.4443
0.4840

0.3264
0.3632
0.4013
0.4404
0.4801

0.3228
0.3594
0.3974
0.4364
0.4761

0.3192
0.3557
0.3936
0.4325
0.4721

0.3156
0.3520
0.3897
0.4286
0.4681

0.3121
0.3483
0.3859
0.4247
0.4641

0.0
0.1
0.2
0.3
0.4

0.5000
0.5398
0.5793
0.6179
0.6554

0.5040
0.5438
0.5832
0.6217
0.6591

0.5080
0.5478
0.5871
0.6255
0.6628

0.5120
0.5517
0.5910
0.6293
0.6664

0.5160
0.5557
0.5948
0.6331
0.6700

0.5199
0.5596
0.5987
0.6368
0.6736

0.5239
0.5636
0.6026
0.6406
0.6772

0.5279
0.5679
0.6064
0.6433
0.6808

0.5319
0.5714
0.6103
0.6480
0.6844

0.5359
0.5753
0.6141
0.6517
0.6879

0.5
0.6
0.7
0.8
0.9

0.6915
0.7257
0.7580
0.7881
0.8159

0.6950
0.7291
0.7611
0.7910
0.8186

0.6985
0.7324
0.7642
0.7939
0.8212

0.7019
0.7357
0.7673
0.7967
0.8238

0.7054
0.7389
0.7704
0.7995
0.8264

0.7088
0.7422
0.7734
0.8023
0.8289

0.7123
0.7454
0.7764
0.8051
0.8315

0.7157
0.7486
0.7794
0.8078
0.8340

0.7190
0.7517
0.7823
0.8106
0.8365

0.7224
0.7549
0.7852
0.8133
0.8389

1.0
1.1
1.2
1.3
1.4

0.8413
0.8643
0.8849
0.9032
0.9192

0.8438
0.8665
0.8869
0.9049
0.9207

0.8461
0.8686
0.8888
0.9066
0.9222

0.8485
0.8708
0.8907
0.9082
0.9236

0.8508
0.8729
0.8925
0.9099
0.9251

0.8531
0.8749
0.8944
0.9115
0.9265

0.8554
0.8770
0.8962
0.9131
0.9278

0.8577
0.8790
0.8980
0.9147
0.9292

0.8599
0.8810
0.8997
0.9162
0.9306

0.8621
0.8830
0.9015
0.9177
0.9319

186

Table A Continued
Z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

1.5
1.6
1.7
1.8
1.9

0.9332
0.9452
0.9554
0.9641
0.9713

0.9345
0.9463
0.9564
0.9649
0.9719

0.9357
0.9474
0.99573
0.9656
0.9726

0.9370
0.9484
0.9582
0.9664
0.9732

0.9382
0.9495
0.9591
0.9371
0.9738

0.9394
0.9505
0.9599
0.9678
0.9744

0.9406
0.9515
0.9608
0.9686
0.9750

0.9418
0.9525
0.9616
0.9693
0.9756

0.9429
0.9535
0.9625
0.9699
0.9761

0.9441
0.9545
0.9633
0.9706
0.9767

2.0
2.1
2.2
2.3
2.4

0.9772
0.9821
0.9861
0.9893
0.9918

0.9778
0.9826
0.9864
0.9896
0.9920

0.9783
0.9830
0.9868
0.9898
0.9922

0.9788
0.9834
0.9871
0.9901
0.9925

0.9793
0.9838
0.9875
0.9904
0.9927

0.9798
0.9842
0.9878
0.9906
0.9929

0.9803
0.9846
0.9881
0.9909
0.9931

0.9808
0.9850
0.9884
0.9911
0.9932

0.9812
0.9854
0.9887
0.9913
0.9934

0.9817
0.9857
0.9890
0.9916
0.9936

2.5
2.6
2.7
2.8
2.9

0.9938
0.9953
0.9965
0.9974
0.9981

0.9940
0.9955
0.9966
0.9975
0.9982

0.9941
0.9956
0.9967
0.9976
0.9982

0.9943
0.9957
0.9968
0.9977
0.9983

0.9945
0.9959
0.9969
0.9977
0.9984

0.9946
0.9960
0.9970
0.9978
0.9984

0.9948
0.9961
0.9971
0.9979
0.9985

0.9949
0.9962
0.9972
0.9979
0.9985

0.9951
0.9969
0.9973
0.9980
0.9986

0.9952
0.9964
0.9974
0.9981
0.9986

3.0
3.1
3.2
3.3
3.4

0.9987
0.9990
0.9993
0.9995
0.9997

09987
0.9991
0.9993
0.9995
0.9977

0.9987
0.9991
0.9994
0.9995
0.9997

0.9988
0.9991
0.9994
0.9996
0.9997

0.9989
0.9992
0.9994
0.9996
0.9997

0.9989
0.9992
0.9994
0.9996
0.9997

0.9989
0.9992
0.9994
0.9996
0.9997

0.9989
0.9992
0.9994
0.9996
0.9997

0.9990
0.9993
0.9995
0.9996
0.9997

0.9990
0.9993
0.9995
0.9997
0.9998

rd

Appendix A is taken from Table A.4, Introduction to Statistics R.E. Walpoke 3 Ed. McMillian
Publishing company Inc.

Select significance level for the normal distribution


Two-tailed
One-tailed
Z

0.20

0.10

0.05

0.02

0.01

0.002

0.001

0.0001

0.00001

0.10

0.05

0.025

0.01

0.005

0.001

0.0005

0.00005

0.000005

1.282

1.645

1.960

2.326

2.576

3.090

3.291

3.891

4.417

Taken from Nonparametric Statistics for the Behavioral Sciences, S. Siegel and N. Cstellan,
McGrew Hill Book Company.

187

O.S. Corpuz 2012

UNDERSTANDING STATISTICS
Appendix Table AII

Critical z Values for #c Multiple Comparisons*

#c

1
2
3
4
5
6
7
8
9
10
11
12
15
21
28

Two-tailed
One-tailed

0.30
0.15

0.25
0.125

0.20
0.10

0.15
0.075

0.10
0.05

0.05
0.025

1.036
1.440
1.645
1.780
1.881
1.960
2.026
2.080
2.128
2.170
2.208
2.241
2.326
2.450
2.552

1.150
1.534
1.732
1.863
1.960
2.037
2.100
2.154
2.200
2.241
2.278
2.301
2.394
2.515
2.615

1.282
1.645
1.834
1.960
2.054
2.128
2.189
2.241
2.287
2.326
2.362
2.394
2.475
2.593
2.690

1.440
1.780
1.960
2.080
2.170
2.241
2.300
2.350
2.394
2.432
2.467
2.498
2.576
2.690
2.785

1.645
1.960
2.128
2.241
2.326
2.394
2.450
2.498
2.539
2.576
2.608
2.638
2.713
2.823
2.913

1.960
2.241
2.394
2.498
2.576
2.638
2.690
2.734
2.773
2.807
2.838
2.886
2.935
3.038
3.125

#c is the number of comparisons

Taken from Nonparametric Statistics for the Behavioral Sciences, S. Siegel and N. Cstellan,
McGrew Hill Book Company.

188

Table AB
Wilcozons Paired Signed Ranks Test Critical Values

0.05
0.10

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
40
50

1
2
4
6
8
11
14
17
21
26
30
36
41
47
54
60
101
152
287
466

One-Tailed
0.025
0.01
Two-Tailed
0.05
0.02

1
2
4
6
8
11
14
17
21
25
30
35
40
46
52
90
137
264
434

0.005
0.01

0
2
3
5
7
10
13
16
20
24
28
33
38
43
77
120
138
398

0
2
3
5
7
10
13
16
19
23
28
32
37
68
109
221
373

nd

Taken from CRC Handbook of Tables for Probability and Statistics, 2


Boca Raton, Florida

ed., 1968, CRC Press,

189

UNDERSTANDING STATISTICS

O.S. Corpuz 2012

Table A C
Wilcoxons Two-Sample Rank Test (The Mann-Whitney Test)
(These values or smaller cause rejection, Two-tailed Test. Take n1 n2)
0.05 Level of Significance
n2
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

190

3
3
3
4
4
4
4
4
4
5
5
5
5
6
6
6
6
6
7
7
7

6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17

10
11
12
13
14
15
15
16
17
18
19
20
21
21
22
23
24
25
26
27
28
28
29

17
18
20
21
22
23
24
26
27
28
29
31
32
33
34
25
27
28
29
40
42

26
27
29
31
32
34
35
37
38
40
42
43
45
46
48
50
51
53
55

36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68

49
51
53
55
58
60
63
65
67
70
72
74
77
79
82

n1
9

63
65
68
71
73
76
79
82
84
87
90
93
95

10

11

12

13

14

15

78
81
85
88
91
94
97
100
103
107
110

96
99
103
106
110
114
117
121
124

115
119
123
127
131
135
139

137
141 160
145 164 185
150 169
154

Table A C Continued
Wilcoxons Two-Sample Rank Test (The Mann-Whitney Test)
(These values or smaller cause rejection, Two-tailed Test. Take n1 n2)

n2
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

3
3
3
3
3
3
3
3
4
4

6
6
6
7
7
7
8
8
8
8
9
9
9
10
10
10
11
11
11

10
10
11
11
12
12
13
14
14
15
15
16
16
17
18
18
19
19
20
20
21

5
15
16
17
17
18
19
20
21
22
22
23
24
25
26
27
28
29
29
30
31
32

0.01 Level of Significance


n1
6
7
8
9
10 11

23
24
25
26
27
28
30
31
32
33
34
36
37
38
39
40
42
43
44

32
34
35
37
38
40
41
43
44
46
47
49
50
52
53
55
57

43
45
47
49
51
53
54
56
58
60
62
64
66
68
70

56
58
61
63
65
67
70
72
74
76
78
81
83

71
74
76
79
81
84
86
89
92
94
97

87
90
93
96
99
102
105
108
111

12

13

14

15

106
109
112
115
119
122
125

125
129 147
133 151 171
137 155
140

191

O.S. Corpuz 2012

UNDERSTANDING STATISTICS
Appendix Table B
Critical Values of the t Distribution

0.10

0.05

0.025

0.01

0.005

1
2
3
4
5

3.078
1.886
1.638
1.533
1.476

6.314
2.290
2.353
2.132
2.015

12.706
4.303
3.182
2.776
2.571

31.821
6.965
4.541
3.747
3.365

63.657
9.925
5.841
4.604
4.032

6
7
8
9
10

1.440
1.415
1.397
1.383
1.372

1.943
1.895
1.860
1.833
1.812

2.447
2.365
3.306
2.262
2.228

3.143
2.998
2.896
2.821
2.764

3.707
3.499
3.355
3.250
3.169

11
12
13
14
15

1.363
1.356
1.350
1.345
1.341

1.796
1.782
1.771
1.761
1.753

2.201
2.179
2.160
2.145
2.131

2.718
2.681
2.650
2.624
2.602

3.106
3.055
3. 012
2.977
2.947

16
17
18
19
20

1.337
1.333
1.330
1.328
1.325

1.746
1.740
1.734
1.729
1.725

2.120
2.110
2.101
2.093
2.086

2.583
2.567
2.552
2.539
2.528

2.921
2.898
2.878
2.861
2.845

21
22
23
24
25

1.3323
1.321
1.319
1.318
1.316

1.721
1.717
1.714
1.711
1.708

2.080
2.074
2.069
2.064
2.060

2.518
2.508
2.500
2.492
2.485

2.431
2.819
2.07
2.797
2.787

26
27
28
29
Inf.

1.315
1.314
1.313
1.312
1.282

1.706
1.703
1.701
1.699
1.645

2.056
2.052
2.048
2.045
1.960

2.479
2.473
2.467
2.462
2.326

2.779
2.771
2.763
2.756
2.576

df

Ref: Basilio et.al. 2003. Fundamental Statistics. Trintas Publishing, Inc.Philippines

192

References
Aczel, A. 1989. Complete Business Statistics. Richard D. Irwin, Inc.
Anderson, R.L and Bancrot, T.A 1952. Statistical Theory in Research. McGraw-Hill
Book Co., Inc.
Calmorin, L.P and Calmorin, M.A. 1999. Methods of Research and Thesis Writing. 1st
Ed. Rex Book Store, Inc. Manila Philippines
Daleon, S., Sanches, L. and marquez, T. 1996. Fundamentals of Statistics. National
Book Store, Inc.
Draper, N. and Smith, H. 1966. Applied Regression Analysis. John Wiley and Sons,
Inc.
Gomez, K.A and Gomez, A.A 1984. Statistical Procedures for Agricultural Research.
2nd Ed. An International Rice Research Institute Book. John Wiley and
Sons Inc.
Iman, L.R. and W.J. Canover, 1983. A Modern Approach to Statistics. John Wiley and
Sons Inc.
Johnston, J. 1972. Econometrics methods, 2nd Ed. McGraw-Hill Book Co. Inc.
Mendenhall, W. and Sincich. 1989. A Second Course in Business Statistics Regression
Analysis. Dellen Publishing Company
Ostle, B. 1966. Statistics in Research. Iowa State University Press, Ames, Iowa.
Pacificador, A. Jr. 1997. Outreach Seminar on Statistics for Researchers. Urios
College, Butuan City.
Parel, P. C. 1996. Introduction to Statistical Methods with Application. Macaraig
Publishing Co. Inc., Manila Philippines.
Searle, S.R. 1971. Linear Models. New York. Wiley and Sons Inc.
Siegel, S. and N.J. Castellan. 1988. Non-Parametric Statistics for Biological Science.
McGraw-Hill Book Co.
Snedecor, G.W and Cohcran, W.G. 1957. Statistical Methods. 5th Ed. Iowa State
University Press.
Snedecor, G.W and Cohcran, W.G. 1956. Statistical Methods. Iowa State College
Press, Ames Iowa.
Spiegel, M.1978. Statistics. McGraw-Hill Book Inc.

193

O.S. Corpuz 2012

UNDERSTANDING STATISTICS

Steel, R.G.D and J.H Torre 1960. Principles and Procedures of Statistics. McGraw-Hill
Book Co., Inc. New York.
Tagaro, C.A and Tagaro A.T. Statistics Made Easy. 11th Edition. University of
Southern Mindanao, Kabacan, Cotabato Philippines.
Walpole, R.E 1982. Introduction to Statistics. 3rd Ed. McMillan Publishing Co. Inc.

194

Buy your books fast and straightforward online - at one of worlds


fastest growing online book stores! Environmentally sound due to
Print-on-Demand technologies.

Buy your books online at

www.get-morebooks.com
Kaufen Sie Ihre Bcher schnell und unkompliziert online auf einer
der am schnellsten wachsenden Buchhandelsplattformen weltweit!
Dank Print-On-Demand umwelt- und ressourcenschonend produziert.

Bcher schneller online kaufen

www.morebooks.de
VDM Verlagsservicegesellschaft mbH
Heinrich-Bcking-Str. 6-8
D - 66121 Saarbrcken

Telefon: +49 681 3720 174


Telefax: +49 681 3720 1749

info@vdm-vsg.de
www.vdm-vsg.de

You might also like