Stat - Sir Corpuz
Stat - Sir Corpuz
Stat - Sir Corpuz
!""
"
"#
# $#
# %
$ # # #
& %#"# #
' #
( # (#
(# ($ # % (
%#(""("
"
"
!" #
&
'
! #
$,
(
$
-
$$$
.
/"/"#012"
)*3- +)
*
!4!& 5!6%55787&
%
2
9:;567<=8>!<7>%2
?9:;567<=8>!<7>;
0 @!
*
A
&
B
$!+,-./010/2/20-3420$&5)!*
C
2
!
"C
(
%
2
%
%
%
%
$
$
$C
C
CC
-
$$$
/"/"#012"
)*3- +)
*
!4!& 5!6%55787&
%)
C
9:;567<=8>!<7>%D?9:;567<=8>!<7>;
0 @!
E & "
E + CA
B
$!+,-./010/2/20-3420-CF8>78C
/"/"#012"
)*3- +)
"
&
8>78
UNDERSTANDING STATISTICS
ii
ACKNOWLEDGEMENT
The authors express their sincere thanks and gratitude to their
brothers and sisters, kids, friends and relatives.
To all those who help and encouraged them finished this humble
work. They have been indebted with ideas, inspirations, moral and
spiritual supports, and suggestions which contributed much to the
realization of this book.
Above all, the Almighty ALLAH for his daily blessing and guidance.
iii
UNDERSTANDING STATISTICS
FOREWORD
This book is design to College students and other Professionals
who would like to enhance knowledge on social and educational
statistics. The book deals with uses and importance of statistics,
collection and presentation of data, sampling technique, data
organization, descriptive statistics such as frequency distribution,
measures of central tendencies, position, dispersion/ variability,
skewness, kurtosis, and normal distribution of data.
Discussion and concrete illustrations on regression and correlation
analysis, t- test, F-test, z-test, chi-square test, alternative nonparametric analysis of variance, one-way classification and two-way
classification analysis of data. The methods of social research
presented will enlighten young researcher on the appropriateness of
statistical tools they will use in the analysis of data.
The Authors
iv
Title
Page
Chapter 1. INTRODUCTION
Meaning and Uses
Classification of Statistics
Descriptive Statistics
Inferential Statistics
Parametric Statistics
Non-Parametric Statistics
Concept of Population and Sample
Parameters and Estimates
Subscript and Summation Notations
Chapter 2. COLLECTION AND PRESENTATION OF
DATA
1. Types of Data
a. Qualitative Data
b. Quantitative Data
2. Sources of data
3. Types of Error in Data
4. Methods of Gathering Data
5. Sampling Technique
a. Probability/Scientific Sampling
Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Multi Stage
b. Non-Probability/non-scientific Sampling
Purposive Sampling
Quota Sampling
Convenience Sampling
1
1
1
1
2
2
3
3
4
5
3
8
8
8
8
8
9
11
12
12
12
13
14
15
18
18
18
18
19
v
UNDERSTANDING STATISTICS
Incidental sampling
Chapter 3. DATA ORGANIZATION
Tabular Presentation
Graphs and Diagram
Chapter 4. FREQUENCY DISTRIBUTION
Relative frequency distribution
Cumulative frequency distribution
Cumulative percent
Chapter 5. MEASURES OF CENTRAL TENDENCIES
The Mean
The Median
The Mode
Chapter 6. MEASURES OF POSITION
The Quartiles
The Deciles
The percentiles
Chapter 7. MEASURES OF DISPERSION/VARIABILITY
The Range
The Inter quartile Range
The Quartile Deviation
The Average Deviation
The Standard Deviation
Chapter 8. MEASURES OF SKEWNESS AND
KURTOSIS
Positive Skewness
vi
19
20
20
21
24
25
27
30
32
32
36
39
42
42
46
53
61
61
62
63
65
67
71
71
Negative Skewness
Moment Coefficient of Skewness
Measures of Kurtosis
Leptokurtic
Mesokurtic
Platykurtic
Moment Coefficient of Kurtosis
Chapter 9. NORMAL DISTRIBUTION
The Normal Curve
Standard Normal Scores
Areas Under the Normal Curve
Chapter 10. T-TESTS
Test for Dependent or Correlated Samples
T-test for Independent Samples
Test for Equality of Variance
Testing for the Difference in Means of
the Two Independent Samples
71
72
76
76
76
76
80
80
81
82
84
84
94
96
97
110
112
117
117
122
126
130
130
134
vii
UNDERSTANDING STATISTICS
141
143
146
149
153
155
159
162
162
174
APPENDICES
182
REFERENCES
193
viii
Introduction
Meaning and Uses of Statistics
Statistics is a tools or methods use in data analysis. A
scientific methods for collecting, organizing, analyzing, and
interpreting quantitative data, as well as drawing valid conclusions
and making reasonable decisions on the basis of such analysis. It is
an essential tool in almost all fields of knowledge.
Collecting data refers to the process of obtaining qualitative
information and quantitative or numerical measurements needed in
the study.
Organizing is the tabulation/or presentation of data into
tables, graphs or chart to formulate logical and statistical
conclusions from the collected measurements.
Analysis of data pertains to the process of extracting relevant
information from the given data to formulate numerical descriptions.
Interpretation of data on the other hand refers to the task of
drawing conclusions from the analyzed statistical data. It also
involves the formulation of forecast or predictions about larger
populations based on the data collected from sample populations.
Classification of Statistics
Statistics can be classified into two:
1.
UNDERSTANDING STATISTICS
DISTRIBUTION
MEASUREMENT
Normal
Interval or ratio
Non-Parametric
Unknown
Distribution
Nominal or ordinal
UNDERSTANDING STATISTICS
COLUMN B
a. Analysis of data
b. Parametric Statistics
c. Data Interpretation
d. Data Presentation
e. Organization of Data
f. Descriptive Statistics
g. Variable
h. Inferential Statistics
i. Sample
k. Non-parametric
Statistics
l. Statistics
j. Population
i=1
UNDERSTANDING STATISTICS
Xi
i-1
i=1
X1 + X2 + X3 + X4
Dot Notation:
Another way of representing a sum of observations is to use a
system called the DOT NOTATION system.
Consider this set of data:
Observation
Group 1
Group 2
Group 3
Group 4
. Total
X11 - 3
X21 -
X31 - 1
X41 - 2
X.1 = 12
X12 - 5
X22 - 3
X32 - 5
X42 - 3
X.2 = 16
X13 - 4
X23 - 1
X33 - 3
X43 - 4
X.3 = 12
X14 - 2
X24 - 5
X34 - 2
X44 - 2
X.4 = 11
X15 - 1
X25 - 2
X35 - 4
X45 - 5
X.5 = 12
Total
X1. = 15
X2. = 17
X3. = 15
X4. = 16
X.. = 63
X1.
X2 .
= X2j
X3 .
= X3j
X4 .
= X4j
i=1
5
i=1
5
i=1
Xi
i=1
X..
Xi. = 63
i=1
Example:
Variety
II
III
20
15
10
22
15
13
25
17
15
Total
X1. = 67
Total
X.3 = 57
X23 = 17
X32 = 13
X11 = 20
X.1 = 45
X2. = 47
X33 =15
X.2 = 50
X.. = 15
UNDERSTANDING STATISTICS
b.
yes or no responses
sex, civil status
ratings such as poor, fair, satisfactory, good
or excellent
In agriculture: climatic region, geographical
locations , soil type, slope, land-used, kind of
insecticides and fertilizers
Examples:
UNDERSTANDING STATISTICS
(Y1,Y2)
(1, 2)
Sample Mean
Y1 = 3/3
(Y1,Y3)
(1, 3)
Y2 = 4/3
(Y2,Y3)
(2, 3)
Y3 = 5/3
3/3 6/3
-1.00
2.
4/3 6/3
-0.67
3.
5/3 6/3
-0.33
10
11
UNDERSTANDING STATISTICS
12
13
UNDERSTANDING STATISTICS
14
k = n/ns
k = 200/5 = 40
Hence, 40 units will be taken from each stratum to
constitute the 200 sample elements decided by the researcher
b. Proportional Allocation. Stratified sampling using
proportionate allocation is used to guarantee a more
representative sample from each stratum. It is
expected that the more population in a stratum, the
more sample units will be taken.
Suppose the researcher decided to take 80 forestry
students as sample which is proportionate in the four
department of the College of Education presented
below:
Department
Biological Science
English
Mathematics
Pilipino
Total
Population
(P)
150
140
200
75
565
Proportion
wi = P/Tp
150/565
140/565
200/565
75/565
k = nswi
21
20
28
11
80
P = population
wi = proportion of the population (P) with total population (Tp)
k = number of elements to be taken per stratum
ns = decided sample size
15
UNDERSTANDING STATISTICS
NV + [Se2 (1 p)]
NSe + [V2 x p(1 p)]
Where:
Ss
N
V
= sample size
= total number of population
= the standard value (2.58) or 1% level of
probability with 0.99 reliability
= sampling error (0.01)
= the largest possible proportion (50%)
Se
p
Example:
The Forestry students want to determine the sample
size of 1000 trees to measure diameter
Solution:
Ss
NV + [Se2 (1 p)]
NSe + [V2 x p(1 p)]
Ss
= 2,580.00005
11.6641
= 221
P/1 +Pe2
where:
n = sample size
P = population
e = sampling error (usually 1% and 5%)
Example:
P = 1000
e = 5%
Solution:
n =
=
=
=
=
n =
P/1 +Pe2
1000/1 + 1000(0.05)2
1000/1 + 1000(0.0025)
1000/(1 + 2.5)
1000/3.5
286
UNDERSTANDING STATISTICS
18
Exercise No. 2
Test yourself:
Answer the following questions:
1. If the Education students are stratified according to
department where they belong (Biological Science,
Mathematics, English, Pilipino, Physical Education), how
many sample shall we get from each Department of we want
to get 200 sample students allocated equally among each of
the departments? Show your solution.
2. Suppose you would like to allocate proportionately the sample
size of 200 among the Forestry Department with population
given in the table below, how many sampling units would you
allocate per stratum?
DEPARTMENT
Biological Science
Mathematics
English
Pilipino
Physical Education
TOTAL
Proportion
wi = P/Tp
k = nswi
180
200
280
210
130
1000
19
UNDERSTANDING STATISTICS
Data Organization
Data that are collected in any form can be organized through
tables, graphs, diagrams/or plots
1. Tabular Presentation
There are basically two types of tables: the general or
reference table and the summary or text table. The general
table is used mainly as a repository of information. It is the
table of the raw data. The summary table on the other hand
is usually small in size and design to guide the reader in
analyzing the data. It is usually accompanies a text
discussion.
Example 1: General reference table. Score of 4 First Year High
School Students in two subjects
Class Section
Math
Science
A
B
C
D
60
45
50
30
75
27
29
50
20
Class Section
Score
A
B
C
D
Weighted Mean
67.5
36.0
39.5
40.0
45.75
21
UNDERSTANDING STATISTICS
e.
Daily Income
Example:
AGE
22
Exercise No. 3
Test yourself:
Answer the following
1. Give and define the two kinds of data
2. What are the types of error in data? how it is
committed?
3. Give the different methods of collecting data. discuss
atleast 3 methods
4. What are the different methods of organizing collected
or measured data?
23
UNDERSTANDING STATISTICS
Frequency Distribution
Frequency distribution is a common technique of
describing a set of data. It is the listing of the
collected/measured data. To organize said data into a
frequency distribution, we need to pick convenient intervals
and tabulate the number of each data that falls into a
particular interval.
Frequency distribution is used when there are many statistical
data recorded/collected. Usually, it is utilized when the
number of data collected exceeds or equal to 30 (n 30).
The following steps are observed in preparing frequency
distribution:
1. Look for the lowest and the highest data recorded
2. Subtract the lowest data from the highest data plus 1.
3. Decide on the number of steps or class intervals. The
maximum number of intervals is 20, minimum number is 5,
and the ideal number is between 10 and 15 inclusively.
4. Determine the interval size by dividing step 2 by the desired
number of intervals. Unless specified, it is advisable to use
the ideal number of intervals.
5. Choose an appropriate lower limit for the first class interval.
This number should be minus 1 of the lowest data or is
exactly divisible
by the interval size.
6. Write the lowest limit at the bottom and from it, develop the
lower limits of the next higher intervals by adding the
interval size to a preceding lower limit until the highest data
is included. From the lowest limits develop also the
corresponding upper limits.
7. Read each data in a set of collected data and record a tally
opposite the class interval to which it belongs.
24
Tally
Frequency
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14
/
/
/
//
///
///
////
////
////
////
////-///
//// - ////
//// - //
///
1
1
1
2
3
3
4
3
4
5
8
10
7
3
Class
Boundary
64.5 69.5
59.5 64.5
54.5 59.5
49.5 54.5
44.5 49.5
39.5 44.5
34.5 39.5
29.5 34.5
24.5 29.5
19.5 24.5
14.5 19.5
9.5 14.5
4.5 9.5
0 4.5
Class Mark
67
62
57
52
47
42
37
32
27
22
17
12
7
2
25
UNDERSTANDING STATISTICS
Example:
In our frequency distribution, if the class frequency is 3,
the relative frequency is 3/55 or 5.45%. The rest of the
answers are shown in the following table.
C.I
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14
Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3
rf (%)
1.82
1.82
1.82
3.64
5.45
5.45
7.27
5.45
7.27
9.1
14.55
18.18
12.73
5.45
18
16
14
12
10
8
6
4
2
0
2
12
17
22
27
32
37
42
47
52
57
62
Class Mark
26
67
Line Graph
20
18
16
14
12
10
8
6
4
2
0
2
12
17
22
27
32
37
42
47
52
57
62
67
Class Mark
27
UNDERSTANDING STATISTICS
C.I
65 - 69
60 - 64
55 59
50 54
45 49
40 44
35 - 39
30 34
25 29
20 24
15 19
10 14
59
14
Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3
<cf
55
54
53
52
50
47
44
40
37
33
28
20
10
3
50
40
30
20
10
0
2
12
17
22
27
32
37
42
47
52
57
62
Class Mark
28
67
Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3
>cf
1
2
3
5
8
11
15
18
22
27
35
45
52
55
Each number in the > cf column is interpreted as follows. Fiftyfive items are greater than 69.6; fifty-four items are greater than
64.5 and so on and so forth.
60
Cumulative Freq.
50
40
30
20
10
0
2
12
17
22
27
32
37
42
47
52
57
62
67
Class Mark
29
UNDERSTANDING STATISTICS
30
Frequency
1
1
1
2
3
3
4
3
4
5
8
10
7
3
rf
1.82
1.82
1.82
3.64
5.45
5.45
7.27
5.45
7.27
9.09
14.55
18.18
12.73
5.45
<cf
55
54
53
52
50
47
44
40
37
33
28
20
10
3
Cum. %
100
98.17
96.35
94.53
90.89
85.44
79.99
72.72
67.27
60.00
50.91
36.36
18.18
5.45
Exercise No. 4
The following are the scores obtained by a group of 60
College students in Math 12 examination:
MATH 12 SCORES
88
84
42
96
83
54
44
72
82
86
81
63
72
73
98
73
74
62
69
85
79
86
89
39
45
59
65
78
77
73
90
59
78
88
88
91
68
78
86
89
78
89
72
77
69
74
80
68
79
49
82
76
81
43
40
66
81
70
50
75
31
UNDERSTANDING STATISTICS
X
n
32
= mean
= symbol for summation
= individual data
= total number of population
Example:
Calculate the mean scores of 10 students in Math and
Science subjects
No.
Math
Science
66
50
1
78
55
2
89
60
3
88
87
4
97
90
5
78
59
6
59
88
7
85
89
8
84
92
9
79
95
10
Total
765
803
Mean
76.5
80.3
When the number of data is large (n 30), it is easy to
compute the mean by grouping the set of data in terms of frequency
distribution. The mean from a frequency distribution may be
obtained in almost the same ways as the mean from raw data is
computed.
There are many ways in determining the mean when the
number of data is large: a. by midpoint; b. by the class-deviation
method, lower limit method and upper limit method
The mean from a frequency distribution by midpoint method
may be computed using the formula below:
X = fiXi
n
Where:
fi = frequency of the ith interval.
Xi = class mark of the ith class interval
n = number of population
33
UNDERSTANDING STATISTICS
Example:
Calculation of Mean from frequency Distribution of scores of
40 students in Math using mid- point method.
C.I
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24
Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
X = fiXi = 1,875
n
40
Xi
72
67
62
57
52
47
42
37
32
27
22
fiXi
144
134
186
114
416
423
84
111
160
81
22
fiXi = 1,875
= 46.875
Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
di
+5
+4
+3
+2
+1
0
-1
-2
-3
-4
-5
fidi
10
8
9
4
8
0
-2
-6
-15
-12
-5
fidi = -1
Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
X = flc + i 0. 5
n 2
flc
140
130
180
110
400
405
80
105
150
75
20
Flc=1,795
= 1,795875 + 5/2 0.5 = 46.875
40
Where:
X = mean
= summation notation
35
UNDERSTANDING STATISTICS
f = frequency
lc = lower class limit
i = class interval size
n = number of observation
uc= upper class limit
Frequency (fi)
2
2
3
2
8
9
2
3
5
3
1
n = 40
X = fuc - i 0. 5
n 2
fuc
148
138
192
118
432
441
88
117
170
87
24
fuc=1,955
= 1,95575 - 5/2 0.5 = 46.875
40
The Median
Another measure of central tendency is the median. It is a
point measure that divides the distribution of arranged data from
highest to lowest or vice versa in half; thus, half of the data falls
below the median and another half of the data falls above the
median. It is the most stable measure of central tendency. The
value of the median depends on the number of data, and not on its
magnitude. If most of the data are high, the median is high, and if
most of the data are low, the median is also low.
In identifying median, data should be arranged in the order of
magnitude, either in ascending or descending order, provided that n
is small (n < 30).
36
The following are the steps for median determination from raw
scores:
1. Arrange the data from highest to lowest or vice versa.
2. If n/2 is an integer, the median is taken to be the average of
the two middlemost data.
3. If n/2 is not an integer, the median is taken to be the
middlemost data.
Example: Median from sample raw scores of 8 students in Stat 22
and 9 students in Math 12
STAT 22
MATH 12
17
15
17
19
26
20
28
24
28
30
30
30
32
31
32
37
40
Mdn = 28 + 30 = 29
Mdn = 28
2
When the data are large (n 30), using the less than
cumulative frequency distribution, the median is computed by the
formula:
Mdn = Lm + {n/2 lcf} i
fm
Where:
Mdn = median
Lm
= lower class boundary of the median
Class
37
UNDERSTANDING STATISTICS
lcf
fm
i
n
Example:
Median from frequency distribution of scores of 40
students in Math using the Less Than Cumulative Frequency
C.I
fi
<cf
40
2
70 -74
38
2
65 69
36
3
60 64
33
2
55 59
31
8
50 54
23
9
45 49
14
40 44
2
12
4
35 39
8
5
30 34
3
3
25 29
n = 40
Mdn = Lm + {n/2-lcf}i = 44.5 + {20-14}5 = 47.83
fm
9
Using the greater than cumulative frequency, the median is
computed as follows:
Mdn = Um {n/2-gcf}i
fm
where:
Um
gcf
fm
i
38
UNDERSTANDING STATISTICS
freq
2
2
3
2
8 f2
9 fmo
2 f1
3
4
3
2
n = 40
Mo = Lm + {fm f1 } i
2fmo-f1-f2
= 44.5 + {9 2} 5
2(9)-2-8
40
= 44.5 + ( 7 ) 5
8
= 44.5 + 35/8
= 44.5 + 4.375
Mo
= 48.875
81
63
72
73
98
73
74
62
69
85
79
86
89
39
45
59
65
78
77
73
90
59
78
88
88
91
68
78
86
89
78
89
72
77
69
74
80
68
79
49
82
76
81
43
40
66
81
70
50
75
41
UNDERSTANDING STATISTICS
Measures of Position
These are the measures that are used to find the specific
location of a point that determines percentage of test scores in the
distribution. These measures are values below which specific
fractions of the test scores in a given set would fall. The quartiles,
deciles, and percentiles are the measures of position that are
commonly used.
The Quartiles
Quartiles are points which divide the total number of test
scores into four equal parts. Each set of test scores has three
quartiles. 25% falls below the first quartile (Q1), 50% is below the
second quartile (Q2), and 75% is below the 3rd quartile (Q3). The 1st
and the 3rd quartiles are used in the computation of the interquartile
range and quartile deviation. Quartiles are computed in the same
way as the median is computed, since Q2 is the same as the
median.
The steps in finding the quartiles of raw scores can be
summarized as follows:
1. Arrange the scores from highest to lowest or lowest to
highest,
2. Determine Qk, where Qk is the kth quartile and k = 1, 2,3.
- If nk/4 is an integer, Qk = (nk/4)th + (nk/4 + 1)th
2
If nk/4 is not an integer, Qk = ith score where i is the closest integer
greater than nk/4.
42
Math 102
15
19
20
24
28
30
32
32
40
= 17 + 26 = 21.5
2
Q3 = (8)(3) = 6 --- Q3 = (6+7)th scores
4
2
= 30 + 31 = 30.5
2
43
UNDERSTANDING STATISTICS
Example:
Calculation of Quartiles from frequency distribution of test
scores of 40 students in Math 102
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
Q1
freq
2
2
3
2
8
9
2
3
4
3
2
n = 40
<cf
40
38
36
33
31
23
14
12
9
5
2
= nk/4
= 11(1)
4
= 2.75 or 3rd class interval
= nk/4
= 11(3)
4
= 8.25 or 9th class interval
45
UNDERSTANDING STATISTICS
The Deciles
The deciles are points which divide the total number of test
scores into ten equal parts. Each set of test scores has nine deciles.
10% falls below the 1st decile (D1); 20% falls below the 2nd decile
(D2); 30% falls below the 3rd decile (D3); 40% falls below the 4th
decile (D4); 50% falls below the 5th decile (D5); 60% falls below the
6th decile (D6); 70% falls below the 7th decile (D7); 80% falls below
the 8th decile(D8); and 90% falls below the 9th decile (D9).
The deciles are computed exactly in the same manner as the
median is computed. Hence, the 5th decile (D5) is the same with the
median.
The steps in finding the deciles from raw scores can be
summarized as follows:
1. Arrange the scores from highest to lowest or vice versa
2. Determine Dk, wher Dk is the kth decile and k = 1, 2, 3,,9.
- If nk/10 is an integer, Dk = (nk/10)th + (nk/10 +1)th
2
- If nk/10 is not an integer, Dk = ith score where i is the
closest integer greater than nk/10.
Example:
Calculation of Deciles from raw score of 8 students in
Statistics and 9 students in Math 12
Statistics
Math 12
17
15
17
19
26
20
28
24
30
28
30
30
32
31
37
32
40
46
lcf
UNDERSTANDING STATISTICS
D4
LD4
fD4
lcf
Where:
D5 = the 5th decile
LD5 = lower class boundary where D5 lies
fD5
lcf
48
lcf
D9
Where: D9
LD9
fD9
lcf
49
UNDERSTANDING STATISTICS
Example:
Calculation of deciles from frequency distribution of test
scores of 40 students in Math 102
Class Interval
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
freq
2
2
3
2
8
9
2
3
4
3
2
n = 40
For Decile 1
D1 = nk
10
= 11(1)
10
= 1.1 or 2nd class interval
D1 = LD1 + {n/10 lcf} i = 24.5 + {4 2}5 = 27.83
fD1
3
For Decile 2
D2 = nk
10
= 11(2)
10
= 2.2 or 3rd class interval
50
<cf
40 D9
38 D7
36 D8
33
31 D6
23 D5
14 D3; D4
12
9 D2
5 D1
2
For Decile 3
D3 = nk
10
= 11(3)
10
= 3.3 or 4th class interval
D3 = LD3 + {3n/10 lcf} i = 34.5 + {12 9}5 = 39.5
fD3
3
For Decile 4
D4 = nk
10
= 11(4)
10
= 4.4 or 5th class interval
D4 = LD4 + {2n/5 Fl} i
fD4
For Decile 5
D5 = nk
10
= 11(5)
10
= 5.5 or 6th class interval
D5 = LD5 + {n/2 lcf} i
fD5
51
UNDERSTANDING STATISTICS
For Decile 6
D6 = nk
10
= 11(6)
10
= 6.6 or 7th class interval
D6 = LD6 + {3n/5 lcf} i
fD6
For Decile 7
D7 = nk
10
= 11(7)
10
= 7.7 or 8th class interval
D7 = LD7 + {7n/10 lcf} i = 54.5 + {28 23}5 = 67.00
fD7
2
For Decile 8
D8 = nk
10
= 11(8)
10
= 8.8 or 9th class interval
D8 = LD8 + {4n/5 lcf} i = 59.5 + {32 31}5 = 61.17
fD8
3
For Decile 9
D9 = nk
10
= 11(9)
10
52
The Percentiles
The percentiles are the points that divide the total number of
test scores or data into exactly one hundred equal parts. For each
test score, it is understood that there are ninety-nine (99)
percentiles which determine the points below which specific
percentage of test score would fall. For instance, 9th percentiles
(P9), would include 9% of the test scores in the distribution lie at
below it and 91% lie at or above it. other percentiles would take
similar meaning and interpretation.
The percentiles are calculated exactly in the same manner as
the computation of the median. In effect, the 50th percentile (P50) is
the same with the median.
The steps in finding the percentiles from raw scores can be
summarized to wit:
1. Arrange the scores from highest to lowest or vice versa
2. Determine Pk, where Pk is the kth percentile and k = 1, 2,3,
, 99.
a). If nk/100 is an integer, Pk=(nk/100)th+(nk/100+1)th scores
2
b). If nk/100 is not an integer, Pk = ith score where i is the
closest integer greater than nk/100.
53
UNDERSTANDING STATISTICS
Math 102
15
19
20
24
28
30
32
32
40
54
fP3
lcf
UNDERSTANDING STATISTICS
Where: P4
LP4
fP4
lcf
Where:
P50 = the 50th percentile
LP5 = lower class boundary where P50 lies
fP5 = the frequency where P50 lies
lcf = less than cumulative frequency approaching or equal
to but exceeding n/2.
P60 = LP60 + {3n/5 lcf}i
fP60
Where:
P60 = the 60th percentile
LP60 = lower class boundary where P60 lies
fP60 = the frequency where P60 lies
lcf = less than cumulative frequency approaching or equal
to but exceeding 3n/5.
P80 = LP80 + {4n/5 lcf}i
fP80
Where:
P80
LP80
fP80
lcf
56
freq
2
2
3
2
8
9
2
3
4
3
2
<cf
40
38
36
33
31
23
14
12
9
5
2
n = 40
For percentile 1
P1 = nk
100
= 11(1)
100
= 0.11 or 1st score
P1 = LP1 + {n/100 lcf} i = 19.5 + {0.4 0}5 = 20.167
fP1
3
57
UNDERSTANDING STATISTICS
For percentile 10
P10 = nk
100
= 11(10)
100
= 1.1 or 2nd score
P10 = LP10 + {n/10 lcf} i = 24.5 + {4 2}5 = 27.833
fP10
3
For percentile 20
P20 = nk
100
= 11(20)
100
= 2.2 or 3rd score
P20 = LP20 + {n/5 lcf} i
fP20
For percentile 30
P30 = nk
100
= 11(30)
100
= 3.3 or 4th score
P30 = LP30 + {3n/10 lcf} i = 34.5 + {12 9}5 = 39.5
fP30
3
For percentile 40
P40 = nk
100
58
= 11(40)
100
= 4.4 or 5th score
P40 = LP40 + {2n/5 lcf} i
fP40
For percentile 50
P50 = nk
100
= 11(50)
100
= 5.5 or 6th score
P50 = LP50+ {n/2 lcf} i
fP50
For percentile 90
P90 = nk
100
= 11(90)
100
= 9.9 or 10th score
P90 = LP90+ {9n/10 Fl} i = 64.5 + {36 33}5 = 72.00
fP90
2
For percentile 99
P99 = nk
100
= 11(99)
100
= 10.89 or 11th score
59
UNDERSTANDING STATISTICS
freq
2
2
3
2
9
10
9
5
4
3
2
2
1
1
n = 55
60
<cf
Measures of Dispersion/Variability
It can be utilized in determining the size of the distribution of
test score or a portion of it. They can be used to find the deviation of
test scores from the mean scores. Measures of dispersion can also
be used to establish the actual similarities or the differences of the
distribution. In general, these measures are employed to further
characterize the distribution of test scores.
The most commonly used measures of dispersion are: the
range; interquartile range; quartile deviation; average deviation and
standard deviation.
The Range
It is the simplest and easiest measures of dispersion or
variability. It simply measures how far the highest score is to the
lowest score. It does not tell anything about the scores between
these two extreme scores. Thus, it is considered as the least
satisfactory measure of dispersion.
The equations used are:
a. For Raw Scores
R=HL
Where:
R = range
H = highest score
L = lowest score
b. For Frequency Distributed Scores
61
UNDERSTANDING STATISTICS
R = (Hmpt Lmpt)
Hmpt = midpoint of the highest step
Lmpt = midpoint of the lowest step
The Inter quartile Range
It refers to the range of score of specified parts of the total
group usually the middle 50% of the cases lying between the 1st
quartile and the 3rd quartile.
The equation used is:
I.Q.R = Q3 Q1
Where:
I.Q.R = inter quartile range
Q3 = 3rd Quartile
Q1 = 1st Quartile
Example:
a. Calculation of Interquartile Range from sample scores of 8
Stat 22 students
DBH:
17
17
26
28
30
30
31
37
Recall: Q1 = 21.5
Q3 = 30.5
I.Q.R = Q3 Q1
= 30.5 21.5
= 9.0
C.I
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
freq
2
2
3
2
8
9
2
3
4
3
2
n = 40
Recall: Q1 = 30.75
Q3 = 62
I.Q.R = Q3 Q1
= 62 30.75
= 31.25
63
UNDERSTANDING STATISTICS
Q.D = Q3 Q1
2
Example:
a. Calculation of Interquartile Range from sample raw
scores of 8 Stat 22 students
DBH:
17
17
26
28
Recall: Q1 = 21.5
Q3 = 30.5
IQR = Q3 Q1
2
= 30.5 21.5
2
= 9/2
IQR = 4.5
30
30
31
37
freq
2
2
3
2
8
9
2
3
4
3
2
n = 40
Recall: Q1 = 30.75
IQR = Q3 Q1
2
64
Q3 = 62
= 62 30.75
2
= 31.25
2
IQR = 15.625
The Average Deviation
Average deviation is a measure of absolute dispersion that is
affected by every individual score. It is the mean of the absolute
deviation of the individual score from the mean of all the scores.
A large average deviation would mean that a set of scores is
widely dispersed about the mean while a small average deviation
would imply that a set of scores is closer to the mean.
The formula in calculating average deviation is as follows:
1. Raw Score
_
A.D = /X-X/
n1
UNDERSTANDING STATISTICS
Example:
a. Calculation of Average Deviation from sample raw scres of
8 Stat 22 students
_
X
XX
_
17
-10
Recall: X = 27
17
-10
26
28
30
-1
1
3
_
A.D = /X-X/
n1
30
= 42
31
37
10
42
A.D = 6.0
_
Recall: X = 47.25
fi
2
2
3
2
8
9
2
4
5
3
n = 40
Xi
72
67
62
57
52
47
42
37
32
27
(X X)
24.75
19.75
14.75
9.75
4.75
-0.25
-5.25
-10.25
-15.25
-20.25
_
A.D = fi/Xi-X/
n1
= 381.50
39
66
fi(Xi X)
49.50
39.50
44.25
19.50
38.00
-.2.25
-10.50
-41.00
-76.50
-60.75
381.50
A.D = 9.78
The Standard Deviation
Is the measure of dispersion that involves all scores in
the distribution rather than through extreme scores. It may
be referred to as the root-mean square of the deviation from
the mean. It is considered the most important measure of
dispersion. Mathematically, it is equated as:
1. Raw Score
S.D = (X X)2
n-1
2. Frequency Distribution
a. Midpoint Method
S.D = fi(Xi X)2
n-1
b. Class-deviation Method
S.D = ifidi2 (fidi)2
n-1
n(n-1)
Where: S.D = standard deviation
fi = frequency of the ith diameter class
di = deviation of the ith diameter class
di2 = square of the deviation of the ith diameter class
Xi = midpoint of the diameter class
X = mean of the DBH
Example:
a. Calculation of Standard Deviation from sample raw scores
of 8 Stat 22 students
67
UNDERSTANDING STATISTICS
X
17
_
XX
- 10
_
(X X)2
100
17
- 10
100
26
28
30
-1
1
3
1
1
9
30
31
37
3
4
10
9
16
100
336
Recall:
_
X = 27
_
S.D = (X-X)2
n1
= 336
7
S.D = 6.93
= 5,897.43
39
S.D = 12.30
68
fi
2
2
3
2
8
9
2
4
5
3
n = 40
di
+5
+4
+3
+2
+1
0
-1
-2
-3
-4
fidi
10
8
9
4
8
0
-2
-8
-15
-12
2
fidi2
50
32
27
8
8
0
2
16
45
48
236
SD = ifidi2 (fidi)2
n-1
n(n-1)
SD = 5236 (2)2
39 40(39)
= 56.05128 4/1560
= 56.05128 0.0025641
= 56.048716
= 5(2.4594)
SD = 12.30
69
UNDERSTANDING STATISTICS
Exercise No. 7
Test yourself:
Given the frequency distributed data on the raw scores of
students in Math
Class Interval
fi
75 - 79
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
15 19
10 14
5-9
2
2
3
2
8
9
2
4
10
3
10
5
5
5
5
n = 75
Compute for:
1. Quartile Deviation
2. Inter-quartile deviation
3. Average Deviation and
4. Standard Deviation
70
Xi
(X X)
(X X)
fi(Xi X)
UNDERSTANDING STATISTICS
SK = (X X)3
(n-1)(SD)3
Where: SK = skewness
Xi = midpoint of the ith class interval
SD = standard deviation
b. Frequency Distribution
_
SK = fi(X X)3
(n-1)(SD)3
Example:
a. Calculation of Moment Coefficient of Skewness from
sample raw scores in Stat 22 of 8 students in the previous
examples.
X
17
17
XX
- 10
- 10
(X X)3
-1000
-1000
26
-1
-1
28
30
1
3
1
27
_
SK = (X X)3
(n-1)(SD)3
30
31
37
3
4
10
42
27
64
1000
-882
= -882
7(6.93)3
SK = -882
7(332.813)
_
Recall: X = 27
SD = 6.93
SK = -882
2,329.69
SK = - 0.38
Take note of the result which is negative. This implies that the
distribution of the scores in Stat 22 of the 8 students is skewed to
73
UNDERSTANDING STATISTICS
the left; hence, majority of the scores are above the mean and that,
the test is very easy.
d. Calculation of Moment Coefficient of Skewness from
frequency distribution of raw scores of 40 students in Stat
22 using Midpoint Method.
Class Interval
fi
Xi
(X X)
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
2
2
3
2
8
9
2
4
5
3
n = 40
72
67
62
57
52
47
42
37
32
27
24.75
19.75
14.75
9.75
4.75
-0.25
-5.25
-10.25
-15.25
-20.25
Recall: X = 47.25
(X X)3
fi(Xi X)3
15160.92 30321.84
7703.73 15407.46
9627.15
3206.09
1853.72
926.86
107.17
857.36
-0.18
-0.02
-289.40
-144.70
-1076.89 -4307.56
-3546.58 -17732.90
-8303.76 -24911.28
10,826.21
S.D = 12.30
SK = fi(X X)3
(n-1)(SD)3
SK = 10,826.21
(39)(12.30)3
= 10,826.21
(39)(1,860.867)
= 10,826.21
72,573.813
SK = 0.15
Exercise No. 8
Test yourself:
Given the frequency distributed data on the scores of students
in Mathematics, find the moment coefficient of skewness
Diameter
Class
75 - 79
70 74
65 69
60 64
55 59
50 54
45 49
40 44
35 39
30 34
25 29
20 24
15 19
10 14
5-9
fi
(X X)
(X X)3
fi(Xi X)3
2
2
3
2
8
9
2
4
10
3
10
5
5
5
5
n = 75
75
UNDERSTANDING STATISTICS
Measures of Kurtosis
Curves of distributions having the same coefficient of
skewness may still differ significantly. Symmetrical curves, for
instance may vary in shape because they may not have the same
peakedness, a property of curves which can be described by
computing for a value called measure of kurtosis.
Kurtosis is a measure of the degree of peakedness or flatness of
a distribution. Hence, the concern is on the height of the curve
along the y-axis.
Types:
1. Leptokurtic refers to the distribution having a relatively
high peak
2. Mesokurtic refers to the distribution neither very peaked
nor very flat-topped
3. Platykurtic refers to the distribution having relatively flattop
Leptokurtic
Mesokurtic
Platykurtic
_
KU = fi(Xi X)4
(n-1)(SD)4
Where: KU = kurtosis
Others as defined
Example:
a. Calculation of Moment Coefficient of Kurtosis from sample
raw scores of 8 students in Stat 22 in the previous
examples
_
_
X
XX
(X X)4
_
17
- 10
10,000
Recall: X = 27
17
- 10
10,000
SD = 6.93
26
-1
1
_
28
1
1
KU = (X X)4
30
3
81
(n-1)(SD)4
30
3
81
= 30,420
31
4
256
7(6.93)4
37
10
42
10,000
30,420
KU = 30,420
7(2,306.40)
KU = 30,420
16,144.8
KU = 1.88
77
UNDERSTANDING STATISTICS
fi
2
2
3
2
8
9
2
4
5
3
n = 40
Xi
72
67
62
57
52
47
42
37
32
27
_
Recall: X = 47.25
KU = fi(X X)4
(n-1)(SD)4
KU = 2,039,462.07
(39)(12.30)4
= 2,039,462.07
(39)(22,888.664)
= 2,039,462.07
892,657.9
KU = 2.28
78
(X X)
24.75
19.75
14.75
9.75
4.75
-0.25
-5.25
-10.25
-15.25
-20.25
(X X)4
375,232.82
152,148.75
47,333.44
9,036.88
509.07
0.004
759.69
11,038.13
54,085.32
168,151.25
S.D = 12.30
fi(XiX)4
750,465.64
304,297.50
142,000.32
18,073.76
4,072.56
0.04
1,519.38
44,152.52
270,426.60
504,453.75
2,039,462.07
fi
(X X)
(X X)4
fi(Xi X)4
2
2
3
2
8
9
2
4
10
3
10
5
5
5
5
n = 75
79
UNDERSTANDING STATISTICS
Normal Curve
The equation of the normal curve is:
80
f(x) = le (x-/)2
22
Where: f(x) = the height of the curve above the x-axis
x = raw score laid off along the x-axis
= mean of the distribution of the test score
= standard deviation of the distribution of the
test score
e = 2.711828
= 3.1415
Characteristics of a normal curve:
1. The mean, median, and mode have the same value which
is a point on the horizontal axis of which the curve is a
maximum.
2. The curve is symmetrical and bell-shaped about a vertical
axis through the mean. This means that the line at both
sides fall off toward the opposite directions at
exactly
equal distances from the center.
2. The normal curve approaches the horizontal axis
asymptotically as we proceed in either direction away from
the mean.
4. The total area under the curve and above the horizontal
axis is equal to 1.
Standard Normal Scores
These are the converted scores, which are needed and
utilized when constructing areas of the normal probability curve.
They are the scores having definite mean of 0 and standard
deviation of 1. These scores are referred to as z-scores.
81
UNDERSTANDING STATISTICS
P(X1<X<X2)
P(X<X1)
P(X>X2)
X2
X1
Examples:
82
Solutions
a. First, draw the standard normal curve and indicate
the required area by shading it, then read the
corresponding area from the table. To locate the
area, look for 2.5 along the leftmost column and then
locate 0.06 at the topmost row. The area under the
normal curve is the intersection of 2.5 and 0.06,
which gives 0.4406
3
2.56
83
UNDERSTANDING STATISTICS
The T- Tests
Test for Dependent or Correlated Samples
Note that types of data that consist of information obtained
from matched pairs or repeated measures are classified as
dependent or correlated samples. Examples of correlated data are
effects of socio-economic characteristics with academic
performance. Data on socio economic characteristics serves as
dependent variables and their performance as independent
variables. If the data were taken twice on the same criterion
variable, this can be called repeated measure design.
Procedure:
1. Ho: 1=2 that the mean of population 1 is equal to the mean
of population 2.
The alternative hypothesis may be defined in one of the three
hypotheses presented below:
a. Ha: 1>2 that the mean of population 1 is greater than the
mean of population 2.
b. Ha: 1<2 that the mean of population 1 is less than the
mean of population 2
c. Ha: 12 that the mean of population 1 is not equal to the
mean of population 2
84
5. Computations
Individual
1
2
3
:
:
:
n
Total
Experimental
Group
x11
x21
Control
Treatment
x12
x22
Di
Di2
D1
D2
D12
D22
xi1
xi2
Di
Di2
xn1
xn2
Dn
Dn2
Xi1
i=1
Xi2
i=1
Di
i=1
Dn2
i=1
85
where:
UNDERSTANDING STATISTICS
6. State your decision based on the decision criterion and the tcomputed.
7. State your conclusion.
Illustration 1:
In a study of the effect of vitamins on the weight increase of
students, a group of 10 students were weighted before and after a
two months of taking the vitamins.
86
Wt. Before
Wt. after
196
171
170
207
177
162
199
173
231
140
200
178
169
212
180
165
201
179
243
144
87
UNDERSTANDING STATISTICS
5. Computation:
Seedling
No.
1
2
3
4
5
6
7
8
9
10
Total
Wt.
before
196
171
170
207
177
162
199
173
231
140
1826
Wt. after
2 Months
200
178
169
212
180
165
201
179
243
144
1871
= 1.1833
88
4
7
-1
5
3
3
2
6
12
4
45
16
49
1
25
9
9
4
36
144
16
309
6. Decision: Since
, reject Ho.
7. Conclusion: Based on the result, it can be concluded that the
weight after two months of taking the vitamins significantly
heavier than the mean weight before. The result further reveals
that there exists a significant increase of weight as indicated in
the test. Therefore, taking the vitamin in two months is effective
for weight increase.
Illustration 2:
In a study of effectiveness of physical exercise in weight
reduction, a group of 10 students engaged in a prescribed program
of physical exercise for one showed the following results:
INDIVIDUAL
WEIGHT
WEIGHT
BEFORE
AFTER
(pounds)
(pounds)
1
210
196
2
168
170
3
165
170
4
202
200
5
170
157
6
185
152
7
211
189
8
189
170
9
245
221
10
149
130
Use the 0.05 level of significance to test if the prescribed program of
physical exercise is effective in reducing weight.
89
UNDERSTANDING STATISTICS
Solution:
1.
5. Computation:
INDIVIDUAL
1
2
3
4
5
6
7
8
9
10
Total
WEIGHT
BEFORE
(pounds)
200
178
169
212
180
165
201
179
243
144
1871
90
WEIGHT
AFTER
(pounds)
196
171
170
207
177
162
199
173
231
140
1826
14
-2
-5
2
13
33
22
19
24
19
139
196
4
25
4
169
1089
484
361
576
361
3269
SD2
= 14.85
tc
6. Decision: Since
= 13.65
, reject Ho.
91
UNDERSTANDING STATISTICS
Exercise No. 10
1. A certain reducing weight program has produced the following
weight changes (lb) in ten students:
STUDENTS
1
10
Before 124
138
113
129
149
149
177
138
139
129
After
115
110
110
131
122
155
125
142
122
105
CORRECTLY
Group A
Group B
19
23
18
10
11
13
25
20
12
10
16
18
11
10
15
11
20
12
18
13
Is there a significant difference at 5% level in the number of
statistical problems answered correctly between Group A and
Group B?
3. A program is designed to enhance readers speed and
comprehension. To evaluate the effectiveness of this program, a
test given both before and after the program, and sample result
follow. At the 0.05 significance level, test the claim that
comprehension is higher after the program.
1
Before 102
After
109
111
117
139
151
169
185
208
189
105
127
102
133
129
127
116
115
10
125
128
93
UNDERSTANDING STATISTICS
that the before and after the design exercise program values are
equal.
94
Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= pooled variance
Note that the degrees of freedom (df) is equal to n1 + n2 2.
With the assumption that the populations have an unequal
variance (case2), the test-statistic is given by:
Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= sample variance of group 1
= sample variance of group 2
The degree of freedom (df) will be determined using the
equation:
Where:
= sample size of group 1
= sample size of group 2
=
=
= sample variance of group 1
= sample variance of group 2
95
UNDERSTANDING STATISTICS
96
Case 1:
1.
if
if
if
,
,
97
UNDERSTANDING STATISTICS
5. Computation
Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= pooled variance
98
Example 1:
TEST SCORES
Math
English
45
55
67
78
90
39
58
69
80
89
58
89
59
59
58
60
94
49
90
91
Solution:
The first step to take is to test the quality of variances.
a.
99
UNDERSTANDING STATISTICS
TEST SCORES
Math
English
Sample size
Total
Mean
Variance
45
55
67
78
90
39
58
69
80
89
10
670
67
311.1111
58
89
59
59
58
60
94
49
90
91
10
707
70.7
316.0111
Fc = 316.0111
311.1111
= 1.01575
100
Example 2:
SEEDLINGS DIAMETER (CM)
Benguet pine
Mangium
3.08
2.38
3.10
2.68
2.35
2.17
3.56
3.86
3.73
3.91
1.48
2.65
1.72
1.85
2.30
1.86
2.80
2.76
3.50
2.68
Total
29.27
25.15
Mean
2.93
2.52
Variance 0.4994
0.5291
1.
101
UNDERSTANDING STATISTICS
Therefore,
= 1.278
6. Decision: Since
, there is no
if
if
if
,
,
5. Computation:
Where:
= computed t-value
= sample size of group 1
= sample size of group 2
= sample variance of group 1
= sample variance of group 2
Note that
103
UNDERSTANDING STATISTICS
Where:
= sample size of group 1
= sample size of group 2
=
=
= sample variance of group 1
= sample variance of group 2
6. State your decision based on the rejection criterion and the
computer test-statistic.
7. State your conclusion based on your decision.
15
15
32
28
40
30
50
17
20
22
Solution:
Math
Science
104
Note that
Where: = sample size of the group with the larger
variance
= sample size of the group with smaller
variance
d. Computation of the Test-Statistic
= 6.1493
e. Decision: Since
f. Conclusion:
, reject
With the result on the test for the quality of variances, we can
proceed to the solution of the main problem. Thus,
1.
=4
=6
105
UNDERSTANDING STATISTICS
=
=
6. Decision: Since
, we fail to reject
.
7. Conclusion: The mean scores of 4 Math students is equal
to the mean scores of 6 science students.
106
Exercise No. 11
T-TEST FOR INDEPENDENT SAMPLES
1. The weight of 2 groups of children (randomized samples)
taken were found to be as follows:
Group
A
B
22.5
14.1
24.4
20.6
26.4
24.1
Weight (kg)
25.5 24.9
22.5 24
23.7
31.2
26.5
21.6
23.3
Group A
Group B
1.9
2.1
3.1
0.6
0.9
Creek 1
Creek 2
1
15
20
2
20
24
No of Trials
3
4
12
10
21
18
5
25
28
6
14
107
UNDERSTANDING STATISTICS
YEAR
1
85
95
86
87
83
79
89
91
78
89
92
95
85
81
84
89
Exercise No. 12
1. A student want to buy battery for her flashlight. She has a choice
of three brands of rechargeable batteries that vary in cost. She
obtains the sample data in the following table. She randomly
selects three batteries for each brand, and test them for
operating time (in hours) before recharging is necessary. Are
three brands have the same mean usable time before
recharging is required?
Brand
A
B
C
108
24.7
28.4
27.5
28.2
29.5
24.6
Low
9.2
20.6
24.3
WATER TABLE
Medium
High
9.7
11.5
9.2
6.8
20.3
7.9
Yield
(kgs/ha)
A
4,500
5,500
8,000
4,000
7,735
Rice Varieties
B
C
7,000
5,000
4,125
8,000
9,000
5,000
5,235
9,900
3,699
6,342
D
5,325
7,985
6,689
7,321
6,390
UNDERSTANDING STATISTICS
Where:
= ith observed value of the variable Y
= ith observed value of the variable X
= regression constant. It is the true Y-intercept
= regression coefficient. It measures the true increases in Y
per unit increase in X
110
Note that
UNDERSTANDING STATISTICS
x
x1
x2
y
y1
y2
xy
x1 y1
x2 y2
x2
x12
x22
y2
y12
y22
xi
yi
xi yi
xi2
yi2
n
Total
xn
x
yn
y
xn yn
xy
xn2
x2
yn2
y2
,
That there is no significant linear relationship that
exists between X and Y.
,
That there is a significant linear relationship
between X and Y.
a.
b.
c.
d.
e.
f.
ANOVA
Source of
Variation
Regression
Error
Total
Degrees of
Freedom
1
n2
Sum of
Squared
SSReg
SSError
n1
SSTotal
Mean
Squares
MSReg
MSError
Computed
F
Fc
Science Scores
75
70
95
72
88
85
94
113
UNDERSTANDING STATISTICS
85
70
83
100
99
114
SUM
MEAN
Math
Scores
(X)
98
99
118
94
109
116
97
100
99
114
1044
104.4
i)
Where:
Therefore:
114
Science
Scores
(Y)
75
70
95
72
88
85
94
85
70
83
827
82.7
x2
y2
xy
9604
9801
13924
8836
11881
13456
9409
10000
9801
12996
109708
5625
4900
9025
5184
7744
7225
8836
7225
4900
8649
69313
7350
6390
11210
6768
9592
9860
9118
8500
6930
10602
86860
ii)
Where:
Therefore:
iii) The resulting regression equation is
= 6.5336 + 0.7296x
The prediction equation has a y-intercept of 6.534 and a
slope of a line equal to 0.7296. The y-intercept of 6.534
indicates that even if the Math scores is 0 (which is
impossible), the Science scores predicted to be 6.534
or 7. The slope of 0.7296 indicates that the Science
score increases to about 0.7296 for every unit increase
in the Math scores.
,
That there is no significant linear relationship that
exists between Math and Science scores of the 7 students
.
2.
, That there is a significant linear relationship
between the Math and Science scores
.
3. Test-Statistics: Use F-test at =0.05 level of significance
if
,
4. Rejection Criterion: Reject
115
UNDERSTANDING STATISTICS
5. Computation:
a.
b.
c.
= 920.10 380.2484
= 539.8516
d.
e.
f.
Sum of
Squared
380.2484
539.8516
920.1000
Mean
Squares
380.2484
67.4815
Computed
F
5.6349*
* = significant at 5% level
6. Decision: Since
7. Conclusion: The result indicated that there is a significant linear
relationship between the Math and Science scores of 7 students.
116
Correlation Analysis
Correlation Analysis is a statistical technique used to
determine the strength or degree of linear relationship between two
variables. A measure of the degree of linear relationship is called
correlation coefficient, r. The more pronounced the linear
relationship and the greater is the magnitude of the correlation
coefficient.
The value of r range from -1 to +1. The correlation coefficient,
r is interpreted as follows.
Values of r
1
0.81 to 0.99
0.61 to 0.80
0.41 to 0.60
0.21 to 0.40
0.01 to 0.20
0
Qualitative Interpretation
Perfect linear relationship
Very strong linear relationship
Strong linear relationship
Moderate linear relationship
Weak linear relationship
Very weak linear relationship
No linear relationship
117
UNDERSTANDING STATISTICS
Where:
sum of the cross produce of X and Y
sum of squares of x
sum of squares of x
a.
if
if
if
,
,
118
Where:
Tc = computed t-statistic
r = correlation coefficient
n= sample size
5. Decision: State your decision based on the rejection criterion
and computer t-statistic.
6. Conclusion: State your conclusion.
Illustration:
Problem: Compare and interpret the correlation coefficient for
the following grades of ten education students selected at random
Student
1
2
3
4
5
6
7
8
9
10
Total
Mean
where:
Stat 101
(X)
78
90
79
80
88
90
78
82
90
70
819
81.9
Math 102
(y)
76
88
80
78
90
92
80
82
89
68
823
82.3
x2
y2
xy
5184
8100
6241
6400
7744
8100
6084
6724
8100
4900
67577
5776
7744
6400
6084
8100
8464
6400
6724
7921
4624
68237
5472
7920
6320
6240
7920
8280
6240
6724
8010
4760
67886
119
UNDERSTANDING STATISTICS
= 482.3
sum of squares of x
= 500.9
sum of squares of x
=504.10
Therefore:
r = 0.960
Based on the qualitative interpretation of r, the result indicates
that there is a direct very strong linear relationship between
the grades of students in Stat 101 and Math 102. That is, the
higher is the students Stat 101 grade, the higher is his Math
102 grades.
Scatter plot presentation:
Math
Regression Equation
Coefficient of Determination
Regression Equation
Coefficient of Determination
120
Interpretation:
Regression (Y- intercept)
The prediction equation has a y-intercept of -6.171 and a
slope of a line equal to 1.0724. The y-intercept of -6.171
indicates that if the score in STAT 101 is 0, the Math 102
score is predicted to be -6. The slope of 1.0724 indicates that
the STAT 101 score increases to about 1.07 for every unit
increase in the Math 102 score.
Coefficient of Determination (R2)
The R2 value of 0.9456 implies that the independent variable
STAT 101 (as pre-requisite course) accounts 94.56% on the
scores in Math 102. Only about 5.44% accounts for other
factors not included in the model.
Testing the Degree of Relationship Between Stat 101 and
MATH 102 Grades:
1.
= 9.697
5. Decision: Since
reject
UNDERSTANDING STATISTICS
Measures of Correlation
+ Relationship
No Relationship
- Relationship
No Relationship
Equation:
r=
nXY (X)(Y)_____
[nX (X)2][nY2 (Y)2
2
122
XY
16.72
18.70
26.52
18.69
8.93
12.54
19.95
28.38
37.13
187.56
X2
5,776
3,025
6,084
7,921
2,209
3,249
3,249
4,356
6,241
42,110
Y2
0.0484
0.1156
0.1156
0.0441
0.0361
0.0484
0.1225
0.1849
0.2209
0.9365
9(187.56) (604)(2.77)
[9(42,110) (604)2][(9)(0.9365)2
= 1,688.04 1,673.08
(378,990 364,816)(8.4285)2
= 14.96
(14,174)(71.04)
= 14.96
1,006,921
= 14.96
1,003.455
= 0.0000149
The result simply means that there is no linear relationship between
DBH and height of the seedlings using Pearson Product-Moment
Coefficient of Correlation.
123
UNDERSTANDING STATISTICS
The plot explained that tree height is not related to dbh. The
Regression equation represented by y=0.0011x + 0.2369 explained
that the y-intercept of 0.2369 indicates that even if tree height is 0,
dbh remains 0.24. The slope of 0.0011 indicates that dbh increases
to about 0.0011 for every unit increase of tree heights.
Simple Linear Correlation Analysis
It deals with the estimation and test of significance of the
simple linear correlation coefficient r, which is a measure of the
degree of linear relationship between two variables X and Y. this
can be computed using the equation:
r=
xy
(x )(y)2
2
Where:
x = deviate of data X
y = deviate of data Y
Example:
124
X
196
171
170
207
177
162
199
173
231
140
1826
182.6
r=
Y
200
178
169
212
180
165
201
179
243
144
1871
187.1
x2
y2
xy
13.4
12.9
179.56
166.41
172.86
-11.6
-9.1
134.56
82.81
105.56
-12.6
-18.1
158.76
327.61
228.06
24.4
24.9
595.36
620.01
607.56
-5.6
-7.1
31.36
50.41
39.76
-20.6
-22.1
424.36
488.41
455.26
16.4
13.9
268.96
193.21
227.96
-9.6
-8.1
92.16
65.61
77.76
48.4
55.9 2342.56
3124.81 2705.56
-42.6
-43.1 1814.76
1857.61 1836.06
1643.4 1683.9 6042.4 2835519.21 2767321
13.4
12.9
x y
(x2 )(y)2
6492.859
( 6042.4)( 6976.9)
6492.859
6492.859
0.995
Interpretation:
Based on the qualitative interpretation of r, the result
indicates that there is a direct very strong linear relationship
between the X and Y variables. That is, the higher is the X
variable, the higher is the Y value.
125
UNDERSTANDING STATISTICS
The plot explained that X variable is linearly related to Yvariable. The Regression equation represented by y=1.0685x
8.011 explained that the y-intercept of -8.011 indicates that even if
X variable is 0, Y remains -8. The slope of 1.0685 indicates that Y
increases to about 1.07 for every unit increase of X.
Spearman Rank Correlation
(for Non-parametric data)
= 1 6D2
n(n2-1)
Example:
Entry No.
1
2
3
4
5
6
7
8
9
126
1st Judge
9
2
7
4
5
8
6
3
1
2nd Judge
7
6
8
4
9
1
2
5
3
D
2
-4
-1
0
-4
7
4
-2
-2
D2
4
16
1
0
16
49
16
4
4
110
Computation:
= 1 6D2
n(n2-1)
= 1 6(110)
9(81-1)
= 1- 660
720
= 1 0.9167
= 0.083
Based on the result, no relationship detected between judge
no. 1 and judge no. 2.
Scatter plot presentation
UNDERSTANDING STATISTICS
Exercise No. 13
SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS
1. The following data shows the score (X) of students during
examination and the number of hours (Y) they studied for the
examination:
X
Y
85
8
78
3
90
10
92
15
87
11
89
12
80
6
82
8
81
7
128
FAMILY
2
800.5
100.5
104.5
200.6
522.0
Consumption
Expenditure(Y)
97.6
53.2
62.8
116.9
218.6
10
11
12
6.8
7.0
6.9
7.2
7.3
7.0
7.0
7.5
7.3
7.1
6.5
6.4
154 167 162 175 190 158 166 195 189 186 148 140
129
UNDERSTANDING STATISTICS
130
After
Before
Note also that those cases which show changes between the
1st and 2nd response appear in cells A and D. An individual is
tailled in cell A if her changes from + to -, tallied in cell D if he
changed from to +, and tallied in both cells B and C if no changes
is observed.
The formula for a McNemar Change test is equated as:
131
UNDERSTANDING STATISTICS
Prior
Operation
Total
14
18
3
17
4
8
7
25
2.
3.
4.
5.
6.
132
= 4.50
7. Decision: Since
reject
8. Conclusion: It can be concluded that for those operators who
change, the probability that any operators will change his
source of information from co-fisher farmers to other sources
(PA) is greater than the probability that he will change his
source of information from others to co-fisher farmers
(PD).That is, fish cage operators show a significant tendency
to change their sources of information from co-fisher farmers
to other source when they get familiarity and experience in
fish culture (Tagaro and Tagaro).
Exercise No. 15
McNemar Change Test
1. A scientist is interested in finding out whether the students in
Political Science changed their perception of the government two
years after the election. Immediately after the elections, he asked
the opinion of the community toward the government. Two years
later, he asked the same question to the same people. The result
are as follows:
Before
Favorable(+)
Unfavorable(-)
After Operation
Unfavorable(-)
Favorable(+)
30
19
15
24
UNDERSTANDING STATISTICS
the first year during the law was in effect made them change
their minds, The data are summarized as:
One Year After
Against
Favor
Favor
25
12
3 Years
Before
Against
16
28
3. A group of faculty were asked about their degree of satisfaction
with the university before and after a change in administration.
The test uses significance at 0.05 level of the change in opinion
of the faculty regarding the school administration. The data are
as follows:
Before
Satisfied
Not Satisfied
After
Not Satisfied
Satisfied
40
56
22
18
Sign Test
Functions
The sign test gets its name from the fact that it uses + and signs rather than quantitative measures in its data. It is particularly
useful for research in which quantitative measurement is
impossible, but possible to rank with respect to each other the two
members of each pair.
The sign test is applicable to the case of low related samples
when the researcher wishes to establish that two conditions are
different.
134
Method
The null hypothesis tested by the sign tests that
Where
is the judgment or score under one condition (or before
is the judgment or score under the other
the treatment) and
condition (after the treatment). That is
are the two scores
is that the median
for a matched pair. Another way of stating
difference of X and Y is zero.
In applying the sign test, we focus on the direction of the difference
, nothing where sign of the difference is + or -.
between every
is true, we would expect the number of pairs which have
When
to be equal to the number of pairs which have
. That
is if the null hypothesis is true, we would expect about half of the
difference to be a negative and half to be positive.
is rejected if
too few differences of one sign occur.
Hypothesis Testing Procedure of Sign Test:
1.
UNDERSTANDING STATISTICS
X
91
88
70
79
85
86
90
66
72
60
75
84
80
Y
88
87
67
69
83
81
93
67
76
55
74
86
72
di
3
1
3
10
2
5
-3
-1
-4
5
1
-2
8
14
80
90
-10
15
70
75
-5
Using the sign test, it is reasonable to say that the data presented
sufficient evidence to conclude that the daily output after the wage
increase is higher than before the increase.
Solution:
1.
X
91
88
70
79
85
86
90
66
72
60
75
84
80
80
70
Y
88
87
67
69
83
81
93
67
76
55
74
86
72
90
75
di
+
+
+
+
+
+
+
+
+
-
UNDERSTANDING STATISTICS
5. Decision: Since
, there is no sufficient evidence
to reject .
6. Conclusion: The hourly output after the raise of salary does not
significantly differ from the hourly output before the raise.
Illustrative Example No. 2 (For large Samples, n > 35)
A company claims that its hiring practices are fair, it does not
discriminate on the basis of gender, and the fact that 40 of the last
50 new employees are men is just a fluke (an unexpected random
event). The company acknowledges that applicants are about half
men and women. Test the null hypothesis that men and are equal in
their ability to be employed by this company. Use the significance
level of 0.05.
Solution:
1.
4. Computation:
If we denote players by + and non-players by - , we
have 10 positive signs and 40 negative signs. The test statistic X is the smaller of 10 and 40, so X=10. We note that
the value of n = 50 is above 35, so the test-statistic X is
converted to the test-statistic Z. Here, we use +1 since X <
n/2 or X = 10<50/2=25.
5. Decision: Since
reject
6. Conclusion: There is a sufficient evidence to warrant rejection of
the claim that the hiring practices is fair.
139
UNDERSTANDING STATISTICS
Exercise No. 12
Sign Test
1. Two different firms design their own IQ test and psychologist
administer both test to randomly selected students with the
results given below. At the 0.05 level of significance, test the
hypothesis that there is no significant difference between the two
tests.
Test
I
II
99
115
95
103
115
113
102
98
108
112
105
106
92
97
88
97
101
107
99
103
Before
After
97
115
99
103
110
113
112
98
106
112
105
109
92
97
98
95
140
141
UNDERSTANDING STATISTICS
BEFORE
AFTER
DIFFERENCE
RANK OF
DIFFERENCE
SIGNED
RANKS
A
B
C
D
E
F
G
H
I
J
K
L
M
67
78
81
72
75
92
84
83
77
65
71
79
80
68
81
85
60
75
81
73
78
84
56
61
64
63
-1
-3
-4
+12
0
+11
+11
+5
-7
+9
+10
+15
+17
1
2
3
10
8.5
8.5
4
5
6
7
11
12
-1
-2
-3
+10
+8.5
+8.5
+4
-5
+6
+7
+11
+12
Independent Sample
(two-sample case)
Chi-square Test for Two Independent Samples
Function
When the data of research consist of frequencies in discrete
categories, x2 test may be used to determine the significance of
difference between two independent groups. The measurement
involved may be as weak as nominal scaling.
The hypothesis under the test is usually that the two groups
differ with respect to some characteristics and therefore with respect
to the relative frequency with which group members fall in several
categories.
Method
The null hypothesis may be tested by
Where:
= the observed number of cases in the ijth category
143
UNDERSTANDING STATISTICS
= the expected number of cases in the ijth category when the null
hypothesis is true
k = the number of categories
Note that the most common of all uses of the x2 test is the test
whether an observed breakdown of frequencies in a 2 x 2
contingency table could have occurred under H0. When applying a
Chi-square test where both r and k equal 2, the following formula
should be used:
2 x 2 Contingency Table
GROUP
I
A
II
C
Total
A+C
+
B
D
B+D
Total
A+B
C+D
N
with df =1
To test the significance of x2 computed, the researcher has to refer
to the table of critical values x2 (Appendix Table C). If the x2
computed is greater than or equal to the critical value of x2 , then
reject H0.
Sample Illustration by Tagaro and Tagaro:
A researcher studied the relation of feeding management in
tilapia raising with efficiency. Efficiency is measured in terms of total
weights (in kg) at harvest time. The greater the weight means that
the feeding management is more efficient that other assuming all
other factors of production constant. A purposive sampling of 60
fishponds operators using feeding a day (A) and 65 fishpond
operators using 3 times in a day (B) were interviewed. The amounts
of feed for both groups are equal.
144
Efficient
Not Efficient
Total
40
30
70
20
35
55
60
65
125
Solution:
1. H0: There is no difference in efficiency between the two
feeding management.
Ha: Management has a significant effect on the tilapia raising..
2. Test Statistics: Use The x2 test for two independent samples
is chosen because the two groups (A and B) are independent
and because the score under study are frequencies in
discrete categories (efficient and not efficient).
3. Level of Significance: Use 5% as the level of significance, N
= 125 (number of fishpond operators).
4. Rejection Criterion: Reject H0 if
5. Computation:
= 4.53
, reject H0.
6. Decision: since
7. Conclusion: We conclude that the management has a
significant effect on the tilapia raising.
145
UNDERSTANDING STATISTICS
Wilcoxon-Mann-Whitney Test
Function
The Wilcoxon-Mann-Whitney test is an examination of
equality of two population distributions. The test is most useful in
testing for equality of two population means. As such, the test is an
alternative to the two-sample t-test and is used when the
assumption of normal population distribution is not met. The test is
slightly weaker than the t-test.
The only assumptions required for the test are random
samples from the two populations of interest and that they are also
drawn independently of each other. If we intent to state the
hypothesis in terms of population means or medians, we need to
add an assumptions, the difference is in location (mean or median).
Method
Case 1. When the samples sizes are equal, n1=n2
1. Put all observation in a single array tagging each
observation to differentiate the origin of each observation.
2. Rank the observations in the combined array.
3. Assign the average rank in case of ties.
4. Sum the rank of the first sample (T1) and the rank of the
second sample (T2) and compute T = min (T1,T2).
5. Compare T with tabular value (Table A10)
6. Decision Criterion: Reject H0 if T Ttab.
Case 2. When samples sizes are unequal, n1 < n2.
1. Do step 1 to 3 in case 1.
2. Find the total ranks for sample that has the smaller size, n1
(T1).
3. Compute for T2 = n2 (n1+n2) - T1.
4. Determine T = min (T1,T2).
146
Illustration:
Suppose a researcher wants to compare the daily
expenditures of families in rural area with that of the City. Suppose
further that there are 15 sample respondents from the City, while
there are only 10 respondents from the rural area. Below are the
data corresponding to the groups under consideration:
DILY EXPENDITURES IN
RURAL AREA (PhP)
100
200
186
177
67
74
48
300
244
74
DAILY EXPENDITURES IN
CITY (PhP)
250
300
600
134
890
52
570
153
462
115
405
117
334
157
224
147
UNDERSTANDING STATISTICS
Solution:
1. H0: The daily household expenditures in the city are the same
as those of rural areas.
Ha: The daily household expenditures in the city are greater
than those of rural areas.
2. Test Statistics: Use Wilcoxon-Mann-Whitney test at 5% level
of significance. (case 2)
3. Rejection Criterion: Reject H0 if T Ttab.
4. Computation:
n1 = 10
n2 = 15
A
Array 48
Rank 1
B
157
11
A
177
12
B
52
2
A
67
3
A
186
13
A
74
4.5
A
200
14
A
74
4.5
A
244
15.5
A
B
B
B
B
100 115 117 134 153
6
7
8
9
10
B
244
15.5
B
250
17
B
B
B
B
B
B
Array 334 405 462 570 600 880
Rank 20 21 22 23 24 25
T1 = 1+3+4.5+4.5+6+12+13+14+15.5+18.5
= 92
T2 = n1 (n1+n2+1) - T1
= 10 (10+15+1) 92
= 168
148
B
300
18.5
A
300
18.5
T = min (T1,T2)
= min (92, 168)
= 92
5. Decision: Since T = 92 > T tab = 90, we fail to reject H0.
6. Conclusion: We conclude that the daily household
expenditures in the urban are the same as those in the rural
area.
Related Samples
(k samples case)
Friedman Two-Way Analysis of Variance by Ranks
The Friedman test (Friedman two-way analysis of variances
by ranks) is a nonparametric analogue of the parametric two-way
analysis of variance. The objective of this test is to determine if we
may conclude from a sample of results that there is difference
among treatment effects. The first step in calculating the test
statistic is to convert the original results to ranks. Thus, it ranks the
algorithms for each problem separately, the best performing
algorithm should have the rank of 1, the second best rank 2, etc. In
case of ties, average ranks are computed.
Let rji be the rank of the jth of k algorithms on the ith of n data
sets. The Friedman test needs the computation of the average
ranks of algorithms, Rj = 1/ni rji. Under the null hypothesis, which
states that all the algorithms behave similarly and thus their ranks Rj
should be equal, the Friedman statistic
Function
When the data from k matched samples are in at least an
ordinal scale, this test is useful for testing the null hypothesis that k
samples have been drawn from the same population.
149
UNDERSTANDING STATISTICS
150
GROUP
A
B
C
D
E
F
G
H
I
J
LECTURE
27
30
31
26
25
24
28
30
30
29
Teaching Strategies
LECTURE W/
LECTURE W/
POWER
FIELD
POINT
DEMONSTRATION
30
35
31
40
28
42
33
28
35
45
39
45
41
42
42
45
40
42
39
45
UNDERSTANDING STATISTICS
Solution:
1. H0: The different methods of teaching have no differential
effect.
Ha: The different methods of teaching have differential effect.
2. Test Statistics: Use the non-parametic Friedman two-way
analysis of variance because the scores exhibited possible
lack of homogeneity of variance and thus the data suggested
that one of the basic assumptions of the F-test was
unattainable.
3. Level of Significance: Use 5% level of significance with
N=10 the number of students in the three matched groups.
4. Rejection Criterion: Reject H0 if
5. Computation:
GROUP
1
2
3
4
5
6
7
8
9
10
Rj
152
Methods of Teaching
LECTURE W/
LECTURE W/
LECTURE
POWER
FIELD
POINT
DEMONSTRATION
1
2
3
1
2
3
1.5
1.5
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
10.5
19.5
30
= 19.05
, reject H0.
6. Decision: Since
7. Conclusion: The different types of training have different
effect.
Multiple Comparisons Between Groups of Conditions
From the Result of the Friedman Two-Way ANOVA
When the obtained value of Fr is significant, it indicates that at
least one of the conditions differ from at least one other condition. It
does not tell the researcher which one is different, nor does it tell
the researcher how many of the groups varied from each other.
That is, when the obtained value of Fr is significant, we would like to
test the hypothesis H0: u=v against the alternative hypothesis Ha:
uv for some conditions of u and v. There is a simple procedure
for determining which condition/s differs. Begin by determining the
differences | Ru Rv | for all pairs of conditions or groups. When the
sample size is large, the differences are not independent so, the
comparison procedure must be adjusted appropriately. Suppose the
hypothesis of no difference between k conditions or matched
groups was tested and rejected at a significance level. Then we can
test the significance of individual pairs of differences by using the
following equality. That is, if
Then we may reject the hypothesis H0: u=v and conclude that
uv. Thus, if the average between the rank sums (or average
153
UNDERSTANDING STATISTICS
Example:
In the example above regarding teaching strategies, the Fr is
significant at 5% level, The following total ranks were obtained: RL =
10.5 (Lecture Method), RLH = 19.5 (Lecture with power point
presentation, RLD=30 (Lecture with Field Demonstration). We
have the following differences:
| RL RLH | = | 10.5 19.5 | = 9
| RL RLD | = | 10.5 30.0 | = 19.5
| RLH RLD | = | 19.5 30.0 | = 10.5
We then find the critical differences by using eq. R. since =
0.05 and k = 3, the number of comparisons, #c, is equal to
Referring to Appendix Table AII, we see that the
value of Z is 2.394. The critical difference is then
154
155
UNDERSTANDING STATISTICS
Method
To compare an observed with an expected group of
frequencies, we must be able to state what frequencies would be
expected. The null hypothesis states the proportion of objects falling
in each of the categories in the presumed population. That, from the
null hypothesis we may deduce what are the expected frequencies.
The chi-square technique gives the probability that the observed
frequencies could have been sampled from a population with the
given expected values.
The null hypothesis may be tested by using the following statistic:
Where:
= the observed number of cases in the ijth category
= the expected number of cases in the ijth category when the null
hypothesis is true
k = the number of categories
If the agreement between the observed and expected
frequencies is close, the differences (
) will be small and the
chi-square will be small. However if the divergence is large, the
value of chi-square as computed in the equation will likewise be
large. Roughly, the bigger the value of chi-square, less likely it is
that the observed frequencies came from the population on which
the null hypothesis and the expected frequencies are based.
Summary Procedure:
These are the steps in the used of the Chi-square test for k in
dependent samples:
156
Root Class 1
(1-10)
21
Root Class 2
(11-20)
36
Root Class 3
(21 & Above)
30
48
26
19
Group
UNDERSTANDING STATISTICS
R. Class 1
(1-10)
21
R. Class 2
(11-20)
36
R.Class 3
(21 +)
30
48
26
19
93
Total
69
62
49
180
Total
87
Solution:
158
159
UNDERSTANDING STATISTICS
Menu B
59
60
63
65
67
Menu C
45
48
51
54
55
Menu D
62
64
68
70
72
Solution:
1. H0: The four menus have the same acceptability level.
Ha: The four menus differ in acceptability level.
160
Menu B
59
60
63
65
67
69.50
13.90
5
Menu C
45
48
51
54
55
23.00
4.60
5
Menu D
62
64
68
70
72
85.00
17.00
5
= (0.02857)(2728.1) 63
= 14.942
6. Decision: Since
reject H0.
161
UNDERSTANDING STATISTICS
Exercise No. 16
Test yourself:
Find the significant different of judging beauty contestants
= 1 6D2
n(n2-1)
Entry No.
1
2
3
4
5
6
7
8
9
162
1st Judge
9
2
7
4
5
8
6
3
1
2nd Judge
7
6
8
4
9
1
2
5
3
D
2
-4
-1
0
-4
7
4
-2
-2
D2
4
16
1
0
16
49
16
4
4
ANALYSIS OF VARIANCE
ONE WAY CLASSIFICATION DESIGN: CRD
CRD is a design wherein the allocation of treatments is done by
randomizing the treatments completely over the entire experimental
units (eus) without any restriction imposed on the units and there is
only one criterion for data classification.
CRD is commonly used when:
1. The eus are sufficiently homogenous (like dishes of culture
medium) and
2. Effective local control is assured (as those in laboratories,
greenhouse).
Randomization and layout
Suppose there are: t = 3 treatments, T1, T2, and T3, which are
replicated r1 = 2, r2 = 3 and r3 + 4 times respectively; hence, the nos.
of eus required is n = r1 + r2 + r3 = 9.
The randomization (using random no. generator key on calculator)
may be as follows:
1. Label the eus consecutively from 1 to n = 9.
2. Obtain a sequence of n = 9 random numbers. Rank the
numbers in increasing order. Using the sequence of ranks as a
randomization of the eus, assign the first r1 = 2 eus to T1, the next
r2 = 3 eus to T2 and the last r3 = 4 eus to T3.
Random no.: 0.678 0.124
Rank (eu no.) 7
3
_______________
T1
T2
T3
163
UNDERSTANDING STATISTICS
T3
T2
T1
T3
T2
T2
T1
T3
T3
Observations (Yij)
Reps
Total
Mean
ri
Yi.
T1
Y11
Y12
Y1r1
r1
Y1.
T2
Y21
Y22
Y2r2
r2
Y2.
Tt
Yt1
Yt2
Ytrt
rt
Yt.
Y..
Linear Model:
Model
= overall population mean
= effect of ith treatment (
(NID = normally, independently distributed)
= residual or error effect of jth measurement of ith treatment
= deviation of the ijth observation from the ith treatment mean
)
164
Advantages
Completely flexible for # of replications or treatments
Missing observations are not a problem
Maximum error degree of freedom of any designs
Analysis of data:
Illustration 1:
Three methods of soil analysis, S1, S2, S3, were tried by a research
institute. Twelve uniform soil samples were taken from a certain
farm and S1 was randomly assigned to 5 of the soil samples, S2 to 3
samples and S3 to 4 samples. The researcher was interested in the
time to complete the soil analysis. The data on time (in hours) of the
soil analysis were summarized as follows:
Method Time to complete analysis (hrs.)
S1
2.4
3.8
2.9
4.6
S2
4.8
1.6
0.2
S3
7.2
5.3
2.9
3.5
Total
14.4
10.7
6.0
8.1
G.Mean
3.1
3.1
16.8
3.36
6.6
2.20
18.9
4.72
12
42.3
3.35
At the level of significance = 5%, test the hypothesis that there is no difference in the
time to complete the soil analysis for the three methods.
UNDERSTANDING STATISTICS
df
SS
MS
t-1
TrSS
MSTr
n-t
ESS
MSE
n-1
TSS
FFtab
Computed
MSTr/MSE F[(t-1), (n-t)]
Computations:
Let Yij = jth observation on the ith treatment
Yi = total of observations on the ith treatment
Y.. = grand total of all observations.
Then compute
a. the sums of squares
CF = (Y..)2 = (42.3)2
n
12
t
TSS =
i=1 j=1
t
166
ri
= 149.11
TrSS = Yi2/ri CF
i=1
= 25.34/9 = 2.82
= 5.58/2.82 = 1.98
Source of
Variance
Treatment
df
SS
MS
11.16
5.581
Error
25.34
2.82
TOTAL
11
36.50
FComputed
1.98ns
Ftab
4.26
UNDERSTANDING STATISTICS
= 2.82 x 100
= 47.60%
Y..
3.52
2. Standard error of a treatment mean, s.e.(i) is the measure of the
average error in estimating the true treatment mean. It measures
the degree of precision of i as the estimate of the true treatment
mean.
s.e.(i) = MSE/ri
3. Standard error of the difference between two treatment means,
s.e.(i - i) is a measure of the average error in estimating the
difference between two treatment means. It measures the degree of
precision of .(i - i) as the estimate of the difference between the
true means of the treatment i and treatment i.
s.e.(i - i) = MSE (1/ri+1/ri) .. for un equal replication
s.e.d
= 2MSE/r
=
=
=
=
3.36
2.20
4.72
3.35
Means
Effects
S1
3.36
0.01
S2
2.20
-1.15
S3
4.72
1.37
Mean
3.35
0.00
Note: Since the ANOVA test showed no significant effects, the observed effects
cannot be generalized to be significant.
169
UNDERSTANDING STATISTICS
imply that the data starts at row 2. Make sure that the order of
the variables in the input statement is exactly the same as the order
inputted in Excel. Click the run icon and if everything is alright, then you
will get the following in the output window of SAS (if there is a mistake in
the program statements, the program may not run. Check the log window
for mistakes and comments on the program statements).
170
171
UNDERSTANDING STATISTICS
172
UNDERSTANDING STATISTICS
Source
df
Hybrids
Within hybrids
9
20
Within plots
120
MS
Variation among hybrids
Variation among reps within
hybrids = experimental error
Variation within plots = sampling
variation
149
observations
To use RCBD
174
Now:
Ti
NID
NID
ij
NID
Randomization Procedure
(1)
(3)
Advantages
(1)
(2)
(3)
UNDERSTANDING STATISTICS
Disadvantages
(1)
(2)
to
Definition
df
Blocks
(r-1)
Treatments
(t-1)
176
SS
Calculation
MS
B x T (error)
(r-1)(t-1)
Total
rt-1
SSTotal - SS - ST
= 0)
I
II
fertility
7
1
2 8 3 4
6
WRONG
WAY
177
8
3
4
6
I
UNDERSTANDING STATISTICS
III
II
III
178
179
UNDERSTANDING STATISTICS
180
181
UNDERSTANDING STATISTICS
Appendix 1
OUTLINE OF RESEARCH METHODS IN EDUCATION
Research is a scientific investigation of phenomena which includes
collection, presentation, analysis , and interpretation of facts that links
mans speculation of reality.
Characteristics of Researcher
1. Intellectual curiosity
2. Prudence
3. Healthy criticism
4. Intellectual honesty
Qualities of a Good Researcher
R research-oriented
E efficient
S scientific
E Effective
A Active
R resourceful
C creative
H honest
E economical
R religious
Characteristics of Research
1.
2.
3.
4.
5.
6.
Empirical
Logical
Cyclical
Analytical
Replicability
Critical
Types of Research
1. Pure research
2. Applied research
3. Action research
Classification of Research
1. Library research
2. Field research
3. Laboratory research
182
The Variable
Types:
1. Independent variable stimulus variable
2. Dependent variable response variable
3. Moderate variable special type of independent variable that may alter
or modify the relationship of the independent and dependent variable
4. Control variable a variable controlled by the researcher which the
effect can be neutralized by eliminating or removing the variable
5. Intervening variable a variable that interferes with the independent and
dependent variable which may either strengthen or weaken the two
variables
Schematic Diagram of the Research Process
Problem/Objectives
Theoretical/Conceptual Framework
Assumptions
Hypothesis
Research Design
Data Collection
183
UNDERSTANDING STATISTICS
Research Designs
1. Historical Designs
Steps:
a. Collection of data
b. Criticism of the data collected
c. Presentation of the facts
2. Descriptive Designs
Types:
a. Descriptive-survey
b. Descriptive-normative survey
c. Descriptive-status
d. Descriptive-analysis
e. Descriptive-classification
f. Descriptive-evaluative
g. Descriptive-comparative
h. Correlational survey
i. Longitudinal survey
3. Case Study Design
Steps:
a. Recognition and determination of the status of the problem to be
investigated
b. Collection of data
c. Diagnosis or identification of the causal factors
d. Application of remedial or adjustment measures
e. Subsequent follow-up to determine the effectiveness of the corrective
or developmental measures applied
Qualities of a Good Research Instrument
1. Validity
a. Content validity
b. Concurrent validity
2. Reliability
3. Usability
Sampling Designs
1. Scientific sampling
a. Random sampling
b. Systematic sampling
- Stratified sampling design
- Multi-stage sampling design
- Clustering
2. Non-scientific sampling
a. Purposive sampling
b. Incidental sampling
c. Quota sampling
184
185
UNDERSTANDING STATISTICS
Appendix 2
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
-3.4
-3.3
-3.2
-3.1
-3.0
0.0003
0.0005
0.0007
0.0010
0.0013
0.0003
0.0005
0.0007
0.0009
0.0013
0.0003
0.0007
0.000
0.0009
0.0013
0.0003
0.0004
0.0006
0.0008
0.0012
0.0003
0.0004
0.0006
0.0008
0.0012
0.0003
0.0004
0.0006
0.0008
0.0011
0.0003
0.0004
0.0005
0.0008
0.0011
0.0003
0.0004
0.0005
0.0008
0.0011
0.0003
0.0004
0.0005
0.0007
0.0010
0.0002
0.0003
0.0005
0.0007
0.0010
-2.9
-2.8
-2.7
-2.6
-2.5
0.0019
0.0026
0.0035
0.0047
0.0062
0.0018
0.0025
0.0034
0.0045
0.0060
0.0018
0.0025
0.0034
0.0045
0.0060
0.0017
0.0023
0.0032
0.0043
0.0057
0.0016
0.0023
0.0031
0.0041
0.0051
0.0016
0.0022
0.0030
0.0040
0.0054
0.0015
0.0021
0.0029
0.0039
0.0052
0.0015
0.0021
0.0028
0.0038
0.0051
0.0014
0.0020
0.0027
0.0037
0.0049
0.0014
0.0019
0.0026
0.0036
0.0048
-2.4
-2.3
-2.2
-2.1
-2.0
0.0082
0.0107
0.0139
0.0179
0.0228
0.0080
0.0104
0.0136
0.0174
0.0222
0.0080
0.0104
0.0136
0.0174
0.0222
0.0075
0.0099
0.0129
0.0166
0.0212
0.0073
0.0096
0.0125
0.0162
0.0207
0.0071
0.0094
0.012
0.0158
0.0202
0.0069
0.0091
0.0119
0.0154
0.0197
0.0068
0.0089
0.0116
0.0154
0.0192
0.0066
0.0087
0.0113
0.0146
0.0188
0.0064
0.0084
0.0110
0.0143
0.0183
-1.9
-1.8
-1.7
-1.6
-1.5
0.0287
0.0359
0.0446
0.0548
0.0668
0.0281
0.0352
0.0436
0.0537
0.0655
0.0281
0.0352
0.0436
0.0537
0.0655
0.0268
0.0336
0.0418
0.0516
0.0630
0.0262
0.0329
0.0409
0.0505
0.0618
0.0256
0.0322
0.0401
0.0495
0.0606
0.0250
0.0314
0.0392
0.0485
0.0594
0.0244
0.0307
0.0384
0.0475
0.0582
0.0239
0.0301
0.0357
0.0465
0.0571
0.0233
0.0294
0.0367
0.0455
0.0559
-1.4
-1.3
-1.2
-1.1
-1.0
0.0808
0.0968
0.1151
0.1357
0.1587
0.0793
0.0951
0.1131
0.1335
0.1562
0.0793
0.0951
0.1131
0.1335
0.1562
0.0764
0.0918
0.1098
0.1292
0.1515
0.0749
0.0901
0.1075
0.1271
0.1492
0.0735
0.0885
0.1056
0.1251
0.1469
0.0722
0.0869
0.1038
0.1230
0.1446
0.0708
0.0853
0.1020
0.1210
0.1423
0.0694
0.0838
0.1003
0.1190
0.1401
0.0681
0.0823
0.0985
0.1170
0.1379
-0.9
-0.8
-0.7
-0.6
-0.5
0.1841
0.2119
0.2420
0.2743
0.3085
0.1841
0.2090
0.2389
0.2709
0.3050
0.1788
0.2061
0.2358
0.2676
0.3015
0.1762
0.2033
0.2327
0.2643
0.2981
0.1736
0.2005
0.2296
0.2611
0.2946
0.1711
0.1977
0.2266
0.2578
0.2912
0.1685
0.1949
0.2236
0.2546
0.2877
0.1660
0.1922
0.2206
0.2514
0.2843
0.1635
0.1894
0.2177
0.2483
0.2810
0.1611
0.1867
0.2148
0.2451
0.2776
-0.4
-0.3
-0.2
-0.1
-0.0
0.3446
0.3821
0.4287
0.4602
0.5000
0.3409
0.3783
0.4168
0.4562
0.4960
0.3372
0.3745
0.4129
0.4522
0.4920
0.3336
0.3707
0.4090
0.4483
0.4880
0.3300
0.3669
0.4052
0.4443
0.4840
0.3264
0.3632
0.4013
0.4404
0.4801
0.3228
0.3594
0.3974
0.4364
0.4761
0.3192
0.3557
0.3936
0.4325
0.4721
0.3156
0.3520
0.3897
0.4286
0.4681
0.3121
0.3483
0.3859
0.4247
0.4641
0.0
0.1
0.2
0.3
0.4
0.5000
0.5398
0.5793
0.6179
0.6554
0.5040
0.5438
0.5832
0.6217
0.6591
0.5080
0.5478
0.5871
0.6255
0.6628
0.5120
0.5517
0.5910
0.6293
0.6664
0.5160
0.5557
0.5948
0.6331
0.6700
0.5199
0.5596
0.5987
0.6368
0.6736
0.5239
0.5636
0.6026
0.6406
0.6772
0.5279
0.5679
0.6064
0.6433
0.6808
0.5319
0.5714
0.6103
0.6480
0.6844
0.5359
0.5753
0.6141
0.6517
0.6879
0.5
0.6
0.7
0.8
0.9
0.6915
0.7257
0.7580
0.7881
0.8159
0.6950
0.7291
0.7611
0.7910
0.8186
0.6985
0.7324
0.7642
0.7939
0.8212
0.7019
0.7357
0.7673
0.7967
0.8238
0.7054
0.7389
0.7704
0.7995
0.8264
0.7088
0.7422
0.7734
0.8023
0.8289
0.7123
0.7454
0.7764
0.8051
0.8315
0.7157
0.7486
0.7794
0.8078
0.8340
0.7190
0.7517
0.7823
0.8106
0.8365
0.7224
0.7549
0.7852
0.8133
0.8389
1.0
1.1
1.2
1.3
1.4
0.8413
0.8643
0.8849
0.9032
0.9192
0.8438
0.8665
0.8869
0.9049
0.9207
0.8461
0.8686
0.8888
0.9066
0.9222
0.8485
0.8708
0.8907
0.9082
0.9236
0.8508
0.8729
0.8925
0.9099
0.9251
0.8531
0.8749
0.8944
0.9115
0.9265
0.8554
0.8770
0.8962
0.9131
0.9278
0.8577
0.8790
0.8980
0.9147
0.9292
0.8599
0.8810
0.8997
0.9162
0.9306
0.8621
0.8830
0.9015
0.9177
0.9319
186
Table A Continued
Z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
1.5
1.6
1.7
1.8
1.9
0.9332
0.9452
0.9554
0.9641
0.9713
0.9345
0.9463
0.9564
0.9649
0.9719
0.9357
0.9474
0.99573
0.9656
0.9726
0.9370
0.9484
0.9582
0.9664
0.9732
0.9382
0.9495
0.9591
0.9371
0.9738
0.9394
0.9505
0.9599
0.9678
0.9744
0.9406
0.9515
0.9608
0.9686
0.9750
0.9418
0.9525
0.9616
0.9693
0.9756
0.9429
0.9535
0.9625
0.9699
0.9761
0.9441
0.9545
0.9633
0.9706
0.9767
2.0
2.1
2.2
2.3
2.4
0.9772
0.9821
0.9861
0.9893
0.9918
0.9778
0.9826
0.9864
0.9896
0.9920
0.9783
0.9830
0.9868
0.9898
0.9922
0.9788
0.9834
0.9871
0.9901
0.9925
0.9793
0.9838
0.9875
0.9904
0.9927
0.9798
0.9842
0.9878
0.9906
0.9929
0.9803
0.9846
0.9881
0.9909
0.9931
0.9808
0.9850
0.9884
0.9911
0.9932
0.9812
0.9854
0.9887
0.9913
0.9934
0.9817
0.9857
0.9890
0.9916
0.9936
2.5
2.6
2.7
2.8
2.9
0.9938
0.9953
0.9965
0.9974
0.9981
0.9940
0.9955
0.9966
0.9975
0.9982
0.9941
0.9956
0.9967
0.9976
0.9982
0.9943
0.9957
0.9968
0.9977
0.9983
0.9945
0.9959
0.9969
0.9977
0.9984
0.9946
0.9960
0.9970
0.9978
0.9984
0.9948
0.9961
0.9971
0.9979
0.9985
0.9949
0.9962
0.9972
0.9979
0.9985
0.9951
0.9969
0.9973
0.9980
0.9986
0.9952
0.9964
0.9974
0.9981
0.9986
3.0
3.1
3.2
3.3
3.4
0.9987
0.9990
0.9993
0.9995
0.9997
09987
0.9991
0.9993
0.9995
0.9977
0.9987
0.9991
0.9994
0.9995
0.9997
0.9988
0.9991
0.9994
0.9996
0.9997
0.9989
0.9992
0.9994
0.9996
0.9997
0.9989
0.9992
0.9994
0.9996
0.9997
0.9989
0.9992
0.9994
0.9996
0.9997
0.9989
0.9992
0.9994
0.9996
0.9997
0.9990
0.9993
0.9995
0.9996
0.9997
0.9990
0.9993
0.9995
0.9997
0.9998
rd
Appendix A is taken from Table A.4, Introduction to Statistics R.E. Walpoke 3 Ed. McMillian
Publishing company Inc.
0.20
0.10
0.05
0.02
0.01
0.002
0.001
0.0001
0.00001
0.10
0.05
0.025
0.01
0.005
0.001
0.0005
0.00005
0.000005
1.282
1.645
1.960
2.326
2.576
3.090
3.291
3.891
4.417
Taken from Nonparametric Statistics for the Behavioral Sciences, S. Siegel and N. Cstellan,
McGrew Hill Book Company.
187
UNDERSTANDING STATISTICS
Appendix Table AII
#c
1
2
3
4
5
6
7
8
9
10
11
12
15
21
28
Two-tailed
One-tailed
0.30
0.15
0.25
0.125
0.20
0.10
0.15
0.075
0.10
0.05
0.05
0.025
1.036
1.440
1.645
1.780
1.881
1.960
2.026
2.080
2.128
2.170
2.208
2.241
2.326
2.450
2.552
1.150
1.534
1.732
1.863
1.960
2.037
2.100
2.154
2.200
2.241
2.278
2.301
2.394
2.515
2.615
1.282
1.645
1.834
1.960
2.054
2.128
2.189
2.241
2.287
2.326
2.362
2.394
2.475
2.593
2.690
1.440
1.780
1.960
2.080
2.170
2.241
2.300
2.350
2.394
2.432
2.467
2.498
2.576
2.690
2.785
1.645
1.960
2.128
2.241
2.326
2.394
2.450
2.498
2.539
2.576
2.608
2.638
2.713
2.823
2.913
1.960
2.241
2.394
2.498
2.576
2.638
2.690
2.734
2.773
2.807
2.838
2.886
2.935
3.038
3.125
Taken from Nonparametric Statistics for the Behavioral Sciences, S. Siegel and N. Cstellan,
McGrew Hill Book Company.
188
Table AB
Wilcozons Paired Signed Ranks Test Critical Values
0.05
0.10
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
40
50
1
2
4
6
8
11
14
17
21
26
30
36
41
47
54
60
101
152
287
466
One-Tailed
0.025
0.01
Two-Tailed
0.05
0.02
1
2
4
6
8
11
14
17
21
25
30
35
40
46
52
90
137
264
434
0.005
0.01
0
2
3
5
7
10
13
16
20
24
28
33
38
43
77
120
138
398
0
2
3
5
7
10
13
16
19
23
28
32
37
68
109
221
373
nd
189
UNDERSTANDING STATISTICS
Table A C
Wilcoxons Two-Sample Rank Test (The Mann-Whitney Test)
(These values or smaller cause rejection, Two-tailed Test. Take n1 n2)
0.05 Level of Significance
n2
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
190
3
3
3
4
4
4
4
4
4
5
5
5
5
6
6
6
6
6
7
7
7
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
10
11
12
13
14
15
15
16
17
18
19
20
21
21
22
23
24
25
26
27
28
28
29
17
18
20
21
22
23
24
26
27
28
29
31
32
33
34
25
27
28
29
40
42
26
27
29
31
32
34
35
37
38
40
42
43
45
46
48
50
51
53
55
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
49
51
53
55
58
60
63
65
67
70
72
74
77
79
82
n1
9
63
65
68
71
73
76
79
82
84
87
90
93
95
10
11
12
13
14
15
78
81
85
88
91
94
97
100
103
107
110
96
99
103
106
110
114
117
121
124
115
119
123
127
131
135
139
137
141 160
145 164 185
150 169
154
Table A C Continued
Wilcoxons Two-Sample Rank Test (The Mann-Whitney Test)
(These values or smaller cause rejection, Two-tailed Test. Take n1 n2)
n2
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
3
3
3
3
3
3
3
3
4
4
6
6
6
7
7
7
8
8
8
8
9
9
9
10
10
10
11
11
11
10
10
11
11
12
12
13
14
14
15
15
16
16
17
18
18
19
19
20
20
21
5
15
16
17
17
18
19
20
21
22
22
23
24
25
26
27
28
29
29
30
31
32
23
24
25
26
27
28
30
31
32
33
34
36
37
38
39
40
42
43
44
32
34
35
37
38
40
41
43
44
46
47
49
50
52
53
55
57
43
45
47
49
51
53
54
56
58
60
62
64
66
68
70
56
58
61
63
65
67
70
72
74
76
78
81
83
71
74
76
79
81
84
86
89
92
94
97
87
90
93
96
99
102
105
108
111
12
13
14
15
106
109
112
115
119
122
125
125
129 147
133 151 171
137 155
140
191
UNDERSTANDING STATISTICS
Appendix Table B
Critical Values of the t Distribution
0.10
0.05
0.025
0.01
0.005
1
2
3
4
5
3.078
1.886
1.638
1.533
1.476
6.314
2.290
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.657
9.925
5.841
4.604
4.032
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
3.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3. 012
2.977
2.947
16
17
18
19
20
1.337
1.333
1.330
1.328
1.325
1.746
1.740
1.734
1.729
1.725
2.120
2.110
2.101
2.093
2.086
2.583
2.567
2.552
2.539
2.528
2.921
2.898
2.878
2.861
2.845
21
22
23
24
25
1.3323
1.321
1.319
1.318
1.316
1.721
1.717
1.714
1.711
1.708
2.080
2.074
2.069
2.064
2.060
2.518
2.508
2.500
2.492
2.485
2.431
2.819
2.07
2.797
2.787
26
27
28
29
Inf.
1.315
1.314
1.313
1.312
1.282
1.706
1.703
1.701
1.699
1.645
2.056
2.052
2.048
2.045
1.960
2.479
2.473
2.467
2.462
2.326
2.779
2.771
2.763
2.756
2.576
df
192
References
Aczel, A. 1989. Complete Business Statistics. Richard D. Irwin, Inc.
Anderson, R.L and Bancrot, T.A 1952. Statistical Theory in Research. McGraw-Hill
Book Co., Inc.
Calmorin, L.P and Calmorin, M.A. 1999. Methods of Research and Thesis Writing. 1st
Ed. Rex Book Store, Inc. Manila Philippines
Daleon, S., Sanches, L. and marquez, T. 1996. Fundamentals of Statistics. National
Book Store, Inc.
Draper, N. and Smith, H. 1966. Applied Regression Analysis. John Wiley and Sons,
Inc.
Gomez, K.A and Gomez, A.A 1984. Statistical Procedures for Agricultural Research.
2nd Ed. An International Rice Research Institute Book. John Wiley and
Sons Inc.
Iman, L.R. and W.J. Canover, 1983. A Modern Approach to Statistics. John Wiley and
Sons Inc.
Johnston, J. 1972. Econometrics methods, 2nd Ed. McGraw-Hill Book Co. Inc.
Mendenhall, W. and Sincich. 1989. A Second Course in Business Statistics Regression
Analysis. Dellen Publishing Company
Ostle, B. 1966. Statistics in Research. Iowa State University Press, Ames, Iowa.
Pacificador, A. Jr. 1997. Outreach Seminar on Statistics for Researchers. Urios
College, Butuan City.
Parel, P. C. 1996. Introduction to Statistical Methods with Application. Macaraig
Publishing Co. Inc., Manila Philippines.
Searle, S.R. 1971. Linear Models. New York. Wiley and Sons Inc.
Siegel, S. and N.J. Castellan. 1988. Non-Parametric Statistics for Biological Science.
McGraw-Hill Book Co.
Snedecor, G.W and Cohcran, W.G. 1957. Statistical Methods. 5th Ed. Iowa State
University Press.
Snedecor, G.W and Cohcran, W.G. 1956. Statistical Methods. Iowa State College
Press, Ames Iowa.
Spiegel, M.1978. Statistics. McGraw-Hill Book Inc.
193
UNDERSTANDING STATISTICS
Steel, R.G.D and J.H Torre 1960. Principles and Procedures of Statistics. McGraw-Hill
Book Co., Inc. New York.
Tagaro, C.A and Tagaro A.T. Statistics Made Easy. 11th Edition. University of
Southern Mindanao, Kabacan, Cotabato Philippines.
Walpole, R.E 1982. Introduction to Statistics. 3rd Ed. McMillan Publishing Co. Inc.
194
www.get-morebooks.com
Kaufen Sie Ihre Bcher schnell und unkompliziert online auf einer
der am schnellsten wachsenden Buchhandelsplattformen weltweit!
Dank Print-On-Demand umwelt- und ressourcenschonend produziert.
www.morebooks.de
VDM Verlagsservicegesellschaft mbH
Heinrich-Bcking-Str. 6-8
D - 66121 Saarbrcken
info@vdm-vsg.de
www.vdm-vsg.de