Unit 14 Analysis of Quantitative Data (Descriptive Statistical Measures Selection AND Application)
Unit 14 Analysis of Quantitative Data (Descriptive Statistical Measures Selection AND Application)
Structure
14.1 Introduction 14.2 Objectives 14.3 Types of Data 14.3.1 Qualitative Data 14.32 Quantitative Data 14.4 Graphic Representation of Quantitative Data 14.4.1 The Histogram or Column Diagram , 14.42 Frequency Polygon 14.4.3 Cumulative Frequency Curve 14.4.4 Cumulative Percentage Curve or Ogive 14.5 Descriptive Statistical Measures 14.5.1 Measures of Central Tendency or Averages 14.52 Measures of Variability or Dispersion 14.5.3 Normal Probability Curve 14.5.4 Measures of Relative Positions 14.5.5 Measures of Relationship 14.6 Let Us Sum Up 14.7 Unit-end Activities 14.8 Points for Discussion 14.9 Suggested Readings 14.10 Answers to Check Your Progress
14.1 INTRODUCTION
In the Block 2, you were introduced to various approaches to research: historical, philosophical, descriptive, experimental, etc. You were also explained the techniques of sampling, alongwith the development and use of tools in the collection of data. The data are of two types: Qualitative and Quantitative. Descriptive narrations, observation notes, responses to questions, etc. are qualitative data; whereas quantitative data comprise numerical figures. The quantitative data are either parametric or non-parametric in nature. Various descriptive statistical measures are used in the analysis of parametric and non-parametric data. These measures include: (i) measures of central tendency; (ii) measures of variability; (iii) measures of relative position; and (iv) measures of relationship. Descriptive statistical analysis using these measures limits generalisation to the particular grmp of individuals observed. No conclusions are extended beyond this group, and any similarity to those outside the group cannot be assumed.
7 '
Y this unit you will study the nature of qualitative and quantitative data. The n techniques which produce data of quantitative nature will also be explained. You will also be introduced to various descriptive statistical measures which are used in the analysis of quantitative data. The computation, selection and application of these descriptive statistical measures will also be explained with the help of illustrations. You will also understand the nature and characteristics ofa normal probability curve, and its applications in educational research.
name and describe the nature of various types of data; describe the techniques/tools which generate data of quantitative type; describe the procedures of classifiing data and their graphical representation; name and describe various statistical measures which are used in the analysis of quantitative data; define, Mean, Median and Mode as the measures of central tendency; their computation, selection and applications; define Range, Average Deviation, Quartile Deviation and Standard Deviation as the measures of variability; their computation, selection and application; discuss the nature, characteristics and applications of Normal Probability Curve;
a a
define Sigma Scores, T-Scores and Percentile Rank as the measures of comparing individuals; their computation, selection and application; and define Product moment correlation and Rank difference correlation as the measures of relationship; their computation, selection and application.
14.3 TYPESOFDATA
The data collected from various sources through the use of different tools and techniques generally consists of numerical figures, ratings, narrations, responses to open-ended questions comprising a questionnaire or an interview schedule, quotations, field notes etc. In educational research usually, the use of two types of data are recognised. These are qualitative data and quantitative data.
nor standardized. However, such responses are detailed and comprehensive which provide data of qualitative nature in the form of emotions of respondents, their thoughts and experiences about the phenomena under study. Further details about the nature of qualitative data, their analysis and interpretation will be explained to you in the units 17 and 18.
and only arithmetical operation applicable to such scales is Counting, the mere enumeration of individuals in a particular category. Statistical analysis based on counting is permissible in this type of measurement. The measurement based on ranks is ordinal scale measurement. In this scale, objects or individuals are ordered (ranked) on some continuum in a series from lowest to highest according to measured characteristics. The ranking of students in a class for weight, height or academic achievement are the examples of ordinal scale. Suppose we rank three students X, Y and Z in order of their height, X as the tallest, and assign the number 1, 2, 3 respectively as their ranks. In this way we have information about the serial arrangement. We cannot infer any information as how much X is taller than Y or 2, even though the three numbers assigned to them are equally spaced on the scales of measurement. The statistical analysis based rank is permissible in this measurement.
Histogram or Column Diagram Frequency Polygon Cumulative Frequency Curve Cumulative Percentage Curve or Ogive
f (Frequency)
1
4 10 1 5
8
2
10 - 19
19.5
= 40
Let us construct a histogram or column diagram representing the above distribution. Histogram is a graph in which the exact class intervals are represented along the horizontal axis called x-axis and their corresponding frequencies are represented by areas in the form of rectangular basis drawn on the intervals as shown in the figure
"
9.5
19.5
29.5
39.5
49.5
59.5
69.5
f (Frequency)
1 4 10 1 5 8 2
F (Cumulative Frequency)
40 E (39+ 1) 39 = (35+4) 35 (25+10) 25 = (10+15) 10 = (2+8) 2
1
. L
N = 40
11
For drawlrt,:. - cumulative frequency curve of this distribution, each curnu'lative frequency i s pl;%tted the exact upper limit of the class interval upen which it falls. at For the given distribution first value of cumulative frequency correspnding to 19.5 is 2; the second value of cumulative frequency corresponding to 29.5 is 10 and so on to the last point for which value of cumulative frequency corresponding to 69.5 IS 40. By joining the plotted points by lines, we get 'frequency cunle' as illustrated in the Figure (Flg. 14.3). It lnay be noted that in order to have the curye begin on the horizontal axis, we start at the exact lower limit of the lower class (i.e. 9.5 in the present case), the cumulative frequency of which is taken to be '0'.
14.4.4 Cumulative Percentage Curve or Ogive
:+
Cumulative percentage curve or Ogive differs from cumulative frequency curve in that frequencies are expressed as cumulative percents of N on the vertical axis instead of as cumulative frequencies, as illustrated in the given example.
Exact Units of Class Intervals
59.5 - 69.5
F (Comulative Frequency)
40 39
35
25
40.5 - 59.5
4
10
39.5 - 49.5
29.5 - 39.5
I0.S - 29.5 9.5 - 19.5
I! r-t
i
;5
62.5 25.0
8
2
10
2
5.0
N=r'-
, -
In the above distribution, the percentage cumulative frequency are expressed as percentages of N in the last column. After finding cumulative percentage frequencies we plot these frequencies corresponding to upper exact limits of class intervals, as shown in the figure (Fig. 14.4). The curve joining the points thus drawn is called cumulative percentage curve or Ogive.
loo
80 60
40
20
Applktk.)
I
I
9.5
19.5
29.5
39.5
49.5
59.5
69.5
I
I Your I Notes
check
b) Compare your answer with the one given at the end of this unit.
1. i )
I I I
................................................................................................................
I
14.5
DESCRIPTI[VE STATISTICALMEASURES
S b t b h l methods are extensively used in analyzing quantitative data in educational research. Generally, thcsc methods are classified as 'descriptive statistics' and 'inferential statistics'. Dtscriptive statistics are computed to describe the characteristics (attributes) of a sample or population in totality and thus limit generalization to the particular group (sample). No conclusions att extended beyond used to draw generalizations this group whereas infbrential statistical methods beyond the sample with a known degree of accuracy.
In this unit we shall confine our discussion to the understanding of four measures of descriptive statistics commonly used in educational research. These include measures of (i) central tendency or averages; (ii) vgriability or dispersion ; (iii) relative position; and (iv) relationship.
19
The Mean
The Mean of a distribution is the arithmetic average. It is perhaps the most familiar, t most frequently used and well understood average. T is computed by dividing the sum of all the observations by the total number of observation. If x,, 3, x,, ......... xn are the N observations, the formula for computing the Mean (X) is given
X I + X 2 + X 3
Mean = M
+ .................... X ,
N
-C
N Suppose 10, 12, 7, 6, 4, 10, 15 are the marks obtained by 7 students in a test of Hindi, the Mean is Mean = M = 10+ 1 2 + 7 + 6 + 4 + 1 0 + 5 64 -9.14 7 7
In case of large data, the observations are arranged in a frequency distribution. Consider the following distribution which was used in 14.4.1.
Mid Point
x
X'
fx'
3
8
1
4 10 15
3 2 1 0
10 0
1(
24.5 14.5
8
2
-1
-2
-8
-4
19.5
N = 40
c &'=9
These data are grouped in a frequency distribution. The following formula is used for computing the Mean in such cases: in which A.M. f
X '
N
=
=
Assumed Mean Frequency of the Class Interval Deviation of the score from the Assumed Mean Divided by the length of the class interval Width of tlie class interval Total number of observations
= =
In the column Mid point (x) of the example, we have computed the mid-point of each class interval and taken the assumed mean (A.M.) at the interval which has the maximum frequency. The values for the column (x') have been calculated by taking deviation of each mid point (x) from the assumed mean dividing it by the length of the class interval
14
i.e. 10 in the present case. The values fx' are obtained by multiplying each f with the corresponding x'. Using the formula: Mean =
= A.M.
+ -x
N
Cfx
Generally, Mean is a more useful measure of central tendency than its other measures. Mean is an appropriate statistic when i the observations or measures are distributed symmetrically about the centre of ) the distribution; u the most stable measure of central tendency is desired;
The Median
The Median is a point in an ordered arrangement of observations, above and below which one-half of the observations fall. It is a measure of a position rather than one of magnitude. If the number of observations are ungrouped and small, we arrange them in order of magnitude and identi@the middle observation by counting up half the value of N (the total number of observations). When the number of observations (N) is odd, the mid-observation is the Median. For example, 45 is the media'n of the marks 30,3 1,45,48,52 obtained by 5 students in a test of Mathematics. When the number of observations is even, the Median is the mid-point between two middle observations. For example,
In case of data, grouped in frequency distribution we use the following formula: Median = Mdn = l + ih which (Nl2-F) xi f
1
i
f
= =
Exact lower limit of the class interval upon which the median lies. Width of class interval in which the median falls. Frequency within the class interval upon which the median lies. Sum of all the frequencies below 1. One-half of the total number of observations. 15
F = N = 2
To illustrate the use of this formula, consider the data which was used in 14.4.1 for computing the Mean.
N 40 Here -=-=20;
2
2
i)
The Mode
The mode is defined as the most frequently occurring observation in a distribution. It is located by inspection rather than by computation. If there is only one value which occurs a maximum number of times, then the distribution is said to have one mode or to be unimodal. In some distributions there may be more than one mode. A two mode distribution is bimodal; more than two, multimodal. In a simple ungrouped series of measures of observations, the crude or empirical mode is that single measure which occurs most frequently. For example, in the series 19,20,21,26,28, 28,29 and 3 1, the most often recurring measure, namely, 28, is the crude or empirical mode. In grouped data distributions, the mode is assumed to be the mid score of the interval in which the greatest frequency occurs. For example, in the grouped distribution under 14.4.1 which was used for computing Mean, 34.5 is the Mode. It is the mid score of the interval 29.5-39.5 with maximum frequency of 15.
i)
Check Your Progress Notes :a) Space is given below for your answer. b) Compare your answer with the one given at the end of this unit. 2.
i)
................................................................................................................
ii Compute mean and mode for the following distribution: i
10,12,5,7,8,9,15,5
................................................................................................................ ................................................................................................................
ii Compute median for the following distribution. ii
The range for this distribution will be (90 -50) + 1 = 41 Selection and Application of Range Although the range has the advantage of being easily calculated, it has some limitations. First, the value of range is based on only two extreme scores in the total distribution and thus it does not provide us any information of the variation of many other scores of the distribution. Second, it is not a stable statistics as its value can differ from sample to sample drawn from the same population.
17
The range is used when: i) the data are too scant or too scattered;
ii) only an idea of extreme scores or of total spread is wanted. Average Deviation (AD) The average deviation or AD is the mean of the deviations of all of the separate observations or scores in a series taken from their mean. However, in averaging deviations to find average deviation (AD), the signs (+ and -) are not taken into consideration i.e. all the deviations, whether plus or minus, are treated as positive. The formula for computing average deviation of an ungrouped data is: Average Deviation in which
= AD =
C XI I
X xl I
N
Sum of deviations of scores from their mean disregarding plus and minus signs. Total number of scores.
To illustrate the computation of average deviation, consider the following ungrouped distribution of marks obtained by a group of ten students in a test of spellings.
The deviations of the scores from the mean, neglecting the negative signs, are given in the column 11 x. Using the formula: Average Deviation
=
1
N
1 = -23 2 . 3 =
10
It may be noted that the sum of the deviations of the observations (scores) from their actual mean is zero. In the present case =
CX=O
For a grouped data distribution, the formula for computing average deviation is: Average Deviation (AD) =
18
TIfxl Ad1 N
I
in which
Elfx I
Using the data of 14.4.1, the Average Deviation of the distribution will be: Average Deviation
=
Elf4--- 33 - 0.825 -
Average deviation is used when: i) It is desired to consider all deviations from the mean according to their size;
The quartile deviation or Q is one-half the scale distance between the 75th and 25th percentiles in a frequency distribution. The 25th percentile or Q, is the first quartile on the score scale, the point below which lie 25 per cent of the scores. The 75th percentile or Q, is the third quartile on the score scale, the point below which lie 75 per cent of the score. Mathematically, in which
where
1
i F f
= = =
The exact lower limit of the interval in which the quartile falls The length of the interval Cumulative frequency upto the interval which contains the quartile The frequency on the interval containing the quartile.
f 1 0 5 10 20 12 8 2 3 4 N = 65
F
65 64 64 59 49 29 17 9 7 4
52 - 55 48 - 51 44 - 47 40 - 43 36 - 39 32 - 35 28 - 31 24 - 27 20 - 23 16 - 19 N 65 Here -=-=16.25 4 4
5 1.5 - 55.5 47.5 - 5 1.5 43.5 - 47.5 39.-5 - 43.5 35.5 - 39.5 3 1.5 - 35.5 27.5 - 3 1.5 23.5 - 27.5 19.5 - 23.5 15.5 - 19.5 3N and -=48.75 4
I I
hi) the concentration around the median, the middle 50 per cent of cases, is of primary interest.
The standard deviation (SD) is the most general and stable measure of variability. It is the positive square root of variance. The average of the squared deviations of the measures or scores from their mean is known as variance which is generally denoted as& The standard deviation is also denoted as 0 . The standard deviation for the ungrouped is computed by using the formula:
in which x N
= =
Deviation of the raw score or measure from the Mean. Number of scores or measures.
To illustrate the use of this formula, let us consider the data: 15, 10, 15,20, 8, 10,25,9 20
Now we subtract the mean fiom each of the raw scores in the distribution and proceed as under:
Score (x)
X-Morx
x2
1
1 5 1 0 1 5 20 8 1 0 25 9
-4
1
1 6
1
36 36 1 6 11 2 25
-6
-4
11
-5
=
cx2 252
Using the formula: Standard ~iviation =
8
We can also compute standard deviation directly from the raw scores without using the deviation with the help of the formula: Standard Deviation = .
0
= JNC x 2 -
(Xx)'
in which
x
=
=
x2
1 5 10 1 5 20
8
1 0 25 9
100 625
8 1
E =I12 x
E x 2 =I820
21
Standard Deviation =
0=
The formula for computing standard deviation for the grouped data is:
in which
i
N
x
=
=
length of the class interval total number of measures or scores deviation of the raw score from the assumed mean divided by >he length of the class interval.
To illustrate the use of this formula let us consider the data used for calculating S.D.
Scores
f
1 0 5 10 20 12 8 2 3 4 N = 65
x'
4 3 2
1
fx'
4 0 10 10 0
-1 2 -1 6
fx2
16
0 20 10 0 12 32 18 48
52 - 55 48 - 51 44 - 47 40 - 43 36 - 39 32 - 35 28 - 3 1 24 27 20 - 23 16 - 19
AM^
33.5 29.5 25.5 21.5 17.5
0
-1
-2 -3 -4 -5
I 00
cfk2=256
S.D.=o=N
= 7.51
Selection and Application of Standard Deviation (a) Standard deviation is used when:
i)
the statistics having greatest stability is sought; extreme deviations exercise a proportionally greater effect upon the variability;
23
1.
The curve is symmetrical around its vertical axis. It implies that the size, shape and slope of the curve on one side of the curve is identical to that on the other side of the curve. The values of mean, mode and median computed for a distribution following this curve always coincide and have the same value. The height of the vertical line called ordinate is maximum at the mean and in the unit normal curve is equal to 0.3989.
2.
3.
4. The curve has no boundaries in either direction, for the curve never touches the base line, no matter how far it is extended i.e. the curve is asymptotic and extends from - m (minus infinity) to + m (plus infinity).
5.
The data 'cluster' around the mean. The percentages in a given standard deviation are greatest around the Mean and decrease as one moves away from the Mean.
Fig. 14.6
Mean* 1 . 0 0 ~
.
ii) iii)
This implies that 68.26% of the total area of the curve falls between the limits Mean + l o and Mean - 1 ; 95.44 per cent of the total area of the curve falls between limits Mean + 2 o and Mean - 2 o and 99.73 per cent of the total area of the curve fa1Is between Mean + 3 o and Mean - 3 o . . The equation of the normal curve is
x y N
= =
scores, expressed as deviations from the mean height of the curve above the horizontal axis i.e. the frequency of a given x value number of cases in the sample 3.1416 2.7183
=
=
n
e
When N and o are known, it is possible from the equation to compute: (i) the frequency (y) of a given value x; and (ii) the number or percentage between two .points, or above or below a given value point in the distribution. However, these calculations are not needed as Normal Probability Tables are available from which these values can be readily obtained. Applications of the Normal Curve There are a number of applications of the normal curve in the field of educational I . When the measures or scores are normally or nearly normally distributed, the Normal Probability Table is useful: (i) to find the percentage of cases, or the number when N is known, that fall between the mean and a given 0 distance from the mean; (ii) the percentage of the total area included between the mean and a given 0 distance from the mean.
2. The normal curve is used to convert a raw score into standard score. Suppose the student A gets a score of 50 in an achievement test in mathematics which are administered to random sample of 300 students. The mean and standard deviation of the test scores, assumed to be normally distributed, for the sample are 60 and 6 respectively. The standard score or sigma score of the student A
Z=
X-M ------ 50-60 - -10 --1.67 o 6 6 indicating that the score of 50 is 1.67 standard deviation below the Mean.
3. Normal curve is useful in calculating the percentile rank of scores in a normal distribution. 4. For normalising a given frequency distribution, the normal curve is used. Check Your Progress Notes :a) Space is given below for your answer. b) Compare your answer with the one given at the end of this unit. 4.
i)
................................................................................................................
ii)
State two characteristics of a normal probability curve.
................................................................................................................ ................................................................................................................
25
T Score
In describing a score in a distribution, its deviation from the mean is more meaningful than the score itself. The unit of measurement is the standard deviation. Mathematically, it is expressed as;
z=-X - X
in which
0
or
X -
x
x o
x
=
=
raw score mean standard deviation (x - x) score deviation from the mean.
= =
x = 77 x = 84 a = 4
x = 64 x = 58
a =5
Z =
- -= -1.75 =
77-84 4
-7 4
The raw score of 77 in Test 1. may be expressed as a sigma score of -1.75, indicating that 77 is 1.75 standard deviation below the mean. The raw score of 64 in Test 2 may be expressed as a sigma score of + 1.20, indicating that 64 is 1.20 standard deviation above the mean. On sigma scale, the mean of any distribution is converted to zero and the standard deviation is equal to 1. Thus a sigma score is useful in making realistic comparison of scores and thus provides a basis for equal weighting of the scores. Suppose a teacher wants to determine a students' equally weighted average (mean) achievement on a physics test and on a chemistry test on the basis of the following data:
a
Subject
Test Score
Mean
Standard Deviation
Physics Chemistry 26
50 65
60
92
80 90
6 15
It is evident that the mean of the two raw scores obtained by the student in Physics and Chemistry tests would not provide a valid picture of the student's performance because the mean wouId be weighted more in favour of the chemistry test score. The conversion of each test score to a sigma score makes them equally weighted and comparabIe as both the raw test scores have been expressed on a scale with a mean of zero and standard deviation of one.
-10 --=-1.67 - 6 65-92 -27 Chemistry Z Score = - -= - 1.80 15 15 The teacher, on an equally weighted basis, may conclude that the performance of the student was fairly consistent, i.e., 1.67 standard deviation below the mean in physics and 1.80 standard deviation below the mean in chemistry. Physics Z Score =
50-60
In the application of normal probability curve, it was explained that the curve describes the percentage of area lying between the mean and successive deviation units. The sigma or Z score can be used in hypothesis testing and determination of percentile
The value sigma (Z) score in most of the cases is expressed as scores with decimals or negatives. In order to avoid this type of situation, another version of a standard score, the T score, has been devised. It is expressed as: T=50+10X-X
or 50+10Z
Using the scores in the previous example, we can compute the physics and chemistry T scores as: Physics T
=
50 + 10 (- 1.67) = 33.30
Chemistry T = 50 + 10 (- 1.80) = 32.00 T scores are always rounded to the nearest whole number. Thus the T scores of the student in Physics and Chemistry are 33 and 32 respectively. Percentile Rank The percentile rank is the point below which a given percentage of scores fall. For example, if the 80th percentile is a score of 60, 80 per cent of the scores fall below 60. The median is the 50th percentile rank, for 50 per cent of the scores fall below it. The formula for converting a given rank into percentile ranks is: Percentile rank = 100 in which RK
= rank
For example, a student A ranks 6th in a class of 150 students. This means 5 students rank above the student A, 144 below him. The percentile rank of the student A is:
100 - (3.67) 27
Check Your Progress Notes : a) Space is given below for your answer. b) Compare your answer with the one given at the end of this unit.
5.
i)
................................................................................................................
................................................................................................................
ii)
Shyam ranks twenty seventh in his class of 139 students. Find his percentile rank.
................................................................................................................
iii) Find the sigma score of a student whose raw score in a test of Hindi is 76. The mean of the Hindi scores of the students in his class is 82 and standard deviation 4. What will be his T score.
................................................................................................................ ................................................................................................................
14.5.5 Measures of Relationship
We have so far discussed the statistical description of a single variable. Now we shall study the problem of describing the degree of simultaneous variation of two variables. The data in which we obtain measures of two variables for each individual is called a bivariate data. For example, we get bivariate data if we have scores of the tests of Mathematics and Hindi for a group of school children. The essential feature of the bivariate data is that one measure can be paired with another measure for each member of the group. When we study bivariate data we may like to know the degree of relationship between variables of such data. This degree of relationship between variables is known as 'correlation' which is represented by the co-efficient of correlation. This co-efficient may be identified by either the r, the Greek symbol rho (r), or other symbols depending upon the data distributions and the way the co-efficient has been calculated. Students who obtain high scores on an intelligence test tend to get high scores in an achievement test in mathematics; whereas those with low scores on an intelligence test tend to score low in mathematics. When this type of relationship is obtained the two variables are said to be positively related. When the increase in one variable is associated with decrease in the other variable, the variables are said to be negatively related. When the relationship between two sets of variables is a pure chance relationship, we say there is no correlation. The intensity or degree of linear relationship is represented quantitatively by coefficient of correlation. Its value ranges from -1.00 to +1.00. A value of -1.00 O indicates a perfect relationship between the two variables and +I . O describes perfect positive relationship. A zero value indicates complete lack of correlation between the two variables. The sign (- or +) indicates the direction of the relationship and the numerical value its magnitude or strength.
28
Methods of Correlation
There are various methods of correlation. Their use is relative to the situation and type of data. We may have data in scores. There are many situations in which the researcher does not have scores and has to deal with data in which differences in a given attribute or characteristic can be expressed only by ranks or classifying an individual into one of several descriptive categories. In this unit we will discuss two methods of correlation, namely, product moment correlation and rank order correlation.
Product Moment Correlation (r)
In some situations the data for two variables x and y are expressed in interval or ratio level of measurement and the distribution of these variables have a linear relationship. Moreover, the distributions of the variables are uni-modal and their variances are approximately equal. In such situations we make use of product moment method of correlation. It is also called Pearson's r.
rxy =
XS)[.C
$1
x
y
= =
deviation of x measures from the assumed mean deviation of y measures from the assumed mean
To illustrate the use of this formula, let us compute product-moment r form the following data in the two variables X and Y for 8 students.
X:
Y:
15,18,22,17,19,20,16,21 40,42,50,45,43,46,41,41
C x,C
29
rxy =
To illustrate the use of this formula, let us consider the data in the following scattergram:
r=
)]
0.54
The computation for the values of Cfx, Cfx2, Cfy, Cfy2, and Cfxy, is done using the following steps:
The distribution of language test scores for 85 students is found in the f column at the right of the scattergram. Assume a mean for the distribution of scores for a language test (the mid point of that interval which contains the largest frequency), and draw bold lines to mark off the row in which the assumed mean falls. Here the mean for the language test scores has been taken at 27 (mid-point of interval 26-28) and y's (deviations from the assumed mean) have been taken from this point. Fill in the values of fy and fy2 columns.
The distribution of the mathematics test scores of 85 students is found in the f row at the bottom of the scattergram. Assume a mean for this distribution, and
The fxy for a cell is computed by multiplying the frequency given in the particular cell with the corresponding x and y.
Rank Order Correlation @)
Rank order correlation is also known as the Spearman rank order co-efficient of correlation and is denoted by rho (p). This method is used when the data are available only in ordinal form of measurement (ranked) rather than in interval or ratio form, or if the number of observations or measures is small. For computing tKe rank order correlation (r), the following formula is used:
D
N
= =
31
To illustrate the use of this formula, let us consider the following data:
X:
15,18,22,17,19,20,16,14,17,22
41,40,42,50,45,38,46,41,40,39
Y:
First, let us convert the data to ranks by assigning rank I to highest measure and so on. If a measure is repeated, the ranks are averaged. For example, in the variable X, 22 is the highest measure and is repeated. The ranks assigned are 1 and 2. The average is -- 3= 1.5. The next measure is given the rank 3 and so on. 2 2
'+'
p=l-
6 x D~
N (N' -1)
=I-
(6)(220.50) lO(100 - 1)
1323 10 x 99
=I--
32
Check Your Progress Notes : a) Space is given below for your answer.
b) Compare your answer with the one given at the end of this unit. 6.
i)
................................................................................................................ ................................................................................................................
................................................................................................................
ii) Compute product moment correlation for the following data: X: Y:
50,54,56,59,60,62,61,65,67,71,71,74
22,25, 34,28,26,30,32, 30,28,34,36,40
................................................................................................................
iii) Compute rank difference correlation for the data given under (ii), Compare the result with the one obtained using product moment correlation.
................................................................................................................
In this unit, we listed the tools and techniques which generate data of a quantitative nature. We also discussed that quantitative data are either parametric or non-parametric. The methods of presenting the data graphically using histogram, frequency polygon, cumulative frequency curve and Ogive were also explained. The four measures of descriptive statistics, viz., measures of central tendency, variability, relative position and relationship were discussed. The use, application and computation of mean, median and mode as the measures of central tendency were explained with the help of examples. The measures of variability, namely, range, average deviation, quartile deviation and standard deviation were discussed. Their application, use and computation were explained with the help of examples. The characteristics of normal probability curve alongwith its use in the educational research were discussed.
2. 3.
4.
5.
6.
7. The raw test score, taken by itself, has no meaning. It is meaningful only by comparison with some reference group or groups. For which we make use of sigma score (Z), T score and percentile ranks. The method for computing sigma score, T score and percentile ranks have been explained in this unit.
8. The direction and magnitude of a relationship between the two variables is measured with the help of co-efficient of correlation. The product moment correlation co-efficient is computed when the variables (X and Y) are expressed in interval or ratio scales of measurement and the distribution of the variables have a linear relationship. In case of ordinal (ranked) data we compute rank difference correlation. The procedure for computing these co-efficients has been discussed in this unit.
33
List the situations in which we obtain data of a quantitative nature. What are the types of quantitative data? illustrate with the help of examples. Compute Median and Standard Deviation for the following data::
Scores
20-21 2 18-19 2 16-17 4 14-15 0 12-13 4 10-11 8-9 4
f
4.
f
1 2 4 5 8 10 6 4 4 2 3 1
5.
Compute Mode, Median and Standard Deviation for the following scores:
15, 20, 17, 18, 10, 11, 12, 17, 15, 9
6.
Compute Product Moment Correlation (r) and Rank Difference Correlation (p) between the following sets of scores:
Discuss the uses of various measures of (i) central tendency and (ii) variability. What are the characteristics and uses of a normal distribution curve? What are the uses of sigma score, T score and percentile rank?
ii) The data can be represented graphically using Histogram, Frequency Polygon, Frequency Cumulative Curve, and Ogive. 2.
i)
i)
Range is used: (a) when the data are too scant or too scattered; and (b) a knowledge of extreme scores or of total spread is wanted. We use average deviation when: a) it is desired to consider all deviations from the mean according to their size; and (b) extreme deviations would influence the standard deviation unduly.
ii) Quartile deviation is used: (a) when the mean is the measure of central tendency; and (b) when the data are scattered or there are extreme score which would influence the standard deviation.
ii) a) The normal curve is bell shaped and symmetrical around its vertical axis called ordinate; and b) the curve is highest at the mean -the the same value. mean, median, and mode have in which x is the mean and
X-X - X orCT
CT