Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

NOMINALLY SCALED DATA and KAPPA STATISTIC K

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Reporter: HASSAN S.

GANDAMRA
SE 413-Advanced Educational Statistics
CHAPTER 9: MEASURES OF ASSOCIATION AND THEIR TESTS OF SIGNIFICANCE

9.8 NOMINALLY SCALED DATA AND THE KAPPA STATISTIC K


The kappa statistic describes one of a number of measures of agreement which have
been proposed for categorical variables.

Category
Object 1 2 … j … m
1 n11 n12 … n1 j … n1 m S1

2 n21 n22 … n2 j … n2 m S2

: : : : :
i ni 1 ni 2 … nij … nℑ Si

: : : : :
N nN 1 nN 2 … n Nj … n Nm SN

C1 C2 … Cj … Cm

,where nij – number of raters assigning the i th object to the j th category

C j – number of times that an object is assigned to the j th category


N
C j=∑ nij
i=1

The kappa coefficient of agreement is the ratio of the proportion of times that the raters
agree (corrected for chance agreement) to the maximum proportion of times that the raters
could agree (corrected for chance agreement):
P ( A )−P (E)
K= Equation 9.27 (p. 285)
1−P(E)

where P( A) – proportion of times that k raters agree

P(E) – proportion of times that we would expect the k raters to agree by chance
K=1 (complete agreement among the raters)
K=0 (no agreement among the raters)
To find P(E) we note that the proportion of objects assigned to the j th category is

Cj
p j= .
Nk
Total expected agreement across all categories:
m
P ( E )=∑ p 2j Equation 9.28 (p. 286)
j=1

Total expected proportions across all objects:


N
1
P ( A )= ∑ S i=¿ ¿ ¿ Equation 9.29 (p. 286)
N i=1

Example 9.8a
It has been observed by researchers of animal behavior that the male stickleback fish changes
color during the nesting and courtship cycle. When placed in a suitable environment, the male
sticklebacks establish territories, build nests, and engage in courtship and aggression when
stimulus fish are introduced into the environment. To analyze the relation between color and
other behaviors during experimental study, it was necessary to code the fish in terms of their
coloration. Since the fish must be observed from outside their environment, and because of
variation in observational conditions, k = 4 trained raters evaluated the coloration of each fish.
The colorations were divided into m = 5 categories. The first category was for those fish with
minimal color development and the last category represented maximal color development and
coloration, the other three categories involved varying degrees of coloration. In this study, a
group of N = 29 fish was observed. The data are summarized in Table 9.15. Note that the raters
were in complete agreement about the coloration of fish 1 and that they were divided in their
ratings of fish 2. Examination of the rows of the table shows that there was complete
agreement for some fish but low agreement about others.
Table 9.15
Estimates of Nuptial Coloration of Male Sticklebacks
Coloration
Compute the value of P(E), the proportion Category
of agreement which we could expect by chance, using
equation 9.28,
Fish
2 2 2 2 2
¿ .362 +.026 +.319 +.069 +.224

¿ .2884

Next we must find P( A), the proportion of times that the raters agreed.

N
1
P ( A )= ∑ S i
N i=1
1+ .333+1+ .333+.50+…+.333+ .167
¿
29
¿ .5804

Now, to find K , we use equation 9.27:

P ( A )−P (E)
K=
1−P(E)
.580−.288
¿
1−.288
K=.41
Thus, we conclude that there is moderate agreement among the raters.

Testing the Significance of K


For large N , K is approximately normally distributed with mean 0 and variance

P ( E )−( 2 k−3 ) [ p ( E ) ] +2(k −2) ∑ p j


2 3
2 Equation 9.30 (P. 289)
var (K )≈ ∙
Nk (k−1) [1−P ( E ) ]2

Therefore, we could use the statistic


K
z= Equation 9.31 (P. 289)
√ var (K )
to test the hypothesis is H 0 : K=0 against the hypothesis H 1 : K >0.

Example 9.8b

H 0 : ( K=0) The researcher conclude that the raters exhibit no significant agreement on their ratings.

H 1 : (K >0) The researcher conclude that the raters exhibit significant agreement on their ratings.
Statistical Test
Recall that N=29 (objects rated), m=5 (rating categories), k =4 (raters), and P ( E )=.288 . Since
K=.41 categories (nominal), then we find the variance of K .

Significance Level
Set α= .01 , and N = 29.

∑ p3j =.3623+ .0263 +.3193 +.0693 +.2243 =.092


Computation of var of K :

P ( E )−( 2 k−3 ) [ p ( E ) ] +2(k −2) ∑ p3j


2
2
var (K )≈ ∙
Nk (k−1) [1−P ( E ) ]2
2
2 .288−( 2 ( 4 )−3 ) [ .288 ] + 2 ( 4−2 ) (.092)
¿ ∙
(29)(4)(4−1) [1−.288 ]2
2 .2413
¿ ∙
348 .5069
var (K )=.002736
Using this value for var (k ), we may find z :
K
z=
√ var (K )
.41
¿
√.002736
z=7.84
z crit (at α=.01) = 2.32 (Appendix Table A , p.319)

Compare: z comp = 7.84 > z crit (at α=.01) = 2.32

Decision: Reject H 0 in favor of H 1

Therefore, the researcher conclude that the raters exhibit significant agreement on their ratings.

You might also like