Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Measures of Association For Tables (8.4) : - Difference of Proportions - The Odds Ratio

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

Sociology 601 (Martin) Lecture 15: November 11-13, 2008

Measures of association for tables (8.4)


Difference of proportions the odds ratio

Measures of association for ordinal data (8.5 8.6).


gamma Kendalls tau-b Statistical inference for ordinal associations

8.4: Measures of Association: Difference of Proportions


The difference of proportions is the proportion scoring yes for variable Y in one category of variable X, minus the proportion scoring yes for variable Y in another category of variable X. Formal definition: For two variables X and Y, with 1,2 as possible values for X and 1,2 as possible values for Y: d. p. = P ((Y = 1) | (X = 1)) - P ((Y = 1) | (X = 2)) alternately, d. p. = x=1|y=1 - x=1|y=2

Example for difference of proportions


Age < 40 40 + total Support for legalized abortion: Yes No Total 490 510 1000 210 390 600 700 900 1600

difference of proportions = yes|40+ - yes|<40 =.35 - .49 = -.14


The sample proportion of people 40+ who support abortion is .14 lower than the proportion of people under 40 who support abortion

A conceptual weakness of a difference of proportions


The difference of proportions gives the same value at about .5 as it does near 0.0 or 1.0, but the social implications of the difference of proportions may be very different. Fictitious example: women as a proportion of all veterinary school students. 1960, p=.01 1965, p=.06, d. of p. = .05 1990, p =.51 1995, p=.56 d. of p. = .05 2020, p =.94 2025, p=.99 d. of p. = .05 which 5-year span reflects the largest underlying social change?

Second type of measures of association: Odds and odds ratios


Two possible categories of variable Y are defined as success and failure Odds = proportion of one response/proportion of the other
The odds of an event are often understood as the probability of the event, but this is not the statistical definition of an odds. The odds tend to take extreme values when the proportion under consideration is near 1.

Example: what are the odds that a veterinary student would be a woman in 1990? 1995? 1960? 1965? 2020? 2025?

Odds ratios
Imagine a contingency table with two categories of X and two categories of Y compare the odds for each category of X by using the odds ratio: = odds that X = 1, given Y = 1 odds that X = 1, given Y = 2 Example: veterinary school enrollment, by sex and year 1990 1995 women men 51 49 112 88 163 137

100

200

300

Another example for an odds ratio


HIV+ prevalence in health surveys of a developing nation. 1994: 28 of 997 are HIV+ 1998: 59 of 1015 are HIV+ Odds for HIV+ in 1994 = 28/(997-28) = .02890 Odds for HIV+ in 1998 = 59/(1015-59) = .06172 The odds of being HIV+ as opposed to HIV- were .0289 in 1994 and .06172 in 1998.
odds ratio: = .06172/.02890 = 2.136

The odds of being HIV+ as opposed to HIV- were 2.14 times higher in 1998 than in 1994.

Additional notes on odds ratios


You can calculate odds ratios using r*c tables larger than 2*2. Pick any 2 pairs of categories and calculate the odds ratio for those pairs.

You must carefully explain what you are comparing: every odds ratio is a comparison of four numbers! Social researchers often report the log (ln) of the odds ratio instead of the odds ratio itself. (Why?)

Why bother with odds ratios?


Why not just do a ratio of proportions?

When you work with categories of outcomes, you get to choose which category goes in the numerator and which goes in the denominator.
Odds ratios are not affected by your choice of numerators ratios of proportions are Example: calculate odds ratios and ratios of proportions for trends in the HIV- population.

8.5. Stepping up to ordinal and interval data


The chi-squared test is an extremely simple test of relationships between categories.
In chi-squared tests, we ask Does the distribution of one variable depend on the categories for the other variable? This sort of question requires only nominal-scaled data

We are often interested in more informative tests of relationships between categories.


In such tests, we ask As we increase the level of one variable, how do we change the level of another? As we shall see today, we can ask this question for both ordinaland interval-scaled data.

A weakness of a chi-squared test.


The problem: Chi-Squared tests are for nominal associations. If we use a chi-squared test when there is an ordinal association, we waste some information. Assign + to cells where fo > fe and - where fo < fe . Chi-Squared tests cannot distinguish the following patterns:
like job? wages no maybe yes low ++ med high ++ ++ like job? wages no maybe yes low ++ med high ++ ++ -

Alternative to a chi-squared test for ordinal data


A solution: find concordant and discordant patterns.
1.) Identify every possible pair of observations. The number of possible pairs far exceeds the number of observations. 2.) A pair of observations is concordant if the subject who is higher on one variable is also higher on the other variable. 3.) A pair of observations is discordant if the subject who is higher on one variable is lower on the other variable. 4.) Many pairs of observations are neither concordant nor discordant. We ignore those pairs.

Finding concordant and discordant patterns.


For all but the smallest samples, the number of concordant and discordant patterns can be very difficult to count, so we usually leave that exercise to a computer program. It is, however, important to understand what the computer is doing. For that reason, we will try an example.

Concordant pairs:
wages
low like job? no maybe 10 1

yes
1

Discordant pairs:

med high

3 3

4 7

5 2

Counting concordant pairs


(no like, low wages) x (maybe like, med wages) (no, low) x (maybe, high) (no, low) x (yes, med) (no, low) x (yes, high) (maybe, low) x (yes, med) (maybe, low) x (yes, high) (no, med) x (maybe, high) (no, med) x (yes, high) (maybe, med) x (yes, high) Total concordant pairs = 10 x 4 = 10 x 7 = 10 x 5 = 10 x 2 =1x5 =1x2 =3x7 =3x2 =4x2 = 40 = 70 = 50 = 20 = 5 = 2 = 21 = 6 = 8 = 222

wages low med high

like job? no maybe 10 1 3 4

yes 1 5 2

Counting discordant pairs


(no like, med wages) x (maybe like, low wages) (no, med) x (yes, low) (no, high) x (maybe, med) (no, high) x (yes, med) (no, high) x (maybe, low) (no, high) x (yes, low) (maybe, high) x (yes, low) (maybe, high) x (yes, med) (maybe, med) x (yes, low) Total discordant pairs =3x1 =3x1 =3x4 =3x5 =3x1 =3x1 =7x1 =7x5 =4x1 = 3 = 3 = 12 = 15 = 3 = 3 = 7 = 35 = 4 = 85

wages low med high

like job? no maybe 10 1 3 4

yes 1 5 2

Measuring ordinal associations with gamma


Gamma (): A measure for concordant and discordant patterns. gamma = (C D) / (C+D), where C = number of concordant pairs. D = number of discordant pairs. For the previous example: = (222 85) / (222 + 85) = 139 / 307 = +.45

Measuring ordinal associations with gamma


Interpreting gamma:
If gamma is between 0 and +1, the ordinal variables are positively associated. If gamma is between 0 and 1, the ordinal variables are negatively associated. The magnitude of gamma indicates the strength of the association. If gamma = 0, the variables may still be statistically dependent because Chi-squared could still be large. However, the categories may not be dependent in an ordinal sequence.

The trouble with gamma


Because gamma varies from -1 to +1 and is a measure of association between two variables, nave statisticians tend to interpret gamma as a correlation coefficient.
(more on correlation coefficients in the next chapter)

The problem is that gamma gives more extreme values than a correlation coefficient, especially if the number of categories is small.
Unscrupulous researchers can increase gamma by collapsing categories together!

Kendalls Tau-b
Kendalls Tau-b is an alternative measure to Gamma.
Like Gamma, Kendalls tau-b can take values from -1 to +1, and the farther from 0, the stronger the association.

STATA calculates a sort-of standard error (Asymptotic Standard Error, or ASE) for tau-b, which you can use for statistical significance tests. z = tau-b / (ASE of tau-b)

Using gamma and tau-b:


Use STATA commands for Chi-squared tests, which give you significance tests for ordinal level data. If the gamma or tau-b test is statistically significant and the chi-squared is not, you have added power to the test by making the assumption of an ordinal relationship. If the chi-squared test is statistically significant and the gamma and tau-b tests are not, you should see a clear departure from an ordinal relationship in the data. (To test this relationship, calculate the conditional distributions of one variable for categories of the other.)

Gamma and tau-b: an example


Party identification and gender example:
Party identification Democrat Indep. Republican Total 279(261.4) 73 (70.65) 225(244.9) 577 165(182.6) 47 (49.35) 191(171.1) 403 444 120 416 980

sex female male total

We can calculate X2 = 7.010 (df = 1, p=.030)

STATA example of gamma and tau-b


Next, use the TABULATE command with options:
. tabulate gender party [freq=number], gamma taub | party gender | democrat independe republica | Total -----------+---------------------------------+---------female | 279 73 225 | 577 male | 165 47 191 | 403 -----------+---------------------------------+---------Total | 444 120 416 | 980 gamma = Kendall's tau-b = 0.1470 0.0796 ASE = 0.056 ASE = 0.031

Statistical inference with gamma and tau-b


A test for ordinal comparisons is similar to an independent samples test for population proportions. Assumptions: random sample, ordinal (or interval) categories, the sampling distribution of differences between groups is normal because the sample size is large: n 5 for every cell. Null hypothesis: there is no ordered relationship between the ordered distributions of categories.

Statistical inference with gamma and tau-b


Test statistic:
gamma =

z = gamma / ASE of gamma.


0.1470 ASE = 0.056

z = .1470/.056 = 2.625 (note: ASE stands for Asymptotic Standard Error) P-value: look up in Table A p = .0044 for a one-tailed test, so p = .0088 for a two tailed test. Conclusion: p < .01, so reject the null hypothesis. Instead, conclude that there is an ordered relationship between sex and political identification.
(If you checked, you would find that p for a gamma test is smaller than p for a Chi-squared test in this case.)

You might also like