Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views18 pages

Chapter Four

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 18

Introduction

 The chi-square distribution can be used to test the independence of two


variables,
 For example, “Are senators’ opinions on gun control independent of party
affiliations?”
 That is, do the Republicans feel one way and the Democrats feel differently, or
do they have the same opinion?
 We will show how to test the hypothesis that two categorical variables are
independent.
 The test statistics discussed have sampling distributions that are approximated
by chi-square distributions.
 The methods involve the comparison of a set of observed frequencies with
frequencies specified by some hypothesis to be tested.
Test of association
Tests Using Contingency Tables

 When data can be tabulated in table form in terms of


frequencies, several types of hypotheses can be tested
by using the chi-square test.
 Two such tests are the independence of variables test
and the homogeneity of proportions test.
 The test of independence of variables is used to
determine whether two variables are independent or
related to each other when a single sample is selected.
 The test of homogeneity of proportions is used to determine
whether the proportions for a variable are equal when several
samples are selected from different populations.
 Both tests use the chi-square distribution and a contingency
table, and the test value is found in the same way.
 The statistical hypothesis to test independence is given by

 If one variable has r categories and the other c categories,


then the table of frequencies can be formed to have r rows
and c columns and is called an r × c contingency table.
 In the body of the table, the nij represent the number of
observations having the characteristics described by the
ith category of the row variable and the jth category of
the column variable.
 This frequency is referred to as the ijth “cell” frequency.
The Ri and Cj are the total or marginal frequencies of
occurrences of the row and column categories,
respectively.
 The total number of observations n is the last entry.
 Representation of a Contingency Table
The formula for the test value for the independence is given as

If the null hypothesis of independence is true, then each cell


probability is a product of its marginal probabilities. That is,
 We reject Ho if at a specified level of significance.
Example
A researcher wishes to see if there is a relationship
between the hospital and the number of patient infections.
A sample of 3 hospitals was selected, and the number of
infections for a specific year has been reported. The data
are shown next.
Step 4 Make the decision. The decision is to reject the null hypothesis since
30.518 > 9.488.
Step 5 Summarize the results. There is enough evidence to support the claim
that the number of infections is related to the hospital where they occurred .
Chi-Square Goodness-of-Fit Test
 We can extend the binomial sampling scheme of Chapter
4 to situations in which each trial results in one of k
possible outcomes (k-2).
 For example, a random sample of registered voters is

classified according to political party (Republican,


Democrat, Socialist, Green, Independent, etc.) or
 Patients in a clinical trial are evaluated with respect to the

degree of improvement in their medical condition


(substantially improved, improved, no change, worse).
 This type of experiment or study is called a multinomial

experiment with the characteristics listed here.


 EXAMPLE
A laboratory is comparing a test drug to a standard drug
preparation that is useful in the maintenance of patients
suffering from high blood pressure. Over many clinical trials at
many different locations, the standard therapy was administered
to patients with comparable hypertension (as measured by the
New York Heart Association (NYHA) Classification). The lab
then classified the responses to therapy for this large patient
group into one of four response categories. Table lists the
categories and percentages of patients treated on the standard
preparation who have been classified in each category. The lab
then conducted a clinical trial with a random sample of 200
patients with high blood pressure. All patients were required to
be listed according to the same hypertensive categories of the
NYHA Classification as those studied under
Category Percentage
Marked decrease in blood pressure 50
Moderate decrease in blood pressure 25
Slight decrease in blood pressure 10
Stationary or slight increase in blood pressure 15

the standard preparation. Use the sample data in table to test the
hypothesis that the cell probabilities associated with the test
preparation are identical to those for the standard. Use α= .05.
Category Observed Cell Counts
1 120
2 60
3 10
4 10
Solution
This experiment possesses the characteristics of a multinomial
experiment, with n = 200 and k =4 outcomes. Outcome 1: A
person’s blood pressure will decrease markedly after treatment with
the test drug.
Outcome 2: A person’s blood pressure will decrease moderately
after treatment with the test drug.
Outcome 3: A person’s blood pressure will decrease slightly after
treatment with the test drug.
Outcome 4: A person’s blood pressure will remain stationary or
increase slightly after treatment with the test drug.
 The null and alternative hypotheses are then and
 Ho: Π1=.50, Π2 = .25, Π3 =.10, Π4 =.15
 Ha: At least one of the cell probabilities is different

from the hypothesized value.


Before computing the test statistic, we must determine
the expected cell numbers. These data are given in Table
Observed Cell Expected Cell
Category Number, ni Number, Ei
1 20 200(.50) =100
2 60 200(.25) = 50
3 10 200(.10) = 20
4 10 200(.15) = 30
 Because all the expected cell numbers are relatively large,
we may calculate the chi-square statistic and compare it to a
tabulated value of the chi-square distribution. x2= 24.33
 For the probability of a Type I error set at a .05,
 we look up the value of the chi-square statistic for a .05 and

df 3k - 1 = 3.
 The critical value from Table is 7.815.

R.R.: Reject H0 if x2 > 7.815.


Conclusion: The computed value of x2 is greater than 7.815, so
we reject the null hypothesis and conclude that at least one of the
cell probabilities differs from that specified under H0. Practically,
it appears that a much higher proportion of patients treated with
the test preparation falls into the moderate and marked
improvement categories. The p-value for this test is p = .001.
Measure of strength of Association
Coefficients for Measuring Association

When the chi-square test value is significant and there is a relationship between the
variables, the strength of this relationship can be measured by using these coefficients.

1. Phi: - only used on 2x2 contingency tables. Interpreted as a measure of the relative
(strength) of an association between two variables ranging from 0 to 1.

Phi =

2. Pearson’s contingency coefficient(C)

The coefficient will always be less than 1 and varies according to the number of
rows and Columns.

C=
3. Cramer’s V coefficient (V)

 Useful for comparing multiple test statistics and it generalized across


contingency tables of varying sizes.

 It is not affected by sample size and therefore is very useful in situations


where you suspect a statistically significant chi-square was the result of
large sample size instead of any substantive relationship between the
variables.

 V = where q = smaller number of rows or columns.


Describing strength of Association
Characterizations

>0.5 High association 0.1 to 0.3 low association

0.3 to 0.5 moderate 0 to 0.1 little if any


association association

You might also like