Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
114 views

Modul 06 PDF

Uploaded by

L Tere
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views

Modul 06 PDF

Uploaded by

L Tere
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Biostatistics I: Basic for Public Health

Lecture No.: KUI 6111


Starting Date: 01/10/2019

Hypothesis Testing:
Categorical Data Analysis
Module: 6

Copyright © 2019, S.A. Wilopo,


Department of Biostataistics, Epidemiology, and Population Health
Faculty of Medicine, Public Health, and Nursing, Gadjah Mada University, Yogyakarta, Indonesia
Contents
1 Learning Objectives 1

2 Activities 1

3 Class Exercise 1
3.1 Categorical Data I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3.2 Categorical Data II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Categorical Data III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Homework 7
4.1 Critical Appraisal for Categorical Data . . . . . . . . . . . . . . . . . . . . . 7
4.2 Analyzing Categorical Data using STATA . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Agreement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2 Categorical Analysis for PMA2020 Data . . . . . . . . . . . . . . . . 11

5 References 12
5.1 Articles for Critical Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Required Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6 Output 13

7 Log sheet 14

i
1 Learning Objectives
Upon completion of the course unit, students should be able to:
1. describe methods commonly used to analyze categorical data (nominal or ordinal)

2. perform hypothesis testing for categorical data analysis

3. analyzing, reading, and interpreting published research and/or research data using
categorical analysis

2 Activities
1. Discussion : Categorical data analysis

2. Laboratory Session:

(a) Performing hypothesis testing categorical data using Z test, χ2 , McNemar’s test,
(b) Calculating, interpreting and statistical test on odd ratio and relative risk esti-
mates
(c) Performing hypothesis testing categorical data using stratification such as: Man-
tel and Haenszel’s test and log-rank test
(d) Calculating, interpreting and statistical test on Kappa Statistics

3. Homeworks:

(a) Analyzing, reading, and interpreting published research/or research data which
used categorical analysis

3 Class Exercise
3.1 Categorical Data I
1. A recent study investigated the relationship between cigarette smoking and subse-
quent mortality in men with prior history of coronary disease. It was found that 264
out of 1731 non-smokers and 208 out of 1058 smokers had died in the 5-year period
after the study began. Assume that the age distributions of the two groups are com-
parable, what is an appropriate statistical procedure to compare the mortality rates
in the two groups?

1
2. A hypothesis has been suggested that the principal benefit of physical activity is to
prevent sudden death from heart attack. The following study was designed to test
this hypothesis. One hundred (100) men who died from a first heart attack and 100
men who survived a first heart attack in the 50-59 age group were identified and their
wives were each given a detailed questionnaire concerning their husband’s physical
activity in the year preceding their heart attacks. The men were then classified as
active or inactive. Suppose 30 of the men who survived and 10 of the 100 who died
were classified as physically active. What is the general design? What is the appropri-
ate statistical procedure for assessing the relationship between physical activity and
sudden death from a heart attack?

3. Joseph Lister, a British physician of the late 19th century, decided that something
had to be done about the high death rate from post-operative complications, which
were mostly due to infection. Based on work of Louis Pasteur, he thought that the
infections had an organic cause, and decided to experiment with carbolic acid as a
disinfectant for the operating room. Lister performed 75 amputations over a period
of years. Forty of the amputations were done with carbolic acid and 35 were done
without carbolic acid. For those done with carbolic acid, 34 of the patients lived;
for those done without carbolic acid, 19 patients lived. Arrange these data in an
appropriate table.

(a) What is the study design?


(b) Determine if there is a significant association between mortality rate and the use
of carbolic acid.
(c) What is the odds ratio of living if carbolic acid was used?
(d) Is it reasonable to compute the relative risk of living if carbolic acid was used?
If so, what is the relative risk?

4. You are wondering and asking question: Does aspirin really help prevent heart at-
tacks? In searching the answer, you are looking for publication in this issue. You
found the final report on the aspirin component of the Physician’s Health Study
(NEJM (1989) 231(3): 129-135). During the 1980’s, approximately 22,000 physi-
cians over the age of 40 agreed to participate in a long-term health study for which
one important question was to determine whether aspirin helps to lower the rate of
heart attacks (myocardial infarctions). The treatments of this part of the study were
aspirin or placebo, and the physicians were randomly assigned to one treatment or

2
Table 1: Study of Aspirin and Heart Attack

Treatment Assignment Heart Attack


Yes No Total
Aspirin 139 10782 10921
Placebo 239 10668 10907
Total 378 21450 21828

the other as they entered the study. After the assignment, neither the participating
physicians nor the medical personnel who treated them knew who was taking aspirin
and who was taking placebo (double-blinded study). The physicians were observed
carefully for an extended period of time and all heart attacks as well as other prob-
lems that occurred were recorded. Other than aspirin, there are many variables that
could affect the rate of heart attacks for the two groups of physicians. For example,
the amount of exercise they get and whether they smoke are two prime examples of
factors that could be controlled in the study so that the true effect of aspirin can be
measured.

a) What is the study design?


b) The results of this study reported that 139 heart attacks were reported among
the aspirin users and 239 were reported for the placebo group. What are the
appropriate conditional proportions to study if we want to compare the rates
of heart attacks for the two treatment groups? Is it reasonable to compute the
relative risk for heart attacks for aspirin users to non-aspirin users? How about
the odds ratio? The difference in the two rates of heart attacks? Explain.
c) Is there a significant association between heart attacks and aspirin use?
d) Heart attacks are not the only cause for concern in the Physician’s Health Study.
Another is that too much aspirin can cause an increased risk of stroke. Among
the aspirin users in the study, 119 had strokes during the observation period.
Within the placebo group, only 98 participants had strokes. Place these data in
appropriate two-way table and comment upon the association between aspirin
use and strokes.
e) Is there a significant association between strokes and aspirin use?

3
3.2 Categorical Data II
A 1980 study investigated the relationship between the use of oral contraceptives and the
development of endometrial cancer. It was found that of 117 endometrial cancer patients,
6 had used oral contraceptives at some time in their lives while of the 395 controls, 8 had
used oral contraceptives.

a. What is the study design?

b. State the null and alternative hypothesis for tests for an association between endome-
trial cancer and oral contraceptive use.

c. Put the data in a two way table, set up the calculations for the appropriate test
statistic.

d. What are the degrees of freedom of your test statistic?

e. Is there a significant association?

3.3 Categorical Data III


1. Improvement in control of blood glucose levels is an important motivation for the
use of insulin pumps for diabetic patients. However, certain side effects have been re-
ported with pump therapy. Table 2 below provides data on the occurrence of diabetic
ketoacidosis (DKA) in patients before and after the onset of pump therapy.

Table 2: Study of Insulin Pump and Dia-


betic Ketoacidosis

After Before Pump Therapy


Pump Therapy No DKA DKA
No DKA 237 7
DKA 19 7

4
What is the appropriate statistical procedure to test if the rate of DKA is different before
and after the onset of pump therapy?

2. A study of methods for detecting cytomegalovirus (CMV) reported the results of dif-
ferent assays on 195 broncho-alveolar lavage (BAL) specimens from bone-marrow-
transplant recipients with pneumonia. Table 3 below shows the distribution of the
BAL specimens according to cytological examination result and immunofluorescence
(IF) assay results. Do the data provide evidence that the population proportion of BAL
specimens with positive cytological examination results differs from the population
proportion of BAL specimens with positive IF assay results?

Table 3: Study of Cytological Exam


and Assay

Cytological IF Assay Results


Exam Result Positive Negative
Positive 165 7
Negative 16 7

3. A study was carried out to see if patients whose skin did not respond to dinitrochloroben-
zene (DNCB), a contact allergen, would show an equally negative response to coton
oil, a skin irritant. The following table 4 shows the results of simultaneous skin
reaction tests to DNCB and coton oil in 173 patients with skin cancer.

a) State the null and alternative hypotheses.


b) Using information in the table, set up the calculations for the appropriate test.

4. Suppose we are interested in comparing the effectiveness of two different antibiotics,


A and B, in treating gonorrhea. Each person receiving antibiotic A is matched with
an equivalent person (age within 5 years, same sex), to whom antibiotic B is given.
These patients are asked to return to the clinic within 1 week to see if the gonorrhea
has been eliminated. The results are as follows:

5
Table 4: Study dinitrochlorobenzene (DNCB) and
Cotton Oil

Cotton dinitrochlorobenzene (DNCB)


Oil Positive Negative Total
Positive 81 48 129
Negative 23 21 44
Total 104 69 173

— For 40 pairs of patients, both antibiotics are successful.


— For 20 pairs, antibiotic A is effective whereas antibiotic B is not.
— For 16 pairs, antibiotic B is effective, whereas Antibiotic A is not.
— For 3 pairs, neither antibiotic is effective.

Questions:

a) Set up the appropriate table.


b) State the null and alternative hypotheses.
c) Using numbers from the table, set up the calculations for the appropriate test
statistic.

5. Suppose we have 500 pairs of pregnant women who participate in a study of prema-
ture births and are paired in such a way that the body weight of the women in a pair
are within 5 lbs. of each other (i.e., matching on body weight). We then give one
of the two women a placebo and the other drug A. Wish to test whether drug A has
an effect in preventing premature births. Suppose that in 30 pairs of women, both
women in a pair have a premature child; in 420 pairs of women, both women have
a normal child; in 35 pairs of women, the woman taking drug A has a normal child
and the woman taking the placebo has a premature child; in 15 pairs of women, the
woman taking drug A has a premature child and the woman taking the placebo has a

6
normal child. Display the numbers in a table and indicate what statistical procedure
would be appropriate.

6. A study is designed to compare the effectiveness of a commonly used drug therapy to


an experimental treatment involving mental imaging and diet therapy. For this pur-
pose patients were carefully matched on age, sex and clinical condition. One patient
in each pair was given the drug therapy while the other received the competing reg-
imen. The treatment results based on 120 matched pairs show that for 89 matched
pairs both therapies are ineffective; for 14 pairs both therapies are effective; for 5
matched pairs the drug therapy is effective, whereas the imaging and diet therapy
was ineffective; for 12 matched pairs the imaging and diet were effective, whereas
the drug therapy was ineffective.

a) How many pairs in this study would be considered concordant?


b) How many pairs would be considered discordant?
c) Perform a significance test to assess any differences in effectiveness between the
two therapies.
d) Compute the odds ratio.

4 Homework
4.1 Critical Appraisal for Categorical Data
You are expected to read one of these 3 articles according to your concentration (minat).
Please answer questions related to article selected.

A. Please read Interrante’s et. al. (2017) article which presents risk management out-
comes at 12 months post-counseling for usual care and vs Telephone counseling. You
should read table 1 and conduct appropriate test for categorical data on this table.
The goal is to make sure that two groups are equal.

Questions

a. What is your hypotheses?

7
b. Do you thing there is similarity at the baseline of intervention? For example are
BRCA1/2 test results similar between two groups? Conduct statistical test for
your judgment.
c. Table 4 shows risk management outcomes at 12 months post-counseling for
usual care and vs Telephone counseling. All numbers are presented in single
statistic (percentage only) and tested using chi-square. For example, the risk-
reducing to surgery is statistically significant. However, there is no confidence
intervals presented. Can you create table which shows 95 percent confidence
intervals for all outcomes presented in table 3?
d. Can you explain those confidences intervals?

B. In the Lee’s et. al. (2009) paper, all respondents were asked whether they or someone
in their household received cash assistance through temporary assistance for needy
families (TANF) at baseline. TANF receipt was used as a proxy for abject poverty.

Questions

a. At the baseline of the study, do you think characteristics and risk indicators of
the prenatal subsample is similar between two groups (control vs intervention)?
What is your hypotheses?
b. Are you concern regarding this baseline condition? Why?
c. Table 1 shows characteristics and risk indicators of the prenatal subsample,
Healthy Families New York (HFNY) RCT. All numbers are presented in single
statistic (percentage only) and tested using chi-square. For example, receiving
TANF for control group is lower compared to HFNY group; and it is statistically
significant. There is no confidence interval presented. Can you create table
which shows 95% confidence intervals for all variables presented in table 1?
d. How are you going to interpret and summary those confidences intervals?

C. Ugaz’s et. al. (2016) paper claimed that wealthier women are more likely than
poorer women to use long-acting and permanent methods of contraception instead

8
of short-acting methods.

Questions

a. From table 2, can you extract the Indonesian data and than present the 95%
confidence intervals for each type of contraceptive method used?
b. From table 4, can you present 95% confidence intervals for each type of contra-
ceptive method used according to wealth index (Q1 to Q5)?
c. What is your hypotheses?
d. How are you going to interpret and summary those confidences intervals for
Indonesia?

4.2 Analyzing Categorical Data using STATA


4.2.1 Agreement Analysis

Suppose that each of a sample of n subjects is rated independently by the same two clini-
cians, with the ratings being on a categorical scale consisting of 3 categories.

Table 5: Diagnoses on n=100 patients by Clinician A and B


Clinician A Clinician B
Psychotic Neurotic Organic Total
Psychotic 0.75 0.01 0.04 0.80
Neurotic 0.05 0.04 0.01 0.10
Organic 0 0 0.10 0.10
Total 0.80 0.05 0.15 1.00

Examine table 5 in which each cell entry is the proportion of all subjects classified into one
of 3 diagnostic categories by clinician A and into another by clinician B. Thus, for example,
5% of all subjects were diagnosed neurotic by clinician A and psychotic by clinician B.
Please analyze this table regarding clinicians’ agreement in making the diagnosis of those
100 patients.
Questions

9
1. How are you going to test the hypothesis that there is an association between Clini-
cian A and B?

2. Since some of cells are empties, what would be a statistical test preferred?

3. What is the agreement between two clinicians?

Please note that for intermediate values, Landis and Koch (1977a, 165) suggest the
following interpretations:

1. below 0.00: Poor

2. 0.00 –0.20: Slight

3. 0.21 –0.40: Fair

4. 0.41 –0.60: Moderate

5. 0.61 –0.80: Substantial

6. 0.81 –1.00:Almost perfect

Hints calculation using STATA

1. Use command tabi

2. Create the following data set


Row Col Pop
1. 1 1 75
2. 1 2 1
3. 1 3 4
4. 2 1 5
5. 2 2 4

3. Execute command: kap row col [freq=pop], tab

10
4.2.2 Categorical Analysis for PMA2020 Data

We used lab_12.dta (laboratory 6 stata version 12) extracted from PMA2020 round one
(2015) which has 7 variables. You can read the code book by typing following command:

describe
or
codebook

Now suppose we wish to compare the prevalence of unmet total according to age groups
of women. The two categorical variables can be cross-tabulated and the appropriate chi-
squared statistic calculated using a single STATA command:

tabulate age_cat unmettot, row chi2

Here the row option was used to display row-percentages, making it easier to compare the
age groups of women. Note that this test does not take account of the ordinal nature of
depression and is therefore likely to be less sensitive than, for example, ordinal regression.

The col option can be used to display column-percentages.

tabulate age_cat unmettot, col chi2

Questions:

1. Please present your analysis as in the tabular format and summarize the results of
comparing total unmet need according to wealth index.

2. Please test the null hypothesis that the proportion of women who have unmet need
differ between the rich women and poor women. You can obtain the relevant table
and both the χ2 .

3. Compute 95% confidence interval for each wealth category by issuing command as
follows:

ci proportions unmettot if wealth==1

11
for wealth category is equal 1 (poor).

Other categories can be calculated as an example above.

4. Please construct tables which shows 95% confidence interval for each wealth cate-
gories and total unmet. Interpret the results!

5. We would like to examine association between unmet need and residence (urban
rural). The hypothesis is that urban women will have a lower unmet need compare
to rural women. However, there is unequal distribution of variable wealth index.
Can you estimate odds becoming unmet need among urban and rural women and
considering the wealth distribution?

5 References
5.1 Articles for Critical Appraisal
1. Interrante, M. K., Segal, H., Peshkin, B. N., Valdimarsdottir, H. B., Nusbaum, R., Sim-
iluk, M., . . . Schwartz, M. D. (2017). Randomized Noninferiority Trial of Telephone
vs In-Person Genetic Counseling for Hereditary Breast and Ovarian Cancer: A 12-
Month Follow-Up. JNCI Cancer Spectrum, 11, pkx002-pkx002. urldoi:10.1093/jncics/pkx002

2. Lee, E., Mitchell-Herzfeld, S. D., Lowenfels, A. A., Greene, R., Dorabawila, V., & Du-
Mont, K. A. (2009). Reducing Low Birth Weight Through Home Visitation. American
Journal of Preventive Medicine, 36 2, 154-160.
doi: http://dx.doi.org/10.1016/j.amepre.2008.09.029

3. Ugaz, J. I., Chatterji, M., Gribble, J. N., & Banke, K. (2016). Is Household Wealth
Associated With Use of Long-Acting Reversible and Permanent Methods of Contra-
ception? A Multi-Country Analysis. Global Health: Science and Practice, 4, 1: 43-54.
urldoi:10.9745/ghsp-d-15-00234

12
5.2 Required Reading
1. Bewick V, Cheek L, Ball J. Statistics review 8: Qualitative data Ű tests of association
J. Critical Care 2004, 8(1):46-53. This article is online at http://ccforum.com/
content/8/1/46

2. Rosner, B (2016). Hypothesis Testing: Categorical Data. Chapter 10. Exercise of


Fundamentals of Biostatistics, 8th ed. Belmont, CA: Duxbury Press, pp: 372-456

5.3 Suggested Reading


Fleiss JL, Levin B, Paik MC (2003). Third Edition, Chapter 18. The Measurement
of Interrater Agreement,in Statistical Methods for Rates and Proportions. John Wiley
and Sons, Inc; pp: 598-625.

6 Output
Competence to calculate, interpret, and apply on categorical data analysis:

– Hypothesis testing/inference for categorical data

– Hypothesis Testing for risk ratio and odds ratio

– Calculate χ2 and Fischer’s Exact Statistics

– Calculate Mantel and Haenszel’s test and log-rank test

– Calculate Kappa Statistics

13
7 Log sheet

No Activities Date Signature Comment


1. Group Discussion on categorical
data analysis

2. Laboratory session: hypothesis test-


ing categorical data using z-test, χ2
test, Fischer’s Exact Test, McNemar
test and stratification analysis using
Mantel and Haenszel statistical test
3. Laboratory session: Relative Risk
and odds ratio estimation
5. Homework for Critical appraisal
and computation for categorical data
analysis i.e.: χ2 test, Fischer’s Ex-
act test, McNemar test, and Kappa
Statistics

Score : ____________________Instructor Signature:____________________

14

You might also like