Modul 06 PDF
Modul 06 PDF
Hypothesis Testing:
Categorical Data Analysis
Module: 6
2 Activities 1
3 Class Exercise 1
3.1 Categorical Data I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3.2 Categorical Data II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Categorical Data III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Homework 7
4.1 Critical Appraisal for Categorical Data . . . . . . . . . . . . . . . . . . . . . 7
4.2 Analyzing Categorical Data using STATA . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Agreement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2 Categorical Analysis for PMA2020 Data . . . . . . . . . . . . . . . . 11
5 References 12
5.1 Articles for Critical Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Required Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Output 13
7 Log sheet 14
i
1 Learning Objectives
Upon completion of the course unit, students should be able to:
1. describe methods commonly used to analyze categorical data (nominal or ordinal)
3. analyzing, reading, and interpreting published research and/or research data using
categorical analysis
2 Activities
1. Discussion : Categorical data analysis
2. Laboratory Session:
(a) Performing hypothesis testing categorical data using Z test, χ2 , McNemar’s test,
(b) Calculating, interpreting and statistical test on odd ratio and relative risk esti-
mates
(c) Performing hypothesis testing categorical data using stratification such as: Man-
tel and Haenszel’s test and log-rank test
(d) Calculating, interpreting and statistical test on Kappa Statistics
3. Homeworks:
(a) Analyzing, reading, and interpreting published research/or research data which
used categorical analysis
3 Class Exercise
3.1 Categorical Data I
1. A recent study investigated the relationship between cigarette smoking and subse-
quent mortality in men with prior history of coronary disease. It was found that 264
out of 1731 non-smokers and 208 out of 1058 smokers had died in the 5-year period
after the study began. Assume that the age distributions of the two groups are com-
parable, what is an appropriate statistical procedure to compare the mortality rates
in the two groups?
1
2. A hypothesis has been suggested that the principal benefit of physical activity is to
prevent sudden death from heart attack. The following study was designed to test
this hypothesis. One hundred (100) men who died from a first heart attack and 100
men who survived a first heart attack in the 50-59 age group were identified and their
wives were each given a detailed questionnaire concerning their husband’s physical
activity in the year preceding their heart attacks. The men were then classified as
active or inactive. Suppose 30 of the men who survived and 10 of the 100 who died
were classified as physically active. What is the general design? What is the appropri-
ate statistical procedure for assessing the relationship between physical activity and
sudden death from a heart attack?
3. Joseph Lister, a British physician of the late 19th century, decided that something
had to be done about the high death rate from post-operative complications, which
were mostly due to infection. Based on work of Louis Pasteur, he thought that the
infections had an organic cause, and decided to experiment with carbolic acid as a
disinfectant for the operating room. Lister performed 75 amputations over a period
of years. Forty of the amputations were done with carbolic acid and 35 were done
without carbolic acid. For those done with carbolic acid, 34 of the patients lived;
for those done without carbolic acid, 19 patients lived. Arrange these data in an
appropriate table.
4. You are wondering and asking question: Does aspirin really help prevent heart at-
tacks? In searching the answer, you are looking for publication in this issue. You
found the final report on the aspirin component of the Physician’s Health Study
(NEJM (1989) 231(3): 129-135). During the 1980’s, approximately 22,000 physi-
cians over the age of 40 agreed to participate in a long-term health study for which
one important question was to determine whether aspirin helps to lower the rate of
heart attacks (myocardial infarctions). The treatments of this part of the study were
aspirin or placebo, and the physicians were randomly assigned to one treatment or
2
Table 1: Study of Aspirin and Heart Attack
the other as they entered the study. After the assignment, neither the participating
physicians nor the medical personnel who treated them knew who was taking aspirin
and who was taking placebo (double-blinded study). The physicians were observed
carefully for an extended period of time and all heart attacks as well as other prob-
lems that occurred were recorded. Other than aspirin, there are many variables that
could affect the rate of heart attacks for the two groups of physicians. For example,
the amount of exercise they get and whether they smoke are two prime examples of
factors that could be controlled in the study so that the true effect of aspirin can be
measured.
3
3.2 Categorical Data II
A 1980 study investigated the relationship between the use of oral contraceptives and the
development of endometrial cancer. It was found that of 117 endometrial cancer patients,
6 had used oral contraceptives at some time in their lives while of the 395 controls, 8 had
used oral contraceptives.
b. State the null and alternative hypothesis for tests for an association between endome-
trial cancer and oral contraceptive use.
c. Put the data in a two way table, set up the calculations for the appropriate test
statistic.
4
What is the appropriate statistical procedure to test if the rate of DKA is different before
and after the onset of pump therapy?
2. A study of methods for detecting cytomegalovirus (CMV) reported the results of dif-
ferent assays on 195 broncho-alveolar lavage (BAL) specimens from bone-marrow-
transplant recipients with pneumonia. Table 3 below shows the distribution of the
BAL specimens according to cytological examination result and immunofluorescence
(IF) assay results. Do the data provide evidence that the population proportion of BAL
specimens with positive cytological examination results differs from the population
proportion of BAL specimens with positive IF assay results?
3. A study was carried out to see if patients whose skin did not respond to dinitrochloroben-
zene (DNCB), a contact allergen, would show an equally negative response to coton
oil, a skin irritant. The following table 4 shows the results of simultaneous skin
reaction tests to DNCB and coton oil in 173 patients with skin cancer.
5
Table 4: Study dinitrochlorobenzene (DNCB) and
Cotton Oil
Questions:
5. Suppose we have 500 pairs of pregnant women who participate in a study of prema-
ture births and are paired in such a way that the body weight of the women in a pair
are within 5 lbs. of each other (i.e., matching on body weight). We then give one
of the two women a placebo and the other drug A. Wish to test whether drug A has
an effect in preventing premature births. Suppose that in 30 pairs of women, both
women in a pair have a premature child; in 420 pairs of women, both women have
a normal child; in 35 pairs of women, the woman taking drug A has a normal child
and the woman taking the placebo has a premature child; in 15 pairs of women, the
woman taking drug A has a premature child and the woman taking the placebo has a
6
normal child. Display the numbers in a table and indicate what statistical procedure
would be appropriate.
4 Homework
4.1 Critical Appraisal for Categorical Data
You are expected to read one of these 3 articles according to your concentration (minat).
Please answer questions related to article selected.
A. Please read Interrante’s et. al. (2017) article which presents risk management out-
comes at 12 months post-counseling for usual care and vs Telephone counseling. You
should read table 1 and conduct appropriate test for categorical data on this table.
The goal is to make sure that two groups are equal.
Questions
7
b. Do you thing there is similarity at the baseline of intervention? For example are
BRCA1/2 test results similar between two groups? Conduct statistical test for
your judgment.
c. Table 4 shows risk management outcomes at 12 months post-counseling for
usual care and vs Telephone counseling. All numbers are presented in single
statistic (percentage only) and tested using chi-square. For example, the risk-
reducing to surgery is statistically significant. However, there is no confidence
intervals presented. Can you create table which shows 95 percent confidence
intervals for all outcomes presented in table 3?
d. Can you explain those confidences intervals?
B. In the Lee’s et. al. (2009) paper, all respondents were asked whether they or someone
in their household received cash assistance through temporary assistance for needy
families (TANF) at baseline. TANF receipt was used as a proxy for abject poverty.
Questions
a. At the baseline of the study, do you think characteristics and risk indicators of
the prenatal subsample is similar between two groups (control vs intervention)?
What is your hypotheses?
b. Are you concern regarding this baseline condition? Why?
c. Table 1 shows characteristics and risk indicators of the prenatal subsample,
Healthy Families New York (HFNY) RCT. All numbers are presented in single
statistic (percentage only) and tested using chi-square. For example, receiving
TANF for control group is lower compared to HFNY group; and it is statistically
significant. There is no confidence interval presented. Can you create table
which shows 95% confidence intervals for all variables presented in table 1?
d. How are you going to interpret and summary those confidences intervals?
C. Ugaz’s et. al. (2016) paper claimed that wealthier women are more likely than
poorer women to use long-acting and permanent methods of contraception instead
8
of short-acting methods.
Questions
a. From table 2, can you extract the Indonesian data and than present the 95%
confidence intervals for each type of contraceptive method used?
b. From table 4, can you present 95% confidence intervals for each type of contra-
ceptive method used according to wealth index (Q1 to Q5)?
c. What is your hypotheses?
d. How are you going to interpret and summary those confidences intervals for
Indonesia?
Suppose that each of a sample of n subjects is rated independently by the same two clini-
cians, with the ratings being on a categorical scale consisting of 3 categories.
Examine table 5 in which each cell entry is the proportion of all subjects classified into one
of 3 diagnostic categories by clinician A and into another by clinician B. Thus, for example,
5% of all subjects were diagnosed neurotic by clinician A and psychotic by clinician B.
Please analyze this table regarding clinicians’ agreement in making the diagnosis of those
100 patients.
Questions
9
1. How are you going to test the hypothesis that there is an association between Clini-
cian A and B?
2. Since some of cells are empties, what would be a statistical test preferred?
Please note that for intermediate values, Landis and Koch (1977a, 165) suggest the
following interpretations:
10
4.2.2 Categorical Analysis for PMA2020 Data
We used lab_12.dta (laboratory 6 stata version 12) extracted from PMA2020 round one
(2015) which has 7 variables. You can read the code book by typing following command:
describe
or
codebook
Now suppose we wish to compare the prevalence of unmet total according to age groups
of women. The two categorical variables can be cross-tabulated and the appropriate chi-
squared statistic calculated using a single STATA command:
Here the row option was used to display row-percentages, making it easier to compare the
age groups of women. Note that this test does not take account of the ordinal nature of
depression and is therefore likely to be less sensitive than, for example, ordinal regression.
Questions:
1. Please present your analysis as in the tabular format and summarize the results of
comparing total unmet need according to wealth index.
2. Please test the null hypothesis that the proportion of women who have unmet need
differ between the rich women and poor women. You can obtain the relevant table
and both the χ2 .
3. Compute 95% confidence interval for each wealth category by issuing command as
follows:
11
for wealth category is equal 1 (poor).
4. Please construct tables which shows 95% confidence interval for each wealth cate-
gories and total unmet. Interpret the results!
5. We would like to examine association between unmet need and residence (urban
rural). The hypothesis is that urban women will have a lower unmet need compare
to rural women. However, there is unequal distribution of variable wealth index.
Can you estimate odds becoming unmet need among urban and rural women and
considering the wealth distribution?
5 References
5.1 Articles for Critical Appraisal
1. Interrante, M. K., Segal, H., Peshkin, B. N., Valdimarsdottir, H. B., Nusbaum, R., Sim-
iluk, M., . . . Schwartz, M. D. (2017). Randomized Noninferiority Trial of Telephone
vs In-Person Genetic Counseling for Hereditary Breast and Ovarian Cancer: A 12-
Month Follow-Up. JNCI Cancer Spectrum, 11, pkx002-pkx002. urldoi:10.1093/jncics/pkx002
2. Lee, E., Mitchell-Herzfeld, S. D., Lowenfels, A. A., Greene, R., Dorabawila, V., & Du-
Mont, K. A. (2009). Reducing Low Birth Weight Through Home Visitation. American
Journal of Preventive Medicine, 36 2, 154-160.
doi: http://dx.doi.org/10.1016/j.amepre.2008.09.029
3. Ugaz, J. I., Chatterji, M., Gribble, J. N., & Banke, K. (2016). Is Household Wealth
Associated With Use of Long-Acting Reversible and Permanent Methods of Contra-
ception? A Multi-Country Analysis. Global Health: Science and Practice, 4, 1: 43-54.
urldoi:10.9745/ghsp-d-15-00234
12
5.2 Required Reading
1. Bewick V, Cheek L, Ball J. Statistics review 8: Qualitative data Ű tests of association
J. Critical Care 2004, 8(1):46-53. This article is online at http://ccforum.com/
content/8/1/46
6 Output
Competence to calculate, interpret, and apply on categorical data analysis:
13
7 Log sheet
14