02 Simple-Logistic-Regression-An-Overview Simple Logistic Regression
02 Simple-Logistic-Regression-An-Overview Simple Logistic Regression
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives
► In this set of lectures, we will develop a framework for simple logistic regression, a
method for relating a binary outcome to a single predictor that can be binary, categorical,
or continuous
3
Logistic Regression in General—1
► For logistic regression, the equation is a bit more convoluted than with linear regression:
The regression models the natural log odds of a binary outcome (𝑦𝑦) as a function of a
predictor 𝑥𝑥1
𝑝𝑝
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1 = 𝑙𝑙𝑙𝑙 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1
1 − 𝑝𝑝
► As noted in the previous section, 𝑥𝑥1 can be binary, nominal categorical, or continuous
4
Logistic Regression in General—2
► For example
► If 𝑦𝑦 = 1 if a child is breastfed at the time of the study, and 0 if not, then
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1 = 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
5
Logistic Regression in General—3
► As with everything else we have done thus far, we will only be able to estimate the
regression equation from a sample of data; to indicate the estimates, we can write as:
𝑝𝑝
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1 = 𝑙𝑙𝑙𝑙 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1
1 − 𝑝𝑝
► In a subsequent lecture section, the reason for this choice of scaling [𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)] will be
detailed
6
Logistic Regression in General—4
► For a given value of 𝑥𝑥1, the resulting logistic regression equation can be used to estimate
the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) of a binary outcome 𝑦𝑦, for a group of subjects with the same value of 𝑥𝑥1
𝑝𝑝
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1 = 𝑙𝑙𝑙𝑙 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1
1 − 𝑝𝑝
7
Logistic Regression in General—5
𝑝𝑝
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1 = 𝑙𝑙𝑙𝑙 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1
1 − 𝑝𝑝
► 𝛽𝛽̂1 is the change in that 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1) for a one-unit change in 𝑥𝑥1; in other
words, the difference in the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1) for a one-unit difference in 𝑥𝑥1
8
Logistic Regression in General—6
► So, 𝛽𝛽̂1 = 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1: 𝑥𝑥1 = 𝑎𝑎 + 1 − 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1: 𝑥𝑥1 = 𝑎𝑎
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1: 𝑥𝑥1 = 𝑎𝑎 + 1
by properties of logarithms, 𝛽𝛽̂1 = 𝑙𝑙𝑙𝑙 �
= 𝑙𝑙𝑙𝑙 𝑂𝑂𝑂𝑂
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1: 𝑥𝑥1 = 𝑎𝑎
► Simple logistic regression is a method for relating a binary outcome to a single predictor
that can be binary, categorical, or continuous
► The slope estimate(s), 𝛽𝛽̂1 , from a simple logistic regression has a ln(odds ratio)
interpretation, and can be exponentiated (anti-logged) to estimate an odds ratio
comparing the odds that (the outcome) y=1 for two groups who differ by one-unit in x1
► The intercept estimate, 𝛽𝛽̂0 , is the estimated 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1) when 𝑥𝑥1 = 0
10
Simple Logistic Regression with a Binary
(or Categorical) Predictor
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives
► Interpret the resulting intercept and slope(s) from a logistic regression model in which the
predictor of interest is binary or categorical
2
Obesity and Biological Sex—1
► Data from National Health and Nutrition Examination Survey (NHANES), 2013–2014
► Data include 10,000+ observations on persons 0–80 years old, and 5,847 adults (≥18
years) with body mass index (BMI)
Source: Centers for Disease Control and Prevention (CDC). 2014. National Health and Nutrition Examination Survey data. Hyattsville, MD: U.S. Department of Health
and Human Services, Centers for Disease Control and Prevention. Retrieved July 6, 2017, from
https://wwwn.cdc.gov/nchs/nhanes/ContinuousNhanes/Default.aspx?BeginYear=2013 3
Obesity and Biological Sex—2
4
Obesity and Biological Sex—3
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = −0.74 + 0.38𝑥𝑥1 , where 𝑥𝑥1 = 1 for females, 0 for males
► So:
► For females: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = −0.74 + 0.38(1)
► For males: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = −0.74
5
Obesity and Biological Sex—4
► So the slope, 𝛽𝛽̂1 = 0.38, estimates the difference in the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) for females
to males
► It is the difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) for a one-unit difference in 𝑥𝑥1, but the only
possible one-unit difference in 𝑥𝑥1 is 1 (the difference between 𝑥𝑥1 = 1 and 𝑥𝑥1 = 0)
► Recall, a difference in 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 can be re-expressed as the 𝑙𝑙𝑙𝑙 of an odds ratio, so
� where 𝑂𝑂𝑂𝑂
𝛽𝛽̂1 = 0.38 = 𝑙𝑙𝑙𝑙(𝑂𝑂𝑂𝑂) � is the odds ratio of obesity for females to males
6
Obesity and Biological Sex—5
► So:
► For males: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = −0.74
► So, 𝛽𝛽̂0 = −0.74, estimates the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) for males, i.e. the
𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) when 𝑥𝑥1 = 0
7
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—1
127 79
► Here, 𝑝𝑝̂ 𝐶𝐶𝐶𝐶𝐶<250 = = 0.252 25.2% and �𝑝𝑝𝐶𝐶𝐶𝐶𝐶≥250 = = 0.159 15.9%
503 497
► The odds ratio of treatment response for subjects with baseline CD4 <250 compared to
� 𝐶𝐶𝐶𝐶𝐶<250
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 0.252⁄
0748
subjects with baseline CD4 ≥250 is � 𝐶𝐶𝐶𝐶𝐶≥250
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
= 0.159⁄ ≈ 1.78
0.841
► So:
► For subjects with baseline CD4 <250: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = −1.67 + 0.58(1)
► For subjects with baseline CD4 ≥250: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = −1.67
9
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—3
► So the slope, 𝛽𝛽̂1 = 0.58, estimates the difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟) for subjects
with baseline CD4 <250 compared to subjects with baseline CD4 ≥250
► It is the difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟) for a one-unit difference in 𝑥𝑥1, but the
only possible one-unit difference in 𝑥𝑥1 is 1 (the difference between 𝑥𝑥1 = 1 and 𝑥𝑥1 = 0)
► Recall, a difference in 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 can be re-expressed as the 𝑙𝑙𝑙𝑙 of an odds ratio, so
�
𝛽𝛽̂1 = 0.58 = 𝑙𝑙𝑙𝑙(𝑂𝑂𝑂𝑂),
� is the odds ratio of response for subjects with baseline CD4
where 𝑂𝑂𝑂𝑂
< 250 compared to subjects with baseline CD4 ≥ 250
10
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—4
► The resulting logistic regression equation for this analysis is 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 =
− 1.67 + 0.58𝑥𝑥1 , where 𝑥𝑥1 = 1 for subjects with baseline CD4 <250, and 0 for subjects
with baseline CD4 ≥250
► So:
► For subjects with baseline CD4 ≥250: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = −1.67
► So, 𝛽𝛽̂0 = −1.67 estimates the ln(odds of response) for this reference group (ln(odds of
response) when x1 = 0)
11
Respiratory Failure and Gestational Age—1
Source: Consortium on Safe Labor. (2010). Respiratory morbidity in late preterm births. JAMA, 304(4), 419–425. 12
Respiratory Failure and Gestational Age—2
13
Respiratory Failure and Gestational Age—3
► Even though the gestational age categories are ordinal, authors did not want to assume
the relationship between 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓) and gestational age category
is necessarily linear
► There are four categories: Make one category the reference, and make binary 𝑥𝑥’s
indicators for the other 3; the authors used 37–40 weeks as the reference
► 𝑥𝑥1 = 1 if gestational age = 34 weeks, 0 if not
► 𝑥𝑥2 = 1 if gestational age = 35 weeks, 0 if not
► 𝑥𝑥3 = 1 if gestational age = 36 weeks, 0 if not
14
Respiratory Failure and Gestational Age—4
► In this model:
16
Summary—1
► Logistic regression is a method for relating a binary outcome to a predictor via a linear
equation
► The predictor can be binary, categorical, or continuous
► When the predictor is binary: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1
► The slope 𝛽𝛽̂1 is the estimated 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) for the group with 𝑥𝑥1 = 1
�
compared to the reference group with 𝑥𝑥1 = 0; this result can be exponentiated 𝑒𝑒 𝛽𝛽1
to get the estimated odds ratio
► The intercept 𝛽𝛽̂0 is the estimated 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) for the group with 𝑥𝑥1 = 0; this result
�
can be exponentiated 𝑒𝑒 𝛽𝛽0 to get the estimated odds
17
Summary—2
► The slopes 𝛽𝛽̂1 to 𝛽𝛽̂𝑝𝑝−1 are estimated 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) for the group with 𝑥𝑥𝑖𝑖 = 1
compared to the reference group with 𝑥𝑥1 = 𝑥𝑥2 = ⋯ 𝑥𝑥𝑝𝑝−1 = 0; these results can be
exponentiated to get the estimated odds ratios
► The intercept 𝛽𝛽̂0 is the estimated 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) for the reference group with 𝑥𝑥1 = 𝑥𝑥2 =
�
⋯ 𝑥𝑥𝑝𝑝−1 = 0; this result can be exponentiated 𝑒𝑒 𝛽𝛽0 to get the estimated odds
18
Summary—3
► So why are we doing this? Did not the regressions shown in this previous section just re-
present analyses that were done in Statistical Reasoning 1? And the end result are odds
and odds ratio that could have easily been computed without using logistic regression?
19
Simple Logistic Regression with a
Continuous Predictor
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives
► Use a LOWESS plot to get a snapshot of the relationship between the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1)
and the continuous predictor 𝑥𝑥1
► Interpret the slope and intercept from simple logistic regression models
2
Background—1
► Why model the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) as a linear function of a continuous predictor 𝑥𝑥1
instead of modeling the 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1 or the 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1 in the equation
𝑙𝑙𝑙𝑙 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1
3
Background—2
𝑝𝑝
► With some algebra, one can solve 𝑙𝑙𝑙𝑙 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 = 𝑙𝑙𝑙𝑙 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1 for 𝑝𝑝:
1−𝑝𝑝
� �
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑒𝑒 𝛽𝛽0+𝛽𝛽1𝑥𝑥1
𝑝𝑝 = =
1 + 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 1 + 𝑒𝑒𝛽𝛽�0 +𝛽𝛽�1𝑥𝑥1
4
Obesity and HDL Cholesterol—1
► Data from National Health and Nutrition Examination Survey (NHANES), 2013–2014
► Data include 10,000+ observations on persons 0–80 years old, and 5,847 adults (≥18
years) with body mass index (BMI) and HDL cholesterol levels
5
Obesity and HDL Cholesterol—2
► This formulation makes a strong assumption about the nature of the relationship between
the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) and 𝑥𝑥1 = 𝐻𝐻𝐻𝐻𝐻𝐻 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 (𝑚𝑚𝑚𝑚/𝑑𝑑𝑑𝑑):
6
Obesity and HDL Cholesterol—3
7
Obesity and HDL Cholesterol—4
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 1.20 + −0.034𝑥𝑥1 , where 𝑥𝑥1 = 𝐻𝐻𝐻𝐻𝐻𝐻 (𝑚𝑚𝑚𝑚/𝑑𝑑𝑑𝑑)
8
Obesity and HDL Cholesterol—5
► So the slope, 𝛽𝛽̂1 = −0.034, estimates the difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) for two
groups of persons whose HDL levels differ by 1 mg/dl
► It is the difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) for a one-unit difference in 𝑥𝑥1, i.e., a
1mg/dL difference in HDL
► Recall, a difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) can be re-expressed as the 𝑙𝑙𝑙𝑙 of an odds ratio: so,
�
𝛽𝛽̂1 = −0.034 = 𝑙𝑙𝑙𝑙(𝑂𝑂𝑂𝑂),
� is the odds ratio of obesity for two groups whose HDL levels differ by 1 mg/dL
where 𝑂𝑂𝑂𝑂
9
Obesity and HDL Cholesterol—6
► So, the odds ratio estimate is 0.967, or ≈0.97; the odds ratio of being obese for two
groups of persons who differ by one mg/dL in HDL levels is 0.97, higher HDL to lower LDL
► In other words, higher HDL subjects (by one mg/dL) have 3% lower odds of being
obese when compared to the lower HDL subjects
► This estimate is for any two groups who differ by one mg/dL in HDL in the population from
which the samples were taken
► 60 mg/dL to 59 mg/dL
► 44 mg/dL to 43 mg/dL
► Etc.
10
Obesity and HDL Cholesterol—7
𝑙𝑙𝑙𝑙 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 1.20 + −0.034𝑥𝑥1 , where 𝑥𝑥1 = 𝐻𝐻𝐻𝐻𝐻𝐻 (𝑚𝑚𝑚𝑚/𝑑𝑑𝑑𝑑)
► So:
► When 𝑥𝑥1 = 0: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 1.20
► So, 𝛽𝛽̂0 = 1.20, estimates the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) for persons with 𝐻𝐻𝐻𝐻𝐻𝐻 = 0 𝑚𝑚𝑚𝑚/𝑑𝑑𝑑𝑑
11
Obesity and HDL Cholesterol—8
► What is the odds ratio of being obese for persons with HDL of 100 mg/dL versus persons
with HDL of 80 mg/dL?
12
Breastfeeding Status and Child Age—1
► Data on a random sample of 192 Nepalese children between 1 and 3 years old (12–36
months)
► Information includes breastfeeding status at time of study (1 = 𝑦𝑦𝑦𝑦𝑦𝑦, 0 = 𝑛𝑛𝑛𝑛) and age
of the child in months
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥1 , where 𝑥𝑥1 = 𝑎𝑎𝑎𝑎𝑎𝑎 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
13
Breastfeeding Status and Child Age—2
14
Breastfeeding Status and Child Age—3
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 7.29 + −0.24𝑥𝑥1 , where 𝑥𝑥1 = 𝑎𝑎𝑎𝑎𝑎𝑎 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
► For 𝑎𝑎𝑎𝑎𝑎𝑎 = 𝑎𝑎: 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 7.29 + −0.24(𝑎𝑎)
15
Breastfeeding Status and Child Age—4
► So the slope, 𝛽𝛽̂1 = −0.24, estimates the difference 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏) for
two groups of children who differ by one month in age
► It is the difference in 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏) for a one-unit difference in 𝑥𝑥1,
i.e., a one-month difference in age
► Recall, a difference in 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 can be re-expressed as the 𝑙𝑙𝑙𝑙 of an odds ratio: so,
� where 𝑂𝑂𝑂𝑂
► 𝛽𝛽̂1 = −0.24 = 𝑙𝑙𝑙𝑙(𝑂𝑂𝑂𝑂), � is the odds ratio of being breastfed for two groups
of children who differ by one month in age
16
Breastfeeding Status and Child Age—5
► The odds ratio of being breastfed for two groups of children who differ by one month in
age is 0.79, older compared to younger
► In other words, older children (by one month) have 21% lower odds of being breastfed
when compared to the younger children
► This estimate is for any two groups who differ by one month of age in the population of
Nepalese children age 12–36 months
► 15 months to 14 months
► 27 months to 26 months, etc.
17
Breastfeeding Status and Child Age—6
► 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 7.29 + −0.24𝑥𝑥1 , where 𝑥𝑥1 = 𝑎𝑎𝑎𝑎𝑎𝑎 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
► So, 𝛽𝛽̂0 = 7.29 estimates the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏) for newborn children
(𝑥𝑥1 = 0 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚): not necessarily a realistic quantity based on our sample (children
12–36 months old)
18
Breastfeeding Status and Child Age—7
► What is the estimated relative odds (odds ratio) of being breastfed for children who are
30 months old, compared to children who are 24 months old?
19
What if Linearity Assumption Is Not Met?—1
► Suppose there is not a linear relationship between the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) and/or the
researcher does not wish to treat 𝑥𝑥1 as continuous
20
What if Linearity Assumption Is Not Met?—2
► For example, with the respiratory failure and gestation age example from the previous
section, gestational age was categorized
► Recall: The authors used 37–40 weeks as the reference and 𝑥𝑥1 = 1 if gestational age = 34
weeks, 𝑥𝑥2 = 1 if gestational age = 35 weeks, and 𝑥𝑥3 = 1 if gestational age = 36 weeks
21
Summary
► Simple logistic regression can be done with binary, categorical, and continuous predictors
► When the predictor 𝑥𝑥1 is continuous, the model estimates a linear relationship between
the ln(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) and 𝑥𝑥1
► This assumption should be investigated empirically (LOWESS smoothing, categorization
of continuous predictors) prior to fitting a model
► The resulting estimated slope from logistic regression with a continuous predictor still has
a 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟) interpretation, and the intercept has a 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑥𝑥1 = 0)
interpretation
22
Simple Logistic Regression: Accounting
for Uncertainty in the Estimates
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives
► Create 95% CIs for the intercept and slopes from simple logistic regression and convert
these to 95% CIs for odds and odds ratios
► Estimate p-values for testing the null 𝐻𝐻0: 𝛽𝛽1 = 0 (and, hence, the 𝑂𝑂𝑂𝑂 = 1)
2
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—1
► In the previous sections, we showed the results from several simple logistic regression
models
► For example, when relating response to treatment to baseline CD4 counts, the resulting
logistic regression equation for this analysis is 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = −1.67 + 0.58𝑥𝑥1 ,
where 𝑥𝑥1 = 1 for baseline CD4 <250 and 0 for subjects with baseline CD4 ≥250
► This was estimated from the individual-level data, using a computer package
► What is the algorithm to estimate this equation? There must be some algorithm that will
always yield the same results for the same data set, regardless of the computer package
used to fit the regression model
3
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—2
► For logistic regression, this approach is called “maximum likelihood”: The estimates for
the intercept (𝛽𝛽̂0 ) and the slope (𝛽𝛽̂1 ) are the values that make the observed data “most”
likely among all possible choices for 𝛽𝛽̂0 and 𝛽𝛽̂1
4
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—3
► The values chosen for 𝛽𝛽̂0 and 𝛽𝛽̂1 are just estimates based on a single sample; for a
different random sample of 1,000 subjects from the same HIV+ population, the resulting
estimates would likely be different
► As such, all regression coefficients have an associated standard error that can be used to
make statements about the true relationship between 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑦𝑦 = 1) and 𝑥𝑥1 (for
example, the true slope 𝛽𝛽̂1 ), based on a single sample
► The method of maximum likelihood yields standard errors for the slope and intercept
estimates, 𝛽𝛽̂1 and 𝛽𝛽̂0
5
95% CIs and p-Values—1
► The standard errors allow for the computation of 95% CIs and p-values for the slope and
intercept
► Hence, it is “business as usual” for getting 95% CIs and doing hypothesis tests with one
caveat: The CIs are done on the 𝑙𝑙𝑙𝑙 scale, and these results can be exponentiated
6
95% CIs and p-Values—2
7
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—4
► Again, when relating response to treatment to baseline CD4 counts, the resulting logistic
regression equation for this analysis is 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = −1.67 + 0.58𝑥𝑥1 , where
𝑥𝑥1 = 1 for baseline CD4 <250 and 0 for subjects with baseline CD4 ≥250
8
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—5
► 95% CI for population slope 𝛽𝛽1 ► p-Value for population slope 𝛽𝛽1
𝐻𝐻0 : 𝛽𝛽1 = 0 𝑣𝑣𝑣𝑣. 𝐻𝐻𝐴𝐴 : 𝛽𝛽1 ≠ 0
𝛽𝛽̂1 ± 2𝑆𝑆𝑆𝑆
� 𝛽𝛽̂1 → 0.58 ± 2 0.16 →
(𝐻𝐻0 : 𝑒𝑒 𝛽𝛽1 = 𝑂𝑂𝑂𝑂 = 1 𝑣𝑣𝑣𝑣. 𝐻𝐻0 : 𝑒𝑒 𝛽𝛽1 = 𝑂𝑂𝑂𝑂 ≠ 1)
≈ (0.26, 0.90)
► Assume null is true and calculate distance
� 𝐶𝐶𝐶𝐶𝐶<250 𝑣𝑣𝑣𝑣 𝐶𝐶𝐶𝐶𝐶≥250 = 𝑒𝑒 0.58 = 1.78, of slope estimate 𝛽𝛽̂1 in units of standard
► 𝑂𝑂𝑂𝑂
error
and 95% CI for intercept population odds
ratio (OR) is given by 𝑒𝑒 0.26 , 𝑒𝑒 0.90 → 𝛽𝛽̂1 0.58
𝑧𝑧 = = ≈ 3.6
(1.30, 2.46) �
𝑆𝑆𝑆𝑆(𝛽𝛽)̂ 0.016
► Translate to a p-value
► In this example, the p-value is very
small, <0.001
9
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—6
► One could use a similar approach to get a 95% CI for the true population level intercept 𝛽𝛽0
and a 95% CI for the odds (𝑒𝑒 𝛽𝛽0 )of response in the CD4≥250 cells/mm3 population
► Summary of findings:
► This research used simple logistic regression to estimate the association between
treatment response and baseline CD4 counts in a population of HIV+ individuals using
data on a random sample of 1,000
► A statistically significant association was found (𝑝𝑝 <.001)
► The results estimate that individuals with lower CD4 counts at the time of treatment
(<250) have 78% greater odds of responding to ART, as compared to individuals with
higher CD4 counts (95% CI 30%, 146%)
10
Obesity and HDL Cholesterol—1
► The resulting logistic regression equation for this analysis is 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 =
1.20 + −0.034𝑥𝑥1 , where 𝑥𝑥1 = 𝐻𝐻𝐻𝐻𝐻𝐻 (𝑚𝑚𝑚𝑚/𝑑𝑑𝑑𝑑)
11
Obesity and HDL Cholesterol—2
► 95% CI for population slope 𝛽𝛽1 ► p-Value for population slope 𝛽𝛽1
𝐻𝐻0 : 𝛽𝛽1 = 0 𝑣𝑣𝑣𝑣. 𝐻𝐻𝐴𝐴 : 𝛽𝛽1 ≠ 0
𝛽𝛽̂1 ± 2𝑆𝑆𝑆𝑆
� 𝛽𝛽̂1 → −0.034 ± 2 0.002 →
(𝐻𝐻0 : 𝑒𝑒 𝛽𝛽1 = 𝑂𝑂𝑂𝑂 = 1 𝑣𝑣𝑣𝑣. 𝐻𝐻0 : 𝑒𝑒 𝛽𝛽1 = 𝑂𝑂𝑂𝑂 ≠ 1)
≈ (−0.038, −0.030)
► Assume null is true and calculate distance
� = 𝑒𝑒 −0.034. ≈ 0.967 and 95% CI for of slope estimate 𝛽𝛽̂1 in units of standard
► 𝑂𝑂𝑂𝑂
error
the population odds ratio (OR) is given by
𝑒𝑒 −0.038 , 𝑒𝑒 −0.030 → (0.963, 0.970) 𝛽𝛽̂1 −0.034
𝑧𝑧 = = = −17
�
𝑆𝑆𝑆𝑆(𝛽𝛽)̂ 0.002
► Translate to a p-value
► In this example, the p-value is very
small, <0.001
12
Obesity and HDL Cholesterol—3
► One could use a similar approach to get a 95% CI and p-value for the true population level
intercept 𝛽𝛽0 and a 95% CI for the odds (𝑒𝑒 𝛽𝛽0 )of obesity in persons with HDL cholesterol of
0 mg/dL (but of course, this is not relevant scientifically)
► Summary of findings:
► This research used simple logistic regression to estimate the the association between
obesity and HDL cholesterol levels using data on adults from the 2013–2014 National
Health and Nutrition Examination Survey (NHANES)
► A statistically significant association was found (𝑝𝑝 <.001)
► The results estimate that each additional mg/dL is associated with a 3.3% reduction in
the odds of obesity (odds ratio = 0.967, 95% CI 0.963, 0.970)
13
Summary
► The construction of confidence intervals for logistic regression slopes and intercepts is
“business as usual”: Take the estimate and add/subtract 2 estimated standard errors for
“large samples”
► In smaller samples, the 95% CI and p-values are based on exact computations, but this
detail will be handled by a computer
► The interpretations of the CIs and p-values are the same regardless of sample size
► Confidence intervals for slopes are confidence intervals for 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟)s; these
endpoints an be exponentiated to get a confidence interval for an odds ratio
► Confidence intervals for intercepts are confidence intervals for 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑦𝑦 = 1)
for specific group (𝑥𝑥1 = 0): not always relevant when 𝑥𝑥1 is continuous; these endpoints
can be exponentiated to get a confidence interval for the odds
14
Estimating Risk and Functions of Risk
from Logistic Regression Results
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives
► While the results from logistic regression can be interpreted in terms of odds and odds
ratios (after exponentiation), for prospective cohort studies, risks can be estimated
► With a little bit of work, the results from logistic regression can be converted to
probabilities (proportions, risks) and presented on this scale
2
Probability (Risk) Estimates Based on Logistic Regression
► In the last several sections, we have explored how to relate a binary outcome to a
predictor (binary, ordinal and nominal categorical, continuous) via simple logistic
regression
► We have shown how to translate the results into estimates of odds and odds ratio
► The results from logistic regression can also be used to get estimated risks and functions
of risk (if the study design allows for risk estimates)
3
Relationship Between Odds and Probability (and Logistic Results)
� = 𝑝𝑝�
► Recall, the estimated odds of an event for a single group is 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 1−𝑝𝑝�
�
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
► Solving this equation for 𝑝𝑝̂ yields 𝑝𝑝̂ = �
1+𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
4
Respiratory Failure and Gestational Age—1
► Even though the gestational age categories are ordinal, the authors did not want to
assume the relationship between 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓) and gestational age
category is necessarily linear
► There are four categories: Make one category the reference, and make binary 𝑥𝑥’s
indicators for the other three; the authors used 37–40 weeks as the reference
► 𝑥𝑥1 = 1 if gestational age = 34 weeks
► 𝑥𝑥2 = 1 if gestational age = 35 weeks
► 𝑥𝑥3 = 1 if gestational age = 36 weeks
5
Respiratory Failure and Gestational Age—2
� 37 𝑡𝑡𝑡𝑡 40 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 0.0041
𝑝𝑝̂ 37 𝑡𝑡𝑡𝑡 40 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = = ≈ 0.004 (0.4%)
�
1 + 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜37 𝑡𝑡𝑡𝑡 40 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 1.0041
6
Respiratory Failure and Gestational Age—3
► To compute estimated risk (probability, proportion) of respiratory failure for the 34 week
group: 𝑥𝑥1 = 1 and 𝑥𝑥2 = 𝑥𝑥3 = 0
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓 34 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = −5.5 + 3.4 = −2.1
� 34 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝑒𝑒 −2.1 = 0.122
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
� 34 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 0.122
𝑝𝑝̂ 34 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = = ≈ 0.11 (11%)
�
1 + 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜34 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 1.122
7
Breastfeeding Status and Child Age—1
► What is the estimated probability (risk) of being breastfed among 24-month-old children?
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑓𝑓 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑎𝑎𝑎𝑎 24 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 7.29 + −0.24 24 = 1.53
� 24 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑒𝑒 1.53 ≈ 4.6
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
� 24 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 4.6
𝑝𝑝̂ 24 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = = ≈ 0.82 (82%)
� 24 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 5.6
1 + 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
8
Breastfeeding Status and Child Age—2
► What is the estimated probability (risk) of being breastfed among 16-month-old children?
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑎𝑎𝑎𝑎 16 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 7.29 + −0.24(16) = 3.45
� 16 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑒𝑒 3.45 ≈ 31.5
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
� 16 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 31.5
𝑝𝑝̂16 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = = ≈ 0.97 (97%)
� 16 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 32.5
1 + 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
9
Breastfeeding Status and Child Age—3
10
Breastfeeding Status and Child Age—4
11
Summary
► For most types of studies (case control studies excepted), the results from logistic
regression can be used to estimate risk (probability, proportion) and, hence, risk
differences and relative risks
► For specific values of 𝑥𝑥1 (or 𝑥𝑥1, 𝑥𝑥2, … 𝑥𝑥𝑝𝑝−1 for a multi-categorical predictor), the logistic
regression equation can be used to calculate the 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) of the binary outcome of
�
interest [𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)]
�
� = 𝑒𝑒 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)
► 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
�
𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
► 𝑝𝑝̂ = �
1+ 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
12
Additional Exercises
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—1
► What would the resulting estimates of the intercept and slope be in the model
𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝛽𝛽̂𝑜𝑜∗ + 𝛽𝛽̂1∗ 𝑥𝑥1 if x1 is coded as 1 for subjects with baseline CD4
≥ 250 , and 0 for subject with baseline CD4 < 250?
2
Response to ART, HIV+ Individuals and Baseline CD4 Count Groups—2
3
Respiratory Failure and Gestational Age—1
► Even though the gestational age categories are ordinal, authors did not want to assume
the relationship between 𝑙𝑙𝑙𝑙(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓) and gestational age category
is necessarily linear
► There are four categories: Make one category the reference, and make binary 𝑥𝑥’s
indicators for the other 3; the authors used 37–40 weeks as the reference
► 𝑥𝑥1 = 1 if gestational age = 34 weeks, 0 if not
► 𝑥𝑥2 = 1 if gestational age = 35 weeks, 0 if not
► 𝑥𝑥3 = 1 if gestational age = 36 weeks, 0 if not
4
Respiratory Failure and Gestational Age—2
► Based on the results of this model what is the odds ratio of respiratory failure for children
with gestational age of 36 weeks compared to children with gestational age of 34 weeks?
5
Respiratory Failure and Gestational Age—3
► Linearity assumption?
6
Obesity and HDL Cholesterol
► What is the odds ratio of being obese for persons with HDL of 100 mg/dL versus persons
with HDL of 80 mg/dL?
7
Breastfeeding Status and Child Age—1
► 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 7.29 + −0.24𝑥𝑥1 , where 𝑥𝑥1 = 𝑎𝑎𝑎𝑎𝑎𝑎 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚). Also,
� 𝛽𝛽̂0 ) = 1.07, and 𝑆𝑆𝑆𝑆(
𝑆𝑆𝑆𝑆( � 𝛽𝛽̂1 ) = 0.04. What is the estimate odds ratio (and 95% CI) of
being breastfed for children who are 30 months old compared to children who are 24
months old?
8
Breastfeeding Status and Child Age—2
► 𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 7.29 + −0.24𝑥𝑥1 , where 𝑥𝑥1 = 𝑎𝑎𝑎𝑎𝑎𝑎 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)