PC program implementing an alternative to the paired t-test
which adjusts for regression to the mean
Charles J. Kowalskia, Emet D. Schneiderman*b, Stephen M. Willisb
aDepartment of Biologic and Materials Sciences and The Center for Statistical Consultation and Research, University of Michigan,
Ann Arbor, MI 48109, USA
bDepartment of Oral and Maxiilofacial Surgery and Pharmacology, Baylor College of Dentistry, PO Box 660677, Dallas,
TX 75266-0677, USA
Received 18 January 1994; accepted 25 February 1994
AbShWt
In many biomedical research contexts, treatment effects are estimated from studies based on subjects who have been
recruited because of high (low) measurements of a response variable, e.g., high blood pressure or low scores on a stress
test. In this situation, simple change scores will overestimate the treatment effect; and the use of the paired t-test may
find significant change due not to the treatment per se but, rather, due to regression towards the mean. A PC program
implementing a procedure for adjusting the observed change for the regression effect in simple pre-test-post-test experiments is described, illustrated, and made available to interested readers. The method is due to Mee and Chua (Am
Stat, 45 (1991) 39-42), and may be considered as an alternative to the paired t-test which separates the effect of the
treatment from the so-called regression effect.
Keywork Treatment effects; Regression to the mean; Paired t-test; PC program
1. Introduction
The term regression to the mean (or, simply, the
regression effect) is used to describe the phenomenon that a variable which is extreme on its first
measurement will tend to be closer to the center of
the distribution when measured at a later point in
time. This phenomenon is exemplified in patients
with initially high blood pressure measurements
who, if measured on a second occasion, will tend
to have more average values at that time. It can be
* Corresponding author.
demonstrated in its most simple form if we assume
that the first (Z) and second (x) measurements
have a bivariate normal distribution with means
pz and px, variances 02z and u2x and correlation
p. Then the regression function of X on Z, or the
conditional expectation of X given that Z = z, is
E(XIZ = z) = /Lo + p$(z - p,)
(1)
If we rewrite (1) in the form
E(XlZ = z) - /.J~ = p$(z - /J,)
0020-7101/94/$07.00 0 1994 Elsevier Science Ireland Ltd. All rights reserved
SSDI 0020-7101(94)01017-U
(2)
190
C.J. Kowalski et al. /1n1. J. Biomed. Comput. 37 (1994) 189-194
it is seen that, for a given z, the expected distance
of X from c(x (the left-hand side of Eq. 2) is proportional to the distance between z and Pi, the
constant of proportionality being the regression
coefficient
Pm = P
$
Thus we would expect X to be closer to its mean
than z was to its, if lfl,zl < 1, i.e., if Ip I <
a&,y.
An important special case of(l), which we will
need in our development, has been called dynamic
equilibrium [l] and is characterized by equal
means and variances for Z and X, viz.,
i&z = Px = CL
a22 =
- 2
u 2x-u
and by a correlation which is somewhat less than
perfect ( Ip I < 1). This is a ‘no change’ model
which could be used, e.g., in the situation in which
a treatment was applied following a baseline
measurement and the treatment was completely
ineffective. Nevertheless, the regression phenomenon must still be contended with: in this case
E(XlZ = z) = p+ p(z - p)
(4)
and, as shown clearly by Lord [l], subjects with
sufficiently high (low) values of z are virtually
guaranteed to decrease (increase) when the second
measurement is taken. Writing this in the form (2)
E(X I Z = z) - /J = p(z - CL)
(5)
it is clear that we expect X to be closer to the mean
than z, provided only that I p I # 1. Notice that,
if z > ~1 and p is positive, we still expect that X - p
will be positive, but at a reduced level.
In many biomedical research contexts the inclusion criteria for a study will necessarily include
that subjects have sufficiently high (low) values of
the variable of interest, e.g., high values of blood
pressure, cholesterol level, or gingivitis; or low
scores on an achievement or physical stress test.
Since such subjects can be expected to change even
in the absence of treatment, it is important to correct the observed change for the regression effect
to properly assess the portion of the change which
may be attributed to the treatment. In this paper
we describe, illustrate and make available a PC
program implementing a procedure which explicitly adjusts for that portion of the observed change
attributable to regression to the mean. The procedure was developed by Mee and Chua [2], and we
will follow them in describing a scenario in which
its use would be appropriate.
2. A typical application
Mee and Chua [2] considered data taken from
[3, p. 4761, reproduced in Table 1 below, which
was gathered in the following context. High school
students in Florida are required to pass a literacy
test before they can graduate. Students who do not
pass the examination on their first attempt may
take a refresher course, and then retake an equivalent test. The problem is to decide whether or not
the refresher course was effective in raising the test
scores.
Notice first that it would not be appropriate in
this situation to do a simple paired r-test. This test
fails to account for change due to the regression
effect and thus produces a biased estimate of the
effect of the course. As we have seen, even if nothing is done between the first and second exams,
those individuals who have low scores on the first
exam will, on the average, show improvement on
the second.
Table 1
Literacy test scores before and after taking a remedial course
Student
Before
After
Difference
1
2
3
4
5
6
7
8
45
52
63
68
57
55
60
59
49
50
70
71
53
61
62
67
4
-2
7
3
-4
6
2
8
Example data. Source: McClave and Dietrich [3].
C.J. Kowdlski et al. /Int. J. Biomed. Comput. 37 (1994) 189-194
We let 2 denote a score on the first test, and X
the corresponding score on the second. 2 will be
available for all the students in the school; X will
be observed only for those students with values of
2 < k, where k is a suitable cut-off point. Nevertheless, we define p and a2 to be the common
mean and variance of the pre- and post-test scores
which refer to the situation in which all the
students take both examinations. Similarly, p is the
correlation between the pre- and post-scores when
all students take both exams. We let kz and px be
the means for the students who actually take both
exams; u*z and a2x are the corresponding variances. Assuming that the refresher course is completely ineffective in raising test scores, because of
the regression effect, we do not expect that the differences X - Z will be zero. In fact, the expected
difference between the means is
kx - Pi! = P +
Pbz - d - Pz
(6)
as was noted in [2]. This will be zero only if p = 1.
The property of conditional expectations referred
to in [2], necessary to show px = ~1 + p(pz - p),
and hence (6) is that cc, = E(X IZ < k) =
E[E(XlZ)lZ < k] = E[p + p(Z- r)lZ < k] =
g + p(pz - p). We see, then, that even in the absence of any intervention (the school is in dynamic
equilibrium), students selected using the criterion
Z < k may be expected to change (improve). The
amount of change expected is given by (6). It is this
amount of change which must be subtracted from
the observed change to obtain the change attributable to the intervention. The method of accomplishing this developed in [2] is outlined next.
3. Mee and Chua’s procedure
Mee and Chua consider their method to be an
alternative to the paired t-test which more properly accounts for the regression effect described
above. It is based on regressing X on Z - ~1, taking
A the common value of the means of Z and X
under dynamic equilibrium, as known. This will be
a reasonable assumption when, in the context of
our example, the school is large for then one can
use this large sample to estimate the mean of Z.
191
The null model may be written, from (4), as
H():X=p+p(Z-p)+e
(7)
and the alternative is
H,:X = pi-J + p(Z - p) + c
(8)
with PO > ~1. This form of the hypothesis (with &,
> cc) is for situations in which the treatment or intervention is intended to raise the scores of treated
subjects. We continue to consider this case in this
section. We show the (slight) changes necessary to
deal with the opposite scenario (PO < p) later. In
(6) and (7), e is a random error (residual) with
E(E) = 0 and Var(e) = u2. We define Zi = Zi - p
and compute
i=+
I CXi
2 =7
8, = C(2i - z>(Xi - X)
C(Zi - ~2
$0 = x - fi,Z
and
s2
The fitted model Xi = Bo + Br(Zi - cl) is the least
squares regression line; and s2 is the mean
squared error of the regression (an estimate of a2).
Then the test of H, vs. HA is based on
(9)
192
C.J. Kowalski et al. /ht. J. Biomed. Comput. 37 (1994) 189-194
where t has N - 2 degrees of freedom. Note that
the test may be rephrased as one of &, = JL vs.
/I,-, > ~1 so that a one-sided P-value is appropriate
and is printed in our program. We also compute
and print the (1 - Q) x 100% one-sided contidence interval for /3s - p. The user selects the
value of 1 - (Y (e.g., 0.95) to be used. The conlidence interval is given by
PO-P = @O-CL)- t1-,W-2)s
where ti _ u (N - 2) is the (1 - cr)th percentile of
the t distribution with N - 2 degrees of freedom.
The right-hand side (RHS) of the inequality in (10)
represents a lower bound for the difference
/IO - p. When trying to raise the value of the measurement, we expect fro - p 1 0 and the RHS is
then the smallest increase consistent with the data.
If this is positive, we may conclude, with confidence 1 - o, that the TX (treatment or intervention) effect has in fact raised the mean score, and
this increase is at least RHS. If this is negative, this
indicates that a negative increase (a decrease) is
consistent with the data, the interval contains zero,
and the TX effect is non-significant.
For the example data in Table 1, the fitted
regression function is
B = 79.959 + 1.1112(2 - p)
with s* = 20.296. Mee and Chua took p= 75 as
the true value of ~1 and showed that this leads to
t = 1.08, and the corresponding (one-sided) Pvalue is 0.16. The simple paired t-test, on the other
hand, gives t = 2.0 with P = 0.04, which will be
judged by most as indicating a significant effect.
Thus while the ‘total change’ may be significant,
when the amount of change due to the regression
effect is subtracted - leaving that portion of the
change which can be attributed to the intervention
- the adjusted change need not be. The one-sided
95% confidence interval for B. - p has the lower
bound -3.9450. It answers the question, ‘How bad
might the treatment be?’ In our example, we are
95% confident that the refresher course will not
lower the postmeasure by more than 3.945 points.
The fact that this change can be negative is consistent with the result of the test based on (9) which
showed no significant treatment effect.
4. The program
The program is invoked by issuing the command ‘gsruni meechua’. The user is first asked for
the location and name of the (ASCII or GAUSS)
file containing the values of the Z/X @e/post)
measurements. The data set should be rectangular
with N rows and 2 columns; the premeasures in
column 1, followed by the postmeasures. The user
is then asked to enter the value of p (MU), and the
level of confidence (e.g., 0.95) to be employed in
constructing the confidence interval for flo - ~1.
Finally, the user indicates whether the TX is expected to INCREASE or DECREASE the mean
score for treated subjects.
The output includes plots of the ‘growth profiles’ (pre- and post-measurements connected by
straight lines) for each of the N individuals; descriptive statistics (means, standard deviations and
correlation coefficients) for Z, X, D = X - Z and
Z W = 2; the regression coefficients B. and 8,; the
value of R* and the mean square error for the
regression of X on Z; the P-value for the test of a
significant treatment effect; and the confidence interval for PO - p. For the example data, taking
p= 75, the numerical form of the output is as
follows:
DESCRIPTIVE STATISTICS
ZBAR = 57.375, SDZ = 6.9885, VZ = 48.8393
XBAR = 60.375, SDX = 8.8146, VX = 77.6964
DBAR = 3.000, SDD = 4.2426, VD = 18.000
ZWBAR = -17.625, SDZW = 6.989,
VZW = 48.839
R(ZX) = 0.8810
R(ZD) = 0.1831
R(XZW) = 0.8810
RSQ(XZW) = 0.776
In the above, ZW denotes 2. The value of
C.J. KowaW et al. /ht. J. Biomed. Comput. 37 (1994) 189-194
RSQ(XZW) (the square of R(XZW)) measures the
strength of the linear relationship between ZW and
X, so is useful in judging the goodness-of-fit of the
assumed model. At this point in the program we
also produce a scatterplot of X vs. ZW to help
facilitate this judgement. The descriptive statistics
and plot are followed by
ESTIMATES OF PARAMETERS
BETA0 = 79.9590
BETA1 = 1.1112
MSE = 20.2960
SE = 4.5803
MSE is the mean squared error of the regression
of X on ZW (s*). SE is the standard error of
&I - p as defined by the denominator of expression (9). Following this we plot the fitted regression of X on ZX superimposed on the previous
scatterplot and then print
TEST AND CONFIDENCE INTERVAL
OBSERVED VALUE OF t = 1.0827
CORRESPONDING ONE-SIDED P-VALUE
= 0.1603
THE LOWER BOUND OF THE 95% ONESIDED CONFIDENCE
INTERVAL FOR BETA0 - MU IS = -3.9450.
YOU ARE 95% CONFIDENT THAT BETA0
- MU IS GREATER THAN -3.9450.
Hardware requirements and information concerning the availability of this program, along with
other programs for longitudinal data analysis, are
given in the Appendix.
5. Discussion
The program described above can be used to adjust or correct simple change scores for the regression effect, and provides a valid test of the null
hypothesis that the treatment has had no effect. In
order to see more clearly the relationship between
the test and the regression effect, it may be helpful
to note that Ho: X = p + ~(2 - cc) + E implies that
E(X) = clx = CL + P(CL~ - p) which, subtracting pz
193
from both sides, is (6), which is the expected difference between px and pz when only the regression
effect is present (dynamic equilibrium). Similarly,
HA: X = 0s + ~(2 - cc) + E with /3,-, > ~1 implies
E(X) = pLx = &, + ~(2 - ~1) which in turn implies
that CLX - pz > CL + p(pz - p) - pz, i.e., that the
observed change exceeds the regression effect.
Thus, Ho and HA, previously stated in terms of
regression equations, have direct interpretations in
terms of the regression effect: Ho implies that the
expected change is due solely to the regression
effect; HA that the expected change exceeds this
amount.
Note also that HA as specified in (8) models an
additive treatment effect, i.e., the effect is & - p
no matter what the initial value, Z. This is clear
from comparison of (7) and (8): the difference in
the expected responses under HA and H,, is fl,, - ~1
since the pZs cancel out. In certain situations,
multiplicative effects, where the effect of the treatment is allowed to depend on the value of Z (so
that, e.g., those with the lower scores will show the
most improvement) may be more appropriate.
Methods which can be used in these situations are
described in [4,5]. For a good discussion of the differences between these two types of effects, see [6,
p. 681. Further comparison of (7) and (8), and the
method used to obtain the regression of X on 2,
prompts another remark which may provide some
insight into the structure of the technique. &
estimates fit, the population regression coefficient
of X on 2. This may or may not equal p. It will
only if uz = ax (cf (3)), and this may not be true if
the treatment has had an effect. Mee and Chua [2]
suggest that their test will be most sensitive to a
treatment effect when 8, > p; it may be less effective if & < p. In our example, fil = 1.111
necessarily exceeds p so the test may be presumed
to be ‘sensitive.’
The reader will have noted that our discussion
and example have focused on the situation in
which low scores were selected and the treatment
was intended to increase the value of the response
variable. Our program works equally well when
the opposite scenario obtains, i.e., when high
values (e.g., blood pressure) are selected and the
aim of the intervention is to decrease the value of
the measurement. In this case, the user will in-
194
C.J. Kowalski et al. /Int. J. Biomed. Comput. 37 (1994) 189-194
dicate that the TX is intended to DECREASE the
scores of treated subjects. The one-sided contidence interval in this case is given by
I
Bo-p
s (Bo-P)+fl-a(N-2)s
(11)
and the RHS is an upper bound for the expected
decrease in the mean scores for the selected individuals. Should this be positive, the data do not
rule out an increase, and the TX will be judged to
be non-significant. If negative, the TX has had a
significant effect and the RHS represents the
smallest decrease compatable with the data, i.e.,
the reduction will be no less than the RHS.
Finally, we might remark that while the regression phenomenon has been recognized for a long
time, it continues to be overlooked in many situations like the one considered in this paper. For a
number of additional examples in published research, see [7]. It is hoped that the present discussion, and the availability of software to carry out
the associated computations, will have a positive
impact on this situation. On the other hand, it
needs to be recognized that purely statistical adjustments can never represent a completely adequate substitute for the concurrent study of a
control group [8, p. 1931. Such adjustments, including the one described in this paper, should be
used only when the use of a control group is not
feasible.
5.25” or 3.5” diskettes (please request type) by
sending $25 to defray the cost of handling and
license fees. These programs require an 80386- or
80486-based personal computer (PC) running the
MS-DOS operating system (version 5.0 or higher
is recommended, although versions as low as 3.3
will suffice). 80386 computers must also be equipped with an 80387 math coprocessor. At least 4
MB of memory are required, and must be available
to GAUSS386i, i.e., not in use by memory resident
programs such as Windows. EGA or VGA graphic
capabilities are required to display the color
graphics; VGA or SVGA is suggested to display
optimally the graphic results. Runtime modules
are supplied with the programs so that no additional software (i.e., compiler or interpreter) is
required to run these programs. One can create
and edit ASCII data sets for use by these programs
using the full screen editor supplied with MS-DOS
version 5.0. The programs are written and compiled using GAUSS386i, version 3.0, require no additional installation or modification, and are run
with a single command. When requesting the programs, address inquiries to the corresponding
author and make checks payable to Baylor College
of Dentistry.
References
111
121
131
(41
Acknowledgement
PI
Supported by DE08730 from the National Institute of Dental Research.
WI
Appendix
[71
A set of PC programs performing this and related procedures can be obtained on high density
VI
Lord FM: Elementary models for measuring change. In:
Problems in Measuring Change (Ed: CW Harris), Univ.
Wisconsin Press, Madison, 1963.
Mee RW and Chua TC: Regression toward the mean and
the paired sample t test, Am Stat, 45 (1991) 39-42.
McClave JT and Dietrich FH: Sfafisfics, 4th ed., Dellen,
San Francisco, 1988.
James KE: Regression toward the mean in uncontrolled
clinical studies, Biometrics, 29 (1973) 121-130.
Senn SJ and Brown RA: Estimating treatment effects in
clinical trials subject to regression to the mean,
Biometrics, 41 (1985) 555-560.
Hills M: Statistics fir Comparative Studies, Chapman
and Hall, London, 1974.
Nesselroade J, Stigler S and Baltes P: Regression towards
the mean and the study of change,Psychof Bull, 88 (1980)
622-637.
Fleiss JL: The Design and AMlySiS of Clinical ExperiWiley, New York, 1986.
ments,