PC program implementing an alternative to the paired t-test which adjusts for regression to the mean

Charles Kowalski

PC program implementing an alternative to the paired t-test which adjusts for regression to the mean Charles J. Kowalskia, Emet D. Schneiderman*b, Stephen M. Willisb aDepartment of Biologic and Materials Sciences and The Center for Statistical Consultation and Research, University of Michigan, Ann Arbor, MI 48109, USA bDepartment of Oral and Maxiilofacial Surgery and Pharmacology, Baylor College of Dentistry, PO Box 660677, Dallas, TX 75266-0677, USA Received 18 January 1994; accepted 25 February 1994 AbShWt In many biomedical research contexts, treatment effects are estimated from studies based on subjects who have been recruited because of high (low) measurements of a response variable, e.g., high blood pressure or low scores on a stress test. In this situation, simple change scores will overestimate the treatment effect; and the use of the paired t-test may find significant change due not to the treatment per se but, rather, due to regression towards the mean. A PC program implementing a procedure for adjusting the observed change for the regression effect in simple pre-test-post-test experiments is described, illustrated, and made available to interested readers. The method is due to Mee and Chua (Am Stat, 45 (1991) 39-42), and may be considered as an alternative to the paired t-test which separates the effect of the treatment from the so-called regression effect. Keywork Treatment effects; Regression to the mean; Paired t-test; PC program 1. Introduction The term regression to the mean (or, simply, the regression effect) is used to describe the phenomenon that a variable which is extreme on its first measurement will tend to be closer to the center of the distribution when measured at a later point in time. This phenomenon is exemplified in patients with initially high blood pressure measurements who, if measured on a second occasion, will tend to have more average values at that time. It can be * Corresponding author. demonstrated in its most simple form if we assume that the first (Z) and second (x) measurements have a bivariate normal distribution with means pz and px, variances 02z and u2x and correlation p. Then the regression function of X on Z, or the conditional expectation of X given that Z = z, is E(XIZ = z) = /Lo + p$(z - p,) (1) If we rewrite (1) in the form E(XlZ = z) - /.J~ = p$(z - /J,) 0020-7101/94/$07.00 0 1994 Elsevier Science Ireland Ltd. All rights reserved SSDI 0020-7101(94)01017-U (2) 190 C.J. Kowalski et al. /1n1. J. Biomed. Comput. 37 (1994) 189-194 it is seen that, for a given z, the expected distance of X from c(x (the left-hand side of Eq. 2) is proportional to the distance between z and Pi, the constant of proportionality being the regression coefficient Pm = P $ Thus we would expect X to be closer to its mean than z was to its, if lfl,zl < 1, i.e., if Ip I < a&,y. An important special case of(l), which we will need in our development, has been called dynamic equilibrium [l] and is characterized by equal means and variances for Z and X, viz., i&z = Px = CL a22 = - 2 u 2x-u and by a correlation which is somewhat less than perfect ( Ip I < 1). This is a ‘no change’ model which could be used, e.g., in the situation in which a treatment was applied following a baseline measurement and the treatment was completely ineffective. Nevertheless, the regression phenomenon must still be contended with: in this case E(XlZ = z) = p+ p(z - p) (4) and, as shown clearly by Lord [l], subjects with sufficiently high (low) values of z are virtually guaranteed to decrease (increase) when the second measurement is taken. Writing this in the form (2) E(X I Z = z) - /J = p(z - CL) (5) it is clear that we expect X to be closer to the mean than z, provided only that I p I # 1. Notice that, if z > ~1 and p is positive, we still expect that X - p will be positive, but at a reduced level. In many biomedical research contexts the inclusion criteria for a study will necessarily include that subjects have sufficiently high (low) values of the variable of interest, e.g., high values of blood pressure, cholesterol level, or gingivitis; or low scores on an achievement or physical stress test. Since such subjects can be expected to change even in the absence of treatment, it is important to correct the observed change for the regression effect to properly assess the portion of the change which may be attributed to the treatment. In this paper we describe, illustrate and make available a PC program implementing a procedure which explicitly adjusts for that portion of the observed change attributable to regression to the mean. The procedure was developed by Mee and Chua [2], and we will follow them in describing a scenario in which its use would be appropriate. 2. A typical application Mee and Chua [2] considered data taken from [3, p. 4761, reproduced in Table 1 below, which was gathered in the following context. High school students in Florida are required to pass a literacy test before they can graduate. Students who do not pass the examination on their first attempt may take a refresher course, and then retake an equivalent test. The problem is to decide whether or not the refresher course was effective in raising the test scores. Notice first that it would not be appropriate in this situation to do a simple paired r-test. This test fails to account for change due to the regression effect and thus produces a biased estimate of the effect of the course. As we have seen, even if nothing is done between the first and second exams, those individuals who have low scores on the first exam will, on the average, show improvement on the second. Table 1 Literacy test scores before and after taking a remedial course Student Before After Difference 1 2 3 4 5 6 7 8 45 52 63 68 57 55 60 59 49 50 70 71 53 61 62 67 4 -2 7 3 -4 6 2 8 Example data. Source: McClave and Dietrich [3]. C.J. Kowdlski et al. /Int. J. Biomed. Comput. 37 (1994) 189-194 We let 2 denote a score on the first test, and X the corresponding score on the second. 2 will be available for all the students in the school; X will be observed only for those students with values of 2 < k, where k is a suitable cut-off point. Nevertheless, we define p and a2 to be the common mean and variance of the pre- and post-test scores which refer to the situation in which all the students take both examinations. Similarly, p is the correlation between the pre- and post-scores when all students take both exams. We let kz and px be the means for the students who actually take both exams; u*z and a2x are the corresponding variances. Assuming that the refresher course is completely ineffective in raising test scores, because of the regression effect, we do not expect that the differences X - Z will be zero. In fact, the expected difference between the means is kx - Pi! = P + Pbz - d - Pz (6) as was noted in [2]. This will be zero only if p = 1. The property of conditional expectations referred to in [2], necessary to show px = ~1 + p(pz - p), and hence (6) is that cc, = E(X IZ < k) = E[E(XlZ)lZ < k] = E[p + p(Z- r)lZ < k] = g + p(pz - p). We see, then, that even in the absence of any intervention (the school is in dynamic equilibrium), students selected using the criterion Z < k may be expected to change (improve). The amount of change expected is given by (6). It is this amount of change which must be subtracted from the observed change to obtain the change attributable to the intervention. The method of accomplishing this developed in [2] is outlined next. 3. Mee and Chua’s procedure Mee and Chua consider their method to be an alternative to the paired t-test which more properly accounts for the regression effect described above. It is based on regressing X on Z - ~1, taking A the common value of the means of Z and X under dynamic equilibrium, as known. This will be a reasonable assumption when, in the context of our example, the school is large for then one can use this large sample to estimate the mean of Z. 191 The null model may be written, from (4), as H():X=p+p(Z-p)+e (7) and the alternative is H,:X = pi-J + p(Z - p) + c (8) with PO > ~1. This form of the hypothesis (with &, > cc) is for situations in which the treatment or intervention is intended to raise the scores of treated subjects. We continue to consider this case in this section. We show the (slight) changes necessary to deal with the opposite scenario (PO < p) later. In (6) and (7), e is a random error (residual) with E(E) = 0 and Var(e) = u2. We define Zi = Zi - p and compute i=+ I CXi 2 =7 8, = C(2i - z>(Xi - X) C(Zi - ~2 $0 = x - fi,Z and s2 The fitted model Xi = Bo + Br(Zi - cl) is the least squares regression line; and s2 is the mean squared error of the regression (an estimate of a2). Then the test of H, vs. HA is based on (9) 192 C.J. Kowalski et al. /ht. J. Biomed. Comput. 37 (1994) 189-194 where t has N - 2 degrees of freedom. Note that the test may be rephrased as one of &, = JL vs. /I,-, > ~1 so that a one-sided P-value is appropriate and is printed in our program. We also compute and print the (1 - Q) x 100% one-sided contidence interval for /3s - p. The user selects the value of 1 - (Y (e.g., 0.95) to be used. The conlidence interval is given by PO-P = @O-CL)- t1-,W-2)s where ti _ u (N - 2) is the (1 - cr)th percentile of the t distribution with N - 2 degrees of freedom. The right-hand side (RHS) of the inequality in (10) represents a lower bound for the difference /IO - p. When trying to raise the value of the measurement, we expect fro - p 1 0 and the RHS is then the smallest increase consistent with the data. If this is positive, we may conclude, with confidence 1 - o, that the TX (treatment or intervention) effect has in fact raised the mean score, and this increase is at least RHS. If this is negative, this indicates that a negative increase (a decrease) is consistent with the data, the interval contains zero, and the TX effect is non-significant. For the example data in Table 1, the fitted regression function is B = 79.959 + 1.1112(2 - p) with s* = 20.296. Mee and Chua took p= 75 as the true value of ~1 and showed that this leads to t = 1.08, and the corresponding (one-sided) Pvalue is 0.16. The simple paired t-test, on the other hand, gives t = 2.0 with P = 0.04, which will be judged by most as indicating a significant effect. Thus while the ‘total change’ may be significant, when the amount of change due to the regression effect is subtracted - leaving that portion of the change which can be attributed to the intervention - the adjusted change need not be. The one-sided 95% confidence interval for B. - p has the lower bound -3.9450. It answers the question, ‘How bad might the treatment be?’ In our example, we are 95% confident that the refresher course will not lower the postmeasure by more than 3.945 points. The fact that this change can be negative is consistent with the result of the test based on (9) which showed no significant treatment effect. 4. The program The program is invoked by issuing the command ‘gsruni meechua’. The user is first asked for the location and name of the (ASCII or GAUSS) file containing the values of the Z/X @e/post) measurements. The data set should be rectangular with N rows and 2 columns; the premeasures in column 1, followed by the postmeasures. The user is then asked to enter the value of p (MU), and the level of confidence (e.g., 0.95) to be employed in constructing the confidence interval for flo - ~1. Finally, the user indicates whether the TX is expected to INCREASE or DECREASE the mean score for treated subjects. The output includes plots of the ‘growth profiles’ (pre- and post-measurements connected by straight lines) for each of the N individuals; descriptive statistics (means, standard deviations and correlation coefficients) for Z, X, D = X - Z and Z W = 2; the regression coefficients B. and 8,; the value of R* and the mean square error for the regression of X on Z; the P-value for the test of a significant treatment effect; and the confidence interval for PO - p. For the example data, taking p= 75, the numerical form of the output is as follows: DESCRIPTIVE STATISTICS ZBAR = 57.375, SDZ = 6.9885, VZ = 48.8393 XBAR = 60.375, SDX = 8.8146, VX = 77.6964 DBAR = 3.000, SDD = 4.2426, VD = 18.000 ZWBAR = -17.625, SDZW = 6.989, VZW = 48.839 R(ZX) = 0.8810 R(ZD) = 0.1831 R(XZW) = 0.8810 RSQ(XZW) = 0.776 In the above, ZW denotes 2. The value of C.J. KowaW et al. /ht. J. Biomed. Comput. 37 (1994) 189-194 RSQ(XZW) (the square of R(XZW)) measures the strength of the linear relationship between ZW and X, so is useful in judging the goodness-of-fit of the assumed model. At this point in the program we also produce a scatterplot of X vs. ZW to help facilitate this judgement. The descriptive statistics and plot are followed by ESTIMATES OF PARAMETERS BETA0 = 79.9590 BETA1 = 1.1112 MSE = 20.2960 SE = 4.5803 MSE is the mean squared error of the regression of X on ZW (s*). SE is the standard error of &I - p as defined by the denominator of expression (9). Following this we plot the fitted regression of X on ZX superimposed on the previous scatterplot and then print TEST AND CONFIDENCE INTERVAL OBSERVED VALUE OF t = 1.0827 CORRESPONDING ONE-SIDED P-VALUE = 0.1603 THE LOWER BOUND OF THE 95% ONESIDED CONFIDENCE INTERVAL FOR BETA0 - MU IS = -3.9450. YOU ARE 95% CONFIDENT THAT BETA0 - MU IS GREATER THAN -3.9450. Hardware requirements and information concerning the availability of this program, along with other programs for longitudinal data analysis, are given in the Appendix. 5. Discussion The program described above can be used to adjust or correct simple change scores for the regression effect, and provides a valid test of the null hypothesis that the treatment has had no effect. In order to see more clearly the relationship between the test and the regression effect, it may be helpful to note that Ho: X = p + ~(2 - cc) + E implies that E(X) = clx = CL + P(CL~ - p) which, subtracting pz 193 from both sides, is (6), which is the expected difference between px and pz when only the regression effect is present (dynamic equilibrium). Similarly, HA: X = 0s + ~(2 - cc) + E with /3,-, > ~1 implies E(X) = pLx = &, + ~(2 - ~1) which in turn implies that CLX - pz > CL + p(pz - p) - pz, i.e., that the observed change exceeds the regression effect. Thus, Ho and HA, previously stated in terms of regression equations, have direct interpretations in terms of the regression effect: Ho implies that the expected change is due solely to the regression effect; HA that the expected change exceeds this amount. Note also that HA as specified in (8) models an additive treatment effect, i.e., the effect is & - p no matter what the initial value, Z. This is clear from comparison of (7) and (8): the difference in the expected responses under HA and H,, is fl,, - ~1 since the pZs cancel out. In certain situations, multiplicative effects, where the effect of the treatment is allowed to depend on the value of Z (so that, e.g., those with the lower scores will show the most improvement) may be more appropriate. Methods which can be used in these situations are described in [4,5]. For a good discussion of the differences between these two types of effects, see [6, p. 681. Further comparison of (7) and (8), and the method used to obtain the regression of X on 2, prompts another remark which may provide some insight into the structure of the technique. & estimates fit, the population regression coefficient of X on 2. This may or may not equal p. It will only if uz = ax (cf (3)), and this may not be true if the treatment has had an effect. Mee and Chua [2] suggest that their test will be most sensitive to a treatment effect when 8, > p; it may be less effective if & < p. In our example, fil = 1.111 necessarily exceeds p so the test may be presumed to be ‘sensitive.’ The reader will have noted that our discussion and example have focused on the situation in which low scores were selected and the treatment was intended to increase the value of the response variable. Our program works equally well when the opposite scenario obtains, i.e., when high values (e.g., blood pressure) are selected and the aim of the intervention is to decrease the value of the measurement. In this case, the user will in- 194 C.J. Kowalski et al. /Int. J. Biomed. Comput. 37 (1994) 189-194 dicate that the TX is intended to DECREASE the scores of treated subjects. The one-sided contidence interval in this case is given by I Bo-p s (Bo-P)+fl-a(N-2)s (11) and the RHS is an upper bound for the expected decrease in the mean scores for the selected individuals. Should this be positive, the data do not rule out an increase, and the TX will be judged to be non-significant. If negative, the TX has had a significant effect and the RHS represents the smallest decrease compatable with the data, i.e., the reduction will be no less than the RHS. Finally, we might remark that while the regression phenomenon has been recognized for a long time, it continues to be overlooked in many situations like the one considered in this paper. For a number of additional examples in published research, see [7]. It is hoped that the present discussion, and the availability of software to carry out the associated computations, will have a positive impact on this situation. On the other hand, it needs to be recognized that purely statistical adjustments can never represent a completely adequate substitute for the concurrent study of a control group [8, p. 1931. Such adjustments, including the one described in this paper, should be used only when the use of a control group is not feasible. 5.25” or 3.5” diskettes (please request type) by sending $25 to defray the cost of handling and license fees. These programs require an 80386- or 80486-based personal computer (PC) running the MS-DOS operating system (version 5.0 or higher is recommended, although versions as low as 3.3 will suffice). 80386 computers must also be equipped with an 80387 math coprocessor. At least 4 MB of memory are required, and must be available to GAUSS386i, i.e., not in use by memory resident programs such as Windows. EGA or VGA graphic capabilities are required to display the color graphics; VGA or SVGA is suggested to display optimally the graphic results. Runtime modules are supplied with the programs so that no additional software (i.e., compiler or interpreter) is required to run these programs. One can create and edit ASCII data sets for use by these programs using the full screen editor supplied with MS-DOS version 5.0. The programs are written and compiled using GAUSS386i, version 3.0, require no additional installation or modification, and are run with a single command. When requesting the programs, address inquiries to the corresponding author and make checks payable to Baylor College of Dentistry. References 111 121 131 (41 Acknowledgement PI Supported by DE08730 from the National Institute of Dental Research. WI Appendix [71 A set of PC programs performing this and related procedures can be obtained on high density VI Lord FM: Elementary models for measuring change. In: Problems in Measuring Change (Ed: CW Harris), Univ. Wisconsin Press, Madison, 1963. Mee RW and Chua TC: Regression toward the mean and the paired sample t test, Am Stat, 45 (1991) 39-42. McClave JT and Dietrich FH: Sfafisfics, 4th ed., Dellen, San Francisco, 1988. James KE: Regression toward the mean in uncontrolled clinical studies, Biometrics, 29 (1973) 121-130. Senn SJ and Brown RA: Estimating treatment effects in clinical trials subject to regression to the mean, Biometrics, 41 (1985) 555-560. Hills M: Statistics fir Comparative Studies, Chapman and Hall, London, 1974. Nesselroade J, Stigler S and Baltes P: Regression towards the mean and the study of change,Psychof Bull, 88 (1980) 622-637. Fleiss JL: The Design and AMlySiS of Clinical ExperiWiley, New York, 1986. ments,

RELATED PAPERS

RELATED TOPICS

Log In

PC program implementing an alternative to the paired t-test which adjusts for regression to the mean

PC program implementing an alternative to the paired t-test which adjusts for regression to the mean

Related Papers

RELATED PAPERS

RELATED TOPICS