Correlation and Simple Linear Regression: Statistical Concepts Series
Correlation and Simple Linear Regression: Statistical Concepts Series
Correlation and Simple Linear Regression: Statistical Concepts Series
Radiology
617
Radiology
Figure 1. Scatterplots of four sets of data generated by means of the following Pearson correlation coefficients (from left to right): r ⫽ 0
(uncorrelated data), r ⫽ 0.8 (strongly positively correlated), r ⫽ 1.0 (perfectly positively correlated), and r ⫽ ⫺1 (perfectly negatively correlated).
Regression
We first conducted a simple linear re-
gression analysis of the data on a log scale
(n ⫽ 19); results are shown in Table 3. The
value calculated for R2 was 0.73, which
suggests that 73% of the variability of the
data could be explained by the linear re-
gression. Figure 4. Scatterplot of the log of dose (y
Figure 3. Scatterplot of the log of dose (y
The regression line, expressed in the axis) versus the log of total time (x axis). Each axis) versus the log of total time (x axis). The
form given in Equation (1), is Y ⫽ point in the scatterplot represents the values of regression line has the intercept a ⫽ ⫺9.28 and
⫺9.28 ⫹ 2.83X, where the predictor vari- two variables for a given observation. slope b ⫽ 2.83. We conclude that there is a
able X represents the log of total time, possible association between the radiation
dose and the total time of the procedure.
and the outcome variable Y represents
the log of dose. The estimated regression
parameters are a ⫽ ⫺9.28 (intercept) and total time is specified at x ⫽ 4 (translated to
b ⫽ 2.83 (slope) (Fig 4). This regression e4 ⫽ 55 minutes, approximately), then the
TABLE 3
line can be interpreted as follows: At X ⫽ log dose that is to be applied is approxi- Results based on Correlation and
0, the value of Y is ⫺9.28. For every one- mately y ⫽ ⫺9.28 ⫹ 2.83 ⫻ 4 ⫽ 2.04 (trans- Regression Analysis for Example
unit increase in X, the value of Y will lated to e2.04 ⫽ 7.69 rad). On the other Data
increase on average by 2.83. Effects of hand, if the log total time is specified at x ⫽
Regression Statistic Numerical Result
both the intercept and slope are statisti- 4.5 (translated to e4.5 ⫽ 90 minutes, ap-
cally significant (P ⬍ .005) (Excel; Mi- proximately), then the log dose that is to Correlation coefficient r 0.85
be applied is approximately y ⫽ ⫺9.28 ⫹ R-square (R2) 0.73
crosoft, Redmond, Wash); therefore, the Regression parameter
null hypothesis (H0, the dose remains 2.83 ⫻ 4.5 ⫽ 3.46 (translated to e3.46 ⫽
Intercept ⫺9.28
constant as the total procedure time in- 31.82 rad). Such prediction can be useful Slope 2.83
creases) is rejected. Thus, we confirm the for future clinical practice.
Source.—Reference 11.
alternative hypothesis (H1, the dose in-
creases in the total procedure time). SUMMARY AND REMARKS
The regression line may be used to give
predicted values of Y. For example, if in a Two important statistical concepts, cor- commonly in radiology research, are re-
future CT fluoroscopy procedure, the log relation and regression, which are used viewed and demonstrated herein. Addi-
冘
n
The Spearman may also be computed by Correlation coefficient.—A statistic be-
兲共Y i ⫺ Y
共X i ⫺ X 兲
first reducing the continuous data to their tween ⫺1 and 1 that measures the associa-
冑冘
i⫽1
r⫽ , marginal ranks by using the “rank and per- tion between two variables.
冘
n n centile” option with Data Analysis Tools Intercept.—The constant a in the regres-
兲
共X i ⫺ X 2 兲
共Y i ⫺ Y 2 (Excel; Microsoft) or the “rank” function sion equation, which is the value for y when
i⫽1 i⫽1 (Insightful; MathSoft) or the free software. x ⫽ 0.
Both software programs correctly rank the Least squares method.—The regression line
where X and Y are the sample means of the data in ascending order. However, the rank that is the best fit to the data for which the
Xi and Yi values, respectively. and percentile option in Excel ranks the sum of the squared residuals is minimized.
The Pearson correlation coefficient may data in descending order (the largest is 1). Outlier.—An extreme observation far
be computed by means of a computer-based Therefore, to compute the correct ranks, away from the bulk of the data, often
statistics program (Excel; Microsoft) by us- one may first multiply all of the data by ⫺1 caused by faulty measuring equipment or
ing the option “Correlation” under the op- and then apply the rank function. Excel recording error.
tion “Data Analysis Tools”. Alternatively, it also gives integer ranks in the presence of Pearson correlation coefficient.—Sample
may also be computed by means of a ties compared with the methods that yield correlation coefficient for measuring the
built-in software function “Cor” (Insightful; possible noninteger ranks, as described in linear relationship between two variables.
MathSoft, Seattle, Wash [MathSoft S-Plus 4 the standard statistics literature (19). R2.—The square of the Pearson correla-
guide to statistics, 1997; 89 –96]. Available Subsequently, the sample correlation co- tion coefficient r, which is the fraction of
at: www.insightful.com) or with a free soft- efficient is computed on the basis of the the variability in Y that can be explained by
served values of the outcome variable and Prentice Hall, 1985; 72–98. based on ranks. Malabar, Fla: Krieger,
the fitted values based on a linear regression 5. Freund JE. Mathematical statistics. 5th ed. 1991; 200 –205.
Upper Saddle River, NJ: Prentice Hall, 20. Kendall M, Gibbons JD. Rank correlation
analysis.
1992; 494 –546. methods. 5th ed. New York, NY: Oxford
Scatterplot.—A plot of the observed biva- 6. Spearman C. The proof and measurement University Press, 1990; 8 –10.
riate outcome variable (y axis) against its of association between two things. Am J 21. Zou KH, Hall WJ. On estimating a trans-
predictor variable (x axis), with a dot for Psychol 1904; 15:72–101. formation correlation coefficient. J Appl
each pair of bivariate observations. 7. Fieller EC, Hartley HO, Pearson ES. Tests Stat 2002; 29:745–760.
Simple linear regression analysis.—A linear for rank correlation coefficient. I. Bio- 22. Fisher RA. Frequency distributions of the
metrika 1957; 44:470 – 481. values of the correlation coefficient in
regression analysis with one predictor and 8. Fieller EC, Pearson ES. Tests for rank cor- samples from an indefinitely large popu-
one outcome variable. relation coefficients. II. Biometrika 1961; lation. Biometrika 1915; 10:507–521.
Skewed data.—A distribution is skewed if 48:29 – 40. 23. Duncan OD. Path analysis: sociological
there are more extreme data on one side of 9. Kruskal WH. Ordinal measurement of asso- examples. In: Blalock HM Jr, ed. Causal
the mean. Otherwise, the distribution is ciation. J Am Stat Assoc 1958; 53:814 –861. models in the social sciences. Chicago,
symmetric. 10. David FN, Mallows CL. The variance of Ill: Alpine-Atherton, 1971; 115–138.
Spearman’s rho in normal samples. Bio- 24. Rubin DB. Estimating casual effects of
Slope.—The constant b in the regression metrika 1961; 48:19 –28. treatments in randomized and nonran-
equation, which is the change in y that cor- 11. Silverman SG, Tuncali K, Adams DF, domized studies. J Ed Psych 1974;
responds to a one-unit increase (or de- Nawfel RD, Zou KH, Judy PF. CT fluoros- 66:688 –701.
crease) in x. copy-guided abdominal interventions: 25. Holland P. Statistics and causal inference.
Spearman .—A rank correlation coeffi- techniques, results, and radiation expo- J Am Stat Assoc 1986; 81:945–970.
sure. Radiology 1999; 212:673– 681. 26. Angrist JD, Imbens GW, Rubin DB. Iden-
cient for measuring the monotone relation-
12. Daniel WW. Biostatistics: a foundation tification of causal effects using instru-
ship between two variables. for analysis in the health sciences. 7th ed. mental variables. J Am Stat Assoc 1996;
New York, NY: Wiley, 1999. 91:444 – 455.
13. Altman DG. Practical statistics for medi- 27. Seber GAF. Linear regression analysis.
Acknowledgments: We thank Kimberly E. cal research. Boca Raton, Fla: CRC, 1990. New York, NY: Wiley, 1997; 48 –51.
Applegate, MD, MS, and Philip E. Crewson, 14. Neter J, Wasserman W, Kutner MH. Ap- 28. Carroll RJ, Ruppert D. Transformation
PhD, co-editors of this Statistical Concepts Se- plied linear models: regression, analysis and weighting in regression. New York,
ries in Radiology for their constructive com- of variance, and experimental designs. NY: Chapman & Hall, 1988; 2– 61.
ments on earlier versions of this article. 3rd ed. Homewood, Ill: Irwin, 1990; 38 – 29. Box GEP, Cox DR. An analysis of trans-
44, 62–104. formation. J R Stat Soc Series B 1964; 42:
15. Galton F. Typical laws of heredity. Proc R 71–78.
References Inst Great Britain 1877; 8:282–301. 30. Tello R, Crewson PE. Hypothesis testing
1. Krzanowsk WJ. Principles of multivariate 16. Galton F. Correlations and their measure- II: means. Radiology 2003; 227:1– 4.
analysis: a user’s perspective. Oxford, En- ments, chiefly from anthropometric data. 31. Mudholkar GS, McDermott M, Scrivas-
gland: Clarendon, 1988; 405– 432. Proc R Soc London 1888; 45:219 –247. tava DK. A test of p-variate normality.
2. Rodriguez RN. Correlation. In: Kotz S, 17. Pearson K. Mathematical contributions Biometrika 1992; 79:850 – 854.
Johnson NL, eds. Encyclopedia of statisti- to the theory of evolution. III. Regression,