Correlation and Simple Linear Regression: Statistical Concepts Series
Figure 1. Scatterplots of four sets of data generated by means of the following Pearson correlation coefficients (from left to right): r ⫽ 0
(uncorrelated data), r ⫽ 0.8 (strongly positively correlated), r ⫽ 1.0 (perfectly positively correlated), and r ⫽ ⫺1 (perfectly negatively correlated).
We first conducted a simple linear re-
gression analysis of the data on a log scale
(n ⫽ 19); results are shown in Table 3. The
value calculated for R2 was 0.73, which
suggests that 73% of the variability of the
data could be explained by the linear re-
gression. Figure 4. Scatterplot of the log of dose (y
Figure 3. Scatterplot of the log of dose (y
The regression line, expressed in the axis) versus the log of total time (x axis). Each axis) versus the log of total time (x axis). The
form given in Equation (1), is Y ⫽ point in the scatterplot represents the values of regression line has the intercept a ⫽ ⫺9.28 and
⫺9.28 ⫹ 2.83X, where the predictor vari- two variables for a given observation. slope b ⫽ 2.83. We conclude that there is a
able X represents the log of total time, possible association between the radiation
dose and the total time of the procedure.
and the outcome variable Y represents
the log of dose. The estimated regression
parameters are a ⫽ ⫺9.28 (intercept) and total time is specified at x ⫽ 4 (translated to
b ⫽ 2.83 (slope) (Fig 4). This regression e4 ⫽ 55 minutes, approximately), then the
line can be interpreted as follows: At X ⫽ log dose that is to be applied is approxi- Results based on Correlation and
0, the value of Y is ⫺9.28. For every one- mately y ⫽ ⫺9.28 ⫹ 2.83 ⫻ 4 ⫽ 2.04 (trans- Regression Analysis for Example
unit increase in X, the value of Y will lated to e2.04 ⫽ 7.69 rad). On the other Data
increase on average by 2.83. Effects of hand, if the log total time is specified at x ⫽
Regression Statistic Numerical Result
both the intercept and slope are statisti- 4.5 (translated to e4.5 ⫽ 90 minutes, ap-
cally significant (P ⬍ .005) (Excel; Mi- proximately), then the log dose that is to Correlation coefficient r 0.85
be applied is approximately y ⫽ ⫺9.28 ⫹ R-square (R2) 0.73
crosoft, Redmond, Wash); therefore, the Regression parameter
null hypothesis (H0, the dose remains 2.83 ⫻ 4.5 ⫽ 3.46 (translated to e3.46 ⫽
Intercept ⫺9.28
constant as the total procedure time in- 31.82 rad). Such prediction can be useful Slope 2.83
creases) is rejected. Thus, we confirm the for future clinical practice.
Source.—Reference 11.
alternative hypothesis (H1, the dose in-
creases in the total procedure time). SUMMARY AND REMARKS
The regression line may be used to give
predicted values of Y. For example, if in a Two important statistical concepts, cor- commonly in radiology research, are re-
future CT fluoroscopy procedure, the log relation and regression, which are used viewed and demonstrated herein. Addi-
The Spearman may also be computed by Correlation coefficient.—A statistic be-
兲共Y i ⫺ Y
共X i ⫺ X 兲
first reducing the continuous data to their tween ⫺1 and 1 that measures the associa-
r⫽ , marginal ranks by using the “rank and per- tion between two variables.
n n centile” option with Data Analysis Tools Intercept.—The constant a in the regres-
共X i ⫺ X 2 兲
共Y i ⫺ Y 2 (Excel; Microsoft) or the “rank” function sion equation, which is the value for y when
i⫽1 i⫽1 (Insightful; MathSoft) or the free software. x ⫽ 0.
Both software programs correctly rank the Least squares method.—The regression line
where X and Y are the sample means of the data in ascending order. However, the rank that is the best fit to the data for which the
Xi and Yi values, respectively. and percentile option in Excel ranks the sum of the squared residuals is minimized.
The Pearson correlation coefficient may data in descending order (the largest is 1). Outlier.—An extreme observation far
be computed by means of a computer-based Therefore, to compute the correct ranks, away from the bulk of the data, often
statistics program (Excel; Microsoft) by us- one may first multiply all of the data by ⫺1 caused by faulty measuring equipment or
ing the option “Correlation” under the op- and then apply the rank function. Excel recording error.
tion “Data Analysis Tools”. Alternatively, it also gives integer ranks in the presence of Pearson correlation coefficient.—Sample
may also be computed by means of a ties compared with the methods that yield correlation coefficient for measuring the
built-in software function “Cor” (Insightful; possible noninteger ranks, as described in linear relationship between two variables.
MathSoft, Seattle, Wash [MathSoft S-Plus 4 the standard statistics literature (19). R2.—The square of the Pearson correla-
guide to statistics, 1997; 89 –96]. Available Subsequently, the sample correlation co- tion coefficient r, which is the fraction of
at: or with a free soft- efficient is computed on the basis of the the variability in Y that can be explained by
