0% found this document useful (0 votes)

17 views

Biostat Lecture 10

The document discusses linear regression and correlation methods. It covers topics like linear relationships between variables, fitting regression lines using least squares, inferences about regression line parameters, goodness of fit measures like R-squared, and hypothesis testing using F-tests and t-tests.

Uploaded by

Makadiewo ba

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Biostat Lecture 10

Uploaded by

Makadiewo ba

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

BIO5312 Biostatistics

Lecture 10
10::Regression and Correlation Methods

Dr. Junchao Xia

Center of Biophysics and Computational Biology

Fall 2016

11/1/2016 1/46
Outline

In this lecture , we will discuss topics in Chapter 11:

Methods of regression and correlation analysis in which two

different variables in the same sample are related.

Multiple-regression analysis, where the relationship between

more than two variables at a time is considered.

Linear regression methods where we learn how to relate a

normally distributed outcome variable y to one or more predictor
variables x1, …, xk where the x’s may be either continuous or
categorical variables.
More details about theoretical derivation can be found on the book,
Mathematical Statistics and Data Analysis, by John A. Rice,
ISBN-13: 978-81-315-0587-8
11/1/2016 2/46
General Concepts
Linear relationship between y and x: E(y|x) =  + x
The line y =  + x is the regression line,  is the intercept and  is the slope of the line.
y =  + x is not expected to be true for every points. Need add an error term e, which
assumes a normal distribution with mean 0 and variance 2. Then, we have y =  + x + e
For any linear-regression equation of the form y =  + x + e, y is called the dependent
variable and x is called the independent variable because we a retrying to predict y as a
function of x.

11/1/2016 3/46
Linear--Regression Examples
Linear

If the variance 2 equal to 0, then If  is equal to 0, then there is no linear

every point would fall exactly on the relationship between x and y.
regresion line.
The larger 2 is the more scatter
occurs about the regression line.

11/1/2016 4/46
Fitting Regression Lines—The Method of Least Squares
The least-squares line, or estimated regression line, is the line y = a + bx that
minimizes the sum of the squared distances of the sample points from the line
given by .
This method of estimating the parameters of a regression line is known as the
method of least squares.

n
S   ( yi  a  bxi ) 2
i 1

S n
  2( yi  a  bxi )(1)  0
a i 1
S n
  2( yi  a  bxi )( xi )  0
b i 1

11/1/2016 5/46
Sum of Squares and Estimations of the Least-Squares Line
The following notation is needed to define the slope and intercept of a regression line.
 Raw sum of squares for x is defined by
Corrected sum of squares for x is denoted by Lxx =
It represents the sum of squares of the deviations of the xi from the mean.
Raw sum of squares for y is defined by
Corrected sum of squares for y is denoted by Lyy =
Raw sum of cross products is defined by
Corrected sum of cross products is defined by
Lxy = With a short form

The coefficients of the least-squares line y = a + bx are given by

b = Lxy/Lxx and

The predicted or average value of y for a given value of x, as estimated from the
fitted regression line, is denoted by = a + bx. Thus, the point (x, a+bx) is always on
the regression line.
11/1/2016 6/46
Inferences about Parameters from Regression Lines
The point (x, y ) falls on the regression line. This is common to all
estimated regression lines because a regression line can be
represented as y = a + bx = y – bx + bx = y + b(x - x) or y - y = b(x - x)

For any sample point (xi,yi), the residual or residual component, of

that point about the regression line is defined by yi – .
For any sample point (xi,yi), the regression component of that point
about the regression line is defined by – y.
11/1/2016 7/46
Regression Lines with Varying Residual Components

11/1/2016 8/46
Decomposition of the Total Sum of Squares
Total sum of squares or Total SS is the sum of squares of the
deviations of the individual sample points from the sample mean

Regression sum of squares or Reg SS is the sum of squares of the

regression components:

Residual sum of squares or Res SS is the sum of squares of the

residual components:

Decomposition of the total sum of squares into regression and

residual components

or Total SS = Reg SS + Res SS

11/1/2016 9/46
F Test for Simple Linear Regression
Goodness-of-fit is considered as the ratio of the regression sum of
squares to the residual sum of squares. A large ratio indicates a good fit,
whereas a small ratio indicates a poor fit.

The regression mean square, or Reg MS, is the Reg SS divided by the
number of predictor variables (k) in the model (not including the
constant). Thus, Reg MS = Reg SS/k. For simple linear regression, k = 1 and
thus Reg MS = Reg SS. For multiple regression, k is > 1. k is referred to as
the degrees of freedom for the regression sum of squares or Reg df.

The residual mean square, or Res MS, is the ratio of the Res SS divided
by (n – k - 1), or Res MS = Res SS/(n – k - 1). For simple linear regression, k
= 1 and Res MS = Res SS/(n-2). We refer to n – k -1 as the degrees of
freedom for the residual sum of squares, or Res df. Res MS is also
sometimes denoted by s2y.x.
s2x=Lxx/(n-1) s2y=Lyy/(n-1)
s2y.x s2xy = Lxy/(n-1)
11/1/2016 10/46
F Test for Simple Linear Regression
Short Computational Form for Regression and Residual SS
Regression SS = bLxy = b2Lxx = L2xy/Lxx
Residual SS = Total SS – Regression SS = Lyy – L2xy/Lxx

F test for simple linear regression

To test H0: =0 vs. H1:   0, use the following procedure:
1)Compute the test statistic
F = Reg MS/Res MS = (L2xy/Lxx)/[(Lyy – L2xy/Lxx)/(n-2)]
That follows an F1,n-2 distribution under H0.
2) For a two-sided test with significance level , if
F > F1,n-2,1- then reject H0; if F ≤ F1,n-2,1- then accept H0.
3) The exact p-value is given by Pr(F1,n-2 > F).

11/1/2016 11/46
Acceptance and Rejection Regions, ANOVA

11/1/2016 12/46
F Test for Simple Linear Regression: Example

The computation of the p-value for regression F test are summarized in an

analysis of variance (ANOVA) table. Results displayed in the ANOVA table
have been obtained using the MINITAB REGRESSION program.

11/1/2016 13/46
R2 for Measuring Goodness of Fit

A summary measure of goodness of fit frequently referred to in the

literature is R2, which is defined as Reg SS/Total SS. R2 can be thought of as
the proportion of the variance y that is explained by x.
i. if R2 = 1, then all variation in y can be explained by variation in x, and all
data points fall on the regression line.
ii. If R2 = 0, then x gives no information about y and the variance of y is
the same with or without knowing x.
iii. If 0 < R2 < 1, then for a given value of x, the variance of y is lower than it
would be if x were unknown but is still greater than 0.
R2 = Reg SS/Total SS,  Adjusted R2 =1-s2 yx/ s2 y when n is large.
Adjusted R2 is more sensitive when n is small.

11/1/2016 14/46
t Test for Simple Linear Regression
To test the hypothesis H0:  = 0 vs. H1:   0, use the following procedure:
1) Compute the test statistic t = b/(s2y.x/Lxx)1/2
2) For a two-sided test with significance level ,
If t > tn-2,1-/2 or t < tn-2,/2 = -tn-2,1-/2 then reject H0.
If -tn-2,1-/2 ≤ t ≤ tn-2,1-/2 then accept H0.
3) The p-value is given by
p = 2 × (area to the left of t under a
tn-2 distribution) if t < 0
p = 2 × (area to the right of t under a
tn-2 distribution0 if t  0

11/1/2016 15/46
Interval Estimation for Linear Regression
Interval estimates for the parameters of a regression line:
Standard errors and are often computed to determine the precision of
estimates.

Two-sided 100% × (1-) confidence intervals for the parameters of a

regression line
If b and a are the estimated slope and intercept of a regression line, resp.,
and se(b), se(a) are the estimated standard errors, then the two-sided
100% × (1-) confidence intervals for  and  are given by
b  tn-2,1-/2se(b) and a  tn-2,1-/2se(a) , resp.

11/1/2016 16/46
Interval Estimation for Linear Regression
Predictions made from regression lines for Individual Observations
The distribution of observed y values for the subset of individuals with
independent variable x is normal with mean = = a+ bx and standard
deviation given by

Furthermore, 100% × (1-) of the observed values will fall within the
interval  tn-2,1-/2se1( )
This interval is sometimes called a 100% × (1-) prediction interval for y.
Standard error and confidence interval for predictions made from
regression lines for the average value of y for a given x
The best estimate of the average value of y for a given x is = a + bx. Its
standard error, denoted by se2( ) , is given by

Furthermore, a two-sided 100% × (1-) confidence interval for the

average value of y is  tn-2,1-/2se2( )
11/1/2016 17/46
Assessing the Goodness of Fit of Regression Lines
Assumptions made in linear-regression models
1. For any given value of x, the corresponding value of y has an average value  +
x, which is a linear function of x.
2. For any given value of x, the corresponding value of y is normally distributed
about  + x with the same variance 2 for any x.
3. For any two data points (x1,y1), (x2,y2) the error terms e1,e2 are independent of
each other.
These assumptions may be tested by using several different kinds of plots. The
simplest being the x-y scatter plot. Here, we plot the dependent variable y vs. the
independent variable x and superimpose the regression line y = a + bx on the
same plot.
Standard deviation of residuals about the fitted regression line
Let (xi,yi) be a sample point used in estimating the regression line, y =  + x.
If y = a + bx is the estimated regression line and = residual for the point (xi,yi)
about the estimated regression line, then = yi – (a + bxi) and

The Studentized residual corresponding to the point (xi,yi) is

11/1/2016 18/46
Plot of Studentized Residuals (Example)

11/1/2016 19/46
Outliers and Influential Points
One commonly used strategy that can be used if unequal residual variances are
present is to transform the dependent variable (y) to a different scale. This type of
transformation is called a variance-stabilizing transformation.
The most common transformations when the residual variance is an increasing function
of x are either the ln or square-root transformations.
The square-root transformation is useful when the residual variance is proportional to
the average value of y. The log transformation is useful when the residual variance is
proportional to the square of the average values.
Sometimes, data may be retained in the original scale but a weighted regression
employed in which the weight is approximately inversely proportional to the residual
variance.
Goodness-of-fit of a regression line may also be judged based on outliers and
influential points.
Influential points are defined heuristically as points that have an important influence on
the coefficients of the fitted regression lines.
An outlier (xi,yi) may or may not be influential depending on its location relative to the
remaining sample points.
If |xi – x| is small, then even a gross outlier will have a relatively small influence on the
slope estimate but will have an important influence on the intercept estimate.

11/1/2016 20/46
Correlation Coefficient
The sample (Pearson) correlation coefficient (r) is defined by Lxy/√LxxLyy. The
correlation is not affected by changes in location or scale in either variable and
must lie between -1 and +1. It is a useful tool for quantifying the relationship
between variables.

Interpretation of the sample correlation coefficient

If the correlation is greater than 0, then the variables are said to be positively
correlated. Two variables (x,y) are positively correlated if as x increases, y tends to
increase, whereas as x decreases, y tends to decrease.
If the correlation is less than 0, then the variables are said to be negatively
correlated. Two variables (x,y) are negatively correlated if as x increases, y tends
to decrease, whereas as x decreases, y tends to increase.
If the correlation is exactly 0, then the variables are said to be uncorrelated. Two
variables (x,y) are uncorrelated if there is no linear relationship between x and y.
The correlation coefficient provides a quantitative measure of the dependence
between two variables: the closer |r| is to 1, the more closely related the
variables are; if |r| = 1, then one variable can be predicted exactly from the other.

11/1/2016 21/46
Relationship between the Sample Correlation Coefficient
(r) and the Populaion Correlation Coefficient ()
Interpreting the sample correlation coefficient (r) in terms of degree of
dependence is only correct if the variables x and y are normally distributed and in
certain other special cases. If the variables are not normally distributed, then the
interpretation may not be correct.
The sample correlation coefficient (r) can be written as

Where s2x = Lxx/(n-1) and s2y = Lyy/(n-1) are sample variances. If we defined sample
covariance s2xy = Lxy/(n-1), we can now re-express the relation as

which is completely analogous to the definition of the population correlation

coefficient ,
 = Corr(X,Y) = Cov(X,Y)/(xy)= xy /(xy)

11/1/2016 22/46
Relationship between Sample Regression Coefficient (b) and the
Sample Correlation Coefficient (r)

OR
Where the regression coefficient (b) can be interpreted as a rescaled
version of the correlation coefficient (r) where the scale factor is the ratio
of the standard deviation of y to that of x. r will be unchanged by a change
in the units of x or y, whereas b is in the units of y/x.

The regression coefficient is used when we specifically want to predict

one variable from another.
The correlation coefficient is used when we simply want to describe the
linear relationship between two variables but do not want to make
predictions.

11/1/2016 23/46
Statistical Inference for Correlation Coefficients
One sample t test for a correlation coefficient
To test the hypothesis H0:  = 0 vs. H1:   0 use the following procedure
1) Compute the sample correlation coefficient r
2) Compute the test statistic t = r(n-2)1/2/(1-r2)1/2
Which under H0 follows a t distribution with n-2 df
For a two-sided level  test,
if t > tn-2,1-/2 or t < -tn-2,1-/2 then reject H0.
If -tn-2,1-/2 ≤ t ≤ tn-2,1-/2, then accept H0.
3) The p-value is given by
p = 2 × (area to the left of t under a tn-2 distribution) if t < 0
p = 2 × (area to the right of t under a tn-2 distribution) if t  0
We assume an underlying normal distribution for each of the random variables used to compute r.

11/1/2016 24/46
One-Sample z Test for a Correlation Coefficient
Fisher’s z transformation of the sample correlation coefficient r
The z transformation of r is z = ½ ln[(1+r)/(1-r)]
is approximately normally distributed under H0 with
mean z0 =1/2 ln[(1+0)/(1-0)] and variance 1/(n-3).
The z transformation is very close to r for small values of r but tends to deviate
substantially from r for larger values of r.
One sample z test for a correlation coefficient
To test the hypothesis H0:  = 0 vs. H1:   0, use the following procedure
1)Compute the sample correlation coefficient r
and the z transformation of r
2)Compute the test statistic  = (z – z0)√n-3
If  > z1-/2 or  < -z1-/2 reject H0.
If -z1-/2 ≤  ≤ z1-/2 accept H0.
3) The exact p-value is given by
p = 2 ×() if  ≤ 0
p = 2 ×[1-()] if  > 0
Assume and underlying normal distribution for each of the random variables used
to compute r and z.
11/1/2016 25/46
One-Sample z Test .vs. t Test

The z test is used to test hypotheses about nonzero null correlations,

whereas the t test is used to test hypotheses about null correlations of
zero.
The z test can also be used to test correlations of zero under the null
hypothesis, but the t test is slightly more powerful and is preferred.
However, if 0  0, then the one-sample z test is very sensitive to non-
normality of either x or y.

11/1/2016 26/46
Interval Estimation for Correlation Coefficients
Confidence limits for  can be derived based on the approximate normality of Fisher’s
z transformation of r. Suppose we have a sample correlation coefficient r based on a
sample of n pairs of observations. To obtain a two-sided 100% × (1-) confidence
interval for the population correlation coefficient ().
1) Compute Fisher’s z transformation of r = z = ½ ln[(1+r)/(1-r)].
2) Let z = Fisher’s z transformation of  = ½ ln[(1+ )/(1- )].
A two-sided 100% ×(1-) confidence interval is given for z = (z1,z2) where
z1 = z – z1-/2/√n-3
z2 = z + z1-/2/√n-3
And z1-/2 = 100% ×(1-/2) percentile of an N(0,1) distribution
3) A two-sided 100% ×(1-) confidence interval for  is then given by (1,2)

The interval (z1,z2) = z  z1-/2/√n-3

Solving for r in terms of z, we get r = (e2z - 1)/(e2z + 1)

11/1/2016 27/46
Two-Sample Test for Correlations
Fisher’s z transformation can also be extended to two-sample
(independent) problems for comparing two correlation coefficients.
To test the hypothesis H0: 1 = 2 vs. H1: 1  2, use the following
procedure:
1) Compute the sample correlation coefficients (r1,r2) and Fisher’s z
transformation (z1,z2) for each of the two samples
2) Compute the test statistic

If  > z1-/2 or  > -z1-/2 reject H0.

If -z1-/2 ≤  ≤ z1-/2 accept H0.
3) The exact p-value is given by
P = 2() if  ≤ o
P = 2×[1 - ()] if  > o
Assume an underlying normal distribution for each of the random
variables used to compute r1,r2 and z1,z2.
11/1/2016 28/46
Wolfe’s Test for Comparing Dependent Correlation Coefficients

Previous slide pertains to the comparison of correlation coefficients obtained

from two independent samples. In some cases, we are interested in comparing
two correlation coefficients obtained from the same subjects.
Suppose we want to test the hypothesis H0: XZ = YZ vs. H1: XZ  YZ
where X,Y, and Z are obtained from the same subjects. We assume X = Y.
1) Under this assumption, these hypotheses are equivalent to the hypothesis:
H0: X-Y,Z = 0 vs. H1: X-Y,Z  0. Hence,
2) We use the one-sample t test for correlation based on the following
test statistic t = r√n-2/√1-r2 ~ tn-2 under H0, where r = Corr(Xi – Yi, Zi).
We reject H0 if t > tn-2,1-/2 or if t < tn-2,/2
We accept H0 if tn-2,/2 ≤ t ≤ tn-2,1-/2
3) The p-value is given by
2 × Pr(tn-2 > t) if t  0,
2 × Pr(tn-2 < t) if t < 0.

11/1/2016 29/46
Multiple Regression

Multiple regression analysis involves determining the relationship

between each of the more than one independent variables (x1,…,xk)
and the dependent variable (y) after taking into account the
remaining independent variables.

Estimation of the regression equation y =  + 1x1 + 2x2 + e where

e is an error term that is normally distributed with mean 0 and
variance 2
If we have k independent variables x1,…, xk then a linear-regression
model relating y to x1,…, xk is of the form y =  + jxj + e
We estimate , 1 ,…, k by a, b1 ,…, bk using the same method of
least squares, where we minimize the sum of
2
 k 
 y  yˆ    y  (a   b j x j )
2

 j 1 
11/1/2016 30/46
An Example of Multiple Regression

Use the SAS PROC REG program to obtain the least squares estimates.

11/1/2016 31/46
An Example of Multiple Regression

11/1/2016 32/46
Partial Regression Coefficients
Suppose we consider the multiple-regression model
y =  + jxj + e
where e follows a normal distribution with mean 0 and variance 2. The j, j = 1, 2…,k are
referred to as partial-regression coefficients. j represents the average increase in y per unit
increase in xj, with all other variables held constant (or stated another way, after adjusting
for all other variables in the model), and is estimated by the parameter bj.
Partial regression coefficients differ from simple linear-regression coefficients. The latter
represent the average increase in y per unit increase in x, without considering any other
independent variables.
If there are strong relationships among the independent variables in a multiple-regression
model, then the partial-regression coefficients may differ considerably from the simple
linear-regression coefficients obtained from considering each independent variable
separately.
The standardized regression coefficient (bs) is given by b × (sx/sy). It represents the
estimated average increase in y (expressed in standard deviation units of y) per standard
deviation increase in x, after adjusting for all other variables in the model.
It is a useful measure for comparing the predictive value of several independent variables
because it tells us the predicted increase in standard-deviation units of y per standard-
deviation increase in x.
By expressing change in standard-deviation units of x, we can control for differences in the
units of measurement for different independent variables.
11/1/2016 33/46
Hypothesis Testing for Multiple Linear Regression: F Test
F test for testing the hypothesis
H0: 1 = 2 = …= k = 0 vs. H1: at least one of the j  0 in multiple linear regression
1) Estimate the regression parameters using the method of least squares, and
compute Reg SS and Res SS, where

xij = jth independent variable for the ith subject, j = 1,…,k; i = 1,…,n
2) Compute Reg MS = Red SS/k, Res MS = Res SS/(n-k-1)
3) Compute the test statistic F = Reg MS/Res MS which follows an Fk,n-k-1
distribution under H0.
4) For a level  test, if F > Fk,n-k-1,1- then reject H0. if F ≤ Fk,n-k-1,1- then accept H0.
5) The exact p-value is given by the area to the right of F under an F > Fk,n-k-1
distribution = Pr(Fk,n-k-1 >F)

11/1/2016 34/46
Rejection Regions and p-Value for F Test

11/1/2016 35/46
Hypothesis Testing for Independent Contribution: t Test
The significant p-values for the previous F test can be attributed to
either variable. We would like to perform significance tests to
identify the independent contributions of each variable.
t test for testing the hypothesis H0: l = 0, All other j  0 vs. H1: l 
0 , all other j  0 in multiple linear regression
1) Compute t = bl/se(bl) which should follow a t distribution
with n – k -1 df under H0.
If t < tn-k-1,/2 or t > tn-k-1,/2 then reject H0.
If tn-k-1,/2 ≤ t ≤ tn-k-1,1-/2 then accept H0.
2) The exact p-value is given by
2 × Pr(tn-k-1 > t) if t  0
2 × Pr(tn-k-1 ≤ t) if t < 0

11/1/2016 36/46
Rejection Regions and p-Value for t Test

11/1/2016 37/46
Partial F test for Partial-Regression Coefficients in
Multiple Linear Regression
To test the hypothesis H0: l = 0, all other j  0 vs. H1: l  0, all other j  0
in multiple linear regression, we
1) Compute F as
F = (Reg SSfull model – Reg SSall variables except l in the model) / Res MSfull model
which should follow an F1,n-k-1 distribution under H0.
2) The exact p-value is given by Pr(F1,n-k-1 > F)
3) It can be shown that the p-value from using the partial F test given in 2)
is the same as the p-value obtained from using the t test as previous slide.

11/1/2016 38/46
Criteria for Goodness of Fit: Partial-Residual Plot
Residual analysis can be performed as the simple linear regression case. Outliers
(with a Studentized residual > 3.0) can be removed for another fitting model.
In a multiple regression model, y is normally distributed with expected value = l +
lxl and variance 2 where l =  + 1x1 + … + l-1xl-1 + l+1xl+1 +….+kxk
Given the values of all other independent variables (x1 ,…, xl-1, xl+1,…, xk)
1) The average value of y is linearly related to xl
2) The variance of y is constant (ie., 2)
3) y is normally distributed.
These assumptions can be validated by a partial-residual plot.
A partial-residual plot characterizing the relationship between the
dependent variable y and a specific independent variable xl in a multiple-
regression setting is constructed as follows:
1) A multiple regression is performed of y on all predictors other than xl (ie., x1 ,…, xl-1,
xl+1 ,.., xk) and the residuals are saved.
2) A multiple regression is performed of xl on all other predictors (ie., x1 ,…, xl-1, xl+1 ,..,
xk) and the residuals are saved.
3) The partial-residual plot is a scatter plot of the residuals in step 1 on the y axis
against the residuals in step 2 on the x axis.
11/1/2016 39/46
Partial-residual Plot Example
If the multiple-regression model holds, then the residuals in step 1
should be linearly related to the residuals in step 2 with slop=l and
variance of 2 .

11/1/2016 40/46
Partial Correlation and Multiple Correlation
Partial Correlation: assess the degree of association between two variables after
controlling for other covariates.
Suppose we are interested in the association between two variables x and y but
want to control for other covariates z1,…., zk. The partial correlation is defined as
the Pearson correlation between two derived variables ex and ey, where
ex = the residual from the linear regression of x on z1 ,…, zk
ey = the residual from the linear regression of y on z1 ,…, zk
Multiple Correlation: assess the degree fo association between one outcome
variable and a linear combination of multiple variables.
Suppose we have an outcome variable y and a set of predictors x1,…, xk.
The maximum possible correlation between y and a linear combination of the
predictors c1x1 + … + ckxk is given by the correlation between y and the regression
function 1x1+…+kxk and is called the multiple correlation between y and [x1,…xk].
It is estimated by the Pearson correlation between y and b1x1+…+bkxk, where b1,…,
bk are the least-squares estimates of 1,…, k.
The multiple correlation can also be shown to be equivalent to √Reg SS/Total SS =
√R2

11/1/2016 41/46
Rank Correlation and t Test
Spearman rank-correlation coefficient (rs) is an ordinary correlation
coefficient based on ranks. Thus, rs = Lxy/ √Lxx Lxy where the L’s are
computed from the ranks rather than from the actual scores.
T test for Spearman Rank Correlation
1) Compute the test statistic ts = rs(√n -2)/ √1-rs2 which under the null
hypothesis of no correlation follows a t distribution with n-2 degrees of
freedom.
2) For a two-sided level  test, if ts > tn-2,1-/2 or ts < tn-2,/2 = -tn-2,1-/2 then
reject H0; otherwise, accept H0.
3) The exact p-value is given by
p = 2 × (area to the left of ts under a tn-2 distribution) if ts < 0
p = 2 × (area to the right of ts under a tn-2 distribution) if ts  0
4) This test is valid only if n  10

11/1/2016 42/46
Rejection Regions and p-Value

11/1/2016 43/46
Interval Estimation for Spearman Rank-Correlation Coefficients
Suppose we have an estimated Spearman rank-correlation rs based on a
sample of size n. To obtain an approximate two-sided 100% ×(1-)
confidence interval for s (the underlying rank correlation) we proceed as
follows:
1) Compute the sample probit Hi and Hi* corresponding to Xi, Yi, where Hi =
-1(Pi), Hi* = -1(Pi*) and Pi = rank(Xi)/(n+1) and Pi* = rank(Xi)/(n+1). The
probit has been referred to as the inverse normal distribution. Thus,
probit (0.5) = z.5 = 0, probit (0.975) = z.975 = 1.96, etc
2) Compute the Pearson correlation r between sample probits given by rh =
corr(Hi,Hi*),which is a sample estimate of the probit correlation h =
corr(Hi,Hi*) where Hi = -1(Pi), Hi* = -1(Pi*).

3) Because rh is a slightly negatively biased estimate of h, we compute the

bias-corrected estimator of h given by rcor,h = rh{1+(1-rh2)/[2(n-4)]}.

4) Let zh = Fisher’s z transform of h  0.5 ln[(1+ h)/(1- h)].

11/1/2016 44/46
5) Compute a 100% × (1-) confidence interval for zh given by (z1h,z2h) = zh
 z1-/2 / √n-3 where zh = Fisher’s z-transform of rcor,h = 0.5 ln[(1-
rcor,h)/(1-rcor,h)].

6) The corresponding 100% ×(1-) confidence interval for h is (r1h,r2h),

where r1h = [exp(2z1h)-1]/[exp(2z1h)+1], r2h= [exp(2z2h)-1]/[exp(2z2h)+1].

7) Furthermore, a 100% ×(1-) confidence interval for s is (rs1,rs2), where

(rs1,rs2) = [(6/)sin-1(r1h/2),(6/)sin-1(r2h/2)].

8) This procedure is valid for n  10. The rationale for this procedure is
that for normally distributed scales such as H and H*, there is a
relationship between the underlying rank correlation and Pearson
correlation given by s,h = (6/)sin-1(h/2) where h = corr(Hi,Hi*) and
s,h = corr(Pi,Pi*). However, because the probit transformation is rank-
preserving, Pi and Pi* are the same in the probit scale and the original
scale. Thus, s,h = s = corr(Pi,Pi*)
11/1/2016 45/46
Summary
In this lecture for Chapter 11, we discussed
 Statistical inference methods for investigating the relationship between
two or more variables.
If only two variables, both of which are continuous, are being studied,
and we wish to predict one variable (the dependent variable) as a
function of the other variable (the independent variable) then simple
linear regression analysis is used.
 Pearson correlation methods are used to determine the association
between two normally distributed variables without distinguishing
between dependent and independent variables.
 Rank correlation may be used if both variables are continuous but not
normally distributed or are ordinal variables.
Multiple regression methods may be used to predict the value of one
variable (the dependent variable which is normally distributed) as a
function of several independent variables.

11/1/2016 46/46
The End

11/1/2016 47/46

qbus 5001 quiz 答案
No ratings yet
qbus 5001 quiz 答案
30 pages
SOLUTIONS 2022 Intro Stats Exam2
No ratings yet
SOLUTIONS 2022 Intro Stats Exam2
13 pages
Confidential: Turn in Exam Question Paper
No ratings yet
Confidential: Turn in Exam Question Paper
20 pages
Cheat Sheet - BT1101
100% (2)
Cheat Sheet - BT1101
29 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Regression Basics: Predicting A DV With A Single IV
No ratings yet
Regression Basics: Predicting A DV With A Single IV
20 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Regression Analysis and Multiple Regression: Session 7
No ratings yet
Regression Analysis and Multiple Regression: Session 7
100 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
51 pages
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
No ratings yet
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
51 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
CH 16 Aslr
No ratings yet
CH 16 Aslr
41 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Simple_linear_regression-Presentation -Review-analysis -covariance
No ratings yet
Simple_linear_regression-Presentation -Review-analysis -covariance
10 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Simple Regression
100% (1)
Simple Regression
50 pages
Regression
No ratings yet
Regression
24 pages
ch11
No ratings yet
ch11
55 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Chapter 2: Simple Linear Regression
No ratings yet
Chapter 2: Simple Linear Regression
58 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Topic 2 Simple Regression Model
No ratings yet
Topic 2 Simple Regression Model
39 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Regression1
No ratings yet
Regression1
32 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
Chapter 10 - 2 - 2
No ratings yet
Chapter 10 - 2 - 2
33 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Application-form-final-UCAD (1)
No ratings yet
Application-form-final-UCAD (1)
4 pages
Finances Publiques
100% (1)
Finances Publiques
300 pages
05 MCF CDD SC Eco
No ratings yet
05 MCF CDD SC Eco
2 pages
2023 01 09 Article Taxation v003
No ratings yet
2023 01 09 Article Taxation v003
27 pages
Statistics For The Behavioral Sciences 4th Edition Susan A. Nolan
No ratings yet
Statistics For The Behavioral Sciences 4th Edition Susan A. Nolan
49 pages
Chapter Seven: Multi-Sample Methods
No ratings yet
Chapter Seven: Multi-Sample Methods
52 pages
STM 3rd Periodical
No ratings yet
STM 3rd Periodical
7 pages
Lev&Thiagaraian 1993JAR
No ratings yet
Lev&Thiagaraian 1993JAR
27 pages
Comparative Analysis of Zomato Swiggy Basedon Consumer Perception
No ratings yet
Comparative Analysis of Zomato Swiggy Basedon Consumer Perception
42 pages
UF Forensic PharmChem ClinicalTox Syllabi Current Catalog
No ratings yet
UF Forensic PharmChem ClinicalTox Syllabi Current Catalog
86 pages
Week 9-1 - H0 and H1 (Updated)
No ratings yet
Week 9-1 - H0 and H1 (Updated)
11 pages
Assignment For The Graded Points
No ratings yet
Assignment For The Graded Points
5 pages
Mann Whitney - Practical
No ratings yet
Mann Whitney - Practical
3 pages
Test Bank Questions Chapters 8 and 9
No ratings yet
Test Bank Questions Chapters 8 and 9
3 pages
Annexure-81. B.A. (Programme)
No ratings yet
Annexure-81. B.A. (Programme)
60 pages
Business Statistics 3 Ed. Edition Govind Chand Beri Download PDF
100% (2)
Business Statistics 3 Ed. Edition Govind Chand Beri Download PDF
84 pages
ABE 463: Engineerin G Statistics
No ratings yet
ABE 463: Engineerin G Statistics
40 pages
Quantitative Methods Midterm - FInal Quiz 2
No ratings yet
Quantitative Methods Midterm - FInal Quiz 2
90 pages
Upload Final Stats 1040 Revised 1 0
No ratings yet
Upload Final Stats 1040 Revised 1 0
6 pages
Where To Put The Hypothesis in A Research Paper
100% (3)
Where To Put The Hypothesis in A Research Paper
8 pages
11430-Article Text-32711-2-10-20200430
No ratings yet
11430-Article Text-32711-2-10-20200430
17 pages
Science Investigatory Project
100% (1)
Science Investigatory Project
25 pages
Even You Can Learn Statistics and Analytics: An Easy to Understand Guide, 4th Edition David M. Levine 2024 scribd download
100% (8)
Even You Can Learn Statistics and Analytics: An Easy to Understand Guide, 4th Edition David M. Levine 2024 scribd download
66 pages
Normality Test
No ratings yet
Normality Test
25 pages
Statistics For Dummies
100% (3)
Statistics For Dummies
41 pages
Chi Square Test
33% (3)
Chi Square Test
7 pages
Solution To Exam 1
No ratings yet
Solution To Exam 1
8 pages
BM 31 1 010502
No ratings yet
BM 31 1 010502
27 pages
Introduction To Hypothesis Testing - III
No ratings yet
Introduction To Hypothesis Testing - III
33 pages
Mathematics - T3 - BAS-403 - Dr. Shiv Charan
No ratings yet
Mathematics - T3 - BAS-403 - Dr. Shiv Charan
5 pages