Chapter 9 Correlation and Regression
Chapter 9 Correlation and Regression
120
Slope = -0.010
80
60
Income
120
Slope = .023
80
60
Income
c. Those two points would almost certainly draw the line toward them, which will
flatten the slope. If we remove those countries we have the second graph with a
steeper slope.
The minimum sample size in this example is 25, and we will use that. We would need t =
2.069 for a two-tailed test on N – 2 = 23 df. A little (well, maybe a lot) of algebra will
show that a correlation of .396 will produce that t value.
9.5 If we put these two predictors together using methods covered in Chapter 15, the multiple
correlation will be .58, which is only a small amount higher than Income alone.
9.7 I suspect that a major reason why this variable does not play a more important role is the
fact that it has very little variance. The range is 3% - 7%. One cause of this may be the
very high death rate among women in sub-saharan Africa. There are many fewer women
giving birth at ages above 40. To quote from a United Nations report
(http://www.un.org/ecosocdev/geninfo/women/women96.htm):
Women are becoming increasingly affected by HIV. Today about 42 per cent of
estimated cases are women, and the number of infected women is expected to reach
15 million by the year 2000.
Approximately 585,000 women die every year, over 1,600 every day, from causes
related to pregnancy and childbirth. In sub-Saharan Africa, 1 in 13 women will die
from pregnancy or childbirth related causes, compared to 1 in 3,300 women in the
United States.
Globally, 43 per cent of all women and 51 per cent of pregnant women suffer from
iron-deficiency anemia.
9.9 Psychologists are very much interested in studying variables related to behavior and in
finding ways to change behavior. I would guess that they would have a good deal to say
about educating women in ways that would decrease infant mortality.
d 1 .20
1 N 1 .20 24 0.98
power .17
9.15 Number of symptoms predicted for a stress score of 8 using the data in Table 9.2 :
1 Xi X Xi X
2 2
1
sY . X sY . X 1 0.1726 1
N N 1 s X2 107 106 156.05
Y 0.00856 X 4.30
t /2 1.983
CI (Y ) Yˆ (t /2 )(sY' . X )
For several different values of X, calculate Y and s'Y.X and plot the results.
X= 0 10 20 30 40 50 60
Y = 4.300 4.386 4.471 4.557 4.642 4.728 4.814
The curvature is hard to see, but it is there, as can be seen in the graphic on the right,
which plots the width of the interval as a function of X. (It’s fun to play with R).
a.
a
Coefficients
Standardized
Unstandardized Coefficients Coefficients
Standardized
Unstandardized Coefficients Coefficients
c. Child Means
Descriptives
child
Parent means
Descriptives
midparent
e.
1 N 1
2.80 .40 N 1
N 1 2.80 / .40 7
N 50
9.25 It is difficult to tell whether the significant difference between the results of the two
previous problems is to be attributable to the larger sample sizes or the higher (and thus
more different) values of r'. It is likely to be the former.
ALCOHOL TOBACCO
ALCOHOL Pearson Correlation 1.000 .224
Sig. (2-tailed) . .509
N 11 11
TOBACCO Pearson Correlation .224 1.000
Sig. (2-tailed) .509 .
N 11 11
b. The data suggest that people from Northern Ireland actually drink relatively little.
6.5
6.0
5.5
Alcohol use
5.0
4.5
4.0
3.5
2.5 3.0 3.5 4.0 4.5 5.0
T obacco use
c. With Northern Ireland excluded from the data the correlation is .784, which is
significant at p = .007.
225
198
171
Weight
144
117
90
60 65 70 75 80
Height
The regression solution that follows was produced by SPSS and gives all relevant results.
Model Summaryb
Adjusted St d. Error of
Model R R Square R Square the Estimate
1 .604a .364 .353 14.9917
a. Predictors: (Constant), HEIGHT
b. Gender = Male
ANOVAb,c
Sum of
Model Squares df Mean Square F Sig.
1 Regression 7087.800 1 7087.800 31.536 .000a
Residual 12361.253 55 224.750
Total 19449.053 56
a. Predictors: (Const ant), HEIGHT
b. Dependent Variable: WEIGHT
c. Gender = Male
Coeffi ci entsa,b
St andardi
zed
Unstandardized Coef f icien
Coef f icients ts
Model B St d. Error Beta t Sig.
1 (Constant) -149.934 54.917 -2.730 .008
HEIGHT 4.356 .776 .604 5.616 .000
a. Dependent Variable: WEIGHT
b. Gender = Male
With a slope of 4.36, the data predict that two males who differ by one inch will also
differ by approximately 4 1/3 pounds. The intercept has no meaning because people are
not 0 inches tall, but the fact that it is so largely negative suggests that there is some
curvilinearity in this relationship for low values of Height.
Tests on the correlation and the slope are equivalent tests when we have one predictor,
and these tests tell us that both are significant. Weight increases reliably with increases in
height.
a. I weigh 146 pounds. (Well, I did two years ago.) Therefore the residual in the
prediction is Y- Y = 146 - 146.27 = -0.27.
b. If the students on which this equation is based under- or over-estimated their own
height or weight, the prediction for my weight will be based on invalid data and will
be systematically in error.
9.35 The male would be predicted to weigh 137.562 pounds, while the female would be
predicted to weigh 125.354 pounds. The predicted difference between them would be
12.712 pounds.
The data were plotted by “trial”, where a larger trial number represents an observation
later in the sequence.
Although the regression line has a slight positive slope, the slope is not significantly
different from zero. This is shown below.
ANALYSIS OF VARIANCE
SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P
There is not a systematic linear or cyclical trend over time, and we would probably be
safe in assuming that the observations can be treated as if they were independent. Any
slight dependency would not alter our results to a meaningful degree.
Eris doesn’t fit the plot as well as I would have liked. It is a bit too far away.
9.41 Comparing correlations in males and females.
r1' r2'
z
1 1
N1 3 N 2 3