Correlation & Regression
Correlation & Regression
Correlation & Regression
CORRELATION
& REGRESSION
EXERCISE
Divya Krishnamohan
Student ID: 200292988
Divya Krishnamohan
Student ID: 200292988
Testing for an association between leaf area and root starch concentration in a clonal tree
species (both variables continuous and normally distributed).
Testing for an association between the diversity of flowering plant species (ranked) and the
number of visiting pollinator species within the study area (normal distribution of variables
not required).
Linear regression
Testing the relationship between the degree of nuptial shading* (continuous measure,
residuals normally distributed) and the availability of mates in a species of fish (linearly
related to dependent variable).
Logistic regression
Testing the effect of genetic distance on the sex* of sterile offspring (binomial distribution) in
five hybrid species pairs.
Analysis of covariance
Testing the effect of soil permeability (ranked), litter depth and soil type (covariate) on the
rate of ant re-colonisation* (residuals normally distributed).
Q2
Divya Krishnamohan
Student ID: 200292988
i. Number of sites used in the Christmas Bird Count at which the species
was recorded (abbreviated as CBC circles)
ii. Number seen in the Christmas Bird Count (abbreviated as CBC
number).
A histogram of the data sets CBC number and CBC circles reveals a strong skew
in the case of the former, and a moderate skew in the latter. (Refer Fig. 1)
Usually, a logarithmic transformation is applied to correct strong skews while a
square root transformation is applied to correct moderate skews. Both
transformations were applied respectively to the data and the normality assessed
by means of the Shapiro-Wilk Test (used as there are fewer than 50 cases).
(Refer Fig. 2, Table 1)
Divya Krishnamohan
Student ID: 200292988
Histogram
30
25
20
Frequency
15
10
Mean =194135.47
Std. Dev. =358003.762
0
N =32
0 250000 500000 750000 1000000 1250000 1500000
Histogram
CBC number
12
10
8
Frequency
Mean =375.66
Std. Dev. =262.151
0
N =32
0 200 400 600 800 1000 1200
CBC circles
Fig. 1. Histograms showing the distribution of untransformed data – CBC
numbers and CBC circles.
Divya Krishnamohan
Histogram Student ID: 200292988
12
10
8
Frequency
Mean =4.8193
Std. Dev. =0.70313
0
N =32
3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50
6
Frequency
Mean =18.0935
Std. Dev. =7.05973
N =32
0
5.00 10.00 15.00 20.00 25.00 30.00 35.00
square root (CBC circles)
Tests of Normality
Shapiro-Wilk
Statistic df Sig.
CBC number .513 32 .000
log10 (CBC number) .948 32 .124
CBC circles .922 32 .024
sqrt (CBC circle) .947 32 .121
It is apparent that the transformations applied have helped normalise the data.
The results of a Pearson’s product-moment correlation are described in the table
below.
Table 3. Spearman’s Rank Correlation for CBC circles and CBC number
Correlations
Being continuous, count data, the most appropriate test for determining
whether there is an association between a measure of host abundance and
Mallophaga diversity is a linear regression analysis.
Observed
Predicted
Dependent Variable: Mallophaga spp
Std. Residual
Model: Intercept
Fig. 3. Residual plot of Mallophaga species variable showing a homscedastic
distribution of variance.
Mallophaga species
0
250000 500000 750000 1000000 1250000 1500000
CBC number
Mallophaga spcies
0
CBC circles
Divya Krishnamohan
Student ID: 200292988
Mallophaga species
4
log(CBC number)
NB: Transforming CBC circles and even the Mallophaga species variable doesn’t
help increase the linearity of the relationship between these two variables.
Variables
Model Variables Entered Removed Method
1
CBC circles,
. Enter
log (CBC number) (a)
2
Backward
(criterion:
. CBC circles Probability of F-
to-remove >=
.100).
The backward multiple regression analysis revealed that of the two possible
models, (Model 1: Dependent variable – Mallophaga species, independent
variables – log CBC Numbers and CBC circles; and Model 2: Dependent
variable – Mallophaga species, independent variable – log CBC number), Model
1 had an R2 value marginally higher than Model 2. (R2=0.133 and R2=0.130,
respectively).
ANOVA(c)
Sum of
Model Squares df Mean Square F Sig.
1 Regression 11.855 2 5.927 2.232 .125(a)
Residual 77.020 29 2.656
Total 88.875 31
2 Regression 11.596 1 11.596 4.502 .042(b)
Residual 77.279 30 2.576
Total 88.875 31
Variance Proportions
Condition
Model Dimension Eigen value Index log CBC
(Constant) number CBC circles
1 1 2.789 1.000 .00 .00 .02
2 .204 3.695 .02 .01 .65
3 .007 20.069 .98 .99 .33
2 1 1.990 1.000 .01 .01
2 .010 13.999 .99 .99
The Condition index (representative of the square roots of the ratios of the
largest eigenvalue to each successive eigenvalue), if greater than 15 (seen in
Model 1, dimension 3), indicates a possible problem with collinearity.
Further, the backward multiple regression analysis reveals that the excluded
variable, CBC circles has a test static and significance indicating a non-linear
relationship with the dependent variable (t=0.312, P=.757, where t tests the null
hypothesis that the regression coefficient is zero).
Coefficients (a)
Unstandardized Standardized
Coefficients Coefficients
Conclusion:
The coefficient’s table (refer Table 7) may be used to predict the diversity of
Mallophaga given a specific abundance of duck hosts.
It may be noted however, that the regression is a weak one as the R2 value of the
model indicates that only 13% of the Mallophaga variance is explained. It is
evident that there are other, more significant factors that influence Mallophaga
diversity; however, these factors lie outside the scope of this analysis.