L1 Introduction To Multivariate Analysis PDF
L1 Introduction To Multivariate Analysis PDF
WELCOME!
Lecture 1: Introduction
Introduction
- Bradley Efron
(1938 – )
2
Introduction
3
REVIEW
𝝈 𝟐
𝒏 ≥ 𝟏𝟔 ×
𝒅
*Lehr, R.: Sixteen s-squared over d-squared: a relation for crude sample size estimates. Statistics in Medicine
11: 1099-1102, 1992.
*Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, second edition, Academic Press, New York,
1988.
Review
What sample size for each group should be taken to satisfy the
above conditions?
Review Table 1-1, page 10 of the text
About 65 0.8
σ 2 20 2
Sample size formula: 𝑛 ≥ 16 × = 16 × = 𝟔𝟒
𝑑 10
Multivariate Data
11
Multivariate Data
Multiple Linear Regression: fit a plane to a set of points in
three dimensions, 𝑌 = 𝑋1 + 𝑋2 .
12
Multivariate Data
Multivariate Linear Regression:
𝑌1 𝑌2 𝑌3 … 𝑌𝑚 = 𝑋1 + 𝑋2 + 𝑋3 + … + 𝑋𝑝
13
The variate
Example. Variate:
𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 =
• overall satisfaction
• the likelihood to recommend
• the probability of purchasing again
𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑠𝑎𝑡𝑖𝑠𝑓𝑎𝑐𝑡𝑖𝑜𝑛 =
• • • • •
• • •
• • • •
• • • •
• •
• • •
• • •
Validity and reliability: dart board analogy
• •
• • •
• ••
•
Validity and reliability: dart board analogy
• •
•
•
• •
• • •
• • •
•
•
• •
• • •
Validity and reliability: dart board analogy
• • •
•• •••
• ••
Multivariate Data
• Age
• Education Independent
variables
• Physical activity level
• Candy intake
Dependent
• Body Mass Index
variable
• Age
• Education
• Physical activity level
• Candy intake
One response
• Body Mass Index variable
• Age (numerical)
• Education (ordinal)
• Physical activity level (numerical)
• Candy intake (ordinal)
• Body Mass Index (numerical)
Dependence Techniques
M = metric, N = Nonmetric
A classification of multivariate techniques
Interdependence Techniques
Exercise.
SERVER 1, 4, 9
FOOD 5, 8*
Variables
Cases/ 𝒀𝟏 𝒀𝟐 𝒀𝟑 … 𝒀𝒎 𝑿𝟏 𝑿𝟐 𝑿𝟑 … 𝑿𝒑
Observations/
Subjects
1 67.3 4.2 158.6 … M 0.114 …
2 71.3 … M …
3 65.8 … F …
… … …
n 68.7 … M …
Multivariate Statistical Methods
• Missing data
• Outliers
• Multicollinearity
Missing data
Data layout:
Variables
Cases 𝑿𝟏 𝑿𝟐 𝑿𝟑 𝑿𝟒
1 72.5 M 3.44 1456
2 . F 2.72 . “.” = missing value
3 54.8 M 3.94 2127
4 65.8 M . 1548
Important considerations:
Major breakthroughs in the handling of missing data came in the 1970s with the advent of
and
• Multiple imputation
Availability of these methods in statistical packages did not occur until the 1990s.
Missing data
General guidelines:
(i) Any variable or case with 50% or more missing values should be deleted.
(ii) Under 10% missing data for an individual case can generally be ignored if the
missingness is random.
(iii) If less than 25% of the data are missing, then parameter estimates will generally be
accurate*.
*Demeritus, H, Freels, S.A., & Yucel, R.M. (2008). Plausibility of multivariate normality assumption when multiple imputing non-
Gaussian continuous outcomes: a simulation assessment. Journal of Statistical Computation and Simulation, 78, 69-84.
Missing data
SIMPLE METHODS FOR HANDLING MISSING DATA
Listwise deletion: calculations based only on complete cases (cases with nonmissing values for all variables).
Variables
Case 𝑿𝟏 𝑿𝟐 𝑿𝟑 𝑿𝟒
1 72.5 M . 227
2 68.3 F 2.72 185
3 . M 3.94 178
4 59.7 M 3.54 278
Example:
pairwise deletion: 𝑋ത1 = 66.833 is based on 3 observations, 𝑋ത4 = 217.0 is based on 4
observations.
listwise deletion: all variable means are based on 2 observations.
Outliers
(i) Never omit an outlier without a strong justification for doing
so.
*Tabachnik, B.G. and Fidell, L.S., Using Multivariate Statistics, 5th edition, 2007, page 88.
**Myers, R.H.: Classical and Modern Regression With Applications, Second Edition, PWS-Kent Publishing Co., Boston, MA., 1990, page 369.
Multicollinearity, continued
Simplest solutions:
RED