Methods notes
Methods notes
Confidence interval needs sample mean, sd, and sample size. 95% of
the time, the CI of a sample will include the population mean.
The logarithm and square root transformations are commonly used for
positive data, and the multiplicative inverse (reciprocal) transformation
can be used for non-zero data. The power transformation is a family of
transformations parameterized by a non-negative value λ that includes
the logarithm, square root, and multiplicative inverse as special cases.
If values are naturally restricted to be in the range 0 to 1, not including
the end-points, then a logit transformation may be appropriate: this
yields values in the range (−∞,∞).
OLS assumptions:
⁃ Weak exogeneity. This essentially means that the predictor
variables x can be treated as fixed values, rather than random
variables. This means, for example, that the predictor variables are
assumed to be error-free—that is, not contaminated with measurement
errors.
⁃ Linearity. This means that the mean of the response variable is
a linear combination of the parameters (regression coefficients) and
the predictor variables.
⁃ Constant variance (a.k.a. homoscedasticity). This means that
the variance of the errors does not depend on the values of the
predictor variables.
⁃ Independence of errors. This assumes that the errors of the
response variables are uncorrelated with each other.
⁃ Lack of perfect multicollinearity in the predictors.
More than 20% of data missing from a case, delete case. If less,
replace with median of similar responses (from that person).
More than 20% of data missing from a variable, delete variable. If less,
replace with median of similar responses (from that case, as above).
The best way to deal with missing values is to avoid them in the first
place. If you can prevent missing values, then you don’t have to handle
them later. Missing values are often a result of:
1. Survey fatigue, which can be prevented by using a shorter survey
and short questions.
2. Question ambiguity, which can be prevented by thoroughly
pretesting your measures.
3. Response reluctance, which can be prevented by careful wording
so that no question feels threatening.
4. Unfinished survey, which cannot be prevented, but you can still
put your most important questions (such as DVs) at the beginning.
A univariate outlier is an unusual, unexpected, or out of scope value for
a single variable (whereas multivariate outliers are unusual or
unexpected cases with regards to the relationship between multiple
variables)
Experimental Designs
Single group design
o XO
o OXO
o X is an intervention
o O is an observation
o In S&S speak, these are “one-shot case study” and “one-group
pretest-posttest design”
Single Group Threats
o History
o Maturation
o Testing (pre-post design only)
o Did subjects “learn” how to answer questions
o Instrumentation (pre-post design only)
o Did any change occur during the study in the way the
dependent variable was measured?
o Mortality
o Regression to the mean
Static Group comparison
o XO
o O
o X is an intervention
o O is an observation
o Largely controls for history, testing, instrumentation, and
regression to the mean
o Other Internal validity threats persist
Completely randomized design
o R X O vs. R O
o R O X O vs. R O O
o True experimental designs
o X is an intervention
o O is an observation
o R is random assignment
o What can’t be addressed is whether randomization did it’s job
(top), effect of pretest (bottom), or whether there is an
interaction b/w pretest and IV (bottom)
o Solomon Group Design
o
o Can examine effect of IV, pretesting alone, interaction b/w
pretest and IV, and randomization (O1 vs. O3)
Randomized block design
o Certain characteristics can’t be randomized and might be seen
as a nuisance variable (e.g., job type, sex, health)
o Doesn’t immediately randomly assign participants to
experimental conditions
o We first sort them into a characteristic or blocking variable
that we expect to play a role
o
Latin square design
o Weird
Matched pairs design
o Comparing 2 treatments with same experimental units (i.e.,
participants)
o Everybody gets both treatments
o Randomize treatment order & then compare performance on
DV
Similar experimental units design (matched samples)
o Similar to a block design, but the block is a continuous variable or a set of variables
o Sleep deprivation and test performance
o Can’t have same subject in both sleep deprivation and control group (they would
know what’s on test on second try)
o May match them on some nuisance variable (like in block designs)
Simulations
• Robustness of a technique or finding can be tested via simulation
– You create the population parameters
– Sample characteristics (e.g., N)
– Run repeated trials to determine difference between sample statistics and
population parameters
• Monte Carlo studies are dependent on the representativeness of the conditions
modeled
• one potential concern is that “the constructed model may not reflect real-world
conditions
If true, “even the most elegantly designed study may not be informative if the conditions
included are not relevant to the type of data one typically encounters in practice” (Bandalos &
Gagne, 2012, p. 96