Workshop: R For Statistical Analysis
Workshop: R For Statistical Analysis
Pre-workshop setup
For Windows:
For Mac:
Learning Objectives
Learn how to identify the types of variables in R
Learn the basic commands for descriptive statistics
Learn the basic commands for inference statistics
Data refers to facts or pieces of information that can either be quantitative or qualitative.
Variable refers to any property that can be observed or measured.
Types of Variables
It is important to understand the different types of variables because they will determine the statistical analysis
method.
Interval Numeric values with equal magnitude, doesn't have absolute zero SAT scores
Ratio Numeric values with equal magnitude, does have absolute zero Age
Definitions
Character: Text
Factor: Integer associated with a specific category
Numeric: Number with decimal point
Integer: Number with no decimal point
Getting Started
Set working directory in R studio
You can set the working directory using Session > Set Working Directory > Choose Directory.
3 Measures Of Ability: SATV, SATQ, ACT: "Self reported scores on the SAT Verbal, SAT Quantitative and ACT
were collected as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality
assessment project. Age, gender, and education are also reported. The data from 700 subjects are included here
as a demonstration set for correlation and analysis" (Revelle et al., 2009).
(https://www.rdocumentation.org/packages/psych/versions/1.9.12.31/topics/sat.act)
library(psych)
In [ ]: scores <-read.csv("sat.act.csv")
In [62]: is.factor(scores$gender)
TRUE
Extra information
Exercise #1
Using as.factor command, change 'education' to factor.
Using is.factor command, check if 'education is defined as factor.
Answer to Exercise #1
In [75]: scores$education <- as.factor(scores$education)
In [76]: is.factor(scores$education)
TRUE
Descriptive Statistics
Descriptive statistics summarize the data in a meaningful way. The purpose of using descriptive statistics is to
explore the observed data and not to draw inferences.
In [78]: hist(scores$SATV)
hist(scores$SATV, freq = F)
x <- 200:800
y <- dnorm(x = x, mean = 612.23, sd = 112.90)
lines(x = x, y = y)
Inferential Statistics
Unlike descriptive statistics, inferential statistics use the observed data to make inferences about the population.
In this workshop, we will cover four parametric tests: Independent t-test, One-way ANOVA, Pearson's correlation,
& Simple linear regression. These tests are called parametric because they meet the assumptions of probability
distribution.
Model assumptions
Common model assumptions found in parametric tests:
1. Independence
2. Normality
3. Equal variance
Both Pearson's correlation and Simple linear regression have some additional assumptions. For more
information, click on the following links:
Independent T-test
It is used to see whether there are group difference in numeric data between two groups.
For example, do males and females have different average SAT verbal scores?
t: computing statistics
df: degrees of freedom
p-value: Statistical significance. 0.6187 is bigger than α = 0.05 so that means we must retain the null
hypothesis
Conclusion
There was no statistically significant difference in SAT verbal scores between males and females, t(698) = 0.50,
p = 0.62.
One-way ANOVA
It is used to determine whether there are group differences in numeric data between more than two groups
For example, do SAT verbal scores significantly differ by educational levels (1= HS, 2= some college degree, 3 =
2-year college degree, 4= 4-year college degree, 5= graduate work)?
H0: Mean SATV of students who have HS degree = Mean SATV of students who have some college degree = ...
H1: Mean SATV of students who have HS degree != Mean SATV of students who have some college degree !=
...
boxplot(scores$SATV ~ scores$education)
Conclusion
There were no significant group differences in SAT verbal scores according to students' educational levels, F(5,
694) = 1.269, p = 0.275.
Extra information
Pearson's Correlation
It is used to examine relationships between variables (represented by numeric data)
cor.test(scores$SATV,scores$SATQ)
plot(scores$SATV,scores$SATQ)
abline(lm(scores$SATQ ~ scores$SATV)) #to add regression line
Conclusion
There was statistically significant positive correlation between SAT verbal scores and SAT Quantitative scores (r
= 0.644, p < 0.001).
Conclusion
ACT scores significantly predicted SAT verbal scores. We would expect 13.13 points increase in SAT verbal
scores for every one point increase in ACT score, assuming all the other variables are held constant.
Questions?
Reference(s);
Revelle, William, Wilt, Joshua, and Rosenthal, Allen. (2009). Personality and Cognition: The Personality-
Cognition Link. In Gruszka, Alexandra and Matthews, Gerald and Szymura, Blazej (Eds.) Handbook of Individual
Differences in Cognition: Attention, Memory and Executive Control, Springer.