0% found this document useful (0 votes)

58 views

Workshop: R For Statistical Analysis

The document provides an overview of an R workshop on statistical analysis. It outlines the learning objectives, types of variables in R, and basic commands for descriptive and inferential statistics. Descriptive statistics include functions like describe() and hist() to summarize data. Inferential statistics covered are the independent t-test, one-way ANOVA, Pearson's correlation, and simple linear regression, which make inferences about populations while meeting assumptions like independence and normality. An example t-test finds no significant difference in SAT verbal scores between males and females.

Uploaded by

Omeet Hannah

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Workshop: R For Statistical Analysis

Uploaded by

Omeet Hannah

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

7/13/2020 R for Statistical Analysis

Workshop: R for Statistical Analysis

Data Analysis Team:

Matty Jullamon (GAA)

Amir Michalovich (GAA)
Jeremy Buhler (Data Librarian)
Sarah Parker (Data Librarian)

Pre-workshop setup

Download and install R

For Windows:

1. Visit R Project (https://www.r-project.org/) to learn about R versions.

2. Download and install R from your preferred CRAN mirror here (https://cran.r-project.org/mirrors.html)
A. Choose "0-Cloud" or a mirror site near you.

For Mac:

1. Check that your macOS system is up-to-date

2. Download and install R from The Comprehensive R Archive Network (https://cran.r-project.org/)

Download and install R studio

For Windows and Mac:

1. Download and install R Studio from here (https://rstudio.com/products/rstudio/download/#download)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 1/16

7/13/2020 R for Statistical Analysis

Learning Objectives
Learn how to identify the types of variables in R
Learn the basic commands for descriptive statistics
Learn the basic commands for inference statistics

Overview of Quantative Research Process

A systamatic research process that involves collecting objective, measureable data, using statistics to analyze
the data, and generalizing the results to a larger population to explain a phenomena. Usually, software programs
assist on data analysis.

Data Analysis in Quantitative Research

Definitions

Data refers to facts or pieces of information that can either be quantitative or qualitative.
Variable refers to any property that can be observed or measured.

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 2/16

7/13/2020 R for Statistical Analysis

Types of Variables

It is important to understand the different types of variables because they will determine the statistical analysis
method.

Type Description Example

Nominal 名义上的 Labels or Descriptions that cannot be ordered Gender

Ordinal Labels or Descriptions that can be ordered Education Level

Interval Numeric values with equal magnitude, doesn't have absolute zero SAT scores

Ratio Numeric values with equal magnitude, does have absolute zero Age

Categorize these variables in R

Nominal/Ordinal -> Character or Factor

Intervel/Ratio -> Numeric or Integer

Definitions

Character: Text
Factor: Integer associated with a specific category
Numeric: Number with decimal point
Integer: Number with no decimal point

Getting Started
Set working directory in R studio

You can set the working directory using Session > Set Working Directory > Choose Directory.

Loading a built-in R dataset

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 3/16

7/13/2020 R for Statistical Analysis

About the data

3 Measures Of Ability: SATV, SATQ, ACT: "Self reported scores on the SAT Verbal, SAT Quantitative and ACT
were collected as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality
assessment project. Age, gender, and education are also reported. The data from 700 subjects are included here
as a demonstration set for correlation and analysis" (Revelle et al., 2009).
(https://www.rdocumentation.org/packages/psych/versions/1.9.12.31/topics/sat.act)

Write and run the following commands to load the dataset

install.packages("psych") #install this if you haven't done so.

library(psych)

scores <- sat.act

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 4/16

7/13/2020 R for Statistical Analysis

In [ ]: scores <-read.csv("sat.act.csv")

Identifying and Renaming Variables

str(df ): To check the structure of your data

Question: What do you notice?

as.factor(df$columnname): To change a variable to factor

In [64]: scores$gender <- as.factor(scores$gender)

is.factor(df$columnname): To check if a variable is defined as factor

In [62]: is.factor(scores$gender)

TRUE

Extra information

is.integer(df$columnname): To check if a variable is defined as integer

is.numeric(df$columnname): To check if a variable is defined as numeric

is.character(df$columnname): To check if a varible is defined as character

as.integer(df$columnname): To change a variable to integer

as.numeric(df$columnname): To change a variable to numeric

as.character($columnndfame): To change a variable to character

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 5/16

7/13/2020 R for Statistical Analysis

Exercise #1
Using as.factor command, change 'education' to factor.
Using is.factor command, check if 'education is defined as factor.

Answer to Exercise #1
In [75]: scores$education <- as.factor(scores$education)

In [76]: is.factor(scores$education)

TRUE

Descriptive Statistics

Descriptive statistics summarize the data in a meaningful way. The purpose of using descriptive statistics is to
explore the observed data and not to draw inferences.

We will use the psych package functions to perform descriptive statistics.

describe(df): To obtain descriptive statistics for all variables

hist(df$columnname): To graphically descibe the distribution using histogram

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 6/16

7/13/2020 R for Statistical Analysis

In [78]: hist(scores$SATV)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 7/16

7/13/2020 R for Statistical Analysis

In [95]: # To add the distribution curve

hist(scores$SATV, freq = F)
x <- 200:800
y <- dnorm(x = x, mean = 612.23, sd = 112.90)
lines(x = x, y = y)

Inferential Statistics

Unlike descriptive statistics, inferential statistics use the observed data to make inferences about the population.

In this workshop, we will cover four parametric tests: Independent t-test, One-way ANOVA, Pearson's correlation,
& Simple linear regression. These tests are called parametric because they meet the assumptions of probability
distribution.

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 8/16

7/13/2020 R for Statistical Analysis

Model assumptions
Common model assumptions found in parametric tests:

1. Independence
2. Normality
3. Equal variance

Both Pearson's correlation and Simple linear regression have some additional assumptions. For more
information, click on the following links:

Pearson's correlation assumptions (https://www.statisticssolutions.com/correlation-pearson-kendall-spearman/)

Simple linear regression assumptions (https://www.statisticssolutions.com/assumptions-of-linear-regression/)

Independent T-test
It is used to see whether there are group difference in numeric data between two groups.

For example, do males and females have different average SAT verbal scores?

H0: Mean SATV for males = Mean SATV for females

H1: Mean SATV for males != Mean SATV for females

In [ ]: #Write and run this command:

t.test(scores$SATV ~ scores$gender, data = scores, var.eq = TRUE)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 9/16

7/13/2020 R for Statistical Analysis

Interpreting the results

t: computing statistics
df: degrees of freedom
p-value: Statistical significance. 0.6187 is bigger than α = 0.05 so that means we must retain the null
hypothesis

Conclusion

There was no statistically significant difference in SAT verbal scores between males and females, t(698) = 0.50,
p = 0.62.

One-way ANOVA
It is used to determine whether there are group differences in numeric data between more than two groups

For example, do SAT verbal scores significantly differ by educational levels (1= HS, 2= some college degree, 3 =
2-year college degree, 4= 4-year college degree, 5= graduate work)?

H0: Mean SATV of students who have HS degree = Mean SATV of students who have some college degree = ...

H1: Mean SATV of students who have HS degree != Mean SATV of students who have some college degree !=
...

In [ ]: #Write and run this command:

m1 <- aov(scores$SATV ~ scores$education, data = scores)

summary(m1)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 10/16

7/13/2020 R for Statistical Analysis

Interpreting the results

df: degree of freedom

sum sq: sum of squares
mean sq: mean squares
F value: computing statistics
Pr(>F): statistical significance. 0.275 is bigger than α = 0.05 so that means we must retain the null
hypothesis. We do not have to run the post hoc tests because the group differences are not significant.

Does this make sense?

Check using boxplot.

In [ ]: # Write and run this command:

boxplot(scores$SATV ~ scores$education)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 11/16

7/13/2020 R for Statistical Analysis

Conclusion

There were no significant group differences in SAT verbal scores according to students' educational levels, F(5,
694) = 1.269, p = 0.275.

Extra information

There are different types of post hoc tests

(https://www.rdocumentation.org/packages/DescTools/versions/0.99.36/topics/PostHocTest), but the Tukey's
HSD is the most popular post hoc test for comparing multiple pairings.

In [2]: # R command for Tukey's HSD:

# TukeyHSD(aov(scores$SATV ~ scores$education, data = scores), conf.level=.95)

Pearson's Correlation
It is used to examine relationships between variables (represented by numeric data)

For example, Is there a relationship between SATV and SATQ?

H0: There is no relationship between SATV and SATQ.

H1: There is a relationship between SATV and SATQ.

In [ ]: # Write and run this command:

cor.test(scores$SATV,scores$SATQ)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 12/16

7/13/2020 R for Statistical Analysis

In [ ]: # Write and run this command to see scatterplot:

plot(scores$SATV,scores$SATQ)
abline(lm(scores$SATQ ~ scores$SATV)) #to add regression line

Conclusion

There was statistically significant positive correlation between SAT verbal scores and SAT Quantitative scores (r
= 0.644, p < 0.001).

Simple Linear Regression (for your reference)

It is used to explain/predict the phenomenon of interest based one independent variable.

For example, do ACT scores predict SAT verbal scores?

In [ ]: # Write and run this command:

m2 <- lm(scores$SATV ~ scores$ACT)

summary(m2)

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 13/16

7/13/2020 R for Statistical Analysis

Interpreting the results

The estimated regression line equation: SATV = 237.34 + 13.13(ACT scores)

31% of the variability in the SAT verbal scores was explained by the variables in the regression model.
The overall regression model significantly explained the SAT verbal scores.

Conclusion

ACT scores significantly predicted SAT verbal scores. We would expect 13.13 points increase in SAT verbal
scores for every one point increase in ACT score, assuming all the other variables are held constant.

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 14/16

7/13/2020 R for Statistical Analysis

Questions?

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 15/16

7/13/2020 R for Statistical Analysis

Reference(s);

Revelle, William, Wilt, Joshua, and Rosenthal, Allen. (2009). Personality and Cognition: The Personality-
Cognition Link. In Gruszka, Alexandra and Matthews, Gerald and Szymura, Blazej (Eds.) Handbook of Individual
Differences in Cognition: Attention, Memory and Executive Control, Springer.

https://ubc.syzygy.ca/jupyter/user/mj2653/nbconvert/html/R for Statistical Analysis.ipynb?download=false 16/16

[Ebooks PDF] download Statistics Using Stata An Integrative Approach Sharon Lawner Weinberg full chapters
100% (1)
[Ebooks PDF] download Statistics Using Stata An Integrative Approach Sharon Lawner Weinberg full chapters
55 pages
Cryptocurrency Black Book Project
No ratings yet
Cryptocurrency Black Book Project
68 pages
Definition of Statistics: Examples
No ratings yet
Definition of Statistics: Examples
60 pages
Statistical Analysis Using SPSS and R - Chapter 1 To 3 PDF
100% (1)
Statistical Analysis Using SPSS and R - Chapter 1 To 3 PDF
132 pages
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
From Everand
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
Jurex Gallo
No ratings yet
Erm Spss Example
No ratings yet
Erm Spss Example
17 pages
SPSS Data Analysis
No ratings yet
SPSS Data Analysis
47 pages
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
100% (1)
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
120 pages
SPSS Workshop: Utilizing and Implementing SPSS in Our OC-Math Statistics Classes
No ratings yet
SPSS Workshop: Utilizing and Implementing SPSS in Our OC-Math Statistics Classes
11 pages
Sana H - Data Analysis in Psychological Research
No ratings yet
Sana H - Data Analysis in Psychological Research
5 pages
PSY2060 Session 01
No ratings yet
PSY2060 Session 01
20 pages
Rdias FDP
No ratings yet
Rdias FDP
50 pages
PSY - 2060 - 2022H1 - Session 01 2022-01-19 02 - 39 - 01
No ratings yet
PSY - 2060 - 2022H1 - Session 01 2022-01-19 02 - 39 - 01
24 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
CHAPTER 7 (SAS Session) 2023
No ratings yet
CHAPTER 7 (SAS Session) 2023
137 pages
SPSS Notes
No ratings yet
SPSS Notes
3 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
48 pages
Statistics Using r - Sharon Lawner Weinberg, Daphna Harel - Copy
No ratings yet
Statistics Using r - Sharon Lawner Weinberg, Daphna Harel - Copy
726 pages
Lec448B 20160406
No ratings yet
Lec448B 20160406
30 pages
Statistics 25102022
No ratings yet
Statistics 25102022
37 pages
Intro To Traditional and Bayesian M Using R-Guilford 2017
No ratings yet
Intro To Traditional and Bayesian M Using R-Guilford 2017
330 pages
Data analysis training workshop_Day 2 presentation
No ratings yet
Data analysis training workshop_Day 2 presentation
52 pages
Section 6 Data - Statistics For Quantitative Study
No ratings yet
Section 6 Data - Statistics For Quantitative Study
142 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
Excel Stats Nicar2013
No ratings yet
Excel Stats Nicar2013
6 pages
What Is Statistics
No ratings yet
What Is Statistics
5 pages
Intro Stat
No ratings yet
Intro Stat
324 pages
Lesson One Introduction To Inferential Statistics
No ratings yet
Lesson One Introduction To Inferential Statistics
20 pages
An Introduction To Statistical Analysis
No ratings yet
An Introduction To Statistical Analysis
20 pages
Advanced Statistics
No ratings yet
Advanced Statistics
259 pages
Data Analysis
No ratings yet
Data Analysis
61 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Notes On SPSS
No ratings yet
Notes On SPSS
19 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
Lec7 8
No ratings yet
Lec7 8
28 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
31 pages
Applied Statistics Basic Concepts
No ratings yet
Applied Statistics Basic Concepts
28 pages
Basics of Statistics
No ratings yet
Basics of Statistics
3 pages
41458395861
No ratings yet
41458395861
2 pages
SPSS Zero To Hero
No ratings yet
SPSS Zero To Hero
82 pages
BADM Material
No ratings yet
BADM Material
113 pages
SPSS
No ratings yet
SPSS
25 pages
Unit-4 Big Data Analytics Methods using R
No ratings yet
Unit-4 Big Data Analytics Methods using R
57 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
SPSS Instruction
No ratings yet
SPSS Instruction
14 pages
Advanced Excel Formulas
No ratings yet
Advanced Excel Formulas
30 pages
R-Training For Print
No ratings yet
R-Training For Print
11 pages
SPSS For Beginners 220925
No ratings yet
SPSS For Beginners 220925
72 pages
WK 10 Norusis
No ratings yet
WK 10 Norusis
16 pages
SPSS Guide: Website Resources
No ratings yet
SPSS Guide: Website Resources
11 pages
Chap13 Quantitative Data Analysis Revised Jan2021
No ratings yet
Chap13 Quantitative Data Analysis Revised Jan2021
54 pages
SPSS Data Analysis
100% (6)
SPSS Data Analysis
47 pages
DATA ANALYSIS
No ratings yet
DATA ANALYSIS
55 pages
Unit 1 - Statistics With r
No ratings yet
Unit 1 - Statistics With r
25 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
(eBook PDF) Statistics for Research: With a Guide to SPSS 3rd Editionpdf download
100% (4)
(eBook PDF) Statistics for Research: With a Guide to SPSS 3rd Editionpdf download
52 pages
PDF Statistics Using Stata An Integrative Approach Sharon Lawner Weinberg Download
100% (3)
PDF Statistics Using Stata An Integrative Approach Sharon Lawner Weinberg Download
62 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Stock Control Literature Review
100% (2)
Stock Control Literature Review
5 pages
PHD Course Work Syllabus Computer Science
100% (4)
PHD Course Work Syllabus Computer Science
6 pages
Chapter 2 Sample
No ratings yet
Chapter 2 Sample
8 pages
Epidemiology 1
No ratings yet
Epidemiology 1
2 pages
Unit 1 Formulation of Hypothesis
100% (1)
Unit 1 Formulation of Hypothesis
3 pages
Reliability
No ratings yet
Reliability
13 pages
Introduction To Research in Education
No ratings yet
Introduction To Research in Education
36 pages
Community Partisipation and Governemt Rural
No ratings yet
Community Partisipation and Governemt Rural
8 pages
Powerpoint #3 (Nature of Science)
No ratings yet
Powerpoint #3 (Nature of Science)
21 pages
Module 4. Hypothesis Testing Tools
No ratings yet
Module 4. Hypothesis Testing Tools
4 pages
AI Report-Proposal
No ratings yet
AI Report-Proposal
10 pages
TOK Essay Example A: Student Work
No ratings yet
TOK Essay Example A: Student Work
7 pages
Literature Review On Charger
100% (3)
Literature Review On Charger
7 pages
Introduction to Research Methods in Psychology, 4th ed 4th Edition Dennis Howitt - Download the full ebook now to never miss any detail
100% (1)
Introduction to Research Methods in Psychology, 4th ed 4th Edition Dennis Howitt - Download the full ebook now to never miss any detail
57 pages
PRELIM - MIDTER EXAM - RESEARCH IN SOCIAL STUDIES - Google Forms
No ratings yet
PRELIM - MIDTER EXAM - RESEARCH IN SOCIAL STUDIES - Google Forms
11 pages
Section 3 Test 3 Trancript
No ratings yet
Section 3 Test 3 Trancript
3 pages
Pengembangan Lembar Kerja Siswa Berbasis Pendekatan Open-Ended Untuk Memfasilitasi Kemampuan Berpikir Kreatif Matematis Siswa Madrasah Tsanawiyah
No ratings yet
Pengembangan Lembar Kerja Siswa Berbasis Pendekatan Open-Ended Untuk Memfasilitasi Kemampuan Berpikir Kreatif Matematis Siswa Madrasah Tsanawiyah
11 pages
Unit 2 RM Research Methodology Bba
No ratings yet
Unit 2 RM Research Methodology Bba
26 pages
Powerpoint Template Thesis Defense
100% (3)
Powerpoint Template Thesis Defense
5 pages
Quantitative Methods Assignment
No ratings yet
Quantitative Methods Assignment
15 pages
PR2 CHAP 1 3 Sample
No ratings yet
PR2 CHAP 1 3 Sample
21 pages
SVKM's Narsee Monjee Institute of Management Studies Name of School - SBM, Bangalore
No ratings yet
SVKM's Narsee Monjee Institute of Management Studies Name of School - SBM, Bangalore
3 pages
Course Outline: International Islamic University Malaysia
No ratings yet
Course Outline: International Islamic University Malaysia
3 pages
Bus 173 Project
No ratings yet
Bus 173 Project
12 pages
Academic Text Structure
No ratings yet
Academic Text Structure
2 pages
Syllabus: Cambridge International AS & A Level Sociology 9699
No ratings yet
Syllabus: Cambridge International AS & A Level Sociology 9699
35 pages
Advantages of Good Literature Review in Research
100% (1)
Advantages of Good Literature Review in Research
5 pages
Research Design
No ratings yet
Research Design
2 pages
126 Saena
No ratings yet
126 Saena
22 pages