Appendix R
Appendix R
Appendix R
The following syntax are indicative R Syntax setup to run the main analyses used in the data analyses
examples in this book. In Chapter 3 we show you the basic way of run the main statistical analyses
with SPSS, below we set out examples of how to run these in R. Those familiar with R will be aware
that there are many ways to run any one type of analysis. Here we suggest the syntax that produces
most (if not all) of the output that SPSS produces. There may be more elegant ways to conduct the
analyses, but these examples are the methods here are how the authors run these analyses with R.
Note we indicate that we are running the syntax files from the same working directory that the data
files are stored in, that way we do not have to indicate the directory location of our datafiles here.
However, the working directory would obviously need to be adjusted to match the location that the
analyst has stored the datafiles. These examples will run using the datafiles indicated, assuming that
you have downloaded these from the Kogan Page WEB address: koganpage.com/PHRA2
To find out what working directory your R programme is currently running from type:
getwd()
A simple way of managing your syntax and directory location storage is to have the syntax file saved
in the same directory as your working directory and have your datafile saved in that directory also.
To set the working directory you need to use the setwd(“”) command, e.g. here we are working off
our C: hard drive:
setwd("C:\\Nameofyourdesiredworkingdirectory\\")
setwd("C:\\Users\\MyRFiles\\Nameofyourdesiredworkingdirectory\\")
You can also select the directory interactively (using your mouse) using the following command:
setwd(choose.dir())
You will find the R syntax and datafiles for every analyses conducted in the data analyses Chapters 4-
11 in the data and syntax download files available at the following WEB address:
koganpage.com/PHRA2
A syntax file (called “Appendix R Syntax.R”) with all of the syntax set out below is included in the R
syntax files download
#################################
# Appendix R1 #
#R Code Matching Figure 3.1: running a Crosstabs and Chi square
#read the datafile (here Appendix R1.dat) in working directory location and give name
#(here DataForChi)for the data that R is reading
DataForChi<-read.table("Appendix R1.dat",header=T,sep="\t")
#obtain names in the file
names(DataForChi)
#e.g. "Gender" "LeaverStatus" "Country"
#attach the datafile for analyses
attach(DataForChi)
#create Factor (call it a new name or leave it as is) and label values for categorical variables
LeaverNoYesFact <- factor(LeaverStatus,
levels = c(0,1),
labels = c("Stayer", "Leaver"))
#make a frequency table of this, call this table Freq1 here
Freq1=table(LeaverNoYesFact)
Freq1
#Create a table of this factor
PcntLeaver <- table(LeaverNoYesFact)
#ask for proportions of the levels in the factor
prop.table(PcntLeaver)
#follow same procedure with the other categorical variable that you are going to use:
CountryCode <- factor(Country,
levels = c(1,2,3,4,5,6,7,8,9,10),
labels = c("Belgium", "Sweden", "Italy", "France", "Poland", "Mexico", "Spain", "UK", "USA",
"Australia"))
Freq2=table(CountryCode)
Freq2
PcntCountry <- table(CountryCode)
prop.table(PcntCountry)
#produce basic frequency cross-tabulation across these two variables
CountryByLeaver=table(CountryCode , LeaverNoYesFact)
CountryByLeaver
#This gives you your marginal Freq of country (regardless of stayer V Leaver)
margin.table(CountryByLeaver,1)
#the following will give you the proportions of stayer V Leaver in each country:
prop.table(CountryByLeaver,1)
#This gives you your marginal Stayer V Leaver in your table
margin.table(CountryByLeaver,2)
#the following will give you the proportions of Country in each level of Stayer Versus Leaver
prop.table(CountryByLeaver,2)
#round this to 2 decimal places
round(prop.table(CountryByLeaver,2),digits=2)
#This gives you your Chi square test
chisq.test(CountryByLeaver)
##################################
# Appendix R2 #
#R Code Matching Figure 3.2: running a Binary Logistic Regression
#read the datafile (here Appendix R2.txt) in working directory location and give name (here
DataForLOG)for the data that R is reading
DataForLOG<-read.table("Appendix R2.txt",header=T,sep="\t")
#obtain names in the read file
names(DataForLOG)
# e.g. some variable in this dataset
#"ApplicantCode" "Gender" "BAMEyn" "ShortlistedNY"
#make the dataset live
attach(DataForLOG)
#you then set up (and name, modelFig32 here) the logistic regression model
#notice the glm is the key command here, then the regression formula is set out: DV~IV1 + IV2
modelFig32 <- glm(ShortlistedNY ~ Gender + BAMEyn, family=binomial("logit"))
#this gives results of the logistic model
modelFig32
summary(modelFig32)
#for odds ratios:
exp(coef(modelFig32))
#for some Logistic regression stats (Nagelkerke and pseudo R sq) you need to install package
#rcompanion
#install.packages("rcompanion") if not installed already
library(rcompanion)
nagelkerke(modelFig32, null = NULL, restrictNobs = TRUE)
#################################
# Appendix R3 #
#R Code Matching Figure 3.5: running an Independent t-test
################################
# Appendix R4 #
#R Code Matching Figure 3.6: running a paired t-test
# Appendix R6 #
#R Code Matching Figure 3.10: running a One-Way repeated Measures ANOVA
#read the datafile (here Appendix R6.txt) in working directory location and
#give name (here DataForRepOneAnova)for the data that R is reading
DataForRepOneAnova<-read.table("Appendix R6.txt",header=T,sep="\t")
#obtain names in the file
names(DataForRepOneAnova)
#e.g. names:
# "ID" "FUNCTION" "ValsCompositeT1" "ValsCompositeT2" "ValsCompositeT3"
#make the dataset live
attach(DataForRepOneAnova)
#install.packages("car") if not already installed
library(car)
#this approach produces lots of output that matches SPSS
# make a multivariate linear model with only the intercept as a
# predictor for your within-participants observations
multmodel=lm(cbind(ValsCompositeT1,ValsCompositeT2, ValsCompositeT3) ~ 1)
# create a factor for your repeatedmeasures variable
Time=factor(c("ValsCompositeT1","ValsCompositeT2","ValsCompositeT3"), ordered=F)
model1=Anova(multmodel,idata=data.frame(Time),idesign=~Time,type="III")
model1
summary(model1,multivariate=F)
################################
# Appendix R7 #
#R Code Matching Figure 3.11: running a Repeated Measures ANOVA with a Between Subject
#Factor
#This has an added level of complication that we need to restructure our wide dataset
#to a tall setup and we need to create a new ID variable
#read the datafile (here Appendix R7.txt) in working directory location and
#give name (here DataForANOVAWithBet)for the data that R is reading
DataForANOVAWithBet<-read.table("Appendix R7.txt",header=T,sep="\t")
#obtain names in the file
names(DataForANOVAWithBet)
#e.g.: "FUNCTION" "ValsCompositeT1" "ValsCompositeT2" "ValsCompositeT3" "Country"
#notice in this dataset there is no ID variable - for this analyses we have to
#create this as it is needed later
#make the dataset live
attach(DataForANOVAWithBet)
#install.packages("car")
library(car)
#need to add an ID variable as this will be needed when transposing the datset to tall (rather than
#wide format)
#install.packages("tidyverse") # if not already installed
library(tidyverse)
# this adds an ID variable to the datafile read and creates a new name (Valsdata) for the
#datafile that now has this extra variable included
Valsdata <- tibble::rowid_to_column(DataForANOVAWithBet, "ID")
names(Valsdata)
# select the variables that you are going to use in the dataset, call this subset ValuesData here:
#the variables you need here are the three repeated measures variable, the grouping variable and
#the new ID variable
ValuesData <-subset(Valsdata, select = c(ID, FUNCTION, ValsCompositeT1, ValsCompositeT2,
ValsCompositeT3))
# trim the NAs (if they exist) from this subset of variables and create a clean dataframe called
#"Trimmed" here
Trimmed <- na.omit(ValuesData)
# now need to transform the wide datset to tall stacked version - this will create a new variable of
#Time and ValueLevel
# from your "Trimmed" dataframe and stack the ID and composite variables. This new stacked
#dataset
# is called "Datalong" here. Note your new catagorical variable indicating what Time period the
#data refers to is called "Time", the "ValueLevel" variable is your new Dependent variable name
#and you indicate the three repeated measure variables than get stacked into this "Value level"
#variable. The ID variable gets stacked also
Datalong <- gather(Trimmed, Time, ValueLevel, ValsCompositeT1:ValsCompositeT3,
factor_key=TRUE)
Datalong
#change new categorical variables into categorical factors within the tall/stacked dataframe
Datalong <- within(Datalong, {
FUNCTION <- factor(FUNCTION)
Time <- factor(Time)
ID <- factor(ID)})
#check your new stacked dataframe
names(Datalong)
# you can look at the data, this need to be closed before running the ANOVA
fix(Datalong)
#install.packages("ez") to enable ezAnova command
#load this
library(ez)
#run the within and between anova from the stacked dataframe with DV of ValueLevel, ID as the
#stack identifier
#key , the "Time" variable as the time condition indicator here and "Function" as the between
#measure factor
ezANOVA(data = Datalong, dv=.(ValueLevel), wid=.(ID), within=.(Time),
between=.(FUNCTION), detailed=T, type=3)
# now create the interaction plot:
with(Datalong, interaction.plot(Time, FUNCTION, ValueLevel,
ylim = c(3.9, 4.4), lty= c(1, 12), lwd = 3,
ylab = "mean of Values", xlab = "Time", trace.label = "FUNCTION"))
################################
# Appendix R8 #
#R Code Matching Figure 3.14: running correlations
#call for correlations of all variables (excluding cases where there are NAs)
cor(DataForCorr, use = "complete.obs")
#specify particular variables to correlate
cor(PerformanceRating2014, SickDays2014, use = "complete.obs")
#this doesn't give p values
#install and load Hmisc package
library(Hmisc)
rcorr(as.matrix(DataForCorr))
rcorr(PerformanceRating2014, SickDays2014, type="pearson")
################################
# Appendix R9 #
#R Code Matching Figure 3.18: running Exploratory Factor Analyses (EFA)
# Appendix R10 #
#R Code Matching Figure 3.17: running a Multiple Linear (OLS) Regression
#read the datafile (here Appendix R10.txt) in working directory location and
#give name (here DataForOLSReg)for the data that R is reading
DataForOLSReg<-read.table("Appendix R10.txt",header=T,sep="\t")
#obtain names in the file
names(DataForOLSReg)
#e.g. pasted some names in dataset from R console "DepartmentGroupNumber" "GroupSize"
#"PercentMale" "BAME" "NumberTeamLeads" "NumberFeMaleTeamLeads"
#"Location" "LondonorNot" "Function" "EMPsurvEngagement"
attach(DataForOLSReg)
#Fig 3.17 of book so we call this model here “modelFig317”
modelFig317=lm(BAME ~ LondonorNot + Function + GroupSize + NumberFeMaleTeamLeads +
PercentMale)
modelFig317
anova(modelFig317)
summary(modelFig317)
coef(modelFig317)
#to get standadrised Beta Coef
#install and load QuantPsyc package
library(QuantPsyc)
lm.beta(modelFig317)
################################
# Appendix R11 #
#R Code Matching Figure 3.19: running reliability analyses
#read the datafile (here Appendix R11.txt) in working directory location and give
#name (here DataForReliable) for the data that R is reading
DataForReliable<-read.table("Appendix R11.txt",header=T,sep="\t")
#obtain names in the file
names(DataForReliable)
#e.g. "sex" "jbstatus" "age" "tenure" "ocb1" "ocb2" "ocb3" "ocb4"
#"Eng1" "Eng2" "Eng3" "Eng4" "pos1" "pos2" "pos3"
#attach the datafile for analyses
attach(DataForReliable)
#install and load the psych package
library(psych)
#select variables for reliability dataframe
dATAFORRel <-subset(DataForReliable, select = c(ocb1, ocb2, ocb3, ocb4))
#create new data frame and clear NAs
TrimmedData <- na.omit(dATAFORRel)
alpha(TrimmedData, keys=NULL,cumulative=FALSE, title=NULL, max=10,na.rm = TRUE,
check.keys=FALSE,n.iter=1,delete=TRUE,use="pairwise",warnings=TRUE,n.obs=NULL)
################################
# Appendix R12 #
#R Code for Kaplan-Meier survival analyses discussion in Chapter 3 – Figure 3.4
#read the datafile (here Appendix R12.dat) in working directory location and give
#name (here DataForSurvival) for the data that R is reading
DataForSurvival<-read.table("Appendix R12.dat",header=T,sep="\t")
#obtain names in the file
names(DataForSurvival) #E.G. "Gender" "LengthOfService" "LeaverStatus"
#attach the datafile for analyses
attach(DataForSurvival)
#create a data frame for the gender variable and label values
SEX <- factor(Gender,
levels = c(0,1),
labels = c("Female", "Male"))
#create a data frame for the LeaverStatus variable and label values
LeaverStatus <- factor(LeaverStatus,
levels = c(0,1),
labels = c("Stayer", "Leaver"))
#install survival package if not already installed #install.packages("survival")
library(survival)
#install survminer package if not already installed
#install.packages("survminer")
library(survminer)
#the following sets out the variables to use for the survival analyses leaver status on tenure overall
IndTO.survival <- with(DataForSurvival, Surv(LengthOfService,LeaverStatus))
#survival analyses of all data in dataset
kmall <- survfit(IndTO.survival~1,data=DataForSurvival)
#prints the survival table statistics
summary(kmall)
#produce event statistics and mean overall survival statistics
print(kmall, print.rmean=TRUE)
#Plot of the overall turnover/tenure data
plot(kmall, xlab="Job Tenure",
ylab="% Surviving", yscale=100,
main="Survival Distribution (Overall)")
#conduct survival analyses BY Gender
kmGEN <- survfit(IndTO.survival~Gender,data=DataForSurvival)
#prints the survival table statistics
summary(kmGEN)
#the following prints the number of events (leaver) and mean survival by gender
print(kmGEN, print.rmean=TRUE)
#Plot of the different survival curves across males V females
plot(kmGEN, xlab="Job Tenure",
ylab="% Surviving", yscale=100, col=c("blue","red"),
main="Survival Distributions by Gender")
legend("topright", title="Gender", legend=c("Female", "Male"),
fill=c("blue" , "red"))
#to get the significance of any differences in survival patterns across gender
survdiff(IndTO.survival~Gender, data=DataForSurvival)