Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Analisis Jalur

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

MATERI KE-1

PATH ANALYSIS
In such a scenario, the model becomes complex and path analysis comes handy in
such situations. Path analysis is an extension of multiple regression. It allows for the
analysis of more complicated models. It is helpful in examining situations where
there are multiple intermediate dependent variables and in situations where Z is
dependent on variable Y, which in turn is dependent on variable X. It can compare
different models to determine which one best fits the data.

Path analysis was earlier also known as ‘causal modeling’; however, after strong
criticism people refrain from using the term because it’s not possible to establish
causal relationships using statistical techniques. Causal relationships can only be
established through experimental designs. Path analysis can be used to disprove a
model that suggests a causal relationship among variables; however, it cannot be
used to prove that a causal relation exist among variables.
Let’s understand the terminology used in the path analysis. We don’t variables as
independent or dependent here; rather, we call them exogenous or endogenous
variables. Exogenous variables (independent variables in the world of regression)
are variables which have arrows starting from them but none pointing towards them.
Endogenous variables have at least one variable pointing towards them. The reason
for such a nomenclature is that the factors that cause or influence exogenous
variables exist outside the system while, the factors that cause endogenous
variables exist within the system. In the above image, X is an exogenous variable;
while, Y and Z are endogenous variables. A typical path diagram is as shown below.
In the above figure, A, B, C, D and E are exogenous variables; while, I and O are
endogenous variables. ‘d’ is a disturbance term which is analogous to residuals in
regression.
Now, let’s go through the assumptions that we need to consider before we use path
analysis. Since, path analysis is an extension of multiple regression, most of
assumptions of multiple regression hold true for path analysis as well.
1. All the variables should have linear relations among each other.
2. Endogenous variable should be continuous. In case of ordinal data, minimum
number of categories should be five.
3. There should be no interaction among variables. In case of any interaction, a
separate term or variable can be added that reflects the interaction between the
two variables.
4. Disturbance terms are uncorrelated or covariance among the disturbance terms is
zero.
Now, let’s move a step ahead and understand the implementation of path analysis in
R. We will first try out with a toy example and then take a standard dataset available
in R.
install.packages("lavaan")
install.packages("OpenMx")
install.packages("semPlot")
install.packages("GGally")
install.packages("corrplot")
library(lavaan)
library(semPlot)
library(OpenMx)
library(GGally)
library(corrplot)

Now, let’s create our own dataset and try out path analysis. Please note that the rationale for
doing this exercise is to develop intuition to understand path analysis.
For examples:
# Let's create our own dataset and play around that first
set.seed(11)
a = 0.5
b = 5
c = 7
d = 2.5
x1 = rnorm(20, mean = 0, sd = 1)
x2 = rnorm(20, mean = 0, sd = 1)
x3 = runif(20, min = 2, max = 5)
Y = a*x1 + b*x2
Z = c*x3 + d*Y
data1 = cbind(x1, x2, x3, Y, Z)
head(data1, n = 10)

> head(data1, n = 10)


x1 x2 x3 Y Z
[1,] -0.59103110 -0.68251762 2.152597 -3.70810366 5.797922
[2,] 0.02659437 -0.01585819 3.488896 -0.06599378 24.257289
[3,] -1.51655310 -0.44260479 3.524391 -2.97130048 17.242488
[4,] -1.36265335 0.35255750 2.707776 1.08146082 21.658085
[5,] 1.17848916 0.07317058 4.441204 0.95509749 33.476170
[6,] -0.93415132 0.00715880 3.257310 -0.43128166 21.722969
[7,] 1.32360565 -0.18760011 2.574199 -0.27619773 17.328901
[8,] 0.62491779 -0.76570065 3.946699 -3.51604433 18.836781
[9,] -0.04572296 -0.22105682 4.439842 -1.12814558 28.258531
[10,] -1.00412058 -0.98358859 2.676505 -5.42000323 5.185524

Now, we have created this dataset. Let’s see the correlation matrix for these variables. This
will tell us how strongly and which all variables are correlated to each other.

> cor1 = cor(data1)


> corrplot(cor1, method = 'square')

The above chart shows us that Y is very strongly correlate with X2; while, Z is
strongly correlated with X2 and Y. The impact of X1 on Y is not as strong as that of
X2.
model1 = 'Z ~ x1 + x2 + x3 + Y
Y ~ x1 + x2'
fit1 = cfa(model1, data = data1)
summary(fit1, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)

> summary(fit1, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)


** WARNING ** lavaan (0.6-1) did NOT converge after 90 iterations
** WARNING ** Estimates below are most likely unreliable

Number of observations 20
Estimator ML
Model Fit Test Statistic NA
Degrees of freedom NA
P-value NA

Parameter Estimates:

Information Expected
Information saturated (h1) model Structured
Standard Errors Standard

Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Z ~
x1 0.721 NA 0.721 0.072
x2 0.328 NA 0.328 0.028
x3 1.915 NA 1.915 0.179
Y 1.998 NA 1.998 0.867
Y ~
x1 0.500 NA 0.500 0.115
x2 5.000 NA 5.000 0.968

Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.Z 14.773 NA 14.773 0.215
.Y 0.000 NA 0.000 0.000

R-Square:
Estimate
Z 0.785
Y 1.000

> semPaths(fit1, 'std', layout = 'circle')


The above plot shows us that Z is strongly dependent on Y and weakly dependent
on X3 and X1. Y is strongly dependent on X2 and weakly dependent on X1. This is
the same intuition that we have built earlier in this article. This is the beauty of path
analysis and this is how analysis can be used.
The values between the lines are path coefficients. Path coefficients are
standardized regression coefficients, similar to beta coefficients of multiple
regression. These path coefficients should be statistically significant, which can be
checked from the summary output (we will see this in the next example).
Let’s move to our second example. In this example, we will use standard dataset
‘mtcars’ available in R.
# Let's take second example where we take standard dataset 'mtcars'
available in R
data2 = mtcars
head(data2, n = 10)

> head(data2, n = 10)


mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

model2 = 'mpg ~ hp + gear + cyl + disp + carb + am + wt


hp ~ cyl + disp + carb'
fit2 = cfa(model2, data = data2)

> summary(fit2)
lavaan (0.6-1) converged normally after 62 iterations

Number of observations 32

Estimator ML
Model Fit Test Statistic 7.901
Degrees of freedom 3
P-value (Chi-square) 0.048

Parameter Estimates:

Information Expected
Information saturated (h1) model Structured
Standard Errors Standard

Regressions:
Estimate Std.Err z-value P(>|z|)
mpg ~
hp -0.022 0.016 -1.388 0.165
gear 0.586 1.247 0.470 0.638
cyl -0.848 0.710 -1.194 0.232
disp 0.006 0.012 0.512 0.609
carb -0.472 0.620 -0.761 0.446
am 1.624 1.542 1.053 0.292
wt -2.671 1.267 -2.109 0.035
hp ~
cyl 7.717 6.554 1.177 0.239
disp 0.233 0.087 2.666 0.008
carb 20.273 3.405 5.954 0.000

Variances:
Estimate Std.Err z-value P(>|z|)
.mpg 5.011 1.253 4.000 0.000
.hp 644.737 161.184 4.000 0.000

In the above summary output, we can see that wt is a significant variable for mpg at
5 percent level; while, dsp and crb are significant variables for hp. ‘Hp’ itself is not a
significant variable for mpg. We will examine this model using a path diagram using
semPlot package.
> semPaths(fit2, 'std', 'est', curveAdjacent = TRUE, style = "lisrel")
The above plot shows that mpg is strongly dependent on wt; while, hp is strongly
dependent on dsp and crb. There is a weak relation between hp and mpg. Same
inference was derived from the above output.
semPaths function can be used to create above chart in multiple ways. You can go
through the documentation for semPaths and explore different options.
There are few considerations that you should keep in mind while doing path analysis.
Path analysis is very sensitive to omission or addition of variables in the model. Any
omission of relevant variable or addition of extra variable in the model may
significantly impact the results. Also, path analysis is a technique for testing out
models and not building them. If you were to use path analysis in building models,
then you may end with endless combination of different models and choosing the
right model may not be possible. So, path analysis can be used to test a specific
model or compare multiple models to choose the best possible.
There are numerous other ways you can use path analysis. We would love to hear
your experiences of using path analysis in different contexts. Please share your
examples and experiences in the comments section below.
path analysis is structural equation modeling (SEM). There are a few packages to do
SEM in R, like: lavaan, SEM.
a simple example;
x3 affects both x1 and x2 and x2 affects x1
##############R-code##############
library(lavaan)
model1<-'x3 ~ x1 + x2
x2 ~ x1'
fit1<-sem(model1)
#Summary of the fitted model
summary(fit1)
#check the coefficients
coef(fit1)
#and as dataframe
parameterEstimates(fit1)
############end R-code############
for more details and examples on lavaan package
try http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf

MATERI KE-2
How to Run Path Analysis with R
For this path analysis practice exercise, I continue to use the election
data I used in the previous post. Instead of using some datasets that I
am not quite familiar with, using my own data really helps make my
learning experience more relatable and personal,

Step 1: Install and load lavaan package


It looks like one R package I can use to perform path analysis is
called lavaan by Yves Rosseel (many thanks!). Before doing anything else,
I have to install and load lavaan package. This step is now kind of
simple. I just type the following.
1 install.packages("lavaan")
2 library(lavaan)

Step 2: Specify a model


To run path analysis with lavaan, I have to next specify a model to
estimate. Path analysis necessarily involves some kind of indirect effects
from X to Y mediated by a third variable Z or Zs.

In this practice exercise, I have 2 mediating variables, party affiliation


and political interest, which carry the prior effects of age, sex, race,
education, and income on support for Trump.

In other words, age, sex, race, education, and income are specified to
predict party affiliation and political interest, respectively. Then, party
affiliation and political interest, respectively, are specified to predict
support for Trump. It might be that people who are older, males,
Caucasians, less educated, and less rich may lean toward Republicans
and have more interest in this election, which in turn predict support for
Donald Trump.

 Disclaimer: the model I outline here is not based on any theory. It’s more
of a post-hoc model. When I first ran path analysis, I included political
ideology in the model as another mediating variable. But, this model did
not fit the data well. For some reason, when I removed political ideology,
the model fit the data well. So, I just decided to use the above model for
pretty much this reason. It’s always good to see good fit indices!
Now, with lavaan, it looks like I have to first store a model in a new
variable, which I label model1. Each mediator and the final outcome
variable are placed on the left-hand side, followed by tilde (~). Then, I
place predictors of each mediator and the outcome variable on the right-
hand side. A model is enveloped with single quotes (‘ ‘). So, I type the
following
1 model1 <- 'party ~ age + sex + race + educ + inco
2 inter ~ age + sex + race + educ + inco
3 suppt ~ party + inter'

Step 3: Run analysis!


Once a model is specified, I estimate the model. Doing so seems pretty
straightforward. I can use the sem() function. And inside the parentheses,
I type model1 – the variable that I stored the model I specified – and
data.
1 results1 <- sem(model1, data=election)

When I run this code, R will store results in another new variable that I
create, results1. To see the results stored in results1, I use the summary ()
function and enter results1 inside the parentheses.
1 summary(results1)

Then, I get the following results.

> summary(results1)

l a v a a n ( 0 . 5 - 2 2 ) c o n v e rged normally after 36


iterations

Used Total

Nu m b e r o f o b s e r v a t i o n s 630 677

Estimator ML
Mi n i m u m F u n c t i o n T e s t Statistic 13.542

De g r e e s o f f r e e d o m 6

P- v a l u e ( C h i- s q u a r e ) 0.035

Parameter Estimates:

Information Expected

St a n d a r d E r r o r s S t a n d ard

Regressions:

Es t i m a t e S t d . E r r z- v alue P(>|z|)

pa r t y ~

ag e 0 . 0 5 6 0 . 0 4 7 1 . 1 8 5 0.236

se x 0 . 2 3 9 0 . 1 6 0 1 . 4 9 1 0.136

ra c e - 1 . 1 8 8 0 . 1 8 5 - 6 .408 0.000

ed u c 0 . 1 4 5 0 . 0 5 1 2 . 8 1 9 0.005

in c o - 0 . 1 2 5 0 . 0 3 9 - 3 .228 0.001

in t e r ~

ag e 0 . 1 8 1 0 . 0 4 4 4 . 1 4 4 0.000

se x - 0. 1 8 5 0 . 1 4 8 - 1 . 248 0.212
ra c e - 0 . 0 3 4 0 . 1 7 1 - 0 .198 0.843

ed u c 0 . 0 1 8 0 . 0 4 7 0 . 3 8 7 0.699

in c o 0 . 1 0 4 0 . 0 3 6 2 . 9 0 0 0.004

su p p t ~

pa r t y -0 . 5 6 7 0 . 0 2 2 - 25.368 0.000

in t e r 0 . 1 4 5 0 . 0 2 5 5 . 8 78 0.000

Variances:

Es t i m a t e S t d . E r r z- v alue P(>|z|)

.p a r t y 3 . 4 7 9 0 . 1 9 6 1 7 .748 0.000

.i n t e r 2 . 9 7 3 0 . 1 6 8 1 7 .748 0.000

.s u p p t 1 . 1 9 9 0 . 0 6 8 1 7 .748 0.000

These results indicate that respondents who were Caucasians and who
had higher income were stronger Republicans. In contrast, those who
had higher education were stronger Democrats.

Age and income had a positive relationship with political interest. Older
and richer respondents showed higher levels of interest in politics. Then,
stronger Republicans and more politically interested individuals more
strongly supported Trump.

Before moving forward!


Before moving forward, I need to consider a few things.
1. The coefficients I get are all unstandardized. If I want to see the
relative importance of each predictor and directly compare which
variable plays a bigger or smaller role in this model, maybe I need
standardized coefficients.

2. Maybe, I need to know how much variance in support for Trump this
model accounts for.

3. Looking at the model fit index – Minimum Function Test Statistic


(namely chi-square statistics) and its corresponding p-value – it’s
significant. This means the model was actually significantly different from
the data. Or, the model did not fit the data well. Maybe, I need to modify
the model a little bit – free up some parameters to improve the fit. How
do I do that? With lavaan, I can request modification indices.

4. Finally, the above result only shows one model fit index: Minimum
Function Test Statistic (chi-square). But, chi-square tends to be sensitive
to sample size. When data have large sample size, chi-square tends to
be significant (indicating that the model is significantly different from
data, instead of approximating data). So, I may end up making an
erroneous conclusion. I need to see other model fit indices.

What if I need these statistics? I learned I can add a few things in


the summary () function.
1 summary(results1, standardized=TRUE, fit.measures=TRUE, rsq=TRUE, modindices=TRUE)
standardized=TRUE means I am requesting standardized coefficients.
 fit.measures=TRUE means I am requesting other model fit indices.
 modindices=TRUE means I am requesting modification indices
Then, I get the following results!

> s u m m a r y ( r e s u l t s 1 , s t andardized=TRUE,
f i t . m e a s u r e s = T R U E , r s q =TRUE, modindices=TRUE)

l a v a a n ( 0 . 5 - 2 2 ) c o n v e rged normally after 36


iterations

Used Total
Nu m b e r o f o b s e r v a t i o n s 630 677

Estimator ML

Mi n i m u m F u n c t i o n T e s t Statistic 13.542

De g r e e s o f f r e e d o m 6

P- v a l u e ( C h i- s q u a r e ) 0.035

M o d e l t e s t b a s e l i n e m o del:

M i n i m u m F u n c t i o n T e s t Statistic 566.979

De g r e e s o f f r e e d o m 1 8

P- v a l u e 0 . 0 0 0

U s e r m o d e l v e r s u s b a s e line model:

C o m p a r a t i v e F i t I n d e x (CFI) 0.986

Tu c k e r -L e w i s I n d e x ( TLI) 0.959

L o g l i k e l i h o o d a n d I n f o rmation Criteria:
L o g l i k e l i h o o d u s e r m o d el (H0) -7954.627

Lo g l i k e l i h o o d u n r e s t r icted model (H1) - 7947.856

N u m b e r o f f r e e p a r a m e t ers 15

Ak a i k e ( A I C ) 1 5 9 3 9 . 2 5 4

Ba y e s i a n ( B I C ) 1 6 0 0 5 . 940

Sa m p l e -s i z e a d j u s t e d Bayesian (BIC) 15958.316

R o o t M e a n S q u a r e E r r o r of Approximation:

RMSEA 0.045

90 P e r c e n t C o n f i d e n c e Interval 0.011 0.077

P- v a l u e R M S E A < = 0 . 0 5 0.558

S t a n d a r d i z e d R o o t M e a n Square Residual:

SRMR 0.019

Parameter Estimates:
Information Expected

St a n d a r d E r r o r s S t a n d ard

Regressions:

Es t i m a t e S t d . E r r z- v alue P(>|z|) Std.lv Std.all

pa r t y ~

ag e 0 . 0 5 6 0 . 0 4 7 1 . 1 8 5 0.236 0.056 0.047

se x 0 . 2 3 9 0 . 1 6 0 1 . 4 9 1 0.136 0.239 0.061

ra c e - 1 . 1 8 8 0 . 1 8 5 - 6 .408 0.000 -1.188 - 0.270

ed u c 0 . 1 4 5 0 . 0 5 1 2 . 8 1 9 0.005 0.145 0.122

in c o - 0 . 1 2 5 0 . 0 3 9 - 3 .228 0.001 -0.125 - 0.138

in t e r ~

ag e 0 . 1 8 1 0 . 0 4 4 4 . 1 4 4 0.000 0.181 0.167

se x - 0. 1 8 5 0 . 1 4 8 - 1 . 248 0.212 - 0.185 - 0.052

ra c e - 0 . 0 3 4 0 . 1 7 1 - 0 .198 0.843 -0.034 - 0.009

ed u c 0 . 0 1 8 0 . 0 4 7 0 . 3 8 7 0.699 0.018 0.017

in c o 0 . 1 0 4 0 . 0 3 6 2 . 9 0 0 0.004 0.104 0.126

su p p t ~

pa r t y -0 . 5 6 7 0 . 0 2 2 - 25.368 0.000 -0.567 - 0.701

in t e r 0 . 1 4 5 0 . 0 2 5 5 . 8 7 8 0.000 0.145 0.162


Variances:

Es t i m a t e S t d . E r r z- v alue P(>|z|) Std.lv Std.all

.p a r t y 3 . 4 7 9 0 . 1 9 6 1 7 .748 0.000 3.479 0.914

.i n t e r 2 . 9 7 3 0 . 1 6 8 1 7 .748 0.000 2.973 0.948

.s u p p t 1 . 1 9 9 0 . 0 6 8 1 7 .748 0.000 1.199 0.481

R -S q u a r e :

Es t i m a t e

pa r t y 0 . 0 8 6

in t e r 0 . 0 5 2

su p p t 0 . 5 1 9

Modification Indices:

l h s o p r h s m i e p c s e p c .lv sepc.all sepc.nox

1 6 a g e ~ ~ a g e 0 . 0 0 0 0 . 000 0.000 0.000 0.000

1 7 a g e ~ ~ s e x 0 . 0 0 0 0 . 000 0.000 0.000 0.000

1 8 a g e ~ ~ r a c e 0 . 0 0 0 0 .000 0.000 0.000 0.000

1 9 a g e ~ ~ e d u c 0 . 0 0 0 0 .000 0. 000 0.000 0.000


2 0 a g e ~ ~ i n c o 0 . 0 0 0 0 .000 0.000 0.000 0.000

2 1 s e x ~ ~ s e x 0 . 0 0 0 0 . 000 0.000 0.000 0.000

2 2 s e x ~ ~ r a c e 0 . 0 0 0 0 .000 0.000 0.000 0.000

2 3 s e x ~ ~ e d u c 0 . 0 0 0 0 .000 0.000 0.000 0.000

2 4 s e x ~ ~ i n c o 0 . 0 0 0 0 .000 0.000 0.000 0.000

2 5 r a c e ~ ~ r a c e 0 . 0 0 0 0.000 0.000 0.000 0.000

2 6 r a c e ~ ~ e d u c 0 . 0 0 0 0.000 0.000 0.000 0.000

2 7 r a c e ~ ~ i n c o 0 . 0 0 0 0.000 0.000 0.000 0.000

2 8 e d u c ~ ~ e d u c 0 . 0 0 0 0.000 0.000 0.000 0.000

2 9 e d u c ~ ~ i n c o 0 . 0 0 0 0.000 0.000 0.000 0.000

3 0 i n c o ~ ~ i n c o 0 . 0 0 0 0.000 0.000 0.000 0.000

3 1 p a r t y ~ ~ i n t e r 0 . 1 0 0 -0.041 - 0.041 - 0.012 -0.012

3 2 p a r t y ~ ~ s u p p t 3 . 2 1 5 0.498 0.498 0.162 0.162

3 3 i n t e r ~ ~ s u p p t 4 . 5 2 4 0.699 0.699 0.250 0.250

3 4 p a r t y ~ i n t e r 0 . 1 0 0 - 0.014 -0.014 -0.012 -0.012

3 5 p a r t y ~ s u p p t 1 . 4 9 0 0.223 0.223 0.181 0.181

3 6 i n t e r ~ p a r t y 0 . 1 0 0 - 0.012 -0.012 -0.013 -0.013

3 7 i n t e r ~ s u p p t 0 . 6 3 7 0.050 0.050 0.045 0.045

3 8 s u p p t ~ a g e 1 . 0 4 9 - 0.028 - 0.028 -0.029 -0.018

3 9 s u p p t ~ s e x 0 . 0 5 6 - 0.021 - 0.021 -0.007 -0.013


4 0 s u p p t ~ r a c e 1 . 8 2 2 0.137 0.137 0.038 0.087

4 1 s u p p t ~ e d u c 9 . 2 8 3 -0.082 -0.082 - 0.085 - 0.052

4 2 s u p p t ~ i n c o 4 . 2 2 6 -0.042 -0.042 - 0.058 - 0.027

4 3 a g e ~ p a r t y 0 . 0 0 0 0 .000 0.000 0.000 0.000

4 4 a g e ~ i n t e r 0 . 0 0 0 0 .000 0.000 0.000 0.000

4 5 a g e ~ s u p p t 0 . 8 3 4 - 0.051 - 0.051 -0.049 -0.049

4 6 a g e ~ s e x 0 . 0 0 0 0 . 0 00 0.000 0.0 00 0.000

4 7 a g e ~ r a c e 0 . 0 0 0 0 . 000 0.000 0.000 0.000

4 8 a g e ~ e d u c 0 . 0 0 0 0 . 000 0.000 0.000 0.000

4 9 a g e ~ i n c o 0 . 0 0 0 0 . 000 0.000 0.000 0.000

5 0 s e x ~ p a r t y 0 . 0 0 0 0 .000 0.000 0.000 0.000

5 1 s e x ~ i n t e r 0 . 0 0 0 0 .000 0.000 0.000 0.000

5 2 s e x ~ s u p p t 0 . 8 5 5 - 0.015 - 0.015 -0.047 -0.047

5 3 s e x ~ a g e 0 . 0 0 0 0 . 0 00 0.000 0.000 0.000

5 4 s e x ~ r a c e 0 . 0 0 0 0 . 000 0.000 0.000 0.000

5 5 s e x ~ e d u c 0 . 0 0 0 0 . 000 0.000 0.000 0.000

5 6 s e x ~ i n c o 0 . 0 0 0 0 . 000 0.000 0.000 0.000

5 7 r a c e ~ p a r t y 0 . 0 0 0 0.000 0.000 0.000 0.000

5 8 r a c e ~ i n t e r 0 . 0 0 0 0.000 0.000 0.000 0.000

5 9 r a c e ~ s u p p t 2 . 2 8 7 0.021 0.021 0.075 0.075


6 0 r a c e ~ a g e 0 . 0 0 0 0 . 000 0.000 0.000 0.000

6 1 r a c e ~ s e x 0 . 0 0 0 0 . 000 0.000 0.000 0.000

6 2 r a c e ~ e d u c 0 . 0 0 0 0 .000 0.000 0.000 0.000

6 3 r a c e ~ i n c o 0 . 0 0 0 0 .000 0.000 0.000 0.000

6 4 e d u c ~ p a r t y 0 . 0 0 0 0.000 0.000 0.000 0.000

6 5 e d u c ~ i n t e r 0 . 0 0 0 0.000 0.000 0.000 0.000

6 6 e d u c ~ s u p p t 4 . 2 9 7 -0.105 -0.105 - 0.101 - 0.101

6 7 e d u c ~ a g e 0 . 0 0 0 0 . 000 0.000 0.000 0.000

6 8 e d u c ~ s e x 0 . 0 0 0 0 . 000 0.000 0.000 0.000

6 9 e d u c ~ r a c e 0 . 0 0 0 0 .000 0.000 0.000 0.000

7 0 e d u c ~ i n c o 0 . 0 0 0 0 .000 0.000 0.000 0.000

7 1 i n c o ~ p a r t y 0 . 0 0 0 0.000 0.000 0.000 0.000

7 2 i n c o ~ i n t e r 0 . 0 0 0 0.000 0.000 0.000 0.000

7 3 i n c o ~ s u p p t 0 . 6 0 7 -0.052 -0.052 - 0.038 - 0.038

7 4 i n c o ~ a g e 0 . 0 0 0 0 . 000 0.000 0.000 0 .000

7 5 i n c o ~ s e x 0 . 0 0 0 0 . 000 0.000 0.000 0.000

7 6 i n c o ~ r a c e 0 . 0 0 0 0 .000 0.000 0.000 0.000

7 7 i n c o ~ e d u c 0 . 0 0 0 0 .000 0.000 0.000 0.000

Looking at the fit indices, the model seems pretty good.


Step 4: Test indirect effects
When I see the results under “Regressions,” I just see path coefficients. I
do not see specific significance tests for the indirect effects of age, sex,
race, education, and income on support for Trump through party
affiliation and political interest.

To test the indirect effects with lavaan, apparently I need to give labels
to each parameter and use those labels in a model syntax. Then, I use
the “:=” operator to define new parameters. So, I type the following.
1
2 model2 <- 'party ~ a1*age + a2*sex + a3*race + a4*educ + a5*inco
3 inter ~ a6*age + a7*sex + a8*race + a9*educ + a10*inco
4 suppc ~ b1*party + b2*inter + c1*age + c2*sex + c3*race + c4*educ + c5*inco
a1b1 := a1*b1
5 a2b1 := a2*b1
6 a3b1 := a3*b1
7 a4b1 := a4*b1
8 a5b1 := a5*b1
9 a6b2 := a6*b2
a7b2 := a7*b2
10 a8b2 := a8*b2
11 a9b2 := a9*b2
12 a10b2 := a10*b2
13 total := c1 + c2 + c3 + c4+ c5 + (a1*b1) + (a2*b1) + (a3*b1) + (a4*b1) + (a5*b1) +
14

What the above code means is that I am assigning a label to each


parameter to estimate. I use a for paths from exogeneous variables to
mediators, use b for paths from mediators to the outcome variable,
and c for direct paths from exogeneous variables to the outcome
variable.
For example, the path from age to party gets a1. With *, this path is
labeled a1*age. Apparently, I need to use this asterisk (*) between a1
and the variable name. I put this label on the right-hand side of one line. On the left-hand
side of the same line, I type a1b1 (indirect effect of age on support for Trump through party
affiliation) – this is just a descriptive label that I give for this path – followed by the := operator.
The path from sex to party is now labeled a2*sex. Again, that is placed on the right-hand side of one
line. On the left-hand side of the same line, I type a2b1, followed by the := operator.
The path from party to suppt (support for Trump) gets b1*party. The
path from inter to suppt gets b2*inter. The direct path from age to suppt
gets c1*age, and so forth. And, everything is enveloped with single
quotes (‘ ‘). Hope this is all correct!

I run this code and use the sem() function next.


1 results2 <- sem(model2, data=election)
Then see results stored in results2 – I remove modindices=TRUE,
because it makes the output a bit too lengthy.

1 summary(results2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE)

Now, I get the following results.

> r e s u l t s 2 < - s e m ( m o d e l2, data=election)

> s u m m a r y ( r e s u l t s 2 , s t andardized=TRUE,
f i t . m e a s u r e s = T R U E , r s q =TRUE)

l a v a a n ( 0 . 5 - 2 2 ) c o n v e rged normally after 44


iterations

Us e d T o t a l

Nu m b e r o f o b s e r v a t i o n s 630 677

Es t i m a t o r M L

Mi n i m u m F u n c t i o n T e s t Statistic 0.100

De g r e e s o f f r e e d o m 1

P- v a l u e ( C h i- s q u a r e ) 0.752

M o d e l t e s t b a s e l i n e m o del:
Mi n i m u m F u n c t i o n T e s t Statistic 576.823

De g r e e s o f f r e e d o m 1 8

P- v a l u e 0 . 0 0 0

U s e r m o d e l v e r s u s b a s e line model:

Co m p a r a t i v e F i t I n d e x (CFI) 1.000

Tu c k e r -L e w i s I n d e x ( TLI) 1.029

L o g l i k e l i h o o d a n d I n f o rmation Criteria:

Lo g l i k e l i h o o d u s e r m o del (H0) - 7941.289

Lo g l i k e l i h o o d u n r e s t r icted model (H1) - 7941.239

Nu m b e r o f f r e e p a r a m e ters 20

Ak a i k e ( A I C ) 1 5 9 2 2 . 5 7 8

Ba y e s i a n ( B I C ) 1 6 0 1 1 . 492

Sa m p l e -s i z e a d j u s t e d Bayesian (BIC) 15947.994

R o o t M e a n S q u a r e E r r o r of Approximation:
RM S E A 0 . 0 0 0

90 P e r c e n t C o n f i d e n c e Interval 0.000 0.073

P- v a l u e R M S E A < = 0 . 0 5 0.884

S t a n d a r d i z e d R o o t M e a n Square Residual:

SR M R 0 . 0 0 2

Parameter Estimates:

In f o r m a t i o n E x p e c t e d

St a n d a r d E r r o r s S t a n d ard

Regressions:

Es t i m a t e S t d . E r r z- v alue P(>|z|) Std.lv Std.all

pa r t y ~

ag e ( a 1 ) 0 . 0 5 6 0 . 0 4 7 1.185 0.236 0.056 0.047

se x ( a 2 ) 0 . 2 3 9 0 . 1 6 0 1.491 0.136 0.239 0.061

ra c e ( a 3 ) - 1 . 1 8 8 0 . 1 85 -6.408 0.000 -1.188 -0.270


ed u c ( a 4 ) 0 . 1 4 5 0 . 0 5 1 2.819 0.005 0.145 0.122

in c o ( a 5 ) - 0 . 1 2 5 0 . 0 39 -3.228 0.001 -0.125 -0.138

in t e r ~

ag e (a 6 ) 0 . 1 8 1 0 . 0 4 4 4.144 0.000 0.181 0.167

se x ( a 7 ) - 0 . 1 8 5 0 . 1 4 8 - 1.248 0.212 - 0.185 - 0.052

ra c e ( a 8 ) - 0 . 0 3 4 0 . 1 71 -0.198 0.843 -0.034 -0.009

ed u c ( a 9 ) 0 . 0 1 8 0 . 0 4 7 0.387 0.699 0.018 0.017

in c o ( a 1 0 ) 0 . 1 0 4 0 . 0 3 6 2.900 0.004 0.104 0.126

su p p c ~

pa r t y ( b 1 ) 0 . 5 7 1 0 . 0 2 3 24.937 0.000 0.571 0.706

in t e r ( b 2 ) 0 . 0 7 8 0 . 0 2 5 3.155 0.002 0.078 0.088

ag e ( c 1 ) - 0 . 0 3 4 0 . 0 2 7 - 1.236 0.217 - 0.034 - 0.035

se x ( c 2 ) 0 . 1 1 3 0 . 0 9 2 1.226 0.220 0.113 0.036

ra c e ( c 3 ) - 0 . 2 6 8 0 . 1 10 -2.436 0.015 -0.268 -0.075

ed u c ( c 4 ) 0 . 0 2 1 0 . 0 3 0 0.714 0.475 0.021 0.022

in c o ( c 5 ) 0 . 0 0 4 0 . 0 2 3 0.180 0.857 0.004 0.006

Variances:

Es t i m a t e S t d . E r r z- v alue P(>|z|) Std.lv Std.all

.p a r t y 3 . 4 7 9 0 . 1 9 6 1 7 .748 0.000 3.479 0.914


.i n t e r 2 . 9 7 3 0 . 1 6 8 1 7 .748 0.000 2.973 0.948

.s u p p c 1 . 15 0 0 . 0 6 5 1 7.748 0.000 1.150 0.461

R -S q u a r e :

Es t i m a t e

pa r t y 0 . 0 8 6

in t e r 0 . 0 5 2

su p p c 0 . 5 3 9

Defined Parameters:

Es t i m a t e S t d . E r r z- v alue P(>|z|) Std.lv Std.all

a1 b 1 0 . 0 3 2 0 . 0 2 7 1 . 1 8 4 0.236 0.032 0.033

a2 b 1 0 . 1 3 6 0 . 0 9 2 1 . 4 8 9 0.137 0.136 0.043

a3 b 1 - 0 . 6 7 8 0 . 1 0 9 - 6 .207 0.000 -0.678 - 0.190

a4 b 1 0 . 0 8 3 0 . 0 2 9 2 . 8 0 2 0.005 0.083 0.086

a5 b 1 - 0 . 0 7 2 0 . 0 2 2 - 3 .201 0.001 -0.072 - 0.098

a6 b 2 0 . 0 1 4 0 . 0 0 6 2 . 5 1 1 0.012 0.014 0.015

a7 b 2 - 0 . 0 1 4 0 . 0 1 2 - 1 .161 0.246 -0.014 - 0.005

a8 b 2 - 0 . 0 0 3 0 . 0 1 3 - 0 .198 0.843 -0 .003 - 0.001

a9 b 2 0 . 0 0 1 0 . 0 0 4 0 . 3 8 4 0.701 0.001 0.001


a1 0 b 2 0 . 0 0 8 0 . 0 0 4 2 . 1 35 0.033 0.008 0.011

to t a l -0 . 6 5 6 0 . 1 6 5 - 3.982 0.000 - 0.656 -0.151

>

When I see the results at the very bottom of the output, I see statistical
significance of each indirect effect specified. For example, the indirect
effect of age on support for Trump through party affiliation (a1b1) is not
significaint (p = .236). However, education has an indirect effect on
support for Trump through party affiliation (b = .083, p = .005).

Generate confidence intervals (CIs)


It seems the summary() function can provide basic information. But, if I
want more detailed information, then I can use the parameterEstimates()
function. Based on the lavaan package page, I type the following.
1 parameterEstiamtes(results2, ci = TRUE, level = 0.95, boot.ci.type = "perc", stand

With the above code, I am requesting 95% percentile bootstrap confidence


intervals, as well as standardized coefficients. Then, I get the following. I
think lhs in the header refers to left-hand side column, and rhs right-hand
side column. Op refers to operator

> p a r a m e t e r E s t i m a t e s ( r esults2, ci=TRUE, level=0.95,


b o o t . c i . t y p e = " p e r c " , s tandardized=TRUE)

lh s o p r h s l a b e l e s t se z pvalue ci.lower ci.upper


std.lv std.all std.nox

1 p a r t y ~ a g e a 1 0 . 0 5 6 0.047 1.185 0.236 - 0.037 0.148


0.056 0.047 0.029

2 p a r t y ~ s e x a 2 0 . 2 3 9 0.160 1.491 0.136 - 0.075 0.552


0.239 0.061 0.122
3 p a r t y ~ r a c e a 3 - 1 . 188 0.185 - 6.408 0.000 -1.551 -
0 . 8 2 5 - 1 . 1 8 8 - 0 . 2 7 0 - 0.609

4 p a r t y ~ e d u c a 4 0 . 1 4 5 0.051 2.819 0.005 0.044 0.245


0.145 0.122 0.074

5 p a r t y ~ i n c o a 5 - 0 . 125 0.039 - 3.228 0.001 -0.202 -


0 . 0 4 9 - 0 . 1 2 5 - 0 . 1 3 8 - 0.064

6 i n t e r ~ a g e a 6 0 . 1 8 1 0.044 4.144 0.000 0.095 0.266


0.181 0.167 0.102

7 i n t e r ~ s e x a 7 -0 . 1 85 0.148 -1.248 0.212 - 0.475


0 . 1 0 5 - 0 . 1 8 5 - 0 . 0 5 2 - 0.104

8 i n t e r ~ r a c e a 8 - 0 . 034 0.171 - 0.198 0.843 -0.370


0 . 3 0 2 - 0 . 0 3 4 - 0 . 0 0 9 - 0.019

9 i n t e r ~ e d u c a 9 0 . 0 1 8 0.047 0.387 0.699 -0.075


0.111 0.018 0.017 0.010

1 0 i n t e r ~ i n c o a 1 0 0 . 104 0.036 2.900 0.004 0.034


0.175 0.104 0.126 0.059

1 1 s u p p c ~ p a r t y b 1 0 . 571 0.023 24.937 0.000 0.526


0.616 0.571 0.706 0.706

1 2 s u p p c ~ i n t e r b 2 0 . 078 0.025 3.155 0.002 0.030


0.127 0.078 0.088 0.088

1 3 s u p p c ~ a g e c 1 - 0 . 034 0.027 - 1.236 0.217 -0.088


0 . 0 2 0 - 0 . 0 3 4 - 0 . 0 3 5 - 0.022

1 4 s u p p c ~ s e x c 2 0 . 1 1 3 0.092 1.226 0.220 -0.068


0.294 0.113 0.036 0.072

1 5 s u p p c ~ r a c e c 3 - 0 .268 0.110 -2.436 0.015 -0.483 -


0 . 0 5 2 - 0 . 2 6 8 - 0 . 0 7 5 - 0.170

1 6 s u p p c ~ e d u c c 4 0 . 0 21 0.030 0.714 0.475 - 0.037


0.079 0.021 0.022 0.013
1 7 s u p p c ~ i n c o c 5 0 . 0 04 0.023 0.180 0.857 - 0.040
0.049 0.004 0.006 0.003

1 8 p a r t y ~ ~ p a r t y 3 . 4 7 9 0.196 17.748 0.000 3.095


3 . 8 6 4 3 . 4 7 9 0 . 9 1 4 0 . 9 14

1 9 i n t e r ~ ~ i n t e r 2 . 9 7 3 0.168 17.748 0.000 2.645


3.302 2.973 0.948 0.948

2 0 s u p p c ~ ~ s u p p c 1 . 1 5 0 0.065 17.748 0.000 1.023


1.276 1.150 0.461 0.461

2 1 a g e ~ ~ a g e 2 . 6 7 2 0 . 000 NA NA 2.672 2.672 2.672


1.000 2.672

2 2 a g e ~ ~ s e x 0 . 0 5 3 0 . 000 NA NA 0.053 0.053 0.053


0.064 0.053

2 3 a g e ~ ~ r a c e 0 . 1 6 4 0 .000 NA NA 0.164 0.164 0.164


0.227 0.164

2 4 a g e ~ ~ e d u c 0 . 3 0 7 0 .000 NA NA 0.307 0.307 0.307


0.115 0.307

2 5 a g e ~ ~ i n c o 0 . 3 0 8 0 .000 NA NA 0.308 0.308 0.308


0.088 0.308

2 6 s e x ~ ~ s e x 0 . 2 5 0 0 . 000 NA NA 0.250 0.250 0.250


1.000 0.250

2 7 s e x ~ ~ r a c e 0 . 0 8 1 0 .000 NA NA 0.081 0.081 0.081


0.364 0.081

2 8 s e x ~ ~ e d u c - 0 . 0 6 1 0.000 NA NA -0.061 - 0.061 -


0.061 -0.075 -0.061

2 9 s e x ~ ~ i n c o - 0 . 0 5 6 0.000 NA NA -0.056 - 0.056 -


0.056 -0.053 -0.056

3 0 r a c e ~ ~ r a c e 0 . 1 9 6 0.000 NA NA 0.196 0.196 0.196


1.000 0.196
3 1 r a c e ~ ~ e d u c - 0 . 0 5 1 0.000 NA NA -0.051 -0.051 -
0.051 -0.071 -0.051

3 2 r a c e ~ ~ i n c o 0 . 0 1 5 0.000 NA NA 0.015 0.015 0.015


0.015 0.015

3 3 e d u c ~ ~ e d u c 2 . 6 8 9 0.000 NA NA 2.689 2.689 2.689


1.000 2.689

3 4 e d u c ~ ~ i n c o 1 . 5 9 0 0. 000 NA NA 1.590 1.590 1.590


0.451 1.590

3 5 i n c o ~ ~ i n c o 4 . 6 1 6 0.000 NA NA 4.616 4.616 4.616


1.000 4.616

3 6 a 1 b 1 : = a 1 * b 1 a 1 b 1 0.032 0.027 1.184 0.236 - 0.021


0.085 0.032 0.033 0.020

3 7 a 2 b 1 : = a 2 * b 1 a 2 b 1 0.136 0.092 1.489 0.137 - 0.043


0.316 0.136 0.043 0.086

3 8 a 3 b 1 : = a 3 * b 1 a 3 b 1 -0.678 0.109 -6.207 0.000 -


0 . 8 9 3 - 0 . 4 6 4 - 0 . 6 7 8 - 0.190 -0.430

3 9 a 4 b 1 : = a 4 * b 1 a 4 b 1 0.083 0.029 2.802 0.005 0.025


0.140 0.083 0.086 0.052

4 0 a 5 b 1 : = a 5 * b 1 a 5 b 1 -0.072 0.022 -3.201 0.001 -


0 . 1 1 6 - 0 . 0 2 8 - 0 . 0 7 2 - 0.098 -0.045

4 1 a 6 b 2 : = a 6 * b 2 a 6 b 2 0.014 0.006 2.511 0.012 0.003


0.025 0.014 0.015 0.009

4 2 a 7 b 2 : = a 7 * b 2 a 7 b 2 -0.014 0.012 -1.161 0.246 -


0 . 0 3 9 0 . 0 1 0 -0 . 0 1 4 - 0 .005 - 0.009

4 3 a 8 b 2 : = a 8 * b 2 a 8 b 2 -0.003 0.013 -0.198 0.843 -


0 . 0 2 9 0 . 0 2 4 -0 . 0 0 3 - 0 .001 - 0.002

4 4 a 9 b 2 : = a 9 * b 2 a 9 b 2 0.001 0.004 0.384 0.701 - 0.006


0.009 0.001 0.001 0.001
4 5 a 1 0 b 2 : = a 1 0 * b 2 a 1 0 b2 0.008 0.004 2.135 0.033
0 . 0 0 1 0 . 0 1 6 0 . 0 0 8 0 . 0 1 1 0.005

46 total :=
c 1 + c 2 + c 3 + c 4 + c 5 + ( a 1 * b 1 ) +(a2*b1)+(a3*b1)+(a4*b1)+(a5*b1
) + ( a 6 * b 2 ) + ( a 7 * b 2 ) + ( a 8 * b2)+(a9*b2)+(a10*b2) total -
0 . 6 5 6 0 . 1 6 5 -3 . 9 8 2 0 . 0 00 -0.979 -0.333 - 0.656 - 0.151
- 0. 4 1 6

>

Wrapping Up
I think there are some other things I should do, such as analyzing
localized residuals and replicating the results with existing SEM
packages. But, with lavaan, I learned I can do many things.

When I was reading Webpages on lavaan, I could only find mediation


examples involving only one mediator. But, with the above code, I could
run mediation analysis with multiple mediators. I hope this is all correct –
at least R did not give me any error messages.

You might also like