Review of Correlation & Covariance 3
Structural Equation Models 5
Validity and Reliability 8
Classical Test Theory 11
Parallel Measures 13
Factor Analysis 16
Factor Analysis of GSS Job Values 20
PRELIS to Create a Matrix 21
SIMPLIS Commands 22
Confirmatory Factor Analysis 23
Model Fit Statistics 30
A Two-Factor Model 34
Modification Indexes 35
A MIMIC Model 38
A Chain Model 44
Equality Constraints on Parameters 46
Comparing Models for Groups 48
A Path Model 51
Model Identification 54
Factor Analysis of Dichotomies 56
SEM with Ordinal Variables 61
Because structural equation models (SEMs) are based on analyses of
covariance or correlation matrices, a brief review of these descriptive
statistics may be helpful.
The Pearson product-moment correlation coefficient for two continuous
variables, Y and X, measures the amount of dispersion (spread) around a
linear least-squares regression line. For a population, using a Greek letter
to indicate a parameter, the OLS estimator for the bivariate regression
slope of Y on X is:
= N
The numerator of this parameter is the sum across the N observations of
the cross-product of deviations of both variables around their means. The
denominator is the sum of squared deviations of X around its means. If we
divide both the numerator and denominator by N, the regression slope
formula becomes:
= N
The numerator is called the covariance of Y and X and the denominator is
the variance of X. Thus, we can simplify the OLS estimator of the bivariate
regression coefficient as the ratio of those two components:
β =
Depending on the direction of the covariance of Y and X, a bivariate
regression slope may have a positive or negative sign, indicating the
direction of the relationship between Y and X in the population.
In a bivariate regression the population coefficient of determination, ρ2
indicates the proportion of total variation in Y that is determined by its
linear relationship with X. One of its formulas (see SSDA4 pp. 184 for
details) involves the ratio of the squared covariance to the product of both
ρ =
Given to the squaring, the coefficient of determination cannot have a
negative sign.
The Pearson product-moment correlation coefficient is defined as the
square root of the coefficient of determination. It summarizes the linear
relationship and takes the same sign (plus or minus) as the regression
ρρ == 2
Thus, the correlation is also defined as the covariance of Y and X divided
by the product of the standard deviations of both variables. It ranges
between +1.00 and –1.00 and has a value of 0 when the two variables do
not covary (i.e., are unrelated). The sign attached to the correlation must
be the same as the signs of the covariance and the regression slope.
Both correlations and covariances are symmetric; that is, XYYX ρρ = and
XYYX σσ = , which can be ascertained by noting that the order of cross-
product multiplication is irrelevant in the regression slope formula above.
One important relation between covariance and correlation is to observe
what happens when both X and Y are standardized variables; that is,
turned into Z-scores by subtracting the mean and dividing by the standard
deviation. Into the formula above for ρYX, substitute Z-scores for both
ZZ σ
ρ ===
Because the standard deviation of a Z-score is 1.00, the correlation
coefficient for two standardized measures equals their covariance.
Correlation coefficients “scale-free,” that is, they are unaffected by whether
the units of measurement are the original scales or their transformed Z-
scores. We will see that structural equation models can be estimated
using either covariances or correlations (or both).
This part of this course examines the basics of structural equation models
(SEMs), specifically the LISREL (LInear Structural RELations) approach
developed three decades ago by Swedish psychometricians Karl Jöreskog
and Dag Sörbom. We’ll be using LISREL 8.54 to analyze General Social
Survey data. These notes use the simplified commands (SIMPLIS), as does
Chapter 12 in SSDA4. To understand causal diagrams, a good preparation
would be to skim Chapter 11 on causal models and path analysis.
However, I will try to develop everything we need on that topic in these
lecture notes, primarily by working through increasingly complex data
analysis examples.
As with every statistical method, the structural equation approach is more
suitable to some types of data and measures than to others. Two major
uses of LISREL are: (1) to model social psychological attitudes (factor
structures), in which one or more unobserved constructs generate the
variation in several observed indicators; and (2) to estimate parameters for
a causal model, in which some variables are treated as causes of other
variables (the effects). The chief advantage of LISREL over alternative
methods (such as path analysis and index construction), lies in its power
to combine observed measures with relations among unobserved
constructs into a single integrated system.
I like to imagine that the relationship between structural and measurement
levels of analysis can be traced back to a famous philosophical metaphor
in Plato’s New Republic: the shadows that the unenlightened prisoners see
on the cave wall are obscure reflections of an underlying reality which
analysts cannot view directly but can only seek to comprehend through
intellectual reasoning. Concepts and the objects they indicate are not
identical phenomena (the point of René Magritte’s droll painting, “Ceci
n’est pas une pipe”). Similarly, as Plato elsewhere reasoned, a triangle
drawn with pencil and paper is a flawed representation of the abstract,
eternal concept of “triangle” that exists beyond the realm of sensual
perception. By analogy, social scientists can never accurately observe
peoples’ attitudes (not even their visible behaviors), but can only infer their
existence by making noisy, error-prone measurements – such as
respondents’ responses to survey questions – that are only partially
influenced by their unobservable true beliefs (or actions).
“The famous pipe. How people reproached me for it! And yet, could
you stuff my pipe? No, it’s just a representation, is it not? So if I had
written on my picture ‘This is a pipe,’ I'd have been lying!”
- René Magritte
Modern measurement theory concerns the relationships between a latent
construct at the theoretical or conceptual level and observed indicators at
the level of empirical observations:
Complete these examples:
Religiosity ______________________________
Industrialization ______________________________
Delinquency ______________________________
Centralization ______________________________
Intelligence ______________________________
_________________ Sudden numbness, confusion,
difficulty seeing, severe headache,
loss of coordination & balance
_________________ Fewer social services; low tax
rates; stronger national defense
Observed indicators
Latent construct
Status (SES)
Measurement theory seeks to represent a latent construct with one or more
observable indicators (operational measure or variable) that accurately
capture that theoretical construct. Two desirable properties of empirical
quantitative measures are high levels of validity and reliability:
• Validity: The degree to which a variable’s operationalization
accurately reflects the concept it is intended to measure.
• Reliability: The extent to which different operationalizations of the
same concept produce consistent results. The proportion of an
item’s variance that is attributable to the unobserved cause or
Many validity issues concern how well or poorly an observable variable
reflects its latent counterpart. Another central concern is with accurately
depicting the (causal or covariational) relationships among several
theoretical constructs, using information about the covariation among
observed indicators. This latter interest lies at the heart of the factor
analysis and structural equation models examined in later sections.
Reliability refers to the replicability of a measure under the same
conditions. A perfectly reliable measure must generate the same scores
when conditions are identical. A measure may be very reliable but not
valid; that is, an instrument can precisely measure some phenomenon yet
represent complete nonsense. For example, your bathroom scale
consistently gives identical readings when you step off and on, but it
invalidly operationalizes your true weight (you dialed it back 5 pounds).
To be valid, a measure or indicator must be reliable. In the extreme,
if a measure’s reliability is zero, its validity is also zero. However, a given
indicator may vary in the extent of its validity as a measure of different
concepts. For example, education, measured as years of formal schooling,
might be used both as an indicator of educational persistence and as an
indicator of socioeconomic status (SES). Validity is clearly affected by the
choice of one’s indicator(s). For example, we can treat church attendance
as a measure of Americans’ religiosity, but this indicator might have only
moderate validity because some highly religious persons don’t attend
services, and some go to church mainly for social purposes. A more valid
measure of religiosity would include not only attendance at religious
services, but also would query people about their religious beliefs (e.g., in
the efficacy of prayer, the existence of an afterlife, and infallibility of
Unfortunately, researchers never obtain perfect measurements in the real
world; that is, all measures are subject to measurement error, hence they
are all unreliable and invalid to some greater or lesser degree.
Measurement theory is therefore also a theory about the magnitudes and
sources of errors in empirical observations.
Reliability assumes random errors. When a measurement is repeated over
numerous occasions under the same conditions, if random error occurs,
then the resulting variations in scores form a normal distribution about the
measure’s true value. The standard error of that distribution represents
the magnitude of the measurement error: the larger the standard error, the
lower the measure’s reliability. By definition, random errors are
uncorrelated with any variable, including other random error variables.
Natural scientists also face measurement reliability problems. For
example, astronomers made important contributions to measurement
theory by developing techniques for estimating the true transit times
of Jupiter’s moons from erroneous telescopic observations. (See
Stephen M. Stigler. 1986. The History of Statistics: The Measurement
of Uncertainty Before 1900. Cambridge, MA: Harvard University
Systematic error (nonrandom error) implies a miscalibration of the
measuring instrument that biases the scores by consistently over- or
underestimating a latent construct (e.g., your miscalibrated bathroom
scale). Such consistent biases don’t alter the measure’s reliability, but
they clearly alter its validity because they prevent the indicator from
accurately representing the theoretical concept.
The research methodology literature discusses several types of validity,
but we lack space to examine all these conceptual distinctions (Box 12.1
defines a variety of validity concepts). For purposes of explicating
structural equations models, we’ll assume that the empirical observations
we use have adequate content validity as indicators of the designated
latent constructs. Therefore, we turn next to the quantification of reliability
in classical test theory.
BOX 12.1 Varieties of Validity
Validity indicates the appropriateness of a measurement instrument, such
as a battery of test items, for the concept it intends to measure. In other
words, an instrument’s validity denotes the extent to which measures
what it is supposed to measure. Validity can be established by experts
knowledgeable about a substantive domain, or by demonstrating a
measure’s consistency with the theoretical concepts it is designed to
represent. Three traditional types of measurement validity are construct,
criterion-related, and content validity. Brief definitions and examples of
these various validity types are:
Construct validity: the extent to which a measure agrees with theoretical
expectations; for example, IQ test items try to measure theoretically
hypothesized dimensions of intelligence. Measures with high convergent
validity and discriminant validity exhibit high agreement with theoretically
similar measures but low correlations with dissimilar measures,
Criterion-related validity: the extent to which a measure accurately
predicts performances on some subsequently observable activity (the
criterion); for example, how highly a written driving-test score correlates
with people’s actual skills in operating an automobile. A measure’s
concurrent validity is assessed by its ability to discriminate between
persons with and without the criterion. A measure’s predictive validity is
demonstrated by its accuracy in forecasting future behavior.
Content validity: the extent to which a measure adequately represents
the defined domain of interest that it was designed to measure; for
example, a mathematical ability test should cover the full range of
students’ mathematical knowledge.
Classical Test Theory
Classical test theory depicts the observed score (X) of respondent i on a
measuring instrument, such as a test battery or survey item, as arising
from two hypothetical unobservable sources: the respondent’s “true
score” and an error component:
A person’s true score is the average that would be obtained across
infinitely repeated measures of X. In the theoretical definition of random
error, the distribution of error forms a normal distribution around a mean
value of zero. Because the ± error deviations around the true score cancel
one another, the expected value (mean) of the errors is zero and the
expected value of the observed scores equals the respondent i’s true
iTiXE µ=)(
Further, the error term is assumed to be uncorrelated with its true score
(which makes sense if the errors are really random). Hence, both
components make unique contributions to the variances of the observed
scores in a population:
εσσσ += TX
That is, the observed score variance is the sum of the true score variance
plus error variance. For good measures, the error variance is small relative
to the observed variance; poor measures have the opposite pattern.
Truei Xi Errori
Xi = Ti + ei
The reliability of X is defined as the ratio between true score and observed
score variances (“rho” here is not the same as the Pearsonian correlation):
Note that reliability ranges between 0 (when the true score variance is zero)
and 1 (when the error variance is zero). Values between these extremes
reflect the relative proportions of error and true score variation in the
measure of X.
Rearranging the definition of reliability reveals that the true score variance
equals the observed score variance times the reliability:
XXT σρσ =
Hence, we can estimate the unobserved true score variance from a
measure’s reliability and its observed variance.
Finally, reliability can also be expressed using the error term*:
ρ −=
This formula again demonstrates that reliability ranges between 0 and 1: if
the entire observed variance is error, ρX = 0; but if no random error exists,
then ρX = 1.
* The derivation of the error term above is:
Parallel Measures
If we had a second measure of the same unobservable construct that
differed from the first indicator only in their errors (the true scores are
equal), we would have two parallel measures where:
Assuming that the population variances of their error terms are equal, then
a measure’s reliability is the correlation of the parallel forms. The proof:
1. The correlation coefficient for two variables is defined as the ratio of the
covariance to the product of standard deviations:
X1i = Ti + e1i
X2i = Ti + e2i
2. In the numerator, substitute the two variables’ true and error scores and
multiply the subscripts:
= ++
The three terms on the right are zero because the error terms and true
scores are uncorrelated.
3. Because the standard deviations of parallel measures are equal, the
denominator simplifies to:
21 XXX σσσ =
4. Hence, by substituting this term into the denominator in step 1, the
correlation coefficient for parallel measures becomes:
5. We previously defined the right-side expression in step 4 as the
reliability; therefore:
XXXr ρ=21
An important consequence of this identity is that the true score’s variance
can be estimated as the product of just two empirical measures, the
correlation coefficient and the variance. Rearranging step 4 above:
21 XXXT σρσ =
The correlation between the true score and an observed variable equals
the square root of the reliability, which is also the square root of the
correlation between two parallel measures:
211 XXXTX ρρρ ==
This equation shows that the correlation between an observable indicator
and the unobservable true score it measures can be estimated as the
square root of the reliability of indicator X. For example, if X has reliability
= 0.64, then the correlation with its true score = 0.80. What is the reliability
of X if its true-score correlation = 0.81?
The measurement theory principles discussed in this section are
incorporated into structural equation models, which I introduce next
through the confirmatory factor analytic approach to modeling the
relationships between observed indicators and latent constructs.
Factor analysis refers to a family of statistical methods that represent the
relationships among a set of observed variables in terms of an
hypothesized smaller number of latent constructs, or common factors. The
common factors are assumed to generate the observed variables’
covariations (or correlations, if all measures are standardized with zero
means and unit variances). For example, respondents’ observed scores on
several mental ability tests (e.g., IQ, SAT, GRE exams) allegedly result from
unobserved common verbal and quantitative factors. Or covariations
among numerous socioeconomic indicators of urban communities depend
on latent industrialization, health, and welfare factors.
Of the two major classes of factor analysis, exploratory and confirmatory,
we limit our discussion to the latter. In confirmatory factor analysis (CFA)
a researcher posits an a priori theoretical measurement model to describe
or explain the relationship between the underlying unobserved constructs
(“factors”) and the empirical measures. Then, the analyst uses statistical
fit criteria to assess the degree to which the sample data are consistent
with the posited model; that is, to ask whether the results confirm the
hypothesized model? In practice, however, researchers seldom conduct
only one test of a confirmatory factor model. Rather, based on initial
estimates, they typically alter some model specifications and re-analyze
the new model, trying to improve its fit to the data. Hence, most
applications of CFA to investigate latent factors involve successive
modeling attempts. We apply this successive model-fitting strategy in
estimating alternative models to explain the empirical relationships among
a set of observed variables.
Researchers use confirmatory factor analysis to estimate the parameters of
a measurement model. Consider this diagram showing a single latent
factor measured by four empirical variables.
F = latent common factor
Xi = observed variable i (indicator)
ei = unobserved “error” source (unique factor) for variable Xi
bi = “factor loading” effect of common factor F on observed variable Xi
di = effect of unique factor ei on observed variable Xi
This diagram implies that, if the latent variable were observed, it would
produce values of the indicators. Each observed score is a linear
combination of this common factor plus a unique error term. We can see
these relationships clearly by writing the four implied measurement
equations, which closely resemble the classical test theory equation:
X1 = b1 F + d1 e1
X2 = b2 F + d2 e2
X3 = b3 F + d3 e3
X4 = b4 F + d4 e4
X1 X2 X3 X4
e1 e4e3e2
Note the non-coincidental similarity for these factor analytic equations to
classical test theory’s representation of an observed score as a sum of a
true score and an error term.
The diagram above shows that the error terms are uncorrelated with the
factors and among themselves. Hence, the only sources of indicator i’s
variance are the common factor F and the indicator’s unique error term:
ii FiX εσβσ Θ+=
iεΘ signifies the variance of the error in Xi. Because F is
unobserved, its variance is unknown. And because it is unknown, we can
assume it is a standardized variable, which means that its variance = 1.0.
ii iX εβσ Θ+=
Note that this formulation closely resembles the classical test theory
equation in which the variance of a measure equals the sum of two
components -- the true score variance plus the error variance. Next note
that if we standardize Xi, then the sum of these two components must
equal 1.0. A CFA model has another similarity to the classical test theory.
The reliability of indicator Xi is defined as the squared correlation between
a factor and an indicator. This value is the proportion of variation in Xi that
is statistically “explained” by the common factor (the “true score” in
classical test theory) that it purports to measure.
iFXX ii
βρρ ==
Hence, item reliability equals the square of its factor loading.
Finally, the covariance between two indicators in a single-factor model is
the expected value of the product of their two factor loadings:
)])([( 221121
εβεβσ ++= FFEXX
which, because the error terms are uncorrelated with the factor and with
each other, simplifies to:
ββσββσ == FXX
When all variances standardized, this relationship further simplifies to:
2121 ZZZZ ρσ =
That is, the correlation of a pair of observed variables loading on a
common factor is the product of their standardized factor loadings.
What are the reliabilities of each indicator X, the error variances, and the
expected correlations between every pair of X’s for this single-factor
X1 X2 X3 X4
Factor Analysis of GSS Job Values
To illustrate LISREL procedures for estimating a single-factor model, I use
responses in the 1998 General Social Survey to seven questions about the
importance of particular job values:
On the following list there are various aspects of jobs. Please circle one number to
show how important you personally consider it is in a job:
• SECJOB: Job security?
• HHINC: High income?
• PROMOTN: Good opportunities for advancement?
• INTJOB: An interesting job?
• WRKINDP: A job that allows someone to work independently?
• HLPOTHS: A job that allows someone to help other people?
• HLPSOC: A job that is useful to society?
The response categories ranged from “Very important” = 1 to “Not
important at all” = 5. Here are SPSS recode commands that reverse-code
those values, then write out a data file to be subsequently read by the
PRELIS program and create a covariance matrix for input into LISREL:
RECODE secjob hiinc promotn intjob wrkindp hlpoths hlpsoc
(1=5)(2=4)(3=3)(4=2)(5=1)(ELSE = -999).
secjob hiinc promotn intjob wrkindp hlpoths hlpsoc (7F5.0).
FREQUENCIES VAR=secjob hiinc promotn intjob wrkindp hlpoths hlpsoc.
By including (ELSE = -999), SPSS changes the three missing value codes
on each variable to -999. The WRITE OUTFILE command stores the seven
recoded variables as a fixed-format ASCII file (JOBVALS.TXT). The format
(7F5.0) creates five-column fields to contain the new variable values,
allowing at least one space separation between each score. Here are a few
lines from the JOBVALS.TXT file:
4 3 4 4 4 4 4
5 5 4 4 4 4 4
4 4 4 5 4 4 4
-999 -999 -999 -999 -999 -999 -999
-999 -999 -999 -999 -999 -999 -999
4 4 3 4 3 4 4
5 4 4 4 4 3 3
PRELIS to Create a Matrix
The ultimate data analyzed by LISREL 8.54 comprise a matrix of
covariations or product-moment (Pearson) correlations among the
indicator variables. PRELIS 2.30 in the LISREL program can set up a
matrix from data that are either entered interactively or imported from an
SPSS.SAV file. This section shows another option, where these PRELIS
commands are saved in an ASCII text file called JOBVALS.PR2:
DATA NI=8 NO=2832 MI=-999 TR=LI
To run this job, launch LISREL 8.54, click “File” on the upper left
toolbar, then “Open”. In the window select JOBVALS.PR2, then click
“Open.” Again click “File” on the upper left toolbar, then “Run
PRELIS” to execute. The printout will be stored in a file named
NOTES: The first line is an optional title; I included its own file name
DATA is the input data description, where:
NI = number of observed indicators (variables in the datafile)
NO = number of observations (total number of cases)
MI = missing value codes (if more than one, separated with commas)
TR = LI indicates listwise deletion: calculations are based only on
cases with no missing values on any variable. TR=PA is
pairwise deletion: for each pair of variables, computations are
based on all cases with nonmissing values on both variables.
RAW specifies the external file where the “raw” data are stored:
FI=JOBVALS.TXT. The FO option indicates that the format will appear on
the next line. If FO is omitted, the format is either the first line of the
external raw data file, or the data are stored in free format (separated by
spaces, commas, or return characters).
(7F5.0) indicates the case records in the external file format consist of
seven 5-column fields with no decimals.
LABELS command assigns the sequence of names on the next line to the
NI variables; maximum label length is eight characters
CONTINUOUS defines the listed variables as interval-level measures. By
default, PRELIS2 treat variables with fewer than 16 values as ordinal.
OUTPUT command where:
MATRIX = CM computes a covariance matrix
MATRIX = KM computes a correlation matrix
SM = FILENAME for storing the matrix to be read into LISREL
The covariance matrix below (edited from the output) was computed using
listwise data from 1,129 respondents:
-------- -------- -------- -------- -------- -------- --------
SECJOB 0.442
HIINC 0.167 0.594
PROMOTN 0.194 0.270 0.514
INTJOB 0.119 0.132 0.207 0.399
WRKINDP 0.068 0.139 0.178 0.213 0.616
HLPOTHS 0.068 0.062 0.159 0.170 0.244 0.596
HLPSOC 0.098 0.065 0.146 0.178 0.214 0.430 0.649
-------- -------- -------- -------- -------- -------- --------
Means 4.508 3.982 4.233 4.461 4.051 4.069 4.060
Std Devs 0.665 0.770 0.717 0.632 0.785 0.772 0.806
SIMPLIS Commands
In the earliest versions of LISREL, analysts had to specify the full set of
eight parameter matrices, indicating which coefficients were constrained to
zero and which were free to vary. A major benefit of this approach was to
force researchers to think very carefully and completely about their
models. However, the opportunities for errors were numerous and
frustrating to the learning process. LISREL 8 introduced the SIMPLIS
(SIMPlified LISREL) command language that avoids the necessity to
completely specify the parameter matrices. It undoubtedly speeds the
model-testing process and reduces trial-and-error learning. These Notes
present a variety of examples using SIMPLIS commands.
Methodologists usually describe factor analysis with LISREL as
confirmatory factor analysis (CFA) because the researcher formulates an a
priori theoretical model to describe or explain the empirical data. Then,
statistical analyses determine whether the sample data are consistent with
the imposed model; that is, do the results confirm the substantively
generated model? In practice, however, researchers seldom conduct only
one test of a factor model. Rather, based on the initial parameter
estimates, they typically alter some specifications and re-analyze the new
model, trying to improve its fit to the data. Hence, most applications of
LISREL to investigate latent factors are mixtures of exploratory and
confirmatory procedures.
During my initial analyses of GSS job values, I discovered that a single
factor could not account for the observed covariances among the seven
items. So, to demonstrate how LISREL estimates a single-factor model, I
concentrate on the relations among the first four GSS indicators (SECJOB,
HIINC, PROMOTN, INTJOB), most of which appear to emphasize “extrinsic”
job rewards based on such external benefits as money, promotions, and
job security.
Here’s the SIMPLIS command file, saved in file LISJV1.LS8:
Single Factor LISREL Model with 4 Job Indicators (LISJV1.LS8)
Covariance Matrix From File JOBVALS.MAT
Sample Size = 1129
Latent Variables: Jobvalue
PROMOTN = 1*Jobvalue
Path Diagram
End of Problem
To run this job, launch LISREL 8.54, click “File” on the upper left toolbar,
then “Open”. In the window select LISJV1.LS8, then click “Open.”
Again click “File” on the upper left toolbar, then “Run LISREL” to
execute. The printout will be stored in a file named LISJV1.OUT, and a
path diagram in LISJV1.PTH.
NOTES: The first line is the job title
The Observed variables line lists all seven GSS variable names in their
exact order of occurrence in the covariance matrix previously created by
Covariance Matrix identifies the JOBVALS.MAT file where PRELIS stored
that covariance matrix.
Sample Size reports the number of observations used to compute the
covariances. (The listwise deletion in PRELIS found 1,129 cases with no
missing data on all seven variables.)
Latent Variables provides a name for the single unobserved factor.
Relationships is followed by a specification for the factor loadings to be
estimated. The observed variables’ names appear to the left hand side of
an equal sign and the factor name on the right hand side.
SCALING LATENT CONSTRUCTS: A latent construct is unobserved
and hence has no definite scale; that is, its origin and unit of
measurement are arbitrary. A researcher usually fixes the origin by
assuming a latent construct to have zero mean; LISREL automatically
does this unless otherwise instructed. The unit of measurement can be
scaled one of two ways: (1) Assume that a latent construct is
standardized to have a variance = 1.00; this is the LISREL default
option. (2) Assign a unit measure to the construct by fixing the factor
loading for one indicator to a nonzero value (typically = 1.00). This
method defines the latent construct scale in terms of an observed
reference variable, usually an indicator that the researcher believes
best represents the factor. I used the second procedure for scaling
constructs in this course. I chose PROMOTN as the reference indicator
for the unobserved “Jobvalue” factor (based on preliminary analyses
showing it to have the highest factor loading).
LISREL Output: SC MI requests a completely standardized solution and
modification indices.
End of Problem signals the termination of the model specification.
After five iterations, LISREL produced these maximum likelihood estimates
(MLE) of the parameters, with standard errors in parentheses, and t-ratios
in the third rows:
LISREL Estimates (Maximum Likelihood)
Measurement Equations
SECJOB = 0.56*Jobvalue, Errorvar.= 0.33 , R² = 0.25
(0.042) (0.016)
13.41 21.03
HIINC = 0.75*Jobvalue, Errorvar.= 0.39 , R² = 0.34
(0.051) (0.020)
14.69 19.33
PROMOTN = 1.00*Jobvalue, Errorvar.= 0.15 , R² = 0.70
INTJOB = 0.56*Jobvalue, Errorvar.= 0.28 , R² = 0.29
(0.040) (0.014)
13.99 20.44
The loading for the reference indicator, PROMOTN, was fixed to 1.00, so its
coefficient doesn’t have a standard error or t-test. This observed variable
has the largest proportion of variance explained by “Jobvalue” (70 percent)
suggesting that it was a good choice for fixing the latent construct’s scale.
The other three observed variables all have highly significant loadings on
the latent factor. But, each parameter estimate is smaller than the fixed
value for the reference indicator and their R-squares are also much
LISREL also draws a diagram corresponding to the model. To save it, click
“File” on the top toolbar, then “Export As Gif file (.gif)”. I inserted it on the
next page, and cropped the excess borders with MS Word’s
“Format/Picture” options.
The factor loadings are based on the covariances among the four
indicators, which are measured in the original 5-point scales. A completely
standardized factor solution rescales both the latent factor(s) and all
observed variables to have standard deviations equal to one (i.e.,
transformed to Z-scores). Therefore, all variances also equal one. This
rescaling produces parameter estimates that are proportional to the MLE
parameters. If no indicator’s scale is fixed to 1.00, then LISREL will
automatically set the latent construct’s variance = 1.00. (See the Jobvalue
for PHI in the output below.) In that case, all corresponding MLE and SC
parameters will be identical; why? [HINT: What is the relationship between
correlation and covariance?]
Here’s the completely standardized solution, which appears at the end of
the output:
Completely Standardized Solution
HIINC 0.58
-------- -------- -------- --------
0.75 0.66 0.30 0.71
These standardized factor loadings clearly reveal that PROMOTN has the
strongest relationship with the “Jobvalue” construct. Hence, PROMOTN is
that factor’s most reliable indicator: (.84)2
= 0.71. SECJOB and INTJOB
have the lowest factor loadings; what are their reliabilities? The variance
of Jobvalue equals 1.00 (reported in PHI), consistent with a standardized
The completely standardized solution reveals that the squared factor
loadings and squared error terms (reported in THETA-DELTA) jointly
account for all the variation in each indicator, as required in classical test
theory. That is, the sum of a squared factor loading plus its squared error
term = 1.00. For example, SECJOB = (0.50)2
+ (0.75) = 0.25 + 0.75 = 1.00.
What are these sums for the other three indicators?
To view a model diagram with the standardized coefficients, click “View”
on the top toolbar, then “Estimations” and “Standardized Solutions.”
In the figure below, I converted the LISREL error term values into path
coefficients. To measure all effects in a standard-deviation metric, take the
positive square root of each LISREL error:
Now the sum of the squared factor loading plus the squared error equals
1.0 for each indicator: PROMOTN = (0.55)2
+ (0.84)2
= 0.30 + 0.70 = 1.00
Calculate the expected correlations by multiplying pairs of factor loadings;
calculate the differences by subtracting the observed correlations:
Observed Variables Expected r Observed r Difference
SECJOB-HIINC (.50)(.58) = 0.290 0.326
SECJOB-PROMOTN (.50)(.84) = 0.408
SECJOB-INTJOB (.50)(.54) = 0.284
HIINC-PROMOTN (.58)(.84) = 0.489
HIINC-INTJOB (.58)(.54) = 0.272
PROMOTN-INTJOB (.84)(.54) = 0.458
The discrepancies are fairly small, implying that the unobserved Jobvalue
factor accounts reasonably well for most correlations among its four
indicators. We can more precisely assess how well a model fits the data
with several goodness of fit statistics generated by LISREL.
As with logistic regression, not only the individual parameters but an entire
LISREL model’s fit to the data can be assessed statistically. A specific
structural equation model implies an expected covariance matrix (or a
correlation matrix) for the k observed variables, Σ(θ), where θ is a vector of
parameters to be estimated. PRELIS uses the N sample cases to create the
observed covariance matrix, S, which LISREL then uses to estimate the
expected model parameters. LISREL fits the analyst’s hypothesized model
to the empirical data by minimizing a fit function F involving both matrices.
In matrix algebra notation, this function is:
F[S, Σ(θ)] = ln |Σ| + tr (SΣ-1
) - ln |S| + t
where t is the number of independent parameters estimated, and tr means
“trace” – the sum of the elements in a matrix diagonal. The F function is
non-negative and is zero only if a perfect fit occurs, that is, if S = Σ.
For a large sample N, multiplying F[S, Σ(θ)] by (N-1) yields a test statistic
that is approximately distributed as a χ2
with degrees of freedom equal to:
d = [k(k-1)/2] - t
where k is the number of observed indicators. Because d must be
nonnegative, the number of independent parameters to be estimated (t)
cannot be more than (k2
-k)/2. For example, if k = 5 indicators, what is the
maximum number of parameters that LISREL could estimate?
A researcher’s strategy for finding a best-fitting LISREL model involves
using the fit function to conduct chi-square tests on a series of nested
models with successive parameter constraints. Ideally, poorer-fitting
models will be rejected in favorable of alternative models yielding
improved fits to the data. The ultimate goal is to specify a best-fitting
LISREL that cannot be rejected, indicating that the hypothesized model’s
covariances matrix closely approximates the observed covariance matrix.
(This strategy is opposite to the conventional chi-square testing approach
for a crosstab, where the goal is to reject the null hypothesis of
independence between variables.)
To use the minimum fit function in chi-square tests, a researcher chooses
an α level of significance at which to reject an hypothesized model; for
example, by setting α = .05. If the model χ2
exceeds the (1 - α) percentile of
the chi-square distribution with d degrees of freedom, then that model
must be rejected as producing a poor fit to the observed variance-
covariance matrix. For example, a model is a poor fit if p < .05; but if p >
.05, then the model would have a fit acceptable to a researcher setting the
region of rejection at α = .05.
In practice, a researcher who wants to find an acceptable latent structure
model (i.e., not seeking to reject the model) hopes to obtain a low chi-
square value relative to the degrees of freedom. Because the minimum fit
function χ2
test statistic increases proportional to sample size, (N-1),
obtaining low chi-square values with large samples often proves difficult.
Many analysts come to regard chi-square more usefully as an overall of
“goodness-of-fit” measure rather than as a test statistic. That is, χ2
measures the distance (difference) between the sample covariance matrix
and the expected covariance matrix, (S- Σ). Jöreskog and Sörbom half-
jokingly refer to χ2
as a “badness of fit” measure in the sense that large
chi-square corresponds to a bad fit and low chi-square to a good fit. Zero
is a “perfect” fit.
LISREL prints several goodness-of-fit measures that are functions of chi-
square. Two measures that do not depend explicitly on sample size
measure how much better the specified model fits the data, compared to
no model at all. Both indices range between 0 and 1, with values closer to
1 indicating a better fit of model to data. Most researchers seek values of
0.95 or higher.
The goodness-of-fit index (GFI) is:
where the numerator is the minimum of the hypothesized model’s fit
function and the denominator is the fit function for a model whose
parameters all equal zero ( the null hypothesis model). (This latter model is
conceptually equivalent to the “constant only” equation in logistic
regression or EHA, used to calculate the model chi-square value for a
hypothesized equation.)
The adjusted goodness-of-fit index (AGFI) deflates the GFI by taking into
account the degrees of freedom consumed in estimating the parameters:
where k = the number of observed indicators and d is the model df.
Using chi-square as a test statistic assumes that the model holds exactly in
the population, an implausible assumption. Models that hold
approximately in the population will be rejected for large samples.
An alternative approach takes into account the errors of approximation in
the population and the precision of the fit measure. The estimated
population discrepancy function (PDF) is defined as:
0 −−= NdFMaxF
where Fˆ = the minimum value of the fit function and d = degrees of
Because the PDF usually decreases as additional parameters are added to
the model, the Root Mean Square Error of Approximation (RMSEA)
measures the discrepancy per degree of freedom:
dF /ˆ
A RMSEA value of ε ≤ 0.05 indicates a “close fit”, while values up to 0.08
indicate “reasonable” errors of approximation in the population. A 90-
percent confidence interval for RMSEA indicates whether the sample point
estimate falls into a range that also includes the 0.05 criterion.
Here are all the Goodness of Fit Statistics for the one-factor Jobvalue
model. Can you conclude that the one-factor model is a good fit?
Goodness of Fit Statistics
Degrees of Freedom = 2
Minimum Fit Function Chi-Square = 8.31 (P = 0.016)
Normal Theory Weighted Least Squares Chi-Square = 8.11 (P = 0.017)
Estimated Non-centrality Parameter (NCP) = 6.11
90 Percent Confidence Interval for NCP = (0.78 ; 18.91)
Minimum Fit Function Value = 0.0074
Population Discrepancy Function Value (F0) = 0.0054
90 Percent Confidence Interval for F0 = (0.00069 ; 0.017)
Root Mean Square Error of Approximation (RMSEA) = 0.052
90 Percent Confidence Interval for RMSEA = (0.019 ; 0.092)
P-Value for Test of Close Fit (RMSEA < 0.05) = 0.39
Expected Cross-Validation Index (ECVI) = 0.021
90 Percent Confidence Interval for ECVI = (0.017 ; 0.033)
ECVI for Saturated Model = 0.018
ECVI for Independence Model = 0.88
Chi-Square for Independence Model with 6 Degrees of Freedom = 987.81
Independence AIC = 995.81
Model AIC = 24.11
Saturated AIC = 20.00
Independence CAIC = 1019.92
Model CAIC = 72.34
Saturated CAIC = 80.29
Normed Fit Index (NFI) = 0.99
Non-Normed Fit Index (NNFI) = 0.98
Parsimony Normed Fit Index (PNFI) = 0.33
Comparative Fit Index (CFI) = 0.99
Incremental Fit Index (IFI) = 0.99
Relative Fit Index (RFI) = 0.97
Critical N (CN) = 1251.05
Root Mean Square Residual (RMR) = 0.0087
Standardized RMR = 0.018
Goodness of Fit Index (GFI) = 1.00
Adjusted Goodness of Fit Index (AGFI) = 0.98
Parsimony Goodness of Fit Index (PGFI) = 0.20
Although the chi-square just fails to exceed the p > .05 level of
significance, the other fit statistics suggest a quite acceptable fit of the
single-factor model to the data.
My initial effort to fit a single-factor model using all seven GSS job items
produced a terrible fit: Chi-Square = 828.1, df = 14, p <.000; GFI = 0.83;
AGFI = 0.65; and RMSEA = 0.23. However, the underlying attitude structure
may consist of two intercorrelated latent factors, each of which influences
the variation in different subsets of observed variables. Such a
specification could resemble this diagram:
After trying several alternative model specifications, I discovered that two
latent factors could plausibly account for the covariations among five of
the eight indicators: (1) a subset consisting of SECJOB HIINC PROMOTN;
and (2) another subset of HLPOTHS HLPSOC, which may reflect “intrinsic”
job rewards from helping others or accomplishing something worthwhile.
The new LISREL commands are:
Latent Variables: Jobval1 Jobval2
PROMOTN = 1*Jobval1
HLPOTHS = 1*Jobval2
HLPSOC = Jobval2
X1 X6X2 X7 X8X3 X4
Latent Variables assigns two distinct construct names, while Relationships
specifies the pair of reference indicators and identifies the other variables’
factor loadings. Here’s a diagram of that model specification:
The overall fit statistics indicate much better fit: Chi-Square = 27.7, df = 4, p
= .00; GFI = 0.99; AGFI = 0.96; and RMSEA = 0.073 (a “reasonable fit,” with
the 90% confidence interval from 0.049 to 0.099). Given a sample size of
more than a thousand, I would be tempted to stop trying to improve the fit.
However, I want to demonstrate how to use LISREL’s modification indexes
for clues about altering a model’s specification to fit the data better.
Modification Indexes
LISREL’s modification indexes are powerful diagnostic tools for identifying
which parameters might be added to a model (that is, set free rather than
constrained to equal 0). By adding “MI” to the “LISREL Output:” line,
modification values will be generated for every missing parameter. These
values are predictions about the decrease in model Chi-square that will
occur if a particular parameter were added to the model.
Here are two sets of MIs for the two-factor model above:
Jobval1 Jobval2
Modification Indices and Expected Change
Modification Indices for LAMBDA-X
Jobval1 Jobval2
-------- --------
SECJOB - - 0.03
HIINC - - 15.63
PROMOTN - - 11.71
HLPOTHS - - - -
HLPSOC - - - -
Modification Indices for THETA-DELTA
-------- -------- -------- -------- --------
HIINC 11.71 - -
PROMOTN 15.63 0.03 - -
HLPOTHS 7.63 4.45 13.81 - -
HLPSOC 9.55 0.65 2.19 - - - -
The MI for LAMBDA-X indicates that adding an arrow from Jobval2 to HIINC
should reduce Chi-square by 15.63. Similarly, three MI in THETA-DELTA,
indicate that correlating pairs of errors would improve Chi-square by more
than 10.00. Because I wanted to allow each indicator to load on just one
factor, I chose the latter respecification. Correlating two error terms will
use one of the four degrees of freedom, but should produce a much better
fit. Although the PROMOTN-SECJOB pair has the largest value, they are
indicators of the same unmeasured construct. My attempt to correlate
them produced some unusual estimates, so instead I correlated errors of
the PROMOTON-HLPOTHS indicators.
Because LISREL computes the MIs independently of one another, you
generally should make only one parameter change at a time. Then, use
the new MI results to decide which further changes to try.
Inserted before the last line, the SIMPLIS command to correlate the errors
of two indicators closely resembles natural language:
Let the Errors of SECJOB and PROMOTN Correlate
End of Problem
This new model’s fit statistics are better: Chi-Square = 13.9, df = 3, p < .003;
GFI = 1.00; AGFI = 0.98; RMSEA = 0.057 (“reasonable fit”), with 90% CI
from 0.029 to 0.089. So let’s examine the diagram with the completely
standardized values attached to the unconstrained parameters:
The small but significant correlated errors of HLPOTHS and PROMOTN
(0.08) suggest that their covariation arises from an additional unspecified
common source. PROMOTN is now clearly has the highest factor loading
on Jobval1 (and thus the highest reliability), while HLPSOC is the most
reliable Jobval2 indicator. My substantive interpretation is that the second
factor represents an “intrinsic rewards” dimension, in contrast to the
“extrinsic rewards” dimension of the first factor. The two latent factors
correlate moderately (0.32), indicating that respondents who report
extrinsic job rewards as important to them also tend to view intrinsic
values as important. What substantive interpretation might you venture
about the correlated errors?
LISREL also can be used to estimate several regression-like structural
equation models, in which one or more dependent variables are predicted
by several independent variables. Some or all of these variables may be
latent constructs with two or more observed indicators. Structural
equation models combine two conceptually distinct levels of analysis -- a
measurement level and a structural level.
In parallel to confirmatory factor analysis, the parameter estimates at the
measurement level show how well (or poorly) the observed variables serve
as indicators of the unobserved theoretical concepts. Parameters at the
structural level show the magnitudes and significance of the hypothesized
relations among the latent concepts. And, again in common with factor
analysis, the various goodness-of-fit statistics reveal how well the
combined measurement and structural equation models reproduce the
matrix of covariances among the indicators.
Our first example of a structural equation model is a Multiple Indicator-
Multiple Cause (MIMIC) model. This model’s relationships involve a latent
dependent variable, indicated by several observed measures, that is
predicted by a set of exogenous or predetermined variables, each of which
has just one indicator (see SSDA pp. 475-8). These predictors can be
termed “directly observed variables.” More complex models discussed
below have multiple indicators for both independent and dependent
The MIMIC example involves four indicators of attitudes towards the
federal government’s role in solving social problems, using 1998 GSS.
Each observed variable is measured on a five-point scale where: “I
strongly agree with [the governmental involvement position]” is 1, “I
strongly agree with [the individualist position]” is 5 and “I agree with both
answers” is 3. The four item wordings:
• HELPPOOR: I'd like to talk with you about issues some people tell us are important.
Please look at CARD AT. Some people think that the government in Washington should
do everything possible to improve the standard of living of all poor Americans; they are
at Point 1 on this card. Other people think it is not the government's responsibility, and
that each person should take care of himself; they are at Point 5.
• HELPNOT: Now look at CARD AU. Some people think that the government in
Washington is trying to do too many things that should be left to individuals and private
businesses. Others disagree and think that the government should do even more to solve
our country's problems. Still others have opinions somewhere in between.
• HELPSICK: Look at CARD AV. In general, some people think that it is the
responsibility of the government in Washington to see to it that people have help in
paying for doctors and hospital bills. Others think that these matters are not the
responsibility of the federal government and that people should take care of these things
• HELPBLK: Now look at CARD AW. Some people think that (Blacks/Negroes/African-
Americans) have been discriminated against for so long that the government has a special
obligation to help improve their living standards. Others believe that the government
should not be giving special treatment to (Blacks/Negroes/African-Americans).
The four single-indicator independent variables are AGE, POLVIEWS,
EDUC and WHITE (a 1-0 dichotomy from recoding RACE(1=1)(2,3=0)).
Here’s a diagram of the model specification to be estimated:
Arrows from the latent construct (Help) to the four indicators are the
measurement level of analysis, while the arrows to Help coming directly
from the four independent variables occur at the structural level.
After recoding missing values to –999 and writing a raw data file
(HELP.TXT), I used the following PRELIS commands to create the
covariance matrix for input to LISREL:
DATA NI=8 NO=2832 MI=-999 TR=LI
Note designation of WHITE as an ordinal variable. PRELIS distinguishes
only three levels of measurement – continuous, ordinal, and “censored” –
so any measures that you are unwilling to consider as continuous
(including dummy variables) should be labeled as ordinal. PRELIS will
compute: (1) Pearson product-moment correlations for pairs of continuous
variables; (2) polyserial correlations for a ordinal-continuous pair; and (3)
polychoric correlations for pairs of ordinal variables.* More below.
The covariance matrix (HELP.MAT), based on a pairwise average of about
1,654 cases:
-------- -------- -------- -------- ------- ------- -------- -------
HELPNOT 0.519 1.378
HELPPOOR 0.521 0.609 1.300
HELPSICK 0.448 0.519 0.577 1.451
EDUC -0.168 0.319 0.403 0.115 8.133
AGE 1.080 1.956 1.214 2.176 -7.113 289.143
POLVIEWS 0.351 0.382 0.371 0.394 -0.159 1.904 1.917
WHITE 0.379 0.369 0.307 0.231 0.529 3.246 0.170 1.000
-------- -------- -------- -------- ------- ------- -------- -------
Means 3.566 3.164 3.051 2.548 13.262 45.505 4.139 0.826
Std Dev 1.199 1.174 1.140 1.204 2.852 17.004 1.384 1.000
-------- -------- -------- -------- ------- ------- -------- -------
*For details, see Jöreskog and Sörbom. 1996. PRELIS2: User’s Reference Guide.
Chicago: SSI Scientific Software International. Pp. 18-25.
The LISREL commands are similar to a factor analysis, except the latent
dependent variable (Help) is regressed directly onto the four observed
independent variables (EDUC AGE POLVIEWS WHITE):
Observed variables:
Covariance Matrix From File: HELP.MAT
Sample Size: 1654
Latent Variables: Help
End of Problem
The minimum fit function chi-square test = 99.1 for 14 degrees of freedom,
which is not a good fit; however, the large sample size makes model
rejection relatively easy. Other statistics suggest a somewhat better fit:
GFI = 0.99 and AGFI = 0.96. Moreover, the root mean square error of
approximation (0.061) falls into the intermediate range of a “reasonable fit”
(the 90% confidence interval for RMSEA is 0.050 to 0.072).
A portion of the modification indices (MI):
Modification Indices for THETA-DELTA-EPS
-------- -------- -------- --------
EDUC 48.29 3.31 16.33 0.02
AGE 8.78 1.06 0.22 5.37
POLVIEWS 0.00 0.11 0.86 2.22
WHITE 34.56 1.89 9.41 14.49
which implies that the error term of HELPBLK correlates significantly with
both WHITE and EDUC and together may account for most of the model
To improve the model fit, I added these commands and re-ran the analysis:
Let the Errors between WHITE and HELPBLK Correlate
Let the Errors between EDUC and HELPBLK Correlate
End of Problem
The minimum fit function chi-square test now falls to 30.55 for 12 df, which
has a probability = .002. But the other fit statistics attained very desirable
values: GFI = 1.00, AGFI = 0.99, and RMSEA = 0.030, a “close” fit (90% CI
from 0.017 to 0.044).
Here is completely standardized solution:
At the measurement level, the indicators of the latent Help construct all
have highly significant factor loadings (p<.001), and roughly equal
At the structural level, three of the four path coefficients are highly
significant (p<.001), according to t-tests in the LISREL output. The AGE
effect (0.05) does not differ from zero at α= .05 for a two-tailed research
hypothesis. POLVIEWS and WHITE have much stronger standardized
effects (0.32 and 0.34 standard deviations) than EDUC (0.10) on the latent
Help construct. The predictors jointly explain about 27 percent of the
variation in the Help construct (R2
= .27).
High scores on the four social problems variables mean that respondents
prefer the individual or nongovernmental solutions to social problems.
Therefore, the positive path coefficients reveal that conservatives, whites,
and highly educated respondents were more likely to endorse such policy
The diagram does not show the two correlated error terms:
-------- -------- -------- --------
EDUC -0.44 - - - - - -
AGE - - - - - - - -
POLVIEWS - - - - - - - -
WHITE 0.11 - - - - - -
The error term for HELPBLK has significantly correlations with the error
terms for both EDUC (-0.44) and WHITE (0.11). This re-specification was
estimated mainly to demonstrate a technique for improving a model’s
statistical fit. However, I didn’t have a priori reasons for correlating these
error terms, while post hoc interpretations – e.g., classist or racist
stereotypical responses – are vulnerable to taking advantage of chance
occurrences in the sample.
A widespread application of LISREL involves models in which several or all
the latent constructs have multiple indicators. The simplest version is a
chain model, involving one independent and one dependent variable. The
example below modifies the preceding MIMIC model, by including a second
political measure (PARTYID) with POLVIEWS as indicators of unobserved
political ideology.
The LISREL commands:
Observed variables:
Covariance Matrix From File: CHAIN.MAT
Sample Size: 1615
Latent Variables: Help Politic
POLVIEWS = Politic
PARTYID = Politic
Help = Politic
End of Problem
In the Relationships commands, I did not specify that one indicator of the
latent Politic construct should serve as the reference variable. Instead,
LISREL automatically set the scale by standardizing the variance of Politic
to 1.00. To estimate the structural-level regression coefficient, use a
command involving the latent constructs: “Help = Politic”.
The minimum fit function χ2
= 15.9 for 8 df (p = .044). Other fit statistics --
GFI = 1.00; AGFI = 0.99; RMSEA = 0.025 -- indicate a very “close” fit. The
standard coefficients are all highly significant:
The structural parameter (0.63) indicates that a difference of one standard
deviation in political ideology accounts for a three-fifths standard deviation
difference in attitude towards the federal government’s role in solving
social problems. The positive sign means that more conservative
respondents favor more individualistic solutions.
The value of the error term for the Help construct (from the output, but not
shown in the diagram) is the square root of PSI: .77.060.0 = Show that the
sum of the squared structural parameter plus its squared error term
accounts for all the variance of Help.
For each indicator in the diagram, show that the squared factor loading
plus the (squared) error term accounts for all of that indicator’s variance:
Indicator λ2
+ θδ
= 1.00
Equality Constraints on Parameters
The chain model above shows that the estimated parameters from the
unobserved constructs to the indicators have roughly similar magnitudes
(0.55 to 0.69). LISREL allows an explicit statistical test for the hypothesis
that one parameter equals another in the population. That is:
jiH ββ =:0
Hypothesizing that a pair of “paths” are equal (rather than free to take on
independent values) requires that LISREL estimate just one parameter
instead of two. As a result, one degree of freedom now becomes available
to test whether the two models’ chi-square goodness-of-fit statistics differ
at a chosen alpha-level. If no significant difference occurs in the free and
constrained pair of parameters, then the more parsimonious version with
equal parameters is accepted (i.e., the model with fewer free parameters).
The LISREL commands for constraining paths equality have this syntax,
inserted before the “End of the Problem” line:
Set Path Help -> HELPSICK = Path Help -> HELPNOT
To constrain multiple parameters, continue the command:
Set Path Help -> HELPSICK = Path Help -> HELPNOT = Path Help -> HELPBLK
Because the HELPPOOR indicator is already used as the reference variable
for Help, I did not include it in the constraints
This table shows the chi-squares for alternative models constraining
several combinations of parameters among three Help indicators:
Equality constraints
1. No equality constraints (baseline model) 15.90 8
3. HELPNOT = HELPBLK 23.64 9
In comparison to the initial model with no equality constraints (#1),
specifying equal factor loadings doesn’t result in significantly worse fits to
the data. For example, the difference in χ2
for model #2 versus model 1 is
(17.84 – 15.90 = 1.94) for 1 df. The critical value at α = .05 is 3.84, so we
cannot reject the null hypothesis that these two parameters are equal in the
population. Perform a chi-square difference test for model #1 versus
model #3 and decide whether the latter is the most parsimonious model.
Researchers frequently want to compare structural equation models for
multiple (sub)groups, such as women vs. men, whites vs. blacks,
Republicans vs. Democrats. LISREL is a powerful tool for analyzing
multiple samples simultaneously, with some or all parameters constrained
to be equal across the groups. To illustrate this application, I compare the
preceding chain model for a subset of white and black respondents
(omitting the “other” race.) In a variation on PRELIS-LISREL procedures,
these commands how to enter separate matrices of correlations and
descriptive statistics into the command file, which LISREL will convert to
the respective covariances:
Observed variables:
Correlation Matrix:
.419 1.000
.190 .231 1.000
.226 .258 .450 1.000
.246 .242 .443 .350 1.000
.187 .232 .367 .337 .278 1.000
Means: 3.00 4.15 3.15 3.28 2.61 3.71
Standard Deviations: 1.92 1.40 1.11 1.13 1.21 1.14
Sample Size: 1282
Latent Variables: Help Politic
POLVIEWS = Politic
PARTYID = Politic
Help = Politic
Correlation Matrix:
.111 1.000
.008 .138 1.000
.137 -.009 .365 1.000
.082 .091 .328 .389 1.000
.153 .065 .299 .369 .335 1.000
Means: 1.43 3.93 2.46 2.62 2.16 2.73
Standard Deviations: 1.51 1.27 1.14 1.19 1.15 1.24
Sample Size: 260
End of Problem
My initial comparison involves setting exactly equal corresponding
parameters across the two groups. Because the Relationships commands
appearing in the white specification do not also appear for the black group,
LISREL assumes by default that all pairs of parameters must be
constrained to equality.
The minimum fit function χ2
= 94.88 for 29 df (p < .0001). Other fit statistics
are GFI = 0.93 (AGFI is not calculated) and RMSEA = 0.052 (“reasonable”),
90% CI from 0.040 to 0.64. The structural coefficient for the regression of
Help on Politic = 0.53 for both groups.
My next model specification freed all the factor loadings and structural
regression parameters, by repeating the same Relationships for the black
group that appeared in the white group. In addition, I also allowed these
indicators’ error terms to vary freely across groups, and for the error term
of the Help construct to take on different values, by inserting these lines at
the end of both subgroups’ commands:
Set the Error Variances of HELPPOOR-HELPBLK Free
Set the Error Variances of PARTYID-POLVIEWS Free
Set the Error Variance of Help Free
This completely free model has χ2
= 35.27 for 16 df (p < .0037); GFI = 0.99
and RMSEA = 0.038 (“close fit”), 90% CI from 0.020 to 0.056. Here are some
completely standardized parameter estimates for the two racial groups:
Parameters Blacks Whites
HELPPOOR .68 .68
HELPNOT .84 .60
HELPSICK .70 .58
HELPBLK .74 .49
PARTYID .34 .63
POLVIEWS .25 .70
Politic -> Help .33 .57
Help Error Covar .56 .74
The factor loadings for the two political indicators are substantially smaller
for blacks than for whites, perhaps reflecting the narrower span of black
political ideology. The estimated magnitude of the structural effect of
Politic on Help is much larger for whites (0.57) than for blacks (0.33).
Finally, to determine whether any parameters may be constrained to
equality without worsening the model fit, additional LISREL analyses can
be conducted that deleted a single Relationship line from just one
subgroup. For example, to test for racial equality of the structural effect, I
deleted the “Help = Politic” line from the black subgroup, which forces
LISREL to replicate all the relationships in the second group except the
deleted one. The result was a χ2
= 36.50 for 17 df, which does not differ
significantly from the model above (i.e., the difference in chi-squares is just
1.23 for one df). In other words, the population regression coefficients are
probably equal (both estimates = 0.56). The large standard errors due to
the small black sample size may have prevented rejection of this null
This section extends the preceding MIMIC and chain examples to a causal
model of the relationships among three unobservable constructs with
multiple indicators. I introduce some basic principles of path analysis (see
Chapter 11 in SSDA4 for more details). At the structural equation level, a
causal diagram is indispensable to displaying the hypothesized causal
effects among the latent constructs:
The diagram displays several path analytic principles:
BOX 11.1 Rules for Constructing Causal Diagrams
1. Variable names are represented either by short keywords or letters.
2. Variables placed to the left in a diagram are assumed to be causally
prior to those on the right.
3. Causal relationships between variables are represented by single-
headed arrows.
4. Variables assumed to be correlated but not causally related are
linked by a curved double-headed arrow.
5. Variables assumed to be correlated but not causally related should
be at the same point on the horizontal axis of the causal diagram.
6. The causal effect presumed between two variables is indicated by
placing + or - signs along the causal arrows to show how increases or
decreases in one variable affect the other.
• The model asserts that a respondent’s socioeconomic (SES) directly
causes both political ideology (Politic) and attitude about the
governmental role (Help).
• SES also indirectly affects Help, via its impact on Politic (e.g.,
transmitted via the compound product of the path from SES to
Politics times the path from Politic to Help). [See SSDA Chapter 11
for a detailed discussion of disaggregating the covariation between
any pair of variables into direct, indirect, and so-called correlated
• The unsourced arrows to the two endogenous variables (Politic and
Help) mean that other sources of their variation aren’t included in this
model. These residual effects presumably operate independently of
(i.e., are uncorrelated with) the explicitly included causes.
At the measurement equation level, each construct has two or more
observable indicators. The variables for Politic and Help are the same as
above; the two SES indicators as EDUC and INCOME98 (measured as
midpoints of the $000 ranges). Here are the LISREL commands:
Observed variables:
Covariance Matrix From File: PATH.MAT
Sample Size: 1396
Latent Variables: Help Politic SES
POLVIEWS = Politic
PARTYID = Politic
Help = Politic
Help = SES
Politic = SES
Path Diagram
End of Problem
Although the minimum fit function is much higher than desirable (χ2
143.58 for 24 df; p < .0001), the other statistics indicate a good fit -- GFI =
.98; AGFI = 0.96; RMSEA = 0.060 (“reasonable fit”), 90% CI from 0.051 to
0.069. All coefficients for the factor loadings and structural effects were all
highly significant.
The completely standardized path coefficients:
The factor loadings for Politic and Help are similar to those in the chain
model. The three SES indicators also have comparably high loadings. At
the structural level, the strongest path coefficient is from Politic to Help, at
0.60 standard deviations. The direct path from SES to Help is just 0.11, and
its indirect path via Politic is almost as strong: (0.14)(0.60) = 0. 08. Thus,
higher status persons are more politically conservative and support more
individualistic solutions to social policies.
To be estimable, a latent variable structural equation model must be
identified. Models are identified if one optimal (best) value exists for each
parameter whose value is not known. Models that are identified usually
converge to best estimates for these parameters. Models in which at least
one parameter does not have a unique solution are called “not identified”
or “underidentified.” Underidentification occurs when the specified
equations contain more unknowns to be estimated. To illustrate, does one
unique pair of values for X and Y solve this algebraic equation?
Y = 4 + 3X
However, if we add a second equation, the system of two equations
becomes “just identified”: given two equations with two unknowns,
unique X and Y values are easily calculated:
Y = 14 – 2X
(HINT: set the two righthand sides equal and solve for X.) In this example
illustrates a “just identified” model because it has exactly as many knowns
as unknowns and yields precise estimates.
If we include a third equation in the system, such as Y = 2 + 4X, this
“overidentified” system would allow us to solve for precise X and Y values
using three different pairs. However, if that third equation were Y = 2 + 2X,
would an exact solution be possible? SEM computer programs iteratively
calculate overidentified parameter estimates that minimize discrepancies
between the observed and expected covariance matrices, (S - Σ(θ)).
The known values in structural equation models are the observed
variances and covariances, while the unknowns are those model
parameters you allow to vary freely. For example, if we have 5 indicators,
the covariance matrix contains ((5)(5+1))/2 = 15 nonduplicate values.
Hence, a CFA or SEM would not be identified if you specify more than 15
free parameters to be estimated.
SEM programs usually provide information about a model’s identifiability.
If one or more parameters are not identified, the program will be unable to
produce standard errors for the parameter estimates. In such cases, you
should try to identify the model by placing additional constraints on
appropriate parameters (consistent with theory). For example, set some
parameter value(s) equal to 0, 1, or equal to another free parameter. Run
the model again to see whether it yields a complete set of estimates.
SEM experts disagree on whether computer programs can be trusted to
find all instances of nonidentification, particular for very complex models.
That is, programs sometimes produce standard errors for models that are
not identified. Purists argue that researchers should check whether both
the measurement and structural models are separately identified prior to
submitting a SEM model for computer estimation. Researchers can study
the necessary-and-sufficient rules and requirements for model
identification. Two good sources are: (1) Kenneth Bollen. 1989. Structural
Equations with Latent Variables. New York: Wiley. (2) David A. Kenney’s
Rules for Identification Webpage <http://w3.nai.net/~dakenny/identify.htm>.
For this course, identification is unlikely to be problematic for the types of
CFAs and SEMs that most students will estimate -- multiple-indicator
recursive models, in which every observed variable loads on only one
latent construct, and one indicator per construct is fixed to 1.0 to set the
latent factor’s scale.
If a LISREL analysis includes some or all variables measured at the ordinal
or discrete (dichotomous) level, computing a covariance or Pearson
correlation matrix from such scores and applying maximum likelihood
estimation (MLE) may lead to distorted parameter estimates and inaccurate
test statistics. Jöreskog and Sörbom recommend alternative correlation
coefficients and estimation methods for such situations.
An observed variable whose categories represent a set of ordered
categories might be viewed as a crude classification of an unobserved
(latent) continuous variable z* with a standard normal distribution. For
example, a low-medium-high measure X could be trichotomized at three
threshold values for z*:
X is scored 1 if z* ≤ α1
X is scored 2 if α1 < z* ≤ α2
X is scored 3 if α2 < z*
A variety of correlation coefficients can be calculated when one or both
observed variables are ordinal:
• Polychoric correlation coefficient for two ordinal variables assumes
their underlying continuous measures have a bivariate normal
• Tetrachoric correlation, a subtype of polychoric, is used for two
• Polyserial correlation involves an ordinal and a continuous variable,
and also assumes an underlying bivariate normal distribution
• Biserial correlation, a subtype of polyserial, is used for a
dichotomous and a continuous variable
To include an ordinal variable in a linear relationship, LISREL iteratively
computes polychoric and polyserial correlations not from the observed
scores but from the theoretical correlations of the underlying z* variables.
A matrix of estimated correlation coefficients is created from the separate
crosstabulations for every pair of observed continuous, ordinal, or
dichotomous variables.
As an alternative to MLE, LISREL obtains correct large-sample standard
errors and chi-square values using a weighted least squares (WLS)
estimation method. A weight matrix required for WLS is the inverse of an
estimated asymptotic covariance matrix (W) of polychoric and polyserial
correlations. This inversion will be performed by LISREL, based on input
of a W matrix generated by PRELIS and stored on the computer as a binary
file. PRELIS computes estimates of the asymptotic covariances of the
correlations when instructed:
The PM option instructs PRELIS to compute a matrix of polychoric
correlations when some or all variables have been declared as ordinal; SM
saves this correlation matrix in a first named file with extension “mat”; SA
saves the asymptotic covariance matrix in another file with extension
“acm”; and PA tells it to write the W matrix in the PRELIS output file.
To illustrate, I analyze the seven GSS2000 items on confidence in
institutions, whose responses were recoded into dichotomies (1 = a great
deal of confidence; 0 = only some or hardly any confidence):
165. I am going to name some institutions in this country. As far as the people running these
institutions are concerned, would you say you have a great deal of confidence, only some
confidence, or hardly any confidence at all in them?
CONFINAN: Banks and financial institutions
CONBUS: Major companies
CONCLERG: Organized religion
CONEDUC: Education
CONFED: Executive branch of the federal government
CONLABOR: Organized labor
CONMEDIC: Medicine
CONJUDGE: U.S. Supreme Court
CONSCI: Scientific Community
CONLEGIS: Congress
CONARMY: Military
The research question is whether a single factor or multiple factors are
required to represent the tetrachoric correlations among these
These SPSS commands recode all 13 indicators into dichotomies, replace
all missing values with –999, and write the raw datafile to
MISSING VALUES confinan to conarmy ().
RECODE confinan to conarmy (2,3=0)(1=1)(ELSE=-999).
FREQ VAR = confinan to conarmy.
WRITE OUTFILE = CONFIDE.TXT/confinan to conarmy (13F5.0).
These PRELIS commands estimate the asymptotic covariance matrix for
listwise deletion of cases:
DATA NI=13 NO=2817 MI=-999 TR=LI
Here’s a matrix of tetrachoric correlations among 8 indicators analyzed
below, based on N = 1,496 cases:
------- ------ ------- ------- ------- ------- ------ ------
FINAN 1.00
FED 0.33 1.00
PRESS 0.32 0.40 1.00
MEDIC 0.41 0.38 0.35 1.00
TV 0.32 0.32 0.61 0.41 1.00
JUDGE 0.39 0.52 0.37 0.35 0.23 1.00
SCI 0.43 0.33 0.22 0.45 0.15 0.55 1.00
LEGIS 0.49 0.72 0.47 0.41 0.41 0.68 0.40 1.00
In contrast, the Pearsonian correlation coefficients, which treat the
confidence dichotomies as continuous variables, are substantially smaller:
------- ------ ------- ------- ------- ------- ------ ------
FINAN 1.00
FED 0.18 1.00
PRESS 0.16 0.20 1.00
MEDIC 0.25 0.19 0.17 1.00
TV 0.16 0.14 0.33 0.19 1.00
JUDGE 0.25 0.28 0.19 0.22 0.11 1.00
SCI 0.27 0.17 0.10 0.30 0.07 0.36 1.00
LEGIS 0.26 0.45 0.24 0.20 0.19 0.38 0.20 1.00
For a single-factor model, LISREL automatically applies WLS when
instructed to read the asymptotic covariance matrix:
Observed variables:
Correlation Matrix from File: CONFIDE.MAT
Asymptotic Covariance Matrix from File: CONFIDE.ACM
Sample size: 1496
Latent Variables: Confide1
FINAN - ARMY = Confide1
Path Diagram
End of Problem
This single-factor model yielded an absurdly high χ2
= 2,063.1 (14 df; p <
.0001), and two other statistics indicated poor good fits -- GFI = 0.83; AGFI
= 0.76. However, RMSEA = 0.053 (“reasonable fit”), 90% CI from 0.048 to
I decide to estimate a multi-factor model, incrementally adding an indicator
and observing the change in fit statistics. After several trials, I concluded
that a model having three factors with 8 indicators and four correlated error
terms produced the best fit. Here are the commands:
Observed variables:
Correlation Matrix from File: CONFIDE.MAT
Asymptotic Covariance Matrix from File: CONFIDE.ACM
Sample size: 1946
Latent Variables: Confide1 Confide2 Confide3
TV PRESS = Confide2
Let the Errors between JUDGE and SCI Correlate
Let the Errors between MEDIC and PRESS Correlate
Let the Errors between FINAN and FED Correlate
Let the Errors between JUDGE and PRESS Correlate
Path Diagram
End of Problem
The factor loadings are high, reflecting the polychoric correlations on
which they are based. The three factors are strongly correlated: r12 = 0.59,
r23 = 0.59, and r13 = 0.71. As usual, the substantive meanings of latent
constructs can be inferred by the contents of the specific measures which
load on them. What dimensions, if any, do you conjecture?
The importance of using WLS in conjunction with the asymptotic
covariance matrix becomes evident when the preceding analysis uses only
the tetrachoric correlation matrix. The fit statistics are much worse: χ2
159.3 (13 df; p < .0001) and RMSEA = 0.087.
Continuing to examine how LISREL handles ordinal indicators, I next
estimate a structural equation model that treats all indicators as ordinal
variables. The data are from GSS98. The dependent variable is a latent
abortion attitude construct with three dichotomous indicators of legal
abortions for specific circumstances (1 = yes, 0 = no):
Please tell me whether or not you think it should be possible for a pregnant woman to obtain
a legal abortion if. . .
ABDEFECT: If there is a strong chance of serious defect in the baby?
ABHLTH: If the woman's own health is seriously endangered by the pregnancy?
ABRAPE: If she became pregnant as a result of rape?
The two independent constructs affecting variation in abortion attitude are:
(1) a political orientation construct with indicators PARTYID and
POLVIEWS, each having seven ordered categories; and (2) a religiosity
construct consisting of two items having six and eight ordered categories,
PRAY: About how often do you pray?
PRIVPRAY: How often do you pray privately in places other than at church or synagogue?
The structural model has a curved two-headed arrow to indicate the
antecedent constructs are correlated but not causally related:
After recoding the two religiosity indicators to make frequent praying the
high scores, I used SPSS to write the datafile. Next, these PRELIS
commands created the correlation matrix and ACM files:
The tetrachoric correlation matrix:
-------- -------- -------- ------- -------- -------- --------
ABHLTH 0.866 1.000
ABRAPE 0.802 0.859 1.000
PRAY -0.233 -0.289 -0.263 1.000
PRIVPRAY -0.295 -0.351 -0.303 0.824 1.000
PARTYID -0.243 -0.105 -0.166 0.075 0.124 1.000
POLVIEWS -0.068 -0.085 -0.120 -0.001 0.061 0.434 1.000
Here are the commands:
Observed variables:
Correlation Matrix from File: PRAY.MAT
Asymptotic Covariance Matrix from File: PRAY.ACM
Sample size: 598
Latent Variables: Abortion Prays Politics
Abortion = Prays
Abortion = Politics
Path Diagram
End of Problem
The path diagram with completely standardized estimates:
Notice that the error term for PRIVPRAY is negative (-0.02), a nonsensical
value. To constrain that parameter, at the cost of 1 df, include this LISREL
Let the Error Variance of PRIVPRAY Equal 0
What are your substantive interpretations about the relative impacts of
political orientation and religiosity on abortion attitudes?

