Mplus User Guide Ver - 7 - r6 - Web
Mplus User Guide Ver - 7 - r6 - Web
User’s Guide
Linda K. Muthén
Bengt O. Muthén
Following is the correct citation for this document:
Muthén, L.K. and Muthén, B.O. (1998-2012). Mplus User’s Guide. Seventh Edition.
Los Angeles, CA: Muthén & Muthén
The development of this software has been funded in whole or in part with Federal funds
from the National Institute on Alcohol Abuse and Alcoholism, National Institutes of
Health, under Contract No. N44AA52008 and Contract No. N44AA92009.
The new features that have been added between Version 6 and Version 7 would never
have been accomplished without two very important team members, Tihomir
Asparouhov and Thuy Nguyen. It may be hard to believe that the Mplus team has only
two programmers, but these two programmers are extraordinary. Tihomir has developed
and programmed sophisticated statistical algorithms to make the new modeling possible.
Without his ingenuity, they would not exist. His deep insights into complex modeling
issues and statistical theory are invaluable. Thuy has developed the post-processing
graphics module, the Mplus editor and language generator, and the Mplus Diagrammer
based on a framework designed by Delian Asparouhov. In addition, Thuy has
programmed the Mplus language and is responsible for keeping control of the entire code
which has grown enormously. Her unwavering consistency, logic, and steady and calm
approach to problems keep everyone on target. We feel fortunate to work with such a
talented team. Not only are they extremely bright, but they are also hard-working, loyal,
and always striving for excellence. Mplus Version 7 would not have been possible
without them.
Another important team member is Michelle Conn. Michelle was with us at the
beginning when she was instrumental in setting up the Mplus office and has been
managing the office for the past ten years. In addition, Michelle is responsible for
creating the pictures of the models in the example chapters of the Mplus User’s Guide.
She has patiently and quickly changed them time and time again as we have repeatedly
changed our minds. She is also responsible for keeping the website updated and
interacting with customers. She was the driving force behind the design of the new
shopping cart. With the vastly increased customer base, her efficiency in multi-tasking
and calm under pressure are much appreciated. Sarah Hastings recently joined the
Mplus team. She is responsible for testing the Graphics Module and the Mplus
Diagrammer in addition to providing assistance to Bengt. She has proven to be a
valuable team member.
We would also like to thank all of the people who have contributed to the development of
Mplus in past years. These include Stephen Du Toit, Shyan Lam, Damir Spisic, Kerby
Shedden, and John Molitor.
Initial work on Mplus was supported by SBIR contracts and grants from NIAAA that we
acknowledge gratefully. We thank Bridget Grant for her encouragement in this work.
Linda K. Muthén
Bengt O. Muthén
Los Angeles, California
September 2012
Introduction
CHAPTER 1
INTRODUCTION
1
CHAPTER 1
2
Introduction
A
f y
c u
B
Within
Between
3
CHAPTER 1
• Regression analysis
• Path analysis
• Exploratory factor analysis
• Confirmatory factor analysis
• Item response theory modeling
• Structural equation modeling
• Growth modeling
• Discrete-time survival analysis
• Continuous-time survival analysis
Special features available with the above models for all observed
outcome variables types are:
4
Introduction
5
CHAPTER 1
are models in the full modeling framework that can be estimated using
Mplus:
Most of the special features listed above are available for models with
both continuous and categorical latent variables. The following special
features are also available.
6
Introduction
Most of the special features listed above are available for modeling of
complex survey data.
7
CHAPTER 1
8
Introduction
The analysis model can be different from the data generation model. For
example, variables can be generated as categorical and analyzed as
continuous or generated as a three-class model and analyzed as a two-
9
CHAPTER 1
GRAPHICS
Mplus includes a dialog-based, post-processing graphics module that
provides graphical displays of observed data and analysis results
including outliers and influential observations.
DIAGRAMMER
The Diagrammer can be used to draw an input diagram, to automatically
create an output diagram, and to automatically create a diagram using an
Mplus input without an analysis or data. To draw an input diagram, the
Diagrammer is accessed through the Open Diagrammer menu option of
the Diagram menu in the Mplus Editor. The Diagrammer uses a set of
drawing tools and pop-up menus to draw a diagram. When an input
diagram is drawn, a partial input is created which can be edited before
the analysis. To automatically create an output diagram, an input is
10
Introduction
LTA CALCULATOR
Conditional probabilities, including latent transition probabilities, for
different values of a set of covariates can be computed using the LTA
Calculator. It is accessed by choosing LTA calculator from the Mplus
menu of the Mplus Editor.
LANGUAGE GENERATOR
Mplus includes a language generator to help users create Mplus input
files. The language generator takes users through a series of screens that
prompts them for information about their data and model. The language
generator contains all of the Mplus commands except DEFINE,
MODEL, PLOT, and MONTECARLO. Features added after Version 2
are not included in the language generator.
It is not necessary to read the entire User’s Guide before using the
program. A user may go straight to Chapter 2 for an overview of Mplus
and then to one of the example chapters.
11
CHAPTER 1
12
Getting Started With Mplus
CHAPTER 2
GETTING STARTED WITH Mplus
After Mplus is installed, the program can be run from the Mplus editor.
The Mplus Editor for Windows includes a language generator and a
graphics module. The graphics module provides graphical displays of
observed data and analysis results.
• TITLE
• DATA (required)
• VARIABLE (required)
• DEFINE
• ANALYSIS
• MODEL
• OUTPUT
• SAVEDATA
• PLOT
• MONTECARLO
The TITLE command is used to provide a title for the analysis. The
DATA command is used to provide information about the data set to be
analyzed. The VARIABLE command is used to provide information
about the variables in the data set to be analyzed. The DEFINE
command is used to transform existing variables and create new
variables. The ANALYSIS command is used to describe the technical
13
CHAPTER 2
The Mplus commands may come in any order. The DATA and
VARIABLE commands are required for all analyses. All commands
must begin on a new line and must be followed by a colon. Semicolons
separate command options. There can be more than one option per line.
The records in the input setup must be no longer than 90 columns. They
can contain upper and/or lower case letters and tabs.
14
Getting Started With Mplus
The second example shows the input file for a growth model with time-
invariant covariates. It illustrates the new simplified Mplus language for
specifying growth models.
The third example shows the input file for a latent class analysis with
covariates and a direct effect.
The fourth example shows the input file for a multilevel regression
model with a random intercept and a random slope varying across
clusters.
15
CHAPTER 2
The examples presented do not cover all models that can be estimated
using Mplus but do cover the major areas of modeling. They can be
seen as building blocks that can be put together as needed. For example,
a model can combine features described in an example from one chapter
with features described in an example from another chapter. Many
unique and unexplored models can therefore be created. In each chapter,
all commands and options for the first example are discussed. After that,
only the highlighted parts of each example are discussed.
For clarity, certain conventions are used in the input setups. Program
commands, options, settings, and keywords are written in upper case.
Information provided by the user is written in lower case. Note,
16
Getting Started With Mplus
however, that Mplus is not case sensitive. Upper and lower case can be
used interchangeably in the input setups.
For simplicity, the input setups for the examples are generic. Observed
continuous and censored outcome variable names start with a y;
observed binary or ordered categorical (ordinal), unordered categorical
(nominal), and count outcome variable names start with a u; time-to-
event variables in continuous-time survival analysis start with a t;
observed background variable names start with an x; observed time-
varying background variables start with an a; observed between-level
background variables start with a w; continuous latent variable names
start with an f; categorical latent variable names start with a c; intercept
growth factor names start with an i; and slope growth factor names and
random slope names start with an s or a q. Note, however, that variable
names are not limited to these choices.
The Mplus Base and Mixture Add-On program covers the analyses
described in Chapters 3, 5, 6, 7, 8, 11, 13, and parts of Chapters 4 and
12. The Mplus Base and Mixture Add-On program does not include
analyses with TYPE=TWOLEVEL, TYPE=THREELEVEL, or
TYPE=CROSSCLASSIFIED.
17
CHAPTER 2
The Mplus Base and Multilevel Add-On program covers the analyses
described in Chapters 3, 5, 6, 9, 11, 13, and parts of Chapters 4 and 12.
The Mplus Base and Multilevel Add-On program does not include
analyses with TYPE=MIXTURE.
The Mplus Base and Combination Add-On program covers the analyses
described in all chapters. There are no restrictions on the analyses that
can be requested.
18
Examples: Regression And Path Analysis
CHAPTER 3
EXAMPLES: REGRESSION AND
PATH ANALYSIS
All regression and path analysis models can be estimated using the
following special features:
19
CHAPTER 3
20
Examples: Regression And Path Analysis
available for the total sample, by group, by class, and adjusted for
covariates. The PLOT command includes a display showing a set of
descriptive statistics for each variable. The graphical displays can be
edited and exported as a DIB, EMF, or JPEG file. In addition, the data
for each graphical display can be saved in an external file for use by
another graphics program.
21
CHAPTER 3
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex3.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
22
Examples: Regression And Path Analysis
been selected for analysis. Because the scale of the dependent variable
is not specified, it is assumed to be continuous.
MODEL: y1 ON x1 x3;
The difference between this example and Example 3.1 is that the
dependent variable is a censored variable instead of a continuous
variable. The CENSORED option is used to specify which dependent
variables are treated as censored variables in the model and its
estimation, whether they are censored from above or below, and whether
a censored or censored-inflated model will be estimated. In the example
above, y1 is a censored variable. The b in parentheses following y1
indicates that y1 is censored from below, that is, has a floor effect, and
that the model is a censored regression model. The censoring limit is
determined from the data. The default estimator for this type of analysis
is a robust weighted least squares estimator. By specifying
ESTIMATOR=MLR, maximum likelihood estimation with robust
standard errors is used. The ON statement describes the censored
23
CHAPTER 3
The difference between this example and Example 3.1 is that the
dependent variable is a censored variable instead of a continuous
variable. The CENSORED option is used to specify which dependent
variables are treated as censored variables in the model and its
estimation, whether they are censored from above or below, and whether
a censored or censored-inflated model will be estimated. In the example
above, y1 is a censored variable. The bi in parentheses following y1
indicates that y1 is censored from below, that is, has a floor effect, and
that a censored-inflated regression model will be estimated. The
censoring limit is determined from the data.
24
Examples: Regression And Path Analysis
The difference between this example and Example 3.1 is that the
dependent variable is a binary or ordered categorical (ordinal) variable
instead of a continuous variable. The CATEGORICAL option is used to
specify which dependent variables are treated as binary or ordered
categorical (ordinal) variables in the model and its estimation. In the
example above, u1 is a binary or ordered categorical variable. The
program determines the number of categories. The ON statement
describes the probit regression of u1 on the covariates x1 and x3. The
default estimator for this type of analysis is a robust weighted least
squares estimator. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. An explanation of
the other commands can be found in Example 3.1.
The difference between this example and Example 3.1 is that the
dependent variable is a binary or ordered categorical (ordinal) variable
instead of a continuous variable. The CATEGORICAL option is used to
specify which dependent variables are treated as binary or ordered
categorical (ordinal) variables in the model and its estimation. In the
25
CHAPTER 3
The difference between this example and Example 3.1 is that the
dependent variable is an unordered categorical (nominal) variable
instead of a continuous variable. The NOMINAL option is used to
specify which dependent variables are treated as unordered categorical
variables in the model and its estimation. In the example above, u1 is a
three-category unordered variable. The program determines the number
of categories. The ON statement describes the multinomial logistic
regression of u1 on the covariates x1 and x3 when comparing categories
one and two of u1 to the third category of u1. The intercept and slopes
of the last category are fixed at zero as the default. The default estimator
for this type of analysis is maximum likelihood with robust standard
errors. The ESTIMATOR option of the ANALYSIS command can be
used to select a different estimator. An explanation of the other
commands can be found in Example 3.1.
where u1#1 refers to the first category of u1 and u1#2 refers to the
second category of u1. The categories of an unordered categorical
variable are referred to by adding to the name of the unordered
26
Examples: Regression And Path Analysis
categorical variable the number sign (#) followed by the number of the
category. This alternative specification allows individual parameters to
be referred to in the MODEL command for the purpose of giving starting
values or placing restrictions.
The difference between this example and Example 3.1 is that the
dependent variable is a count variable instead of a continuous variable.
The COUNT option is used to specify which dependent variables are
treated as count variables in the model and its estimation and whether a
Poisson or zero-inflated Poisson model will be estimated. In the
example above, u1 is a count variable that is not inflated. The ON
statement describes the Poisson regression of u1 on the covariates x1
and x3. The default estimator for this type of analysis is maximum
likelihood with robust standard errors. The ESTIMATOR option of the
ANALYSIS command can be used to select a different estimator. An
explanation of the other commands can be found in Example 3.1.
27
CHAPTER 3
The difference between this example and Example 3.1 is that the
dependent variable is a count variable instead of a continuous variable.
The COUNT option is used to specify which dependent variables are
treated as count variables in the model and its estimation and whether a
Poisson or zero-inflated Poisson model will be estimated. In the first
part of this example, a zero-inflated Poisson regression is estimated. In
the example above, u1 is a count variable. The i in parentheses
following u1 indicates that a zero-inflated Poisson model will be
estimated. In the second part of this example, a negative binomial model
is estimated.
28
Examples: Regression And Path Analysis
used to represent individuals who are able to assume values of zero and
above and individuals who are unable to assume any value except zero.
This approach allows the estimation of the probability of being in each
class and the posterior probabilities of being in each class for each
individual.
The difference between this part of the example and the first part is that
a regression for a count outcome using a negative binomial model is
estimated instead of a zero-inflated Poisson model. The negative
binomial model estimates a dispersion parameter for each of the
outcomes (Long, 1997; Hilbe, 2011).
29
CHAPTER 3
x1 y
x2 s
30
Examples: Regression And Path Analysis
In this example, theory specifies the following probabilities for the four
categories of an unordered categorical (nominal) variable: ½ + ¼ p, ¼
(1-p), ¼ (1-p), ¼ p, where p is a probability parameter to be estimated.
These restrictions on the category probabilities correspond to non-linear
constraints on the logit parameters for the categories in the multinomial
logistic model. This example is based on Dempster, Laird, and Rubin
(1977, p. 2).
31
CHAPTER 3
When two parameters are referred to using the same label, they are held
equal. The MODEL CONSTRAINT command is used to define linear
and non-linear constraints on the parameters in the model. The non-
linear constraint for the logits follows from the four probabilities given
above after some algebra. The default estimator for this type of analysis
is maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Example 3.1.
x1
y1
x2 y3
y2
x3
In this example, the path analysis model shown in the picture above is
estimated. The dependent variables in the analysis are continuous. Two
of the dependent variables y1 and y2 mediate the effects of the
covariates x1, x2, and x3 on the dependent variable y3.
32
Examples: Regression And Path Analysis
The difference between this example and Example 3.11 is that the
dependent variables are binary and/or ordered categorical (ordinal)
variables instead of continuous variables. The CATEGORICAL option
is used to specify which dependent variables are treated as binary or
ordered categorical (ordinal) variables in the model and its estimation.
In the example above, u1, u2, and u3 are binary or ordered categorical
variables. The program determines the number of categories for each
variable. The first ON statement describes the probit regressions of u1
and u2 on the covariates x1, x2, and x3. The second ON statement
describes the probit regression of u3 on the mediating variables u1 and
u2 and the covariate x2. The default estimator for this type of analysis is
a robust weighted least squares estimator. The ESTIMATOR option of
the ANALYSIS command can be used to select a different estimator. If
the maximum likelihood estimator is selected, the regressions are
33
CHAPTER 3
The difference between this example and Example 3.12 is that the Theta
parameterization is used instead of the default Delta parameterization.
In the Delta parameterization, scale factors for continuous latent
response variables of observed categorical dependent variables are
allowed to be parameters in the model, but residual variances for
continuous latent response variables are not. In the Theta
parameterization, residual variances for continuous latent response
variables of observed categorical dependent variables are allowed to be
parameters in the model, but scale factors for continuous latent response
variables are not. An explanation of the other commands can be found
in Examples 3.1 and 3.12.
34
Examples: Regression And Path Analysis
The difference between this example and Example 3.11 is that the
dependent variables are a combination of continuous and binary or
ordered categorical (ordinal) variables instead of all continuous
variables. The CATEGORICAL option is used to specify which
dependent variables are treated as binary or ordered categorical (ordinal)
variables in the model and its estimation. In the example above, y1 and
y2 are continuous variables and u1 is a binary or ordered categorical
variable. The program determines the number of categories. The first
ON statement describes the linear regressions of y1 and y2 on the
covariates x1, x2, and x3. The second ON statement describes the probit
regression of u1 on the mediating variables y1 and y2 and the covariate
x2. The default estimator for this type of analysis is a robust weighted
least squares estimator. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. If a maximum
likelihood estimator is selected, the regression for u1 is a logistic
regression. An explanation of the other commands can be found in
Example 3.1.
35
CHAPTER 3
The difference between this example and Example 3.11 is that the
dependent variables are a combination of censored, binary or ordered
categorical (ordinal), and unordered categorical (nominal) variables
instead of continuous variables. The CENSORED option is used to
specify which dependent variables are treated as censored variables in
the model and its estimation, whether they are censored from above or
below, and whether a censored or censored-inflated model will be
estimated. In the example above, y1 is a censored variable. The a in
parentheses following y1 indicates that y1 is censored from above, that
is, has a ceiling effect, and that the model is a censored regression
model. The censoring limit is determined from the data. The
CATEGORICAL option is used to specify which dependent variables
are treated as binary or ordered categorical (ordinal) variables in the
model and its estimation. In the example above, u1 is a binary or
ordered categorical variable. The program determines the number of
categories. The NOMINAL option is used to specify which dependent
variables are treated as unordered categorical (nominal) variables in the
model and its estimation. In the example above, u2 is a three-category
unordered variable. The program determines the number of categories.
36
Examples: Regression And Path Analysis
categories one and two of u2 to the third category of u2. The intercept
and slopes of the last category are fixed at zero as the default. The
default estimator for this type of analysis is maximum likelihood with
robust standard errors. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. An explanation of
the other commands can be found in Example 3.1.
where u2#1 refers to the first category of u2 and u2#2 refers to the
second category of u2. The categories of an unordered categorical
variable are referred to by adding to the name of the unordered
categorical variable the number sign (#) followed by the number of the
category. This alternative specification allows individual parameters to
be referred to in the MODEL command for the purpose of giving starting
values or placing restrictions.
37
CHAPTER 3
38
Examples: Regression And Path Analysis
39
CHAPTER 3
40
Examples: Regression And Path Analysis
Monte Carlo (MCMC) chain when the potential scale reduction (PSR)
convergence criterion (Gelman & Rubin, 1992) is used. Using a number
in parentheses, the BITERATIONS option specifies that a minimum of
30,000 and a maximum of the default of 50,000 iterations will be used.
The large minimum value is chosen to obtain a smooth plot.
41
CHAPTER 3
42
Examples: Exploratory Factor Analysis
CHAPTER 4
EXAMPLES: EXPLORATORY
FACTOR ANALYSIS
All EFA models can be estimated using the following special features:
• Missing data
• Complex survey data
• Mixture modeling
The default is to estimate the model under missing data theory using all
available data. The LISTWISE option of the DATA command can be
used to delete all observations from the analysis that have missing values
on one or more of the analysis variables. Corrections to the standard
errors and chi-square test of model fit that take into account
stratification, non-independence of observations, and unequal probability
of selection are obtained by using the TYPE=COMPLEX option of the
ANALYSIS command in conjunction with the STRATIFICATION,
43
CHAPTER 4
44
Examples: Exploratory Factor Analysis
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex4.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
45
CHAPTER 4
OUTPUT: MODINDICES;
The difference between this part of the example and the first part is that
an exploratory factor analysis for four factors is carried out using
exploratory structural equation modeling (ESEM). In the MODEL
command, the BY statement specifies that the factors f1 through f4 are
measured by the continuous factor indicators y1 through y12. The label
1 following an asterisk (*) in parentheses following the BY statement is
used to indicate that f1, f2, f3, and f4 are a set of EFA factors. When no
rotation is specified using the ROTATION option of the ANALYSIS
command, the default oblique GEOMIN rotation is used. The intercepts
and residual variances of the factor indicators are estimated and the
residuals are not correlated as the default. The variances of the factors
are fixed at one as the default. The factors are correlated under the
default oblique GEOMIN rotation. The results are the same as for the
four-factor EFA in the first part of the example.
46
Examples: Exploratory Factor Analysis
The difference between this example and Example 4.1 is that the factor
indicators are binary or ordered categorical (ordinal) variables instead of
continuous variables. Estimation of factor analysis models with binary
variables is discussed in Muthén (1978) and Muthén et al. (1997). The
CATEGORICAL option is used to specify which dependent variables
are treated as binary or ordered categorical (ordinal) variables in the
model and its estimation. In the example above, all twelve factor
indicators are binary or ordered categorical variables. Categorical
variables can be binary or ordered categorical. The program determines
the number of categories for each variable. The default estimator for
this type of analysis is a robust weighted least squares estimator. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. With maximum likelihood estimation, numerical
integration is used with one dimension of integration for each factor. To
reduce computational time with several factors, the number of
integration points per dimension can be reduced from the default of 7 for
exploratory factor analysis to as few as 3 for an approximate solution.
An explanation of the other commands can be found in Example 4.1.
47
CHAPTER 4
The difference between this example and Example 4.1 is that the factor
indicators are a combination of continuous, censored, binary or ordered
categorical (ordinal), and count variables instead of all continuous
variables. The CENSORED option is used to specify which dependent
variables are treated as censored variables in the model and its
estimation, whether they are censored from above or below, and whether
a censored or censored-inflated model will be estimated. In the example
above, y4, y5, and y6 are censored variables. The b in parentheses
indicates that they are censored from below, that is, have a floor effect,
and that the model is a censored regression model. The censoring limit
is determined from the data. The CATEGORICAL option is used to
specify which dependent variables are treated as binary or ordered
categorical (ordinal) variables in the model and its estimation. In the
example above, the factor indicators u1, u2, and u3 are binary or ordered
categorical variables. The program determines the number of categories
for each variable. The COUNT option is used to specify which
dependent variables are treated as count variables in the model and its
estimation and whether a Poisson or zero-inflated Poisson model will be
estimated. In the example above, u4, u5, and u6 are count variables.
The variables y1, y2, and y3 are continuous variables.
48
Examples: Exploratory Factor Analysis
49
CHAPTER 4
50
Examples: Exploratory Factor Analysis
the between part of the model. In both parts of the model, one- and two-
factors solutions and an unrestricted solution will be obtained. The
unrestricted solution for the within part of the model is specified by UW
and the unrestricted solution for the between part of the model is
specified by UB. The within and between specifications are crossed.
Factor solutions will be obtained for one factor within and one factor
between, two factors within and one factor between, unrestricted within
and one factor between, one factor within and unrestricted between, and
two factors within and unrestricted between. Rotations are not given for
unrestricted solutions. The default rotation is the oblique rotation of
GEOMIN. The ROTATION option of the ANALYSIS command can be
used to select a different rotation. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Example 4.1.
The difference between this example and Example 4.5 is that there is a
combination of individual-level categorical factor indicators and
between-level continuous factor indicators. The exploratory factor
analysis structure for the within part of the model includes only the
individual-level factor indicators whereas the exploratory factor analysis
structure for the between part of the model includes the between part of
the individual-level factor indicators and the between-level factor
51
CHAPTER 4
indicators. Rotated solutions with standard errors are obtained for both
the within and between parts of the model.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level and modeled only on the between
level. Variables not mentioned on the WITHIN or the BETWEEN
statements are measured on the individual level and can be modeled on
both the within and between levels. The default rotation is the oblique
rotation of GEOMIN. The ROTATION option of the ANALYSIS
command can be used to select a different rotation. The default
estimator for this type of analysis is a robust weighted least squares
estimator using a diagonal weight matrix (Asparouhov & Muthén, 2007).
The ESTIMATOR option of the ANALYSIS command can be used to
select a different estimator. The SWMATRIX option of the
SAVEDATA command is used with TYPE=TWOLEVEL and weighted
least squares estimation to specify the name and location of the file that
contains the within- and between-level sample statistics and their
corresponding estimated asymptotic covariance matrix. It is
recommended to save this information and use it in subsequent analyses
along with the raw data to reduce computational time during model
estimation. An explanation of the other commands can be found in
Examples 4.1, 4.3, and 4.5.
52
Examples: Exploratory Factor Analysis
53
CHAPTER 4
54
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
CHAPTER 5
EXAMPLES: CONFIRMATORY
FACTOR ANALYSIS AND
STRUCTURAL EQUATION
MODELING
55
CHAPTER 5
All CFA, MIMIC and SEM models can be estimated using the following
special features:
56
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
57
CHAPTER 5
58
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
59
CHAPTER 5
y1
f1 y2
y3
y4
f2 y5
y6
60
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
TITLE: this is an example of a CFA with
continuous factor indicators
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex5.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
MODEL: f1 BY y1-y3;
f2 BY y4-y6;
61
CHAPTER 5
The difference between this example and Example 5.1 is that the factor
indicators are binary or ordered categorical (ordinal) variables instead of
continuous variables. The CATEGORICAL option is used to specify
which dependent variables are treated as binary or ordered categorical
(ordinal) variables in the model and its estimation. In the example
above, all six factor indicators are binary or ordered categorical
variables. The program determines the number of categories for each
factor indicator. The default estimator for this type of analysis is a
robust weighted least squares estimator (Muthén, 1984; Muthén, du Toit,
& Spisic, 1997). With this estimator, probit regressions for the factor
indicators regressed on the factors are estimated. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Example 5.1.
62
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
The difference between this example and Example 5.1 is that the factor
indicators are a combination of binary or ordered categorical (ordinal)
and continuous variables instead of all continuous variables. The
CATEGORICAL option is used to specify which dependent variables
are treated as binary or ordered categorical (ordinal) variables in the
model and its estimation. In the example above, the factor indicators u1,
u2, and u3 are binary or ordered categorical variables whereas the factor
indicators y4, y5, and y6 are continuous variables. The program
determines the number of categories for each factor indicator. The
default estimator for this type of analysis is a robust weighted least
squares estimator. With this estimator, probit regressions are estimated
for the categorical factor indicators, and linear regressions are estimated
for the continuous factor indicators. The ESTIMATOR option of the
ANALYSIS command can be used to select a different estimator. With
maximum likelihood estimation, logistic regressions are estimated for
the categorical dependent variables using a numerical integration
algorithm. Note that numerical integration becomes increasingly more
computationally demanding as the number of factors and the sample size
increase. An explanation of the other commands can be found in
Example 5.1.
63
CHAPTER 5
The difference between this example and Example 5.1 is that the factor
indicators are a combination of censored and count variables instead of
all continuous variables. The CENSORED option is used to specify
which dependent variables are treated as censored variables in the model
and its estimation, whether they are censored from above or below, and
whether a censored or censored-inflated model will be estimated. In the
example above, y1, y2, and y3 are censored variables. The a in
parentheses following y1-y3 indicates that y1, y2, and y3 are censored
from above, that is, have ceiling effects, and that the model is a censored
regression model. The censoring limit is determined from the data. The
COUNT option is used to specify which dependent variables are treated
as count variables in the model and its estimation and whether a Poisson
or zero-inflated Poisson model will be estimated. In the example above,
u4, u5, and u6 are count variables. Poisson regressions are estimated for
the count dependent variables and censored regressions are estimated for
the censored dependent variables.
64
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
65
CHAPTER 5
is fixed at one as the default to define the metric of the factor. Instead
the metric of the factor is defined by fixing the factor variance at one in
line with IRT. For one-factor models with no covariates, results are
presented both in a factor model parameterization and in a conventional
IRT parameterization. The OUTPUT command is used to request
additional output not included as the default. The TECH1 option is used
to request the arrays containing parameter specifications and starting
values for all free parameters in the model. The TECH8 option is used
to request that the optimization history in estimating the model be
printed in the output. TECH8 is printed to the screen during the
computations as the default. TECH8 screen printing is useful for
determining how long the analysis takes. The PLOT command is used to
request graphical displays of observed data and analysis results. These
graphical displays can be viewed after the analysis is completed using a
post-processing graphics module. Item characteristic curves and
information curves are available. When covariates are included in the
model with direct effects on one or more factor indicators, item
characteristic curves can be plotted for each value of the covariate to
show differential item functioning (DIF). An explanation of the other
commands can be found in Example 5.1.
66
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
f1 f2 f3 f4
f5
The first four BY statements specify that f1 is measured by y1, y2, and
y3; f2 is measured by y4, y5, and y6; f3 is measured by y7, y8, and y9;
and f4 is measured by y10, y11, and y12. The fifth BY statement
specifies that the second-order factor f5 is measured by f1, f2, f3, and f4.
The metrics of the first- and second-order factors are set automatically
by the program by fixing the first factor loading in each BY statement to
1. This option can be overridden. The intercepts and residual variances
of the first-order factor indicators are estimated and the residuals are not
correlated as the default. The residual variances of the first-order factors
are estimated as the default. The residuals of the first-order factors are
not correlated as the default. The variance of the second-order factor is
estimated as the default. The default estimator for this type of analysis
is maximum likelihood. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. An explanation of
the other commands can be found in Example 5.1.
67
CHAPTER 5
68
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1
x1 f1 y2
y3
x2
y4
x3 f2 y5
y6
In this example, the CFA model with covariates (MIMIC) shown in the
figure above is estimated. The two factors are regressed on three
covariates.
69
CHAPTER 5
The first BY statement specifies that f1 is measured by y1, y2, and y3.
The second BY statement specifies that f2 is measured by y4, y5, and y6.
The metric of the factors is set automatically by the program by fixing
the first factor loading in each BY statement to 1. This option can be
overridden. The intercepts and residual variances of the factor
indicators are estimated and the residuals are not correlated as the
default. The residual variances of the factors are estimated as the
default. The residuals of the factors are correlated as the default because
residuals are correlated for latent variables that do not influence any
other variable in the model except their own indicators. The ON
statement describes the linear regressions of f1 and f2 on the covariates
x1, x2, and x3. The ESTIMATOR option of the ANALYSIS command
can be used to select a different estimator. An explanation of the other
commands can be found in Example 5.1.
70
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1a
f1 y1b
y1c
y2a
f2 y2b
y2c
In this example, the CFA model in which two factors are measured by
three equivalent tests forms shown in the picture above is estimated.
The three equivalent test forms are referred to as a, b, and c.
To reflect the hypothesis that the three test forms are equivalent with
respect to their measurement intercepts, the first bracket statement
specifies that the intercepts for y1a, y1b, and y1c are equal and the
71
CHAPTER 5
second bracket statement specifies that the intercepts for y2a, y2b, and
y2c are equal. Equalities are designated by a number in parentheses. All
parameters in a statement followed by the same number in parentheses
are held equal. The means of the two factors are fixed at zero as the
default. The default estimator for this type of analysis is maximum
likelihood. The ESTIMATOR option of the ANALYSIS command can
be used to select a different estimator. An explanation of the other
commands can be found in Example 5.1.
The difference between this example and Example 5.9 is that the factor
indicators are binary or ordered categorical (ordinal) variables instead of
continuous variables. The CATEGORICAL option is used to specify
which dependent variables are treated as binary or ordered categorical
(ordinal) variables in the model and its estimation. In the example
above, all six factor indicators are binary or ordered categorical
variables. The program determines the number of categories for each
factor indicator. In this example, it is assumed that the factor indicators
are binary variables with one threshold each.
72
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
that the three test forms are equivalent with respect to their measurement
thresholds, the (1) after the first bracket statement specifies that the
thresholds for u1a, u1b, and u1c are constrained to be equal and the (2)
after the second bracket statement specifies that the thresholds for u2a,
u2b, and u2c are constrained to be equal. The default estimator for this
type of analysis is a robust weighted least squares estimator. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. With maximum likelihood, logistic regressions are
estimated using a numerical integration algorithm. Note that numerical
integration becomes increasingly more computationally demanding as
the number of factors and the sample size increase. An explanation of
the other commands can be found in Examples 5.1 and 5.9.
73
CHAPTER 5
y1
f1
y2
y3
f3 f4
y4
y5
f2
y6
In this example, the SEM model with four continuous latent variables
shown in the picture above is estimated. The factor indicators are
continuous variables.
74
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
In the IND statement above, the variable on the left-hand side of IND is
the dependent variable. The last variable on the right-hand side of IND
is the independent variable. Other variables on the right-hand side of
IND are mediating variables. The IND statement requests the specific
indirect effect from f1 to f3 to f4. The default estimator for this type of
analysis is maximum likelihood. The ESTIMATOR option of the
ANALYSIS command can be used to select a different estimator. An
explanation of the other commands can be found in Examples 5.1 and
5.11.
75
CHAPTER 5
y1
f1
y2
y3
f3 f4
y4
y5
f2
y6
76
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
77
CHAPTER 5
The difference between this example and Example 5.8 is that this is a
multiple group rather than a single group analysis. The GROUPING
option is used to identify the variable in the data set that contains
information on group membership when the data for all groups are
stored in a single data set. The information in parentheses after the
grouping variable name assigns labels to the values of the grouping
variable found in the data set. In the example above, observations with g
equal to 1 are assigned the label male, and individuals with g equal to 2
are assigned the label female. These labels are used in conjunction with
the MODEL command to specify model statements specific to each
group.
78
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
The difference between this example and Example 5.14 is that means are
included in the model. In multiple group analysis, when a model
includes a mean structure, both the intercepts and factor loadings of the
continuous factor indicators are held equal across groups as the default
to specify measurement invariance. The intercepts of the factors are
fixed at zero in the first group and are free to be estimated in the other
groups as the default. The group-specific MODEL command for
females specifies that the intercept of y3 for females is free and not
equal to the intercept for males. Intercepts are referred to by using
square brackets. The default estimator for this type of analysis is
79
CHAPTER 5
The difference between this example and Example 5.15 is that the factor
indicators are binary or ordered categorical (ordinal) variables instead of
continuous variables. For multiple-group CFA with categorical factor
indicators, see Muthén and Christoffersson (1981) and Muthén and
Asparouhov (2002).
80
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
Because the factor indicators are categorical, scale factors are required
for multiple group analysis when the default Delta parameterization is
used. Scale factors are referred to using curly brackets ({}). By default,
scale factors are fixed at one in the first group and are free to be
estimated in the other groups. When a threshold and a factor loading for
a categorical factor indicator are free across groups, the scale factor for
that variable must be fixed at one in all groups for identification
purposes. Therefore, the scale factor for u3 is fixed at one for females.
The default estimator for this type of analysis is a robust weighted least
squares estimator. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. With maximum
likelihood, logistic regressions are estimated using a numerical
integration algorithm. Note that numerical integration becomes
increasingly more computationally demanding as the number of factors
and the sample size increase. An explanation of the other commands can
be found in Examples 5.1, 5.8, 5.14, and 5.15.
81
CHAPTER 5
The difference between this example and Example 5.16 is that the Theta
parameterization is used instead of the Delta parameterization. In the
Delta parameterization, scale factors are allowed to be parameters in the
model, but residual variances for latent response variables of observed
categorical dependent variables are not. In the alternative Theta
parameterization, residual variances for latent response variables are
allowed to be parameters in the model but scale factors are not. The
Theta parameterization is selected by specifying
PARAMETERIZATION=THETA in the ANALYSIS command.
When the Theta parameterization is used, the residual variances for the
latent response variables of the observed categorical dependent variables
are fixed at one in the first group and are free to be estimated in the other
groups as the default. When a threshold and a factor loading for a
categorical factor indicator are free across groups, the residual variance
for the variable must be fixed at one in these groups for identification
purposes. In the group-specific MODEL command for females, the
residual variance for u3 is fixed at one. An explanation of the other
commands can be found in Examples 5.1, 5.8, 5.14, 5.15, and 5.16.
82
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1 y2
A1 C1 E1 A2 C2 E2
In this example, the univariate twin model shown in the picture above is
estimated. This is a two-group twin model for a continuous outcome
where factors represent the ACE components (Neale & Cardon, 1992).
83
CHAPTER 5
The WITH statement for the A factors is used to fix the covariance
(correlation) between the A factors to 1.0 for monozygotic twin pairs.
The group-specific MODEL command is used to fix the covariance
between the A factors to 0.5 for the dizygotic twin pairs. The WITH
statement for the C factors is used to fix the covariance between the C
factors to 1. The default estimator for this type of analysis is maximum
likelihood. The ESTIMATOR option of the ANALYSIS command can
be used to select a different estimator. An explanation of the other
commands can be found in Examples 5.1 and 5.14.
84
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
The difference between this example and Example 5.18 is that the
outcomes are binary or ordered categorical instead of continuous
variables. Because of this, the outcomes have no freely estimated
residual variances and therefore the E factors are not part of the model.
With categorical outcomes, the twin model is formulated for normally-
distributed latent response variables underlying the categorical outcomes
which are also called liabilities. This model is referred to as the
threshold model for liabilities (Neale & Cardon, 1992). More complex
examples of such models are given in Prescott (2004). A simpler
alternative way of specifying this model is shown in Example 5.22
where parameter constraints are used instead of the A and C factors.
85
CHAPTER 5
Because the outcomes are categorical, scale factors are required for
multiple group analysis when the default Delta parameterization is used.
Scale factors are referred to using curly brackets ({}). By default, scale
factors are fixed at one in the first group and are free to be estimated in
the other groups. In this model where the variance contributions from
the A and C factors are assumed equal across the two groups, the scale
factors are fixed at one in both groups to represent the equality of
variance for latent response variables underlying u1 and u2. The
statement in curly brackets in the group-specific MODEL command
specifies that the scale factors are fixed at one. The variance
contribution from the E factor is a remainder obtained by subtracting the
variance contributions of the A and C factors from the unit variance of
the latent response variables underlying u1 and u2. These are obtained
as part of the STANDARDIZED option of the OUTPUT command.
The default estimator for this type of analysis is a robust weighted least
squares estimator. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. With maximum
likelihood and categorical factor indicators, numerical integration is
required. Note that numerical integration becomes increasingly more
computationally demanding as the number of factors and the sample size
increase. An explanation of the other commands can be found in
Examples 5.1, 5.14, and 5.18.
86
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
87
CHAPTER 5
In the MODEL command, labels are defined for twelve parameters. The
list function can be used when assigning labels to a list of parameters.
The labels lam2, lam3, lam5, and lam6 are assigned to the factor
loadings for y2, y3, y5, and y6. The labels vf1 and vf2 are assigned to
the factor variances for f1 and f2. The labels ve1, ve2, ve3, ve4, ve5,
and ve6 are assigned to the residual variances of y1, y2, y3, y4, y5, and
y6.
88
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1 y2
In this example, the model shown in the picture above is estimated using
parameter constraints. The model estimated is the same as the model in
Example 5.18.
In the MODEL command, labels are defined for three parameters. The
label var is assigned to the variances of y1 and y2. Because they are
given the same label, these parameters are held equal. In the overall
MODEL command, the label covmz is assigned to the covariance
between y1 and y2 for the monozygotic twins. In the group-specific
MODEL command, the label covdz is assigned to the covariance
between y1 and y2 for the dizygotic twins.
89
CHAPTER 5
The difference between this example and Example 5.21 is that the
outcomes are binary or ordered categorical instead of continuous
variables. Because of this, the outcomes have no freely estimated
residual variances. The ACE variance and covariance restrictions are
placed on normally-distributed latent response variables underlying the
categorical outcomes which are also called liabilities. This model is
referred to as the threshold model for liabilities (Neale & Cardon, 1992).
The model estimated is the same as the model in Example 5.19.
90
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
The default estimator for this type of analysis is a robust weighted least
squares estimator. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. With maximum
likelihood, logistic or probit regressions are estimated using a numerical
integration algorithm. Note that numerical integration becomes
increasingly more computationally demanding as the number of factors
and the sample size increase. An explanation of the other commands can
be found in Examples 5.1, 5.14, 5.19 and 5.21.
y1 y2
pihat
91
CHAPTER 5
In this example, the model shown in the picture above is estimated. This
is a QTL model for two siblings (Marlow et al. 2003; Posthuma et al.
2004) for continuous outcomes where parameter constraints are used to
represent the A, E, and Q components. The A component represents the
additive genetic effects which correlate 0.5 for siblings. The E
component represents uncorrelated environmental effects. The Q
component represents a quantitative trait locus (QTL). The observed
variable pihat contains the estimated proportion alleles shared identity-
by-descent (IBD) by the siblings and moderates the effect of the Q
component on the covariance between the outcomes.
In the MODEL command, the (1) following the first bracket statement
specifies that the intercepts of y1 and y2 are held equal across the two
siblings. In addition, labels are defined for two parameters. The label
var is assigned to the variances of y1 and y2. Because they are given the
same label, these parameters are held equal. The label cov is assigned to
the covariance between y1 and y2.
92
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1
y2
x1 f1 y3
y4
y5
x2 f2 y6
y7
y8
93
CHAPTER 5
94
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1
y2 f1
y3
f3 f4
y4
y5 f2
y6
In this example, the SEM with EFA and CFA factors with continuous
factor indicators shown in the picture above is estimated. This is an
exploratory structural equation model (ESEM; Asparouhov & Muthén,
2009a). The factors f1 and f2 are EFA factors which have the same
factor indicators. Unlike CFA, no factor loadings are fixed at zero.
Instead, the four restrictions on the factor loadings, factor variances, and
factor covariances necessary for identification are imposed by rotating
95
CHAPTER 5
the factor loading matrix and fixing the factor variances at one. The
factors f3 and f4 are CFA factors.
96
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
f1 f2 f3 f4
In this example, the EFA at two time points with factor loading
invariance and correlated residuals across time shown in the picture
above is estimated. This is an exploratory structural equation model
(ESEM; Asparouhov & Muthén, 2009a). The factor indicators y1
through y6 and y7 through y12 are the same variables measured at two
time points. The factors f1 and f2 are one set of EFA factors which have
the same factor indicators and the factors f3 and f4 are a second set of
EFA factors which have the same factor indicators. Unlike CFA, no
factor loadings are fixed at zero in either set. Instead, for each set, the
four restrictions on the factor loadings, factor variances, and factor
covariances necessary for identification are imposed by rotating the
factor loading matrix and fixing the factor variances at one at the first
time point. For the other time point, factor variances are free to be
97
CHAPTER 5
For EFA factors, the intercepts and residual variances of the factor
indicators are estimated and the residuals are not correlated as the
default. The intercepts are not held equal across time as the default.
The means of the factors are fixed at zero at both time points and the
variances of the factors are fixed at one as the default. In this example
because the factor loadings are constrained to be equal across time, the
factor variances are fixed at one at the first time point and are free to be
estimated at the other time point. The factors are correlated as the
default under the oblique GEOMIN rotation. The PWITH statement
specifies that the residuals for each factor indicator are correlated over
time. The default estimator for this type of analysis is maximum
likelihood. The ESTIMATOR option of the ANALYSIS command can
be used to select a different estimator. An explanation of the other
commands can be found in Example 5.1.
98
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
f1 f2
99
CHAPTER 5
100
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
MODEL: f1-f2 by y1-y10 (*1);
The default in multiple group EFA is that the factor means are fixed to
zero in the first group and are free to be estimated in the other groups.
The bracket statement in the MODEL command specifies that the factor
means are fixed at zero in both groups.
101
CHAPTER 5
102
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
In the MODEL command, the BY statement specifies that the factors fg,
f1, and f2 are measured by the continuous factor indicators y1 through
y10. The factor fg is a general factor and f1 and f2 are specific factors.
The label 1 following an asterisk (*) in parentheses following the BY
statement is used to indicate that fg, f1, and f2 are a set of EFA factors.
The intercepts and residual variances of the factor indicators are
estimated and the residuals are not correlated as the default. The
variances of the factors are fixed at one as the default. In the OUTPUT
command, the STDY option is chosen for standardization with respect to
y. This puts the results in the metric of an EFA. The default estimator
for this type of analysis is maximum likelihood. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
103
CHAPTER 5
104
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
of the factor indicators are estimated and the residuals are not correlated
as the default. In the OUTPUT command, the STDY option is chosen
for standardization with respect to y. This puts the results in the metric
of an EFA. The default estimator for this type of analysis is maximum
likelihood. The ESTIMATOR option of the ANALYSIS command can
be used to select a different estimator. An explanation of the other
commands can be found in Example 5.1.
In this example, a bi-factor CFA with two items loading on only the
general factor and cross-loadings with zero-mean and small-variance
priors is carried out using the Bayes estimator. This is a Bayesian
structural equation model (BSEM; Muthén & Asparouhov, 2012). By
specifying ESTIMATOR=BAYES, a Bayesian analysis will be carried
out. In Bayesian estimation, the default is to use two independent
Markov chain Monte Carlo (MCMC) chains. If multiple processors are
available, using PROCESSORS=2 will speed up computations.
105
CHAPTER 5
106
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
107
CHAPTER 5
108
Examples: Confirmatory Factor Analysis And
Structural Equation Modeling
In the overall part of the model, labels are assigned to the factor loadings
and the intercepts using automatic labeling for groups. The labels must
include the number sign (#) followed by the underscore (_) symbol
109
CHAPTER 5
followed by a number. The number sign (#) refers to a group and the
number refers to a parameter. The label lam#_1 is assigned to the factor
loading for y1; the label lam#_2 is assigned to the factor loading for y2;
and the label lam#_3 is assigned to the factor loading for y3. These
labels are expanded to include group information. For example, the
label for parameter 1 is expanded across the ten groups to give labels
lam1_1, lam2_1 through lam10_1. In MODEL PRIORS, these expanded
labels are used to assign zero-mean and small-variance priors to the
differences across groups of the factor loadings and intercepts using the
DO and DIFFERENCE options. They can be used together to simplify
the assignment of priors to a large set of difference parameters for
models with multiple groups and multiple time points. For the DO
option, the numbers in parentheses give the range of values for the do
loop. The number sign (#) is replaced by these values during the
execution of the do loop. The numbers refer to the six factor indicators.
110
Examples: Growth Modeling And Survival Analysis
CHAPTER 6
EXAMPLES: GROWTH
MODELING AND SURVIVAL
ANALYSIS
In Mplus, there are two options for handling the relationship between the
outcome and time. One approach allows time scores to be parameters in
the model so that the growth function can be estimated. This is the
approach used in structural equation modeling. The second approach
allows time to be a variable that reflects individually-varying times of
observations. This variable has a random slope. This is the approach
used in multilevel modeling. Random effects in the form of random
111
CHAPTER 6
All growth and survival models can be estimated using the following
special features:
112
Examples: Growth Modeling And Survival Analysis
113
CHAPTER 6
for each graphical display can be saved in an external file for use by
another graphics program.
114
Examples: Growth Modeling And Survival Analysis
i s
115
CHAPTER 6
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex6.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
116
Examples: Growth Modeling And Survival Analysis
estimated and allowed to be different across time and the residuals are
not correlated as the default.
The difference between this example and Example 6.1 is that the
outcome variable is a censored variable instead of a continuous variable.
The CENSORED option is used to specify which dependent variables
are treated as censored variables in the model and its estimation, whether
they are censored from above or below, and whether a censored or
censored-inflated model will be estimated. In the example above, y11,
y12, y13, and y14 are censored variables. They represent the outcome
variable measured at four equidistant occasions. The b in parentheses
following y11-y14 indicates that y11, y12, y13, and y14 are censored
from below, that is, have floor effects, and that the model is a censored
regression model. The censoring limit is determined from the data. The
residual variances of the outcome variables are estimated and allowed to
be different across time and the residuals are not correlated as the
default.
117
CHAPTER 6
The default estimator for this type of analysis is a robust weighted least
squares estimator. By specifying ESTIMATOR=MLR, maximum
likelihood estimation with robust standard errors using a numerical
integration algorithm is used. Note that numerical integration becomes
increasingly more computationally demanding as the number of factors
and the sample size increase. In this example, two dimensions of
integration are used with a total of 225 integration points. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator.
118
Examples: Growth Modeling And Survival Analysis
The difference between this example and Example 6.1 is that the
outcome variable is a censored variable instead of a continuous variable.
The CENSORED option is used to specify which dependent variables
are treated as censored variables in the model and its estimation, whether
they are censored from above or below, and whether a censored or
censored-inflated model will be estimated. In the example above, y11,
y12, y13, and y14 are censored variables. They represent the outcome
variable measured at four equidistant occasions. The bi in parentheses
following y11-y14 indicates that y11, y12, y13, and y14 are censored
from below, that is, have floor effects, and that a censored-inflated
regression model will be estimated. The censoring limit is determined
from the data. The residual variances of the outcome variables are
estimated and allowed to be different across time and the residuals are
not correlated as the default.
119
CHAPTER 6
name of the censored variable the number sign (#) followed by the
number 1.
In the parameterization of the growth model for the inflation part of the
outcome, the intercepts of the outcome variable at the four time points
are held equal as the default. The mean of the intercept growth factor is
fixed at zero. The mean of the slope growth factor and the variances of
the intercept and slope growth factors are estimated as the default, and
the growth factor covariance is estimated as the default because the
growth factors are independent (exogenous) variables.
In this example, the variance of the slope growth factor si for the
inflation part of the outcome is fixed at zero. Because of this, the
covariances among si and all of the other growth factors are fixed at zero
as the default. The covariances among the remaining three growth
factors are estimated as the default.
120
Examples: Growth Modeling And Survival Analysis
The difference between this example and Example 6.1 is that the
outcome variable is a binary or ordered categorical (ordinal) variable
instead of a continuous variable. The CATEGORICAL option is used to
specify which dependent variables are treated as binary or ordered
categorical (ordinal) variables in the model and its estimation. In the
example above, u11, u12, u13, and u14 are binary or ordered categorical
variables. They represent the outcome variable measured at four
equidistant occasions.
The default estimator for this type of analysis is a robust weighted least
squares estimator. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator. With the weighted
least squares estimator, the probit model and the default Delta
parameterization for categorical outcomes are used. The scale factor for
the latent response variable of the categorical outcome at the first time
point is fixed at one as the default, while the scale factors for the latent
response variables at the other time points are free to be estimated. If a
maximum likelihood estimator is used, the logistic model for categorical
121
CHAPTER 6
The difference between this example and Example 6.4 is that the Theta
parameterization instead of the default Delta parameterization is used.
In the Delta parameterization, scale factors for the latent response
variables of the observed categorical outcomes are allowed to be
parameters in the model, but residual variances for the latent response
variables are not. In the Theta parameterization, residual variances for
latent response variables are allowed to be parameters in the model, but
scale factors are not. Because the Theta parameterization is used, the
residual variance for the latent response variable at the first time point is
fixed at one as the default, while the residual variances for the latent
response variables at the other time points are free to be estimated. An
explanation of the other commands can be found in Examples 6.1 and
6.4.
122
Examples: Growth Modeling And Survival Analysis
The difference between this example and Example 6.1 is that the
outcome variable is a count variable instead of a continuous variable.
The COUNT option is used to specify which dependent variables are
treated as count variables in the model and its estimation and whether a
Poisson or zero-inflated Poisson model will be estimated. In the
example above, u11, u12, u13, and u14 are count variables. They
represent the outcome variable measured at four equidistant occasions.
123
CHAPTER 6
The difference between this example and Example 6.1 is that the
outcome variable is a count variable instead of a continuous variable.
The COUNT option is used to specify which dependent variables are
treated as count variables in the model and its estimation and whether a
Poisson or zero-inflated Poisson model will be estimated. In the
example above, u11, u12, u13, and u14 are count variables. They
represent the outcome variable u1 measured at four equidistant
occasions. The i in parentheses following u11-u14 indicates that a zero-
inflated Poisson model will be estimated.
In the parameterization of the growth model for the count part of the
outcome, the intercepts of the outcome variables at the four time points
are fixed at zero as the default. The means and variances of the growth
factors are estimated as the default, and the growth factor covariance is
estimated as the default because the growth factors are independent
(exogenous) variables.
124
Examples: Growth Modeling And Survival Analysis
In the parameterization of the growth model for the inflation part of the
outcome, the intercepts of the outcome variable at the four time points
are held equal as the default. The mean of the intercept growth factor is
fixed at zero. The mean of the slope growth factor and the variances of
the intercept and slope growth factors are estimated as the default, and
the growth factor covariance is estimated as the default because the
growth factors are independent (exogenous) variables.
In this example, the variance of the slope growth factor s for the count
part and the slope growth factor si for the inflation part of the outcome
are fixed at zero. Because of this, the covariances among s, si, and the
other growth factors are fixed at zero as the default. The covariance
between the i and ii intercept growth factors is estimated as the default.
125
CHAPTER 6
The difference between this example and Example 6.1 is that two of the
time scores are estimated. The | statement highlighted above shows how
to specify free time scores by using the asterisk (*) to designate a free
parameter. Starting values are specified as the value following the
asterisk (*). For purposes of model identification, two time scores must
be fixed for a growth model with two growth factors. In the example
above, the first two time scores are fixed at zero and one, respectively.
The third and fourth time scores are free to be estimated at starting
values of 2 and 3, respectively. The default estimator for this type of
analysis is maximum likelihood. The ESTIMATOR option of the
ANALYSIS command can be used to select a different estimator. An
explanation of the other commands can be found in Example 6.1.
126
Examples: Growth Modeling And Survival Analysis
i s q
The difference between this example and Example 6.1 is that the
quadratic growth model shown in the picture above is estimated. A
quadratic growth model requires three random effects: an intercept
factor (i), a linear slope factor (s), and a quadratic slope factor (q). The |
symbol is used to name and define the intercept and slope factors in the
growth model. The names i, s, and q on the left-hand side of the |
symbol are the names of the intercept, linear slope, and quadratic slope
factors, respectively. In the example above, the linear slope factor has
equidistant time scores of 0, 1, 2, and 3. The time scores for the
quadratic slope factor are the squared values of the linear time scores.
These time scores are automatically computed by the program.
127
CHAPTER 6
i
s
The difference between this example and Example 6.1 is that time-
invariant and time-varying covariates as shown in the picture above are
included in the model.
128
Examples: Growth Modeling And Survival Analysis
y1 y2 y3 y4 y5
i s1 s2
In this example, the piecewise growth model shown in the picture above
is estimated. In a piecewise growth model, different phases of
development are captured by more than one slope growth factor. The
first | statement specifies a linear growth model for the first phase of
development which includes the first three time points. The second |
statement specifies a linear growth model for the second phase of
development which includes the last three time points. Note that there is
129
CHAPTER 6
130
Examples: Growth Modeling And Survival Analysis
y1 y2 y3 y4
st
The TSCORES option is used to identify the variables in the data set that
contain information about individually-varying times of observation for
the outcomes. The TYPE option is used to describe the type of analysis
that is to be performed. By selecting RANDOM, a growth model with
random slopes will be estimated.
131
CHAPTER 6
the outcome variable. Two growth factors are used in the model, a
random intercept, i, and a random slope, s.
The second, third, fourth, and fifth | statements use the ON option to
name and define the random slope variables in the model. The name on
the left-hand side of the | symbol names the random slope variable. The
statement on the right-hand side of the | symbol defines the random slope
variable. In the second | statement, the random slope st is defined by the
linear regression of the dependent variable y1 on the time-varying
covariate a21. In the third | statement, the random slope st is defined by
the linear regression of the dependent variable y2 on the time-varying
covariate a22. In the fourth | statement, the random slope st is defined
by the linear regression of the dependent variable y3 on the time-varying
covariate a23. In the fifth | statement, the random slope st is defined by
the linear regression of the dependent variable y4 on the time-varying
covariate a24. Random slopes with the same name are treated as one
variable during model estimation. The ON statement describes the linear
regressions of the intercept growth factor i, the slope growth factor s,
and the random slope st on the covariate x. The intercepts and residual
variances of, i, s, and st, are free as the default. The residual covariance
between i and s is estimated as the default. The residual covariances
between st and i and s are fixed at zero as the default. The default
estimator for this type of analysis is maximum likelihood with robust
standard errors. The estimator option of the ANALYSIS command can
be used to select a different estimator. An explanation of the other
commands can be found in Example 6.1.
132
Examples: Growth Modeling And Survival Analysis
il sl
i2 s2
133
CHAPTER 6
In this example, the model for two parallel processes shown in the
picture above is estimated. Regressions among the growth factors are
included in the model.
The | statements are used to name and define the intercept and slope
growth factors for the two linear growth models. The names i1 and s1
on the left-hand side of the first | statement are the names of the intercept
and slope growth factors for the first linear growth model. The names i2
and s2 on the left-hand side of the second | statement are the names of
the intercept and slope growth factors for the second linear growth
model. The values on the right-hand side of the two | statements are the
time scores for the two slope growth factors. For both growth models,
the time scores of the slope growth factors are fixed at 0, 1, 2, and 3 to
define a linear growth model with equidistant time points. The zero time
score for the slope growth factor at time point one defines the intercept
factors as initial status factors. The coefficients of the intercept growth
factors are fixed at one as part of the growth model parameterization.
The residual variances of the outcome variables are estimated and
allowed to be different across time, and the residuals are not correlated
as the default.
134
Examples: Growth Modeling And Survival Analysis
fl f2 f3
i s
135
CHAPTER 6
y33. The metric of the three factors is set automatically by the program
by fixing the first factor loading in each BY statement to one. This
option can be overridden. The residual variances of the factor indicators
are estimated and the residuals are not correlated as the default.
The | statement is used to name and define the intercept and slope factors
in the growth model. The names i and s on the left-hand side of the | are
the names of the intercept and slope growth factors, respectively. The
values on the right-hand side of the | are the time scores for the slope
growth factor. The time scores of the slope growth factor are fixed at 0,
1, and 2 to define a linear growth model with equidistant time points.
The zero time score for the slope growth factor at time point one defines
the intercept growth factor as an initial status factor. The coefficients of
the intercept growth factor are fixed at one as part of the growth model
parameterization. The residual variances of the factors f1, f2, and f3 are
estimated and allowed to be different across time, and the residuals are
not correlated as the default.
136
Examples: Growth Modeling And Survival Analysis
MODEL: f1 BY u11
u21-u31 (1-2);
f2 BY u12
u22-u32 (1-2);
f3 BY u13
u23-u33 (1-2);
[u11$1 u12$1 u13$1] (3);
[u21$1 u22$1 u23$1] (4);
[u31$1 u32$1 u33$1] (5);
{u11-u31@1 u12-u33};
i s | f1@0 f2@1 f3@2;
The difference between this example and Example 6.14 is that the factor
indicators are binary or ordered categorical (ordinal) variables instead of
continuous variables. The CATEGORICAL option is used to specify
which dependent variables are treated as binary or ordered categorical
(ordinal) variables in the model and its estimation. In the example
above, all of the factor indicators are categorical variables. The program
determines the number of categories for each indicator.
137
CHAPTER 6
138
Examples: Growth Modeling And Survival Analysis
u1 u2 u3 u4
iu su
iy sy
y1 y2 y3 y4
1. If the value of the original variable is missing, both the new binary
and the new continuous variable values are missing.
139
CHAPTER 6
2. If the value of the original variable is greater than the cutpoint value,
the new binary variable value is one and the new continuous variable
value is the log of the original variable as the default.
3. If the value of the original variable is less than or equal to the
cutpoint value, the new binary variable value is zero and the new
continuous variable value is missing.
The first | statement specifies a linear growth model for the binary
outcome. The second | statement specifies a linear growth model for the
continuous outcome. In the parameterization of the growth model for
140
Examples: Growth Modeling And Survival Analysis
the binary outcome, the thresholds of the outcome variable at the four
time points are held equal as the default. The mean of the intercept
growth factor is fixed at zero. The mean of the slope growth factor and
the variances of the intercept and slope growth factors are estimated as
the default. In this example, the variance of the slope growth factor is
fized at zero for simplicity. In the parameterization of the growth model
for the continuous outcome, the intercepts of the outcome variables at
the four time points are fixed at zero as the default. The means and
variances of the growth factors are estimated as the default, and the
growth factors are correlated as the default because they are independent
(exogenous) variables.
It is often the case that not all growth factor covariances are significant
in two-part growth modeling. Fixing these at zero stabilizes the
estimation. This is why the growth factor covariance between iu and sy
is fixed at zero. The OUTPUT command is used to request additional
output not included as the default. The TECH1 option is used to request
the arrays containing parameter specifications and starting values for all
free parameters in the model. The TECH8 option is used to request that
the optimization history in estimating the model be printed in the output.
TECH8 is printed to the screen during the computations as the default.
TECH8 screen printing is useful for determining how long the analysis
takes. An explanation of the other commands can be found in Example
6.1.
141
CHAPTER 6
The difference between this example and Example 6.1 is that first-order
auto correlated residuals have been added to the model. In a model with
first-order correlated residuals, one residual variance parameter and one
residual auto-correlation parameter are estimated.
142
Examples: Growth Modeling And Survival Analysis
143
CHAPTER 6
y1 y2 y3 y4
i
s
In this example, the multiple group multiple cohort growth model shown
in the picture above is estimated. Longitudinal research studies often
collect data on several different groups of individuals defined by their
birth year or cohort. This allows the study of development over a wider
age range than the length of the study and is referred to as an accelerated
or sequential cohort design. The interest in these studies is the
development of an outcome over age not measurement occasion. This
can be handled by rearranging the data so that age is the time axis using
the DATA COHORT command or using a multiple group approach as
described in this example. The advantage of the multiple group
approach is that it can be used to test assumptions of invariance of
growth parameters across cohorts.
In the multiple group approach the variables in the data set represent the
measurement occasions. In this example, there are four measurement
occasions: 2000, 2002, 2004, and 2006. Therefore there are four
variables to represent the outcome. In this example, there are three
cohorts with birth years 1988, 1989, and 1990. It is the combination of
the time of measurement and birth year that determines the ages
represented in the data. This is shown in the table below where rows
represent cohort and columns represent measurement occasion. The
144
Examples: Growth Modeling And Survival Analysis
entries in the table represent the ages. In this example, ages 10 to 18 are
represented.
The model that is estimated uses the time axis of age as shown in the
table below where rows represent cohort and columns represent age.
The entries for the first three rows in the table are the years of the
measurement occasions. The entries for the last row are the time scores
for a linear model.
Age/
10 11 12 13 14 15 16 17 18
Cohort
1988 2000 2002 2004 2006
1989 2000 2002 2004 2006
1990 2000 2002 2004 2006
Time
0 .1 .2 .3 .4 .5 .6 .7 .8
Score
As shown in the table, three ages are represented by more than one
cohort. Age 12 is represented by cohorts 1988 and 1990 measured in
2000 and 2002; age 14 is represented by cohorts 1988 and 1990
measured in 2002 and 2004; and age 16 is represented by cohorts 1988
and 1990 measured in 2004 and 2006. This information is needed to
constrain parameters to be equal in the multiple group model.
The table also provides information about the time scores for each
cohort. The time scores are obtained as the difference in age between
measurement occasions divided by ten. The division is used to avoid
large time scores which can lead to convergence problems. Cohort 1990
provides information for ages 10, 12, 14, and 16. The time scores for
cohort 2000 are 0, .2, .4, and .6. Cohort 1989 provides information for
ages 11, 13, 15, and 17. The time scores for cohort 1989 are .1, .3, .5,
and .7. Cohort 1988 provides information for ages 12, 14, 16, and 18.
The time scores for cohort 1988 are .2, .4, .6, and .8.
145
CHAPTER 6
The GROUPING option is used to identify the variable in the data set
that contains information on group membership when the data for all
groups are stored in a single data set. The information in parentheses
after the grouping variable name assigns labels to the values of the
grouping variable found in the data set. In the example above,
observations with g equal to 1 will be assigned the label 1990,
individuals with g equal to 2 will be assigned the label 1989, and
individuals with g equal to 3 will be assigned the label 1988. These
labels are used in conjunction with the MODEL command to specify
model statements specific to each group.
146
Examples: Growth Modeling And Survival Analysis
u1 u2 u3 u4
x f
147
CHAPTER 6
occurred, and a missing value flag means that the event has occurred in a
preceding time period or that the individual has dropped out of the study
(Muthén & Masyn, 2005). The factor f is used to specify a proportional
odds assumption for the hazards of the event.
x t
148
Examples: Growth Modeling And Survival Analysis
149
CHAPTER 6
150
Examples: Growth Modeling And Survival Analysis
u1 u2 u3 u4 t
151
CHAPTER 6
152
Examples: Mixture Modeling With Cross-Sectional Data
CHAPTER 7
EXAMPLES: MIXTURE
MODELING WITH CROSS-
SECTIONAL DATA
153
CHAPTER 7
154
Examples: Mixture Modeling With Cross-Sectional Data
155
CHAPTER 7
156
Examples: Mixture Modeling With Cross-Sectional Data
157
CHAPTER 7
x1 x2
158
Examples: Mixture Modeling With Cross-Sectional Data
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex7.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
159
CHAPTER 7
MODEL:
%OVERALL%
y ON x1 x2;
c ON x1;
%c#2%
y ON x2;
y;
In the model for class 2, the ON statement describes the linear regression
of y on the covariate x2. This specification relaxes the default equality
160
Examples: Mixture Modeling With Cross-Sectional Data
c#1 ON x1;
where c#1 refers to the first class of c. The classes of a categorical latent
variable are referred to by adding to the name of the categorical latent
variable the number sign (#) followed by the number of the class. This
alternative specification allows individual parameters to be referred to in
the MODEL command for the purpose of giving starting values or
placing restrictions.
161
CHAPTER 7
The difference between this example and Example 7.1 is that the
dependent variable is a count variable instead of a continuous variable.
The COUNT option is used to specify which dependent variables are
treated as count variables in the model and its estimation and whether a
Poisson or zero-inflated Poisson model will be estimated. In the
example above, u is a count variable. The i in parentheses following u
indicates that a zero-inflated Poisson model will be estimated.
162
Examples: Mixture Modeling With Cross-Sectional Data
163
CHAPTER 7
u1
u2
u3
u4
In this example, the latent class analysis (LCA) model with binary latent
class indicators shown in the picture above is estimated using automatic
starting values and random starts. Because c is a categorical latent
variable, the interpretation of the picture is not the same as for models
with continuous latent variables. The arrows from c to the latent class
indicators u1, u2, u3, and u4 indicate that the thresholds of the latent
class indicators vary across the classes of c. This implies that the
probabilities of the latent class indicators vary across the classes of c.
The arrows correspond to the regressions of the latent class indicators on
a set of dummy variables representing the categories of c.
164
Examples: Mixture Modeling With Cross-Sectional Data
The differences between this example and Example 7.3 are that user-
specified starting values are used instead of automatic starting values
and there are no random starts. By specifying STARTS=0 in the
ANALYSIS command, random starts are turned off.
165
CHAPTER 7
In the MODEL command, user-specified starting values are given for the
thresholds of the binary latent class indicators. For binary and ordered
categorical dependent variables, thresholds are referred to by adding to a
variable name a dollar sign ($) followed by a threshold number. The
number of thresholds is equal to the number of categories minus one.
Because the latent class indicators are binary, they have one threshold.
The thresholds of the latent class indicators are referred to as u1$1,
u2$1, u3$1, and u4$1. Square brackets are used to specify starting
values in the logit scale for the thresholds of the binary latent class
indicators. The asterisk (*) is used to assign a starting value. It is placed
after a variable with the starting value following it. In the example
above, the threshold of u1 is assigned the starting value of 1 for class 1
and -1 for class 2. The threshold of u4 is assigned the starting value of -
1 for class 1 and 1 for class 2. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 7.1 and 7.3.
166
Examples: Mixture Modeling With Cross-Sectional Data
The difference between this example and Example 7.4 is that random
starts are used. In this example, the random perturbations are based on
user-specified starting values. The STARTS option is used to specify
the number of initial stage random sets of starting values to generate and
the number of final stage optimizations to use. The default is 20 random
sets of starting values for the initial stage and 4 optimizations for the
final stage. In the example above, the STARTS option specifies that 100
random sets of starting values for the initial stage and 10 final stage
optimizations will be used. The STITERATIONS option is used to
specify the maximum number of iterations allowed in the initial stage.
In this example, 20 iterations are allowed in the initial stage instead of
the default of 10. The default estimator for this type of analysis is
maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Examples 7.1, 7.3, and 7.4.
167
CHAPTER 7
The difference between this example and Example 7.4 is that the latent
class indicators are ordered categorical (ordinal) variables with three
categories instead of binary variables. When latent class indicators are
ordered categorical variables, each latent class indicator has more than
one threshold. The number of thresholds is equal to the number of
categories minus one. When user-specified starting values are used, they
must be specified for all thresholds and they must be in increasing order
for each variable within each class. For example, in class 1 the threshold
starting values for latent class indicator u1 are .5 for the first threshold
and 1 for the second threshold. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 7.1, 7.3, and 7.4.
The difference between this example and Example 7.3 is that the latent
class indicators are unordered categorical (nominal) variables instead of
binary variables. The NOMINAL option is used to specify which
dependent variables are treated as unordered categorical (nominal)
variables in the model and its estimation. In the example above, u1, u2,
u3, and u4 are three-category unordered variables. The categories of an
unordered categorical variable are referred to by adding to the name of
the unordered categorical variable the number sign (#) followed by the
number of the category. The default estimator for this type of analysis is
maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
168
Examples: Mixture Modeling With Cross-Sectional Data
The difference between this example and Example 7.7 is that user-
specified starting values are used instead of automatic starting values.
Means are referred to by using bracket statements. The categories of an
unordered categorical variable are referred to by adding to the name of
the unordered categorical variable the number sign (#) followed by the
number of the category. In this example, u1#1 refers to the first category
of u1 and u1#2 refers to the second category of u1. Starting values of 0
and 1 are given for the means in class 1 and starting values of -1 are
given for the means in class 2. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 7.1, 7.3, and 7.7.
169
CHAPTER 7
y1
y2
y3
y4
The difference between this example and Example 7.3 is that the latent
class indicators are continuous variables instead of binary variables.
When there is no specification in the VARIABLE command regarding
the scale of the dependent variables, it is assumed that they are
continuous. Latent class analysis with continuous latent class indicators
is often referred to as latent profile analysis.
170
Examples: Mixture Modeling With Cross-Sectional Data
The difference between this example and Example 7.4 is that the latent
class indicators are continuous variables instead of binary variables. As
a result, starting values are given for means instead of thresholds.
The means and variances of the latent class indicators and the mean of
the categorical latent variable are estimated as the default. In the models
for class 1 and class 2, by mentioning the variances of the latent class
171
CHAPTER 7
The difference between this example and Example 7.4 is that the latent
class indicators are a combination of binary, censored, unordered
categorical (nominal) and count variables instead of binary variables.
172
Examples: Mixture Modeling With Cross-Sectional Data
in the model and its estimation. In the example above, the latent class
indicator u1 is a binary variable. The CENSORED option is used to
specify which dependent variables are treated as censored variables in
the model and its estimation, whether they are censored from above or
below, and whether a censored or censored-inflated model will be
estimated. In the example above, y1 is a censored variable. The b in
parentheses following y1 indicates that y1 is censored from below, that
is, has a floor effect, and that the model is a censored regression model.
The censoring limit is determined from the data. The NOMINAL option
is used to specify which dependent variables are treated as unordered
categorical (nominal) variables in the model and its estimation. In the
example above, u2 is a three-category unordered variable. The program
determines the number of categories. The categories of an unordered
categorical variable are referred to by adding to the name of the
unordered categorical variable the number sign (#) followed by the
number of the category. In this example, u2#1 refers to the first category
of u2 and u2#2 refers to the second category of u2. The COUNT option
is used to specify which dependent variables are treated as count
variables in the model and its estimation and whether a Poisson or zero-
inflated Poisson model will be estimated. In the example above, u3 is a
count variable. The i in parentheses following u3 indicates that a zero-
inflated model will be estimated. The inflation part of the count variable
is referred to by adding to the name of the count variable the number
sign (#) followed by the number 1. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 7.1 and 7.4.
173
CHAPTER 7
u1
u2
x c
u3
u4
The difference between this example and Example 7.3 is that the model
contains a covariate and a direct effect. The first ON statement
174
Examples: Mixture Modeling With Cross-Sectional Data
175
CHAPTER 7
176
Examples: Mixture Modeling With Cross-Sectional Data
MODEL cu:
%cu#1%
[u1$1-u4$1];
%cu#2%
[u1$1-u4$1];
MODEL cy:
%cy#1%
[y1-y4];
%cy#2%
[y1-y4];
%cy#3%
[y1-y4];
OUTPUT: TECH1 TECH8;
u1 u2 u3 u4
cu
cy
y1 y2 y3 y4
177
CHAPTER 7
where cu#1 refers to the first class of cu, cy#1 refers to the first class of
cy, and cy#2 refers to the second class of cy. The classes of a
categorical latent variable are referred to by adding to the name of the
categorical latent variable the number sign (#) followed by the number
of the class. This alternative specification allows individual parameters
to be referred to in the MODEL command for the purpose of giving
starting values or placing restrictions.
178
Examples: Mixture Modeling With Cross-Sectional Data
179
CHAPTER 7
between the first two variables being zero for each of the two levels of
the third variable.
180
Examples: Mixture Modeling With Cross-Sectional Data
u1
u2
c f
u3
u4
181
CHAPTER 7
y1 y2 y3 y4 y5
c f
In this example, the mixture CFA model shown in the picture above is
estimated (Muthén, 2008). The mean of the factor f varies across the
classes of the categorical latent variable c. The residual arrow pointing
to f indicates that the factor varies within class. This implies that the
distribution of f is allowed to be non-normal. It is possible to allow
other parameters of the CFA model to vary across classes.
The BY statement specifies that f is measured by y1, y2, y3, y4, and y5.
The factor mean varies across the classes. All other model parameters
are held equal across classes as the default. The default estimator for
this type of analysis is maximum likelihood with robust standard errors.
The ESTIMATOR option of the ANALYSIS command can be used to
select a different estimator. An explanation of the other commands can
be found in Example 7.1.
182
Examples: Mixture Modeling With Cross-Sectional Data
183
CHAPTER 7
c1 c2
In the overall model, the BY statement names the second order factor f.
The ON statement specifies that f influences both categorical latent
variables in the same amount by imposing an equality constraint on the
two multinomial logistic regression coefficients. The slope in the
multinomial regression of c on f reflects the strength of association
184
Examples: Mixture Modeling With Cross-Sectional Data
u1 u5
u2 u6
f c
u3 u7
u4 u8
185
CHAPTER 7
In this example, the model with both a continuous and categorical latent
variable shown in the picture above is estimated. The categorical latent
variable c is regressed on the continuous latent variable f in a
multinomial logistic regression.
186
Examples: Mixture Modeling With Cross-Sectional Data
y1 y4
y2 f1 f2 y5
y3 y6
187
CHAPTER 7
y1 y2 y3 y4
cg c
In this example, the multiple group mixture model shown in the picture
above is estimated. The groups are represented by the classes of the
categorical latent variable cg, which has known class (group)
membership.
188
Examples: Mixture Modeling With Cross-Sectional Data
189
CHAPTER 7
y1
y2
y3
y4
190
Examples: Mixture Modeling With Cross-Sectional Data
y3, and y4. The default estimator for this type of analysis is maximum
likelihood with robust standard errors. The ESTIMATOR option of the
ANALYSIS command can be used to select a different estimator. An
explanation of the other commands can be found in Example 7.1.
191
CHAPTER 7
x1 x2
In this example, the mixture model for randomized trials using CACE
(Complier-Average Causal Effect) estimation with training data shown
in the picture above is estimated (Little & Yau, 1998). The continuous
dependent variable y is regressed on the covariate x1 and the treatment
dummy variable x2. The categorical latent variable c is compliance
status, with class 1 referring to non-compliers and class 2 referring to
compliers. Compliance status is observed in the treatment group and
unobserved in the control group. Because c is a categorical latent
variable, the interpretation of the picture is not the same as for models
with continuous latent variables. The arrow from c to the y variable
indicates that the intercept of y varies across the classes of c. The arrow
from c to the arrow from x2 to y indicates that the slope in the regression
of y on x2 varies across the classes of c. The arrow from x1 to c
represents the multinomial logistic regression of c on x1.
192
Examples: Mixture Modeling With Cross-Sectional Data
In the model for class 1, a starting value of zero is given for the intercept
of y as the default. The residual variance of y is specified to relax the
default across class equality constraint. The ON statement describes the
linear regression of y on x2 where the slope is fixed at zero. This is
done because non-compliers do not receive treatment. In the model for
class 2, a starting value of .5 is given for the intercept of y. The residual
variance of y is specified to relax the default across class equality
constraint. The regression of y ON x2, which represents the CACE
treatment effect, is not fixed at zero for class 2. The default estimator
for this type of analysis is maximum likelihood with robust standard
errors. The ESTIMATOR option of the ANALYSIS command can be
used to select a different estimator. An explanation of the other
commands can be found in Example 7.1.
193
CHAPTER 7
MODEL:
%OVERALL%
y ON x1 x2;
c ON x1;
%c#1%
[u$1@15];
[y];
y;
y ON x2@0;
%c#2%
[u$1@-15];
[y*.5];
y;
OUTPUT: TECH1 TECH8;
u y
x1 x2
The difference between this example and Example 7.23 is that a binary
latent class indicator u has been added to the model. This binary
variable represents observed compliance status. Treatment compliers
have a value of 1 on this variable; treatment non-compliers have a value
of 0 on this variable; and individuals in the control group have a missing
value on this variable. The latent class indicator u is used instead of
training data.
194
Examples: Mixture Modeling With Cross-Sectional Data
In the model for class 1, the threshold of the latent class indicator
variable u is set to a logit value of 15. In the model for class 2, the
threshold of the latent class indicator variable u is set to a logit value of
–15. These logit values reflect that c is perfectly measured by u.
Individuals in the non-complier class (class 1) have probability zero of
observed compliance and individuals in the complier class (class 2) have
probability one of observed compliance. The default estimator for this
type of analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 7.1 and 7.23.
x1 u1
x3 c
195
CHAPTER 7
196
Examples: Mixture Modeling With Cross-Sectional Data
197
CHAPTER 7
u1 u2 u3 u4 u5 u6 u7 u8
198
Examples: Mixture Modeling With Cross-Sectional Data
In this example, the factor (IRT) mixture model shown in the picture
above is estimated (Muthén, 2008). The model is a generalization of the
latent class model where the latent class model assumption of
conditional independence between the latent class indicators within class
is relaxed using a factor that influences the items within each class
(Muthén, 2006; Muthén & Asparouhov, 2006; Muthén, Asparouhov, &
Rebollo, 2006). The factor represents individual variation in response
probabilities within class. Alternatively, this model may be seen as an
Item Response Theory (IRT) mixture model. The broken arrows from
the categorical latent variable c to the arrows from the factor f to the
latent class indicators u1 to u8 indicate that the factor loadings vary
across classes.
199
CHAPTER 7
200
Examples: Mixture Modeling With Cross-Sectional Data
u1 u2
f1 f2
In this example, the model shown in the picture above is estimated. The
variables u1 and u2 represent a univariate outcome for each member of a
twin pair. Monozygotic and dizygotic twins are considered in a two-
group twin model for categorical outcomes using maximum likelihood
estimation. Parameter constraints are used to represent the ACE model
restrictions. The ACE variance and covariance restrictions are placed on
normally-distributed latent response variables, which are also called
liabilities, underlying the categorical outcomes. This model is referred
to as the threshold model for liabilities (Neale & Cardon, 1992). The
monozygotic and dizygotic twin groups are represented by latent classes
with known class membership.
201
CHAPTER 7
In the overall model, the (1) following the first bracket statement
specifies that the thresholds of u1 and u2 are held equal across twins.
The two BY statements define a factor behind each outcome. This is
done because covariances of categorical outcomes are not part of the
model when maximum likelihood estimation is used. The covariances of
the factors become the covariances of the categorical outcomes or more
precisely the covariances of the latent response variables underlying the
categorical outcomes. The means of the factors are fixed at zero and
their variances are held equal across twins. The variance of each
underlying response variable is obtained as the sum of the factor
variance plus one where one is the residual variance in the probit
regression of the categorical outcome on the factor.
In the MODEL command, labels are defined for three parameters. The
label varf is assigned to the variances of f1 and f2. Because they are
given the same label, these parameters are held equal. The label covmz
is assigned to the covariance between f1 and f2 for the monozygotic
twins and the label covdz is assigned to the covariance between f1 and f2
for the dizygotic twins. In the MODEL CONSTRAINT command, the
NEW option is used to assign labels to three parameters that are not in
the analysis model: a, c, and h. The two parameters a and c are used to
decompose the covariances of u1 and u2 into genetic and environmental
components. The value .001 is added to the variance of the factors to
avoid a singular factor covariance matrix which comes about because the
factor variances and covariances are the same. The parameter h does not
impose restrictions on the model parameters but is used to compute the
heritability estimate and its standard error. This heritability estimate
uses the residual variances for the latent response variables which are
fixed at one. An explanation of the other commands can be found in
Example 7.1.
202
Examples: Mixture Modeling With Cross-Sectional Data
203
CHAPTER 7
f1 f2
In this example, the model shown in the picture above is estimated. The
factors f1 and f2 represent a univariate variable for each member of the
twin pair. Monozygotic and dizygotic twins are considered in a two-
group twin model for factors with categorical factor indicators using
parameter constraints and maximum likelihood estimation. Parameter
constraints are used to represent the ACE model restrictions. The ACE
variance and covariance restrictions are placed on two factors instead of
two observed variables as in Example 7.28. The relationships between
the categorical factor indicators and the factors are logistic regressions.
Therefore, the factor model for each twin is a two-parameter logistic
Item Response Theory model (Muthén, Asparouhov, & Rebollo, 2006).
The monozygotic and dizygotic twin groups are represented by latent
classes with known class membership.
204
Examples: Mixture Modeling With Cross-Sectional Data
In the MODEL command, labels are defined for nine parameters. The
list function can be used when assigning labels. The label lam2 is
assigned to the factor loadings for u12 and u22; the label lam3 is
assigned to the factor loadings for u13 and u23; and the label lam4 is
assigned to the factor loadings for u14 and u24. Factor loadings with the
same label are held equal. The label t1 is assigned to the thresholds of
u11 and u21; the label t2 is assigned to the thresholds of u12 and u22;
the label t3 is assigned to the thresholds of u13 and u23; and the label t4
is assigned to the thresholds of u14 and u24. Parameters with the same
label are held equal. The label covmz is assigned to the covariance
between f1 and f2 for the monozygotic twins and the label covdz is
assigned to the covariance between f1 and f2 for the dizygotic twins.
205
CHAPTER 7
MODEL:
%OVERALL%
t ON x;
%c#1%
[u$1@15];
[t@0];
%c#2%
[u$1@-15];
[t];
OUTPUT: TECH1 LOGRANK;
PLOT: TYPE = PLOT2;
u t
206
Examples: Mixture Modeling With Cross-Sectional Data
specification, class 1 is the control group. In the model for class 2, the
threshold for u is fixed at -15 so that the probability that u equals one is
one. By this specification, class 2 is the treatment group. In the overall
model, the ON statement describes the Cox regression for the survival
variable t on the covariate x. In class 1, the intercept in the Cox
regression is fixed at zero. In class 2, it is free. This intercept represents
the treatment effect. The LOGRANK option of the OUTPUT command
provides a logrank test of the equality of the treatment and control
survival curves (Mantel, 1966). By specifying PLOT2 in the PLOT
command, the following plots are obtained:
• Kaplan-Meier curve
• Sample log cumulative hazard curve
• Estimated baseline hazard curve
• Estimated baseline survival curve
• Estimated log cumulative baseline curve
• Kaplan-Meier curve with estimated baseline survival curve
• Sample log cumulative hazard curve with estimated log
cumulative baseline curve
207
CHAPTER 7
208
Examples: Mixture Modeling With Longitudinal Data
CHAPTER 8
EXAMPLES: MIXTURE
MODELING WITH
LONGITUDINAL DATA
209
CHAPTER 8
210
Examples: Mixture Modeling With Longitudinal Data
211
CHAPTER 8
212
Examples: Mixture Modeling With Longitudinal Data
y1 y2 y3 y4
i s
213
CHAPTER 8
to the growth factors i and s indicate that the intercepts in the regressions
of the growth factors on x vary across the classes of c. This corresponds
to the regressions of i and s on a set of dummy variables representing the
categories of c. The arrow from x to c represents the multinomial
logistic regression of c on x. GMM is discussed in Muthén and Shedden
(1999), Muthén (2004), and Muthén and Asparouhov (2009).
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex8.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
214
Examples: Mixture Modeling With Longitudinal Data
MODEL:
%OVERALL%
i s | y1@0 y2@1 y3@2 y4@3;
i s ON x;
c ON x;
215
CHAPTER 8
c#1 ON x;
where c#1 refers to the first class of c. The classes of a categorical latent
variable are referred to by adding to the name of the categorical latent
variable the number sign (#) followed by the number of the class. This
alternative specification allows individual parameters to be referred to in
the MODEL command for the purpose of giving starting values or
placing restrictions.
216
Examples: Mixture Modeling With Longitudinal Data
The difference between this example and Example 8.1 is that user-
specified starting values are used instead of automatic starting values. In
the MODEL command, user-specified starting values are given for the
intercepts of the intercept and slope growth factors. Intercepts are
referred to using brackets statements. The asterisk (*) is used to assign a
starting value for a parameter. It is placed after the parameter with the
starting value following it. In class 1, a starting value of 1 is given for
the intercept growth factor and a starting value of .5 is given for the
slope growth factor. In class 2, a starting value of 3 is given for the
intercept growth factor and a starting value of 1 is given for the slope
growth factor. The default estimator for this type of analysis is
maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Example 8.1.
217
CHAPTER 8
The difference between this example and Example 8.1 is that the
outcome variable is a censored variable instead of a continuous variable.
The CENSORED option is used to specify which dependent variables
are treated as censored variables in the model and its estimation, whether
they are censored from above or below, and whether a censored or
censored-inflated model will be estimated. In the example above, y1, y2,
y3, and y4 are censored variables. They represent the outcome variable
measured at four equidistant occasions. The b in parentheses following
y1-y4 indicates that y1, y2, y3, and y4 are censored from below, that is,
have floor effects, and that the model is a censored regression model.
The censoring limit is determined from the data.
218
Examples: Mixture Modeling With Longitudinal Data
The difference between this example and Example 8.1 is that the
outcome variable is a binary or ordered categorical (ordinal) variable
instead of a continuous variable. The CATEGORICAL option is used to
specify which dependent variables are treated as binary or ordered
categorical (ordinal) variables in the model and its estimation. In the
example above, u1, u2, u3, and u4 are binary or ordered categorical
variables. They represent the outcome variable measured at four
equidistant occasions.
219
CHAPTER 8
220
Examples: Mixture Modeling With Longitudinal Data
MODEL:
%OVERALL%
i s q | u1@0 u2@.1 u3@.2 u4@.3 u5@.4 u6@.5
u7@.6 u8@.7;
ii si qi | u1#1@0 u2#1@.1 u3#1@.2 u4#1@.3
u5#1@.4 u6#1@.5 u7#1@.6 u8#1@.7;
s-qi@0;
i s ON x;
c ON x;
OUTPUT: TECH1 TECH8;
The difference between this example and Example 8.1 is that the
outcome variable is a count variable instead of a continuous variable. In
addition, the outcome is measured at eight occasions instead of four and
a quadratic rather than a linear growth model is estimated. The COUNT
option is used to specify which dependent variables are treated as count
variables in the model and its estimation and the type of model that will
be estimated. In the first part of this example a zero-inflated Poisson
model is estimated. In the example above, u1, u2, u3, u4, u5, u6, u7, and
u8 are count variables. They represent the outcome variable measured at
eight equidistant occasions. The i in parentheses following u1-u8
indicates that a zero-inflated Poisson model will be estimated.
221
CHAPTER 8
by adding to the name of the count variable the number sign (#) followed
by the number 1.
In the parameterization of the growth model for the count part of the
outcome, the intercepts of the outcome variable at the eight time points
are fixed at zero as the default. The intercepts and residual variances of
the growth factors are estimated as the default, and the growth factor
residual covariances are estimated as the default because the growth
factors do not influence any variable in the model except their own
indicators. The intercepts of the growth factors are not held equal across
classes as the default. The residual variances and residual covariances
of the growth factors are held equal across classes as the default. In this
example, the variances of the slope growth factors s and q are fixed at
zero. This implies that the covariances between i, s, and q are fixed at
zero. Only the variance of the intercept growth factor i is estimated.
In the parameterization of the growth model for the inflation part of the
outcome, the intercepts of the outcome variable at the eight time points
are held equal as the default. The intercept of the intercept growth factor
is fixed at zero in all classes as the default. The intercept of the slope
growth factor and the residual variances of the intercept and slope
growth factors are estimated as the default, and the growth factor
residual covariances are estimated as the default because the growth
factors do not influence any variable in the model except their own
indicators. The intercept of the slope growth factor, the residual
variances of the growth factors, and residual covariance of the growth
factors are held equal across classes as the default. These defaults can
be overridden, but freeing too many parameters in the inflation part of
the model can lead to convergence problems. In this example, the
variances of the intercept and slope growth factors are fixed at zero.
This implies that the covariances between ii, si, and qi are fixed at zero.
An explanation of the other commands can be found in Example 8.1.
222
Examples: Mixture Modeling With Longitudinal Data
MODEL:
%OVERALL%
i s q | u1@0 u2@.1 u3@.2 u4@.3 u5@.4 u6@.5
u7@.6 u8@.7;
s-q@0;
i s ON x;
c ON x;
OUTPUT: TECH1 TECH8;
The difference between this part of the example and the first part is that
a growth mixture model (GMM) for a count outcome using a negative
binomial model is estimated instead of a zero-inflated Poisson model.
The negative binomial model estimates a dispersion parameter for each
of the outcomes (Long, 1997; Hilbe, 2011).
223
CHAPTER 8
y1 y2 y3 y4
i s
x c u
The difference between this example and Example 8.1 is that a binary or
ordered categorical (ordinal) distal outcome has been added to the model
as shown in the picture above. The distal outcome u is regressed on the
categorical latent variable c using logistic regression. This is
represented as the thresholds of u varying across classes.
224
Examples: Mixture Modeling With Longitudinal Data
%c1#2%
[i1*1 s1];
%c1#3%
[i1*2 s1];
MODEL c2:
%c2#1%
[i2 s2];
%c2#2%
[i2*-1 s2];
OUTPUT: TECH1 TECH8;
225
CHAPTER 8
y1 y2 y3 y4 y5 y6 y7 y8
i1 s1 i2 s2
c1 c2
The | statements in the overall model are used to name and define the
intercept and slope growth factors in the growth models. In the first |
statement, the names i1 and s1 on the left-hand side of the | symbol are
the names of the intercept and slope growth factors, respectively. In the
second | statement, the names i2 and s2 on the left-hand side of the |
symbol are the names of the intercept and slope growth factors,
respectively. In both | statements, the values on the right-hand side of
the | symbol are the time scores for the slope growth factor. For both
growth processes, the time scores of the slope growth factors are fixed at
0, 1, 2, and 3 to define linear growth models with equidistant time
points. The zero time scores for the slope growth factors at time point
one define the intercept growth factors as initial status factors. The
coefficients of the intercept growth factors i1 and i2 are fixed at one as
part of the growth model parameterization. In the parameterization of
the growth model shown here, the means of the outcome variables at the
four time points are fixed at zero as the default. The intercept and slope
growth factor means are estimated as the default. The variances of the
growth factors are also estimated as the default. The growth factors are
226
Examples: Mixture Modeling With Longitudinal Data
When there are multiple categorical latent variables, each one has its
own MODEL command. The MODEL command for each latent
variable is specified by MODEL followed by the name of the latent
variable. For each categorical latent variable, the part of the model that
differs for each class is specified by a label that consists of the
categorical latent variable followed by the number sign followed by the
class number. In the example above, the label %c1#1% refers to the part
of the model for class one of the categorical latent variable c1 that
differs from the overall model. The label %c2#1% refers to the part of
the model for class one of the categorical latent variable c2 that differs
from the overall model. The class-specific part of the model for each
categorical latent variable specifies that the means of the intercept and
slope growth factors are free to be estimated for each class. The default
estimator for this type of analysis is maximum likelihood with robust
standard errors. The ESTIMATOR option of the ANALYSIS command
can be used to select a different estimator. An explanation of the other
commands can be found in Example 8.1.
227
CHAPTER 8
where c2#1 refers to the first class of c2, c1#1 refers to the first class of
c1, and c1#2 refers to the second class of c1. The classes of a
categorical latent variable are referred to by adding to the name of the
categorical latent variable the number sign (#) followed by the number
of the class. This alternative specification allows individual parameters
to be referred to in the MODEL command for the purpose of giving
starting values or placing restrictions.
228
Examples: Mixture Modeling With Longitudinal Data
y1 y2 y3 y4
i s
cg
x c
The difference between this example and Example 8.1 is that this
analysis includes a categorical latent variable for which class
membership is known resulting in a multiple group growth mixture
model. The CLASSES option is used to assign names to the categorical
latent variables in the model and to specify the number of latent classes
in the model for each categorical latent variable. In the example above,
there are two categorical latent variables cg and c. Both categorical
latent variables have two latent classes. The KNOWNCLASS option is
used for multiple group analysis with TYPE=MIXTURE to identify the
categorical latent variable for which latent class membership is known
and is equal to observed groups in the sample. The KNOWNCLASS
option identifies cg as the categorical latent variable for which class
membership is known. The information in parentheses following the
categorical latent variable name defines the known classes using an
observed variable. In this example, the observed variable g is used to
define the known classes. The first class consists of individuals with the
value 0 on the variable g. The second class consists of individuals with
the value 1 on the variable g.
229
CHAPTER 8
parts of the model, starting values are given for the growth factor
intercepts. The four classes correspond to a combination of the classes
of cg and c. They are referred to by combining the class labels using a
period (.). For example, the combination of class 1 of cg and class 1 of c
is referred to as cg#1.c#1. The default estimator for this type of analysis
is maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Example 8.1.
u1 u2 u3 u4
i s
230
Examples: Mixture Modeling With Longitudinal Data
The difference between this example and Example 8.4 is that a LCGA
for a binary outcome as shown in the picture above is estimated instead
of a GMM. The difference between these two models is that GMM
allows within class variability and LCGA does not (Kreuter & Muthén,
2008; Muthén, 2004; Muthén & Asparouhov, 2009).
231
CHAPTER 8
The difference between this example and Example 8.9 is that the
outcome variable is an ordered categorical (ordinal) variable instead of a
binary variable. Note that the statements that are commented out are not
necessary. This results in an input identical to Example 8.9. The
statements are shown to illustrate how starting values can be given for
the thresholds and growth factor means in the model if this is needed.
Because the outcome is a three-category variable, it has two thresholds.
An explanation of the other commands can be found in Examples 8.1,
8.4 and 8.9.
The difference between this example and Example 8.9 is that the
outcome variable is a count variable instead of a continuous variable.
The COUNT option is used to specify which dependent variables are
treated as count variables in the model and its estimation and whether a
Poisson or zero-inflated Poisson model will be estimated. In the
example above, u1, u2, u3, and u4 are count variables and a zero-inflated
Poisson model is used. The count variables represent the outcome
measured at four equidistant occasions.
232
Examples: Mixture Modeling With Longitudinal Data
by adding to the name of the count variable the number sign (#) followed
by the number 1.
In the parameterization of the growth model for the count part of the
outcome, the intercepts of the outcome variable at the four time points
are fixed at zero as the default. The means of the growth factors are
estimated as the default. The variances of the growth factors are fixed
at zero. Because of this, the growth factor covariance is fixed at zero as
the default. The means of the growth factors are not held equal across
classes as the default.
In the parameterization of the growth model for the inflation part of the
outcome, the intercepts of the outcome variable at the four time points
are held equal as the default. The mean of the intercept growth factor is
fixed at zero in all classes as the default. The mean of the slope growth
factor is estimated and held equal across classes as the default. These
defaults can be overridden, but freeing too many parameters in the
inflation part of the model can lead to convergence problems. The
variances of the growth factors are fixed at zero. Because of this, the
growth factor covariance is fixed at zero. The default estimator for this
type of analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 8.1 and 8.9.
233
CHAPTER 8
MODEL c1:
%c1#1%
[u1$1] (3);
%c1#2%
[u1$1] (4);
MODEL c2:
%c2#1%
[u2$1] (3);
%c2#2%
[u2$1] (4);
MODEL c3:
%c3#1%
[u3$1] (3);
%c3#2%
[u3$1] (4);
MODEL c4:
%c4#1%
[u4$1] (3);
%c4#2%
[u4$1] (4);
OUTPUT: TECH1 TECH8;
u1 u2 u3 u4
c1 c2 c3 c4
In this example, the hidden Markov model for a single binary outcome
measured at four time points shown in the picture above is estimated.
Although each categorical latent variable has only one latent class
indicator, this model allows the estimation of measurement error by
allowing latent class membership and observed response to disagree.
This is a first-order Markov process where the transition matrices are
specified to be equal over time (Langeheine & van de Pol, 2002). The
parameterization of this model is described in Chapter 14.
234
Examples: Mixture Modeling With Longitudinal Data
model for each categorical latent variable. In the example above, there
are four categorical latent variables c1, c2, c3, and c4. All of the
categorical latent variables have two latent classes. In the overall model,
the transition matrices are held equal over time. This is done by placing
(1) after the bracket statement for the intercepts of c2, c3, and c4 and by
placing (2) after each of the ON statements that represent the first-order
Markov relationships. When a model has more than one categorical
latent variable, MODEL followed by a label is used to describe the
analysis model for each categorical latent variable. Labels are defined
by using the names of the categorical latent variables. The class-specific
equalities (3) and (4) represent measurement invariance across time. An
explanation of the other commands can be found in Example 8.1.
235
CHAPTER 8
%c1#3%
[u11$1] (11);
[u12$1] (12);
[u13$1] (13);
[u14$1] (14);
[u15$1] (15);
MODEL c2:
%c2#1%
[u21$1] (1);
[u22$1] (2);
[u23$1] (3);
[u24$1] (4);
[u25$1] (5);
%c2#2%
[u21$1] (6);
[u22$1] (7);
[u23$1] (8);
[u24$1] (9);
[u25$1] (10);
%c2#3%
[u21$1] (11);
[u22$1] (12);
[u23$1] (13);
[u24$1] (14);
[u25$1] (15);
OUTPUT: TECH1 TECH8 TECH15;
u11 u12 u13 u14 u15 u21 u22 u23 u24 u25
c1 c2
cg
236
Examples: Mixture Modeling With Longitudinal Data
When there are multiple categorical latent variables, each one has its
own MODEL command. The MODEL command for each categorical
latent variable is specified by MODEL followed by the name of the
categorical latent variable. In this example, MODEL cg describes the
group-specific parameters of the regression of c2 on c1. This allows the
binary covariate to influence the latent transition probabilities. MODEL
c1 describes the class-specific measurement parameters for variable c1
and MODEL c2 describes the class-specific measurement parameters for
variable c2. The model for each categorical latent variable that differs
for each class of that variable is specified by a label that consists of the
categorical latent variable name followed by the number sign followed
by the class number. For example, in the example above, the label
%c1#1% refers to class 1 of categorical latent variable c1.
In this example, the thresholds of the latent class indicators for a given
class are held equal for the two categorical latent variables. The (1-5),
237
CHAPTER 8
238
Examples: Mixture Modeling With Longitudinal Data
239
CHAPTER 8
u11 u12 u13 u14 u15 u21 u22 u23 u24 u25
c1 c2
When there are multiple categorical latent variables, each one has its
own MODEL command. The MODEL command for each categorical
latent variable is specified by MODEL followed by the name of the
categorical latent variable. MODEL c1 describes the class-specific
240
Examples: Mixture Modeling With Longitudinal Data
In this example, the thresholds of the latent class indicators for a given
class are held equal for the two categorical latent variables. The (1-5),
(6-10), and (11-15) following the bracket statements containing the
thresholds use the list function to assign equality labels to these
parameters. For example, the label 1 is assigned to the thresholds u11$1
and u21$1 which holds these thresholds equal over time. The default
estimator for this type of analysis is maximum likelihood with robust
standard errors. The estimator option of the ANALYSIS command can
be used to select a different estimator. An explanation of the other
commands can be found in Example 8.1.
241
CHAPTER 8
242
Examples: Mixture Modeling With Longitudinal Data
[u33$1] (8);
[u34$1] (9);
[u35$1] (10);
%c3#3%
[u31$1] (11);
[u32$1] (12);
[u33$1] (13);
[u34$1] (14);
[u35$1] (15);
OUTPUT: TECH1 TECH8 TECH15;
c1 c2 c3
243
CHAPTER 8
When there are multiple categorical latent variables, each one has its
own MODEL command. The MODEL command for each categorical
latent variable is specified by MODEL followed by the name of the
categorical latent variable. MODEL c describes the class-specific
multinomial logistic regressions of c2 on c1 and c3 on c2 where the first
c class is the mover class and the second c class is the stayer class.
MODEL c1 describes the class-specific measurement parameters for
variable c1; MODEL c2 describes the class-specific measurement
parameters for variable c2; and MODEL c3 describes the class-specific
measurement parameters for variable c3. The model for each categorical
latent variable that differs for each class of that variable is specified by a
label that consists of the categorical latent variable name followed by the
number sign followed by the class number. For example, in the example
above, the label %c1#1% refers to class 1 of categorical latent variable
c1.
In this example, the thresholds of the latent class indicators for a given
class are held equal for the three categorical latent variables. The (1-5),
(6-10), and (11-15) following the bracket statements containing the
thresholds use the list function to assign equality labels to these
parameters. For example, the label 1 is assigned to the thresholds
u11$1, u21$1, and u31$1 which holds these thresholds equal over time.
The TECH15 option is used to obtain the transition probabilities for both
the mover and stayer classes. The default estimator for this type of
244
Examples: Mixture Modeling With Longitudinal Data
y1 y2 y3 u1 u2 u3 u4
i s f
245
CHAPTER 8
In the overall model, the | symbol is used to name and define the
intercept and slope growth factors in a growth model. The names i and s
on the left-hand side of the | symbol are the names of the intercept and
slope growth factors, respectively. The statement on the right-hand side
of the | symbol specifies the outcomes and the time scores for the growth
model. The time scores for the slope growth factor are fixed at 0, 1, and
2 to define a linear growth model with equidistant time points. The zero
time score for the slope growth factor at time point one defines the
intercept growth factor as an initial status factor. The coefficients of the
intercept growth factor are fixed at one as part of the growth model
parameterization. The residual variances of the outcome variables are
estimated and allowed to be different across time and the residuals are
not correlated as the default.
246
Examples: Mixture Modeling With Longitudinal Data
247
CHAPTER 8
u1 u2 u3 u4 u5 t
248
Examples: Mixture Modeling With Longitudinal Data
249
CHAPTER 8
250
Examples: Multilevel Modeling With Complex Survey Data
CHAPTER 9
EXAMPLES: MULTILEVEL
MODELING WITH COMPLEX
SURVEY DATA
251
CHAPTER 9
252
Examples: Multilevel Modeling With Complex Survey Data
253
CHAPTER 9
254
Examples: Multilevel Modeling With Complex Survey Data
255
CHAPTER 9
• Missing data
• Random slopes
The default is to estimate the model under missing data theory using all
available data. The LISTWISE option of the DATA command can be
used to delete all observations from the analysis that have missing values
on one or more of the analysis variables. Random slopes are specified
by using the | symbol of the MODEL command in conjunction with the
ON option of the MODEL command.
256
Examples: Multilevel Modeling With Complex Survey Data
257
CHAPTER 9
258
Examples: Multilevel Modeling With Complex Survey Data
x y
Within
w Between
xm
259
CHAPTER 9
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex9.1a.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
260
Examples: Multilevel Modeling With Complex Survey Data
assign names to the variables in the data set. The data set in this
example contains five variables: y, x, w, xm, and clus.
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and modeled only on the within
level. They are specified to have no variance in the between part of the
model. The BETWEEN option is used to identify the variables in the
data set that are measured on the cluster level and modeled only on the
between level. Variables not mentioned on the WITHIN or the
BETWEEN statements are measured on the individual level and can be
modeled on both the within and between levels. Because y is not
mentioned on the WITHIN statement, it is modeled on both the within
and between levels. On the between level, it is a random intercept. The
CLUSTER option is used to identify the variable that contains clustering
information. The CENTERING option is used to specify the type of
centering to be used in an analysis and the variables that are to be
centered. In this example, grand-mean centering is chosen.
MODEL:
%WITHIN%
y ON x;
%BETWEEN%
y ON w xm;
261
CHAPTER 9
In the between part of the model, the ON statement describes the linear
regression of the random intercept y on the observed cluster-level
covariates w and xm. The intercept and residual variance of y are
estimated as the default. The default estimator for this type of analysis
is maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator.
The difference between this part of the example and the first part is that
the covariate x is decomposed into two latent variable parts instead of
being treated as an observed variable as in conventional multilevel
regression modeling. The decomposition occurs when the covariate x is
not mentioned on the WITHIN statement and is therefore modeled on
both the within and between levels. When a covariate is not mentioned
on the WITHIN statement, it is decomposed into two uncorrelated latent
variables,
262
Examples: Multilevel Modeling With Complex Survey Data
In the MODEL command, the label gamma10 in the within part of the
model and the label gamma01 in the between part of the model are
assigned to the regression coefficients in the linear regression of y on x
in both parts of the model for use in the MODEL CONSTRAINT
command. The MODEL CONSTRAINT command is used to define
linear and non-linear constraints on the parameters in the model. In the
MODEL CONSTRAINT command, the NEW option is used to
introduce a new parameter that is not part of the MODEL command.
This parameter is called betac and is defined as the difference between
gamma01 and gamma10. It corresponds to a “contextual effect” as
described in Raudenbush and Bryk (2002, p. 140, Table 5.11).
263
CHAPTER 9
s
x y
Within
w y Between
xm s
264
Examples: Multilevel Modeling With Complex Survey Data
The difference between this example and the first part of Example 9.1 is
that the model has both a random intercept and a random slope. In the
within part of the model, the filled circle at the end of the arrow from x
to y represents a random intercept that is referred to as y in the between
part of the model. The filled circle on the arrow from x to y represents a
random slope that is referred to as s in the between part of the model. In
the between part of the model, the random intercept and random slope
are shown in circles because they are continuous latent variables that
vary across clusters. The observed cluster-level covariate xm takes the
value of the mean of x for each cluster. The within and between parts of
the model correspond to level 1 and level 2 of a conventional multilevel
regression model with a random intercept and a random slope.
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random slope variables in the
model. The name on the left-hand side of the | symbol names the
random slope variable. The statement on the right-hand side of the |
symbol defines the random slope variable. Random slopes are defined
using the ON option. The random slope s is defined by the linear
regression of the dependent variable y on the observed individual-level
covariate x. The within-level residual variance in the regression of y on
x is estimated as the default.
In the between part of the model, the ON statement describes the linear
regressions of the random intercept y and the random slope s on the
observed cluster-level covariates w and xm. The intercepts and residual
variances of s and y are estimated and the residuals are not correlated as
the default. The WITH statement specifies that the residuals of s and y
are correlated. The default estimator for this type of analysis is
maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Example 9.1.
Following is the second part of the example that shows how to plot a
cross-level interaction where the cluster-level covariate w moderates the
influence of the within-level covariate x on y.
265
CHAPTER 9
MODEL: %WITHIN%
s | y ON x;
%BETWEEN%
y ON w xm;
[s] (gam0);
s ON w (gam1)
xm;
y WITH s;
MODEL CONSTRAINT:
PLOT(ylow yhigh);
LOOP(level1,-3,3,0.01);
ylow = (gam0+gam1*(-1))*level1;
yhigh = (gam0+gam1*1)*level1;
PLOT: TYPE = PLOT2;
266
Examples: Multilevel Modeling With Complex Survey Data
The difference between this part of the example and the first part of the
example is that the covariate x is latent instead of observed on the
between level. This is achieved when the individual-level observed
covariate is modeled in both the within and between parts of the model.
This is requested by not mentioning the observed covariate x on the
WITHIN statement in the VARIABLE command. When a random slope
is estimated, the observed covariate x is used on the within level and the
latent variable covariate xbj is used on the between level. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Example 9.1.
267
CHAPTER 9
x1 y
Within
x2 u
y
Between
268
Examples: Multilevel Modeling With Complex Survey Data
In this example, the two-level path analysis model shown in the picture
above is estimated. The mediating variable y is a continuous variable
and the dependent variable u is a binary or ordered categorical variable.
The within part of the model describes the linear regression of y on x1
and x2 and the logistic regression of u on y and x2 where the intercepts
in the two regressions are random effects that vary across the clusters
and the slopes are fixed effects that do not vary across the clusters. In
the within part of the model, the filled circles at the end of the arrows
from x1 to y and x2 to u represent random intercepts that are referred to
as y and u in the between part of the model. In the between part of the
model, the random intercepts are shown in circles because they are
continuous latent variables that vary across clusters. The between part
of the model describes the linear regressions of the random intercepts y
and u on a cluster-level covariate w.
In the within part of the model, the first ON statement describes the
linear regression of y on the individual-level covariates x1 and x2 and
the second ON statement describes the logistic regression of u on the
mediating variable y and the individual-level covariate x2. The slopes in
these regressions are fixed effects that do not vary across the clusters.
The residual variance in the linear regression of y on x1 and x2 is
estimated as the default. There is no residual variance to be estimated in
the logistic regression of u on y and x2 because u is a binary or ordered
categorical variable. In the between part of the model, the ON statement
describes the linear regressions of the random intercepts y and u on the
cluster-level covariate w. The intercept and residual variance of y and u
are estimated as the default. The residual covariance between y and u is
free to be estimated as the default.
269
CHAPTER 9
270
Examples: Multilevel Modeling With Complex Survey Data
y
Within
x u
y Between
w u
The difference between this example and Example 9.3 is that the
between part of the model has an observed cluster-level mediating
variable z and a latent mediating variable y that is a random intercept.
The model is estimated using weighted least squares estimation instead
of maximum likelihood.
In the between part of the model, the first ON statement describes the
linear regression of the random intercept u on the cluster-level covariate
w, the random intercept y, and the observed cluster-level mediating
variable z. The third ON statement describes the linear regression of the
observed cluster-level mediating variable z on the cluster-level covariate
w. An explanation of the other commands can be found in Examples 9.1
and 9.3.
271
CHAPTER 9
272
Examples: Multilevel Modeling With Complex Survey Data
x1 y1
s1 s2
Within
x2 y2
y2
Between
s2
w
y1
s1
The difference between this example and Example 9.3 is that the model
includes two random intercepts and two random slopes instead of two
random intercepts and two fixed slopes and the dependent variable is
continuous. In the within part of the model, the filled circle on the arrow
from the covariate x2 to the mediating variable y1 represents a random
slope and is referred to as s1 in the between part of the model. The filled
circle on the arrow from the mediating variable y1 to the dependent
variable y2 represents a random slope and is referred to as s2 in the
between part of the model. In the between part of the model, the
random slopes s1 and s2 are shown in circles because they are
continuous latent variables that vary across clusters.
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random slope variables in the
model. The name on the left-hand side of the | symbol names the
random slope variable. The statement on the right-hand side of the |
symbol defines the random slope variable. Random slopes are defined
273
CHAPTER 9
In the between part of the model, the ON statement describes the linear
regressions of the random intercepts y1 and y2 and the random slopes s1
and s2 on the cluster-level covariate w. The intercepts and residual
variances of y1, y2, s2, and s1 are estimated as the default. The residual
covariances between y1, y2, s2, and s1 are fixed at zero as the default.
This default can be overridden. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 9.1 and 9.3.
274
Examples: Multilevel Modeling With Complex Survey Data
y1
x1 y2
fw
x2 y3
y4
Within
Between
y1
y2
w fb
y3
y4
275
CHAPTER 9
constrained to be equal across the within and the between levels, this
implies a model where the regression of the within factor on x1 and x2
has a random intercept varying across the clusters.
276
Examples: Multilevel Modeling With Complex Survey Data
The difference between this example and Example 9.6 is that the factor
indicators are binary or ordered categorical (ordinal) variables instead of
continuous variables. The CATEGORICAL option is used to specify
which dependent variables are treated as binary or ordered categorical
(ordinal) variables in the model and its estimation. In the example
above, all four factor indicators are binary or ordered categorical. The
program determines the number of categories for each indicator. The
default estimator for this type of analysis is maximum likelihood with
robust standard errors using a numerical integration algorithm. Note that
numerical integration becomes increasingly more computationally
demanding as the number of factors and the sample size increase. In this
example, two dimensions of integration are used with a total of 225
integration points. The ESTIMATOR option of the ANALYSIS
command can be used to select a different estimator.
In the between part of the model, the residual variances of the random
intercepts of the categorical factor indicators are fixed at zero as the
default because the residual variances of random intercepts are often
very small and require one dimension of numerical integration each.
Weighted least squares estimation of between-level residual variances
277
CHAPTER 9
278
Examples: Multilevel Modeling With Complex Survey Data
y1
x1 s1 y2
s2 fw
x2 y3
y4
Within
Between
y1
y2
w fb
y3
s1
y4
s2
The difference between this example and Example 9.6 is that the model
has random slopes in addition to random intercepts and the random
slopes are regressed on a cluster-level covariate. In the within part of the
model, the filled circles on the arrows from x1 and x2 to fw represent
random slopes that are referred to as s1 and s2 in the between part of the
model. In the between part of the model, the random slopes are shown
in circles because they are latent variables that vary across clusters.
279
CHAPTER 9
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random slope variables in the
model. The name on the left-hand side of the | symbol names the
random slope variable. The statement on the right-hand side of the |
symbol defines the random slope variable. Random slopes are defined
using the ON option. In the first | statement, the random slope s1 is
defined by the linear regression of the factor fw on the individual-level
covariate x1. In the second | statement, the random slope s2 is defined
by the linear regression of the factor fw on the individual-level covariate
x2. The within-level residual variance of f1 is estimated as the default.
In the between part of the model, the ON statement describes the linear
regressions of fb, s1, and s2 on the cluster-level covariate w. The
residual variances of fb, s1, and s2 are estimated as the default. The
residuals are not correlated as the default. The default estimator for this
type of analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 9.1 and 9.6.
280
Examples: Multilevel Modeling With Complex Survey Data
281
CHAPTER 9
u1
x1 fw1 u2
u3
u4
x2 fw2 u5
u6
Within
Between
y1 y2 y3 y4
u1
u2
f
u3
w fb
u4
u5
u6
282
Examples: Multilevel Modeling With Complex Survey Data
In this example, the model with two within factors and two between
factors shown in the picture above is estimated. The within-level factor
indicators are categorical. In the within part of the model, the filled
circles at the end of the arrows from the within factor fw1 to u1, u2, and
u3 and fw2 to u4, u5, and u6 represent random intercepts that are
referred to as u1, u2, u3, u4, u5, and u6 in the between part of the model.
In the between part of the model, the random intercepts are shown in
circles because they are continuous latent variables that vary across
clusters. The random intercepts are indicators of the between factor fb.
This example illustrates the common finding of fewer between factors
than within factors for the same set of factor indicators. The between
factor f has observed cluster-level continuous variables as factor
indicators.
In the within part of the model, the first BY statement specifies that fw1
is measured by u1, u2, and u3. The second BY statement specifies that
fw2 is measured by u4, u5, and u6. The metric of the factors are set
automatically by the program by fixing the first factor loading for each
factor to one. This option can be overridden. Residual variances of the
latent response variables of the categorical factor indicators are not
parameters in the model. They are fixed at one in line with the Theta
parameterization. Residuals are not correlated as the default. The ON
statement describes the linear regressions of fw1 and fw2 on the
individual-level covariates x1 and x2. The residual variances of the
factors are estimated as the default. The residuals of the factors are
correlated as the default because residuals are correlated for latent
variables that do not influence any other variable in the model except
their own indicators. The intercepts of the factors are fixed at zero as
the default.
283
CHAPTER 9
In the between part of the model, the first BY statement specifies that fb
is measured by the random intercepts u1, u2, u3, u4, u5, and u6. The
metric of the factor is set automatically by the program by fixing the first
factor loading to one. This option can be overridden. The residual
variances of the factor indicators are estimated and the residuals are not
correlated as the default. Unlike maximum likelihood estimation,
weighted least squares estimation of between-level residual variances
does not require numerical integration in estimating the model. The
second BY statement specifies that f is measured by the cluster-level
factor indicators y1, y2, y3, and y4. The residual variances of the factor
indicators are estimated and the residuals are not correlated as the
default. The first ON statement describes the linear regression of fb on
the cluster-level covariate w and the factor f. The second ON statement
describes the linear regression of f on the cluster-level covariate w. The
residual variances of the factors are estimated as the default. The
intercepts of the factors are fixed at zero as the default.
284
Examples: Multilevel Modeling With Complex Survey Data
285
CHAPTER 9
y1
y2
s
fw y5
y3
y4 Within
y1 Between
y2
fb y5
y3
y4
w s
286
Examples: Multilevel Modeling With Complex Survey Data
part of the model. In the between part of the model, the random
intercepts and random slope are shown in circles because they are
continuous latent variables that vary across clusters.
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random slope variables in the
model. The name on the left-hand side of the | symbol names the
random slope variable. The statement on the right-hand side of the |
symbol defines the random slope variable. Random slopes are defined
using the ON option. In the | statement, the random slope s is defined by
the linear regression of the dependent variable y5 on the within factor
fw. The within-level residual variance of y5 is estimated as the default.
287
CHAPTER 9
288
Examples: Multilevel Modeling With Complex Survey Data
y1 y2 y3 y4 y5 y6
y1 y2 y3 y4 y5 y6
Between
fb1 fb2
289
CHAPTER 9
and individuals with g equal to 2 are assigned the label g2. These labels
are used in conjunction with the MODEL command to specify model
statements specific to each group. The grouping variable should be a
cluster-level variable.
In the within part of the model, the BY statements specify that fw1 is
measured by y1, y2, and y3, and fw2 is measured by y4, y5, and y6. The
metric of the factors is set automatically by the program by fixing the
first factor loading in each BY statement to one. This option can be
overridden. The variances of the factors are estimated as the default.
The factors fw1 and fw2 are correlated as the default because they are
independent (exogenous) variables. In the between part of the model,
the BY statements specify that fb1 is measured by y1, y2, and y3, and
fb2 is measured by y4, y5, and y6. The metric of the factor is set
automatically by the program by fixing the first factor loading in each
BY statement to one. This option can be overridden. The variances of
the factors are estimated as the default. The factors fb1 and fb2 are
correlated as the default because they are independent (exogenous)
variables.
290
Examples: Multilevel Modeling With Complex Survey Data
291
CHAPTER 9
y1 y2 y3 y4
iw sw
x
Within
Between
y1 y2 y3 y4
ib sb
292
Examples: Multilevel Modeling With Complex Survey Data
In the within part of the model, the | statement names and defines the
within intercept and slope factors for the growth model. The names iw
and sw on the left-hand side of the | symbol are the names of the
intercept and slope growth factors, respectively. The values on the right-
hand side of the | symbol are the time scores for the slope growth factor.
The time scores of the slope growth factor are fixed at 0, 1, 2, and 3 to
define a linear growth model with equidistant time points. The zero time
score for the slope growth factor at time point one defines the intercept
growth factor as an initial status factor. The coefficients of the intercept
growth factor are fixed at one as part of the growth model
parameterization. The residual variances of the outcome variables are
constrained to be equal over time in line with conventional multilevel
growth modeling. This is done by placing (1) after them. The residual
covariances of the outcome variables are fixed at zero as the default.
Both of these restrictions can be overridden. The ON statement
describes the linear regressions of the growth factors on the individual-
level covariate x. The residual variances of the growth factors are free
to be estimated as the default. The residuals of the growth factors are
correlated as the default because residuals are correlated for latent
variables that do not influence any other variable in the model except
their own indicators.
In the between part of the model, the | statement names and defines the
between intercept and slope factors for the growth model. The names ib
and sb on the left-hand side of the | symbol are the names of the intercept
and slope growth factors, respectively. The values on the right-hand side
of the | symbol are the time scores for the slope growth factor. The time
scores of the slope growth factor are fixed at 0, 1, 2, and 3 to define a
linear growth model with equidistant time points. The zero time score
for the slope growth factor at time point one defines the intercept factor
as an initial status factor. The coefficients of the intercept growth factor
are fixed at one as part of the growth model parameterization. The
293
CHAPTER 9
The difference between this example and Example 9.12 is that the
outcome variable is a binary or ordered categorical (ordinal) variable
instead of a continuous variable.
294
Examples: Multilevel Modeling With Complex Survey Data
295
CHAPTER 9
296
Examples: Multilevel Modeling With Complex Survey Data
y1 y2 y3 y4
iw
sw
x a1 a2 a3 a4
Within
Between
y1 y2 y3 y4
ib
sb
w s
The difference between this example and Example 9.12 is that the model
includes an individual-level time-varying covariate with a random slope
that varies on both the within and between levels. In the within part of
the model, the filled circles at the end of the arrows from a1 to y1, a2 to
y2, a3 to y3, and a4 to y4 represent random intercepts that are referred to
297
CHAPTER 9
as y1, y2, y3, and y4 in the between part of the model. In the between
part of the model, the random intercepts are shown in circles because
they are continuous latent variables that vary across classes. The broken
arrows from s to the arrows from a1 to y1, a2 to y2, a3 to y3, and a4 to
y4 indicate that the slopes in these regressions are random. The s is
shown in a circle in both the within and between parts of the model to
represent a decomposition of the random slope into its within and
between components.
298
Examples: Multilevel Modeling With Complex Survey Data
299
CHAPTER 9
Within
iw sw
Between
ib sb
300
Examples: Multilevel Modeling With Complex Survey Data
factor indicators in the between part of the model are estimated. The
residuals are not correlated as the default. Taken together with the
specification of equal factor loadings on the within and the between
parts of the model, this implies a model where the regressions of the
within factors on the growth factors have random intercepts that vary
across the clusters.
In the within part of the model, the three BY statements define a within-
level factor at three time points. The metric of the three factors is set
automatically by the program by fixing the first factor loading to one.
This option can be overridden. The (1-2) following the factor loadings
uses the list function to assign equality labels to these parameters. The
label 1 is assigned to the factor loadings of u21, u22, and u23 which
holds these factor loadings equal across time. The label 2 is assigned to
the factor loadings of u31, u32, and u33 which holds these factor
loadings equal across time. Residual variances of the latent response
variables of the categorical factor indicators are not free parameters to be
estimated in the model. They are fixed at one in line with the Theta
parameterization. Residuals are not correlated as the default. The |
statement names and defines the within intercept and slope growth
factors for the growth model. The names iw and sw on the left-hand side
of the | symbol are the names of the intercept and slope growth factors,
respectively. The names and values on the right-hand side of the |
symbol are the outcome and time scores for the slope growth factor. The
time scores of the slope growth factor are fixed at 0, 1, and 2 to define a
linear growth model with equidistant time points. The zero time score
for the slope growth factor at time point one defines the intercept growth
factor as an initial status factor. The coefficients of the intercept growth
factor are fixed at one as part of the growth model parameterization.
The variances of the growth factors are free to be estimated as the
default. The covariance between the growth factors is free to be
estimated as the default. The intercepts of the factors defined using BY
301
CHAPTER 9
statements are fixed at zero. The residual variances of the factors are
free and not held equal across time. The residuals of the factors are
uncorrelated in line with the default of residuals for first-order factors.
In the between part of the model, the first three BY statements define a
between-level factor at three time points. The (1-2) following the factor
loadings uses the list function to assign equality labels to these
parameters. The label 1 is assigned to the factor loadings of u21, u22,
and u23 which holds these factor loadings equal across time as well as
across levels. The label 2 is assigned to the factor loadings of u31, u32,
and u33 which holds these factor loadings equal across time as well as
across levels. Time-invariant thresholds for the three indicators are
specified using (3), (4), and (5) following the bracket statements. The
residual variances of the factor indicators are free to be estimated. The |
statement names and defines the between intercept and slope growth
factors for the growth model. The names ib and sb on the left-hand side
of the | symbol are the names of the intercept and slope growth factors,
respectively. The values on the right-hand side of the | symbol are the
time scores for the slope growth factor. The time scores of the slope
growth factor are fixed at 0, 1, and 2 to define a linear growth model
with equidistant time points. The zero time score for the slope growth
factor at time point one defines the intercept growth factor as an initial
status factor. The coefficients of the intercept growth factor are fixed at
one as part of the growth model parameterization. In the
parameterization of the growth model shown here, the intercept growth
factor mean is fixed at zero as the default for identification purposes.
The variances of the growth factors are free to be estimated as the
default. The covariance between the growth factors is free to be
estimated as the default. The intercepts of the factors defined using BY
statements are fixed at zero. The residual variances of the factors are
held equal across time. The residuals of the factors are uncorrelated in
line with the default of residuals for first-order factors.
302
Examples: Multilevel Modeling With Complex Survey Data
303
CHAPTER 9
s
time y
a3
Within
Between
x1 y
x2 s
304
Examples: Multilevel Modeling With Complex Survey Data
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random slope variables in the
model. The name on the left-hand side of the | symbol names the
random slope variable. The statement on the right-hand side of the |
symbol defines the random slope variable. Random slopes are defined
using the ON option. In the | statement, the random slope s is defined by
the linear regression of the dependent variable y on time. The within-
level residual variance of y is estimated as the default. The ON
statement describes the linear regression of y on the covariate a3.
In the between part of the model, the ON statement describes the linear
regressions of the random intercept y and the random slope s on the
covariates x1 and x2. The WITH statement is used to free the
covariance between y and s. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
estimator option of the ANALYSIS command can be used to select a
different estimator. An explanation of the other commands can be found
in Example 9.1.
305
CHAPTER 9
The difference between this example and Example 9.12 is that the
outcome variable is a count variable instead of a continuous variable.
306
Examples: Multilevel Modeling With Complex Survey Data
In the within part of the model, the variances of the slope growth factors
sw and siw are fixed at zero. The ON statements describes the linear
regressions of the intercept and slope growth factors iw and sw for the
count part of the outcome on the covariate x. In the between part of the
307
CHAPTER 9
model, the variances of the intercept growth factor iib and the slope
growth factors sb and sib are fixed at zero. The ON statement describes
the linear regression of the intercept growth factor ib on the covariate w.
An explanation of the other commands can be found in Examples 9.1
and 9.12.
x t
Within
Between
w t
308
Examples: Multilevel Modeling With Complex Survey Data
309
CHAPTER 9
310
Examples: Multilevel Modeling With Complex Survey Data
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random factor loading
variables in the model. The name on the left-hand side of the | symbol
names the random factor loading variable. The statement on the right-
hand side of the | symbol defines the random factor loading variable.
Random factor loadings are defined using the BY option. The random
factor loading variables s1, s2, s3, and s4 are defined by the linear
regression of the factor indicators y1, y2, y3, and y4 on the factor f. The
factor variance is fixed at one to set the metric of the factor. The
residual variances of y1 through y4 are estimated and the residuals are
not correlated as the default. The ON statement describes the linear
regression of f on the individual-level covariates x1 and x2. In the
between part of the model, the ON statement describes the linear
regression of the random intercept f on the cluster-level covariate w.
The cluster-level residual variance of the factor is estimated. The
intercepts and the cluster-level residual variances of y1 through y4 are
estimated and the residuals are not correlated as the default.
MODEL: %WITHIN%
s1-s4 | f BY y1-y4;
f@1;
f ON x1 x2;
%BETWEEN%
fb BY y1-y4;
fb ON w;
311
CHAPTER 9
MODEL: %WITHIN%
s1-s4 | f BY y1-y4;
f@1;
f ON x1 x2;
%BETWEEN%
fb BY y1-y4* (lam1-lam4);
fb ON w;
[s1-s4] (lam1-lam4);
312
Examples: Multilevel Modeling With Complex Survey Data
Within
s1
x y
Between 2
y
s2
w
s12
s1
Between 3
s1
z
s2
s12
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and to specify the levels on which
313
CHAPTER 9
they are modeled. All variables on the WITHIN list must be measured
on the individual level. An individual-level variable can be modeled on
all or some levels. If a variable measured on the individual level is
mentioned on the WITHIN list without a label, it is modeled on only
level 1. It has no variance on levels 2 and 3. If a variable is not
mentioned on the WITHIN list, it is modeled on all levels. The variable
x can be modeled on only level 1. The variable y can be modeled on all
levels.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level(s) and to specify the level(s) on
which they are modeled. All variables on the BETWEEN list must be
measured on a cluster level. A cluster-level variable can be modeled on
all or some cluster levels. For TYPE=THREELEVEL, if a variable
measured on level 2 is mentioned on the BETWEEN list with a level 2
cluster label, it is modeled on only level 2. It has no variance on level 3.
A variable measured on level 3 must be mentioned on the BETWEEN
list with a level 3 cluster label. The variable w can be modeled on only
level 2. The variable z can be modeled on only level 3.
In the level 3 part of the model, the first ON statement describes the
linear regression of the level 3 random intercept y on the level 3
covariate z. The next three ON statements describe the linear
314
Examples: Multilevel Modeling With Complex Survey Data
regressions of the level 3 random slopes s1, s2, and s12 on the level 3
covariate z. The intercepts and level 3 residual variances of y, s1, s2,
and s12 are estimated and the residuals are not correlated as the default.
The WITH statements specify that the level 3 residuals of y, s1, s2, and
s12 are correlated. The default estimator for this type of analysis is
maximum likelihood with robust standard errors. The ESTIMATOR
option of the ANALYSIS command can be used to select a different
estimator. An explanation of the other commands can be found in
Examples 9.1 and 9.3.
315
CHAPTER 9
y
Within
x u
y Between 2
w u
y2
y Between 3
z u
y2 y3
316
Examples: Multilevel Modeling With Complex Survey Data
in the model and its estimation. In the example above, the variable u is
binary or ordered categorical.
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and to specify the levels on which
they are modeled. All variables on the WITHIN list must be measured
on the individual level. An individual-level variable can be modeled on
all or some levels. If a variable measured on the individual level is
mentioned on the WITHIN list without a label, it is modeled on only
level 1. It has no variance on levels 2 and 3. If a variable is not
mentioned on the WITHIN list, it is modeled on all levels. The variable
x can be modeled on only level 1. The variables u and y can be modeled
on all levels.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level(s) and to specify the level(s) on
which they are modeled. All variables on the BETWEEN list must be
measured on a cluster level. A cluster-level variable can be modeled on
all or some cluster levels. For TYPE=THREELEVEL, if a variable
measured on level 2 is mentioned on the BETWEEN list without a label,
it is modeled on levels 2 and 3. If a variable measured on level 2 is
mentioned on the BETWEEN list with a level 2 cluster label, it is
modeled on only level 2. It has no variance on level 3. A variable
measured on level 3 must be mentioned on the BETWEEN list with a
level 3 cluster label. The variable y2 can be modeled on levels 2 and 3.
The variable w can be modeled on only level 2. The variables z and y3
can be modeled on only level 3.
317
CHAPTER 9
In the within part of the model, the first ON statement describes the
probit regression of u on the mediator y and the individual-level
covariate x. The second ON statement describes the linear regression of
the mediator y on the covariate x. The within-level residual variance of
y is estimated as the default. In the level 2 part of the model, the first
ON statement describes the linear regression of the level 2 random
intercept u on the level 2 covariate w, the level 2 random intercept y, and
the level 2 mediator y2. The second ON statement describes the linear
regression of the level 2 random intercept y on the level 2 covariate w.
The third ON statement describes the linear regression of the level 2
mediator y2 on the level 2 covariate w. The level 2 residual variances of
u, y, and y2 are estimated and the residuals are not correlated as the
default. The WITH statement specifies that the level 2 residuals of y and
y2 are correlated. In the level 3 part of the model, the first ON statement
describes the linear regression of the level 3 random intercept u on the
level 3 random intercepts y and y2. The second ON statement describes
the linear regression of the level 3 random intercept y on the level 3
covariate z. The third ON statement describes the linear regression of
the level 3 random intercept y2 on the level 3 covariate z. The fourth
ON statement describes the linear regression of the level 3 variable y3
on the level 3 random intercepts y and y2. The threshold of u; the
intercepts of y, y2, and y3; and the level 3 residual variances of u, y, y2,
and y3 are estimated and the residuals are not correlated as the default.
The first WITH statement specifies that the residuals of y and y2 are
correlated. The second WITH statement specifies that the residuals of u
and y3 are correlated. An explanation of the other commands can be
found in Examples 9.1, 9.3, and 9.20.
318
Examples: Multilevel Modeling With Complex Survey Data
319
CHAPTER 9
Within
y1
x1 fw1 y2
y3
y4
s
x2 fw2 y5
y6
Between 2
y1
y2
y3
sf2
w fb2
y4
ss
y5
s
y6
Between 3
y1
y2
y3
z fb3
y4
s
y5
sf2
y6
ss
320
Examples: Multilevel Modeling With Complex Survey Data
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and to specify the levels on which
they are modeled. All variables on the WITHIN list must be measured
on the individual level. An individual-level variable can be modeled on
all or some levels. If a variable measured on the individual level is
mentioned on the WITHIN list without a label, it is modeled on only
level 1. It has no variance on levels 2 and 3. If a variable is not
mentioned on the WITHIN list, it is modeled on all levels. The variables
x1 and x2 can be modeled on only level 1. The variables y1 through y6
can be modeled on all levels.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level(s) and to specify the level(s) on
which they are modeled. All variables on the BETWEEN list must be
measured on a cluster level. A cluster-level variable can be modeled on
all or some cluster levels. For TYPE=THREELEVEL, if a variable
measured on level 2 is mentioned on the BETWEEN list with a level 2
cluster label, it is modeled on only level 2. It has no variance on level 3.
A variable measured on level 3 must be mentioned on the BETWEEN
list with a level 3 cluster label. The variable w can be modeled on only
level 2. The variable z can be modeled on only level 3.
321
CHAPTER 9
In the within and level 2 parts of the model, the | symbol is used in
conjunction with TYPE=RANDOM to name and define the random
slope variables in the model. The name on the left-hand side of the |
symbol names the random slope variable. The statement on the right-
hand side of the | symbol defines the random slope variable. Random
slopes are defined using the ON option. In the within part of the model,
the random slope s is defined by the linear regression of fw2 on the
individual-level covariate x2.
In the level 2 part of the model, the BY statement specifies that the
factor fb2 is measured by the level 2 random intercepts y1 through y6.
The metric of the factors is set automatically by the program by fixing
the first factor loading in each BY statement to one. This default can be
overridden. The level 2 residual variances of the factor indicators are
estimated and the residuals are not correlated as the default. The
variance of the factor is estimated as the default. The random slope sf2
is defined by the linear regression of fb2 on the level 2 covariate w. The
random slope ss is defined by the linear regression of the random slope s
on the level 2 covariate w. The level 2 residual variances of fb2 and s
are estimated and the residuals are not correlated as the default.
In the level 3 part of the model, the BY statement specifies that the
factor fb3 is measured by the level 3 random intercepts y1 through y6.
The metric of the factors is set automatically by the program by fixing
the first factor loading in each BY statement to one. This default can be
overridden. The intercept and level 3 residual variances of the factor
indicators are estimated and the residuals are not correlated as the
default. The residual variance of the factor is estimated as the default.
The first ON statement describes the linear regression of fb3 on the level
3 covariate z. The second ON statement describes the linear regression
of the random slope s on the level 3 covariate z. The third ON statement
describes the linear regression of the random slope sf2 on the level 3
covariate z. The fourth ON statement specifies the linear regression of
the random slope ss on the level 3 covariate z. The intercepts of y1
through y6, s, sf2, and ss; and the level 3 residual variances of fb3, s,
sf2, and ss are estimated and the residuals are not correlated as the
default. The WITH statements specify that the level 3 residuals of fb3,
s, sf2, and ss are correlated. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
322
Examples: Multilevel Modeling With Complex Survey Data
323
CHAPTER 9
y1 y2 y3 y4
iw sw
x
Within
Between 2
y1 y2 y3 y4
ib2 sb2
Between 3
y1 y2 y3 y4
ib3 sb3
324
Examples: Multilevel Modeling With Complex Survey Data
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and to specify the levels on which
they are modeled. All variables on the WITHIN list must be measured
on the individual level. An individual-level variable can be modeled on
all or some levels. If a variable measured on the individual level is
mentioned on the WITHIN list without a label, it is modeled on only
level 1. It has no variance on levels 2 and 3. If a variable is not
mentioned on the WITHIN list, it is modeled on all levels. The variable
x can be modeled on only level 1. The variables y1 through y4 can be
modeled on all levels.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level(s) and to specify the level(s) on
which they are modeled. All variables on the BETWEEN list must be
measured on a cluster level. A cluster-level variable can be modeled on
all or some cluster levels. For TYPE=THREELEVEL, if a variable
measured on level 2 is mentioned on the BETWEEN list with a level 2
cluster label, it is modeled on only level 2. It has no variance on level 3.
A variable measured on level 3 must be mentioned on the BETWEEN
list with a level 3 cluster label. The variable w can be modeled on only
level 2. The variable z can be modeled on only level 3.
325
CHAPTER 9
shown here, the intercepts of the outcome variables at the four time
points are fixed at zero as the default. The ON statement describes the
linear regression of the intercept and slope growth factors on the
individual-level covariate x. The residual variances of the growth
factors are estimated and the residuals are correlated as the default. The
level 2 residual variances of y1 through y4 are estimated and allowed to
be different across time and the residuals are not correlated as the
default.
The growth model specified in the within part of the model is also
specified on levels 2 and 3. In the level 2 part of the model, the ON
statement describes the linear regression of the level 2 intercept and
slope growth factors on the level 2 covariate w. The level 2 residual
variances of the growth factors are estimated and the residuals are
correlated as the default. In the level 3 part of the model, the ON
statement describes the linear regression of the level 3 intercept and
slope growth factors on the level 3 covariate z. The intercepts and level
3 residual variances of the growth factors are estimated and the residuals
are correlated as the default. The level 3 residual variances of y1
through y4 are fixed at zero. The default estimator for this type of
analysis is maximum likelihood with robust standard errors. The
ESTIMATOR option of the ANALYSIS command can be used to select
a different estimator. An explanation of the other commands can be
found in Examples 9.1, 9.3, and 9.20.
326
Examples: Multilevel Modeling With Complex Survey Data
327
CHAPTER 9
Within
x1
y
s
x2
Level 2a
Level 2b
328
Examples: Multilevel Modeling With Complex Survey Data
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and to specify the levels on which
they are modeled. All variables on the WITHIN list must be measured
on the individual level. An individual-level variable can be modeled on
all or some levels. If a variable measured on the individual level is
mentioned on the WITHIN list without a label, it is modeled on only
level 1. It has no variance on levels 2a and 2b. If a variable is not
mentioned on the WITHIN list, it is modeled on all levels. The variables
x1 and x2 can be modeled on only level 1. The variable y can be
modeled on all levels.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level(s) and to specify the level(s) on
which they are modeled. All variables on the BETWEEN list must be
measured on a cluster level. For TYPE=CROSSCLASSIFIED, a
variable measured on level 2a must be mentioned on the BETWEEN list
with a level 2a cluster label. It can be modeled on only level 2a. A
variable measured on level 2b must be mentioned on the BETWEEN list
with a level 2b cluster label. It can be modeled on only level 2b. The
variable w can be modeled on only level 2a. The variable z can be
modeled on only level 2b.
In the within part of the model, the ON statement describes the linear
regression of y on the individual-level covariate x1. The residual
variance of y is estimated as the default. The | symbol is used in
conjunction with TYPE=RANDOM to name and define the random
slope variables in the model. The name on the left-hand side of the |
symbol names the random slope variable. The statement on the right-
329
CHAPTER 9
hand side of the | symbol defines the random slope variable. Random
slopes are defined using the ON option. The random slope s is defined
by the linear regression of y on the individual-level covariate x2. In the
level 2a part of the model, the first ON statement describes the linear
regression of the level 2a random intercept y on the level 2a covariate w.
The second ON statement describes the linear regression of the level 2a
random slope s on the level 2a covariate w. The residuals of y and s are
estimated and the residuals are not correlated as the default. The WITH
statement specifies that the residuals of y and s are correlated. In the
level 2b part of the model, the first ON statement describes the linear
regression of the level 2b random intercept y on the level 2b covariate z.
The second ON statement describes the linear regression of the level 2b
random slope s on the level 2b covariate z. The residual variances of y
and s are estimated and the residuals are not correlated as the default.
The WITH statement specifies that the residuals of y and s are
correlated. The intercepts of y and s are estimated as the default on level
2b. An explanation of the other commands can be found in Examples
9.1 and 9.3.
330
Examples: Multilevel Modeling With Complex Survey Data
Within
y1
x y2
Level 2a
y1
y2
Level 2b
y1
y2
331
CHAPTER 9
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level(s) and to specify the level(s) on
which they are modeled. All variables on the BETWEEN list must be
measured on a cluster level. For TYPE=CROSSCLASSIFIED, a
variable measured on level 2a must be mentioned on the BETWEEN list
with a level 2a cluster label. It can be modeled on only level 2a. A
variable measured on level 2b must be mentioned on the BETWEEN list
with a level 2b cluster label. It can be modeled on only level 2b. The
variable w can be modeled on only level 2a. The variable z can be
modeled on only level 2b.
In the within part of the model, the first ON statement describes the
linear regression of y2 on the mediator y1 and the individual-level
covariate x. The second ON statement describes the linear regression of
y1 on the individual-level covariate x. The residuals of y1 and y2 are
estimated and the residual are not correlated as the default. In the level
2a part of the model, the first ON statement describes the linear
regressions of the level 2a intercepts y1 and y2 on the level 2a covariate
w. The level 2a residuals are estimated and the residuals are not
correlated as the default. The WITH statement specifies that the level 2a
residuals of y1 and y2 are correlated. In the level 2b part of the model,
the first ON statement describes the linear regression of the level 2b
random intercepts y1 and y2 on the level 2b covariate z. The level 2b
residuals are estimated and the residuals are not correlated as the default.
The WITH statement specifies that the level 2b residuals of y1 and y2
are correlated. The intercepts of y1 and y2 are estimated as the default
on level 2b. An explanation of the other commands can be found in
Examples 9.1, 9.3, and 9.24.
332
Examples: Multilevel Modeling With Complex Survey Data
The WITHIN option is used to identify the variables in the data set that
are measured on the individual level and to specify the levels on which
they are modeled. If a variable is not mentioned on the WITHIN list, it
is modeled on all levels. The variable u can be modeled on the subject
and item levels.
333
CHAPTER 9
The within part of the model is not used in this example. In the subject
part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random factor loading
variables in the model. The name on the left-hand side of the | symbol
names the random factor loading variable. The statement on the right-
hand side of the | symbol defines the random factor loading variable.
Random factor loadings are defined using the BY option. The random
factor loading variable s is defined by the probit regression of u on the
factor f. The factor variance is fixed at one to set the metric of the
factor. The across-subject variance of u is fixed at zero. In the item part
of the model, the variance of the random intercept u, the threshold of u,
and the mean and variance of the random factor loading s are estimated
as the default. An explanation of the other commands can be found in
Examples 9.1, 9.3, and 9.24.
334
Examples: Multilevel Modeling With Complex Survey Data
335
CHAPTER 9
The variable timescor can be modeled on only level 1. The variables y1,
y2, and y3 can be modeled on levels 1 and the time level. The DEFINE
command is used to transform existing variables and to create new
variables. The new variable timescor is a time score variable centered at
the first time point.
In the within part of the model, the | symbol is used in conjunction with
TYPE=RANDOM to name and define the random factor loading
variables in the model. The name on the left-hand side of the | symbol
names the random factor loading variable. The statement on the right-
hand side of the | symbol defines the random factor loading variable.
Random factor loadings are defined using the BY option. The random
factor loading variables s1, s2, and s3 are defined by the linear
regression of the factor indicators y1, y2, and y3 on the factor f. The
factor variance is fixed to one to set the metric of the factor. The
intercepts of the factor indicators are fixed at zero as part of the growth
model parameterization. The residual variances are estimated and the
residuals are not correlated as the default.
336
Examples: Multilevel Modeling With Complex Survey Data
In the time part of the model, the means and variances of the random
factor loadings s1, s2, and s3 and the variances of the random intercepts
y1, y2, and y3 are estimated. The intercepts of y1, y2, and y3 are fixed
at zero as part of the growth model parameterization. The variances of
the random factor loadings s1, s2, and s3 and the variances of the
random intercepts y1, y2, and y3 represent measurement non-invariance
across time. The mean and variance of the random slope growth factor s
are fixed at zero.
337
CHAPTER 9
338
Examples: Multilevel Mixture Modeling
CHAPTER 10
EXAMPLES: MULTILEVEL
MIXTURE MODELING
339
CHAPTER 10
340
Examples: Multilevel Mixture Modeling
341
CHAPTER 10
342
Examples: Multilevel Mixture Modeling
x1 x2
Within
c#1 y Between
343
CHAPTER 10
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
DATA: FILE IS ex10.1.dat;
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex10.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
VARIABLE: NAMES ARE y x1 x2 w class clus;
USEVARIABLES = y x1 x2 w;
CLASSES = c (2);
WITHIN = x1 x2;
BETWEEN = w;
CLUSTER = clus;
344
Examples: Multilevel Mixture Modeling
is one categorical latent variable c that has two latent classes. The
WITHIN option is used to identify the variables in the data set that are
measured on the individual level and modeled only on the within level.
They are specified to have no variance in the between part of the model.
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level and modeled only on the between
level. Variables not mentioned on the WITHIN or the BETWEEN
statements are measured on the individual level and can be modeled on
both the within and between levels. The CLUSTER option is used to
identify the variable that contains cluster information.
ANALYSIS: TYPE = TWOLEVEL MIXTURE;
STARTS = 0;
345
CHAPTER 10
refers to the part of the model for class 2 that differs from the overall
model.
In the overall model in the within part of the model, the first ON
statement describes the linear regression of y on the individual-level
covariates x1 and x2. The second ON statement describes the
multinomial logistic regression of the categorical latent variable c on the
individual-level covariate x1 when comparing class 1 to class 2. The
intercept in the regression of c on x1 is estimated as the default. In the
model for class 1 in the within part of the model, the ON statement
describes the linear regression of y on the individual-level covariate x2
which relaxes the default equality of regression coefficients across
classes. By mentioning the residual variance of y, it is not held equal
across classes.
In the overall model in the between part of the model, the first ON
statement describes the linear regression of the random intercept y on the
cluster-level covariate w. The second ON statement describes the linear
regression of the random intercept c#1 of the categorical latent variable c
on the cluster-level covariate w. The random intercept c#1 is a
continuous latent variable. Each class of the categorical latent variable c
except the last class has a random intercept. A starting value of one is
given to the residual variance of the random intercept c#1. In the class-
specific part of the between part of the model, the intercept of y is given
a starting value of 2 for class 1.
c#1 ON x1;
346
Examples: Multilevel Mixture Modeling
where c#1 refers to the first class of c. The classes of a categorical latent
variable are referred to by adding to the name of the categorical latent
variable the number sign (#) followed by the number of the class. This
alternative specification allows individual parameters to be referred to in
the MODEL command for the purpose of giving starting values or
placing restrictions.
347
CHAPTER 10
348
Examples: Multilevel Mixture Modeling
s1 s2
x1 x2
Within
Between
y s1 s2
cb
349
CHAPTER 10
The BETWEEN option is used to identify the variables in the data set
that are measured on the cluster level and modeled only on the between
level and to identify between-level categorical latent variables. In this
example, the categorical latent variable cb is a between-level variable.
Between-level classes consist of clusters such as schools instead of
individuals. The PROCESSORS option of the ANALYSIS command is
used to specify that 2 processors will be used in the analysis for parallel
computations.
In the overall part of the within part of the model, the | symbol is used in
conjunction with TYPE=RANDOM to name and define the random
slope variables in the model. The name on the left-hand side of the |
symbol names the random slope variable. The statement on the right-
hand side of the | symbol defines the random slope variable. Random
slopes are defined using the ON option. The random slopes s1 and s2
are defined by the linear regressions of the dependent variable y on the
individual-level covariates x1 and x2. The within-level residual variance
in the regression of y on x is estimated as the default.
In the overall part of the between part of the model, the ON statement
describes the multinomial logistic regression of the categorical latent
variable cb on the cluster-level covariate w and the linear regression of
the random intercept y on the cluster-level covariate w. The variances of
the random slopes s1 and s2 are fixed at zero. In the class-specific parts
350
Examples: Multilevel Mixture Modeling
of the between part of the model, the means of the random slopes are
specified to vary across the between-level classes of cb. The intercept of
the random intercept y varies across the between-level classes of cb as
the default.
MODEL:
%WITHIN%
%OVERALL%
y ON x1 x2;
%cb#1%
y ON x1 x2;
%cb#2%
y ON x1 x2;
%BETWEEN%
%OVERALL%
cb ON w;
y ON w;
351
CHAPTER 10
352
Examples: Multilevel Mixture Modeling
x1 x2
Within
Between
u1 u2 u3 u4 u5 u6 y
cb
353
CHAPTER 10
In the overall part of the between part of the model, the first ON
statement describes the multinomial logistic regression of the categorical
latent variable cb on the cluster-level covariate w. The second ON
statement describes the linear regression of the random intercept y on the
cluster-level covariate w. The intercept of the random intercept y and
the thresholds of the between-level latent class indicators u1, u2, u3, u4,
u5, and u6 vary across the between-level classes of cb as the default.
354
Examples: Multilevel Mixture Modeling
355
CHAPTER 10
y1 y2 y3 y4 y5
c fw
Within
Between
y1 y2 y3 y4 y5
c#1 fb
356
Examples: Multilevel Mixture Modeling
for the factor indicators in the between part of the model are zero. If
factor loadings are constrained to be equal across the within and the
between levels, this implies a model where the mean of the within factor
varies across the clusters. The between part of the model specifies that
the random mean c#1 of the categorical latent variable c and the between
factor fb are uncorrelated. Other modeling possibilities are for fb and
c#1 to be correlated, for fb to be regressed on c#1, or for c#1 to be
regressed on fb. Regressing c#1 on fb, however, leads to an internally
inconsistent model where the mean of fb is influenced by c at the same
time as c#1 is regressed on fb, leading to a reciprocal interaction.
In the overall part of the within part of the model, the BY statement
specifies that fw is measured by the factor indicators y1, y2, y3, y4, and
y5. The metric of the factor is set automatically by the program by
fixing the first factor loading to one. This option can be overridden.
The residual variances of the factor indicators are estimated and the
residuals are not correlated as the default. The variance of the factor is
estimated as the default.
In the overall part of the between part of the model, the BY statement
specifies that fb is measured by the random intercepts y1, y2, y3, y4, and
y5. The residual variances of the random intercepts are fixed at zero as
the default because they are often very small and each residual variance
requires one dimension of numerical integration. The variance of fb is
estimated as the default. A starting value of one is given to the variance
of the random mean of the categorical latent variable c referred to as
c#1. In the model for class 1 in the between part of the model, the mean
of fb is given a starting value of 2.
357
CHAPTER 10
358
Examples: Multilevel Mixture Modeling
u1 u2 u3 u4 u5 u6 u7 u8
c f
Within
Between
u1 u2 u3 u4 u5 u6 u7 u8
cb
In this example, the two-level item response theory (IRT) mixture model
with binary factor indicators shown in the picture above is estimated.
The model has both individual-level classes and between-level classes.
Individual-level classes consist of individuals, for example, students.
Between-level classes consist of clusters, for example, schools. The
within part of the model is similar to the single-level model in Example
7.27. In the within part of the model, an IRT mixture model is specified
where the factor indicators u1, u2, u3, u4, u5, u6, u7, and u8 have
thresholds that vary across the classes of the individual-level categorical
359
CHAPTER 10
latent variable c. The filled circles at the end of the arrows pointing to
the factor indicators show that the thresholds of the factor indicators are
random. They are referred to as u1, u2, u3, u4, u5, u6, u7, and u8 on the
between level. The random thresholds u1, u2, u3, u4, u5, u6, u7, and u8
are shown in circles in the between part of the model because they are
continuous latent variables that vary across clusters (between-level
units). The random thresholds have no within-class variance. They vary
across the classes of the between-level categorical latent variable cb.
For related models, see Asparouhov and Muthén (2008a).
In the class-specific part of the between part of the model, the random
thresholds are specified to vary across classes that are a combination of
the classes of the between-level categorical latent variable cb and the
individual-level categorical latent variable c. These classes are referred
to by combining the class labels using a period (.). For example, a
combination of class 1 of cb and class 1 of c is referred to as cb#1.c#1.
This represents an interaction between the two categorical latent
variables in their influence on the thresholds.
When a model has more than one categorical latent variable, MODEL
followed by a label is used to describe the analysis model for each
categorical latent variable. Labels are defined by using the names of the
categorical latent variables. In the model for the individual-level
categorical latent variable c, the variances of the factor f are allowed to
vary across the classes of c.
360
Examples: Multilevel Mixture Modeling
361
CHAPTER 10
u1 u2 u3 u4 u5 u6
Within
Between
c#1 c#2
362
Examples: Multilevel Mixture Modeling
363
CHAPTER 10
364
Examples: Multilevel Mixture Modeling
u1 u2 u3 u4 u5 u6 u7 u8 u9 u10
cw
Within
Between
cb
365
CHAPTER 10
In the overall part of the between part of the model, the ON statement
describes the linear regressions of cw#1, cw#2, and cw#3 on the
between-level categorical latent variable cb. This regression implies that
the means of these random means vary across the classes of the
categorical latent variable cb.
366
Examples: Multilevel Mixture Modeling
367
CHAPTER 10
y1 y2 y3 y4
iw sw
s
Within
Between
y1 y2 y3 y4
ib sb
cb s
368
Examples: Multilevel Mixture Modeling
In the overall part of the within part of the model, the | statement is used
to name and define the random slope s which is used in the between part
of the model. In the overall part of the between part of the model, the
second ON statement describes the multinomial logistic regression of the
categorical latent variable cb on a cluster-level covariate w. The
variance of the random slope s is fixed at zero. In the class-specific parts
of the between part of the model, the intercepts of the growth factors ib
and sb and the mean of the random slope s are specified to vary across
the between-level classes of cb.
369
CHAPTER 10
MODEL:
%WITHIN%
%OVERALL%
iw sw | y1@0 y2@1 y3@2 y4@3;
y1-y4 (1);
iw ON x;
sw ON x iw;
%cb#1%
sw ON iw;
%cb#2%
sw ON iw;
%BETWEEN%
%OVERALL%
ib sb | y1@0 y2@1 y3@2 y4@3;
y1-y4@0;
ib sb ON w;
cb ON w;
%cb#1%
[ib sb];
%cb#2%
[ib sb];
370
Examples: Multilevel Mixture Modeling
371
CHAPTER 10
y1 y2 y3 y4
iw sw
Within
Between
y1 y2 y3 y4
ib sb
c#1
372
Examples: Multilevel Mixture Modeling
framework. In the within part of the model, the filled circles at the end
of the arrows from the within growth factors iw and sw to y1, y2, y3, and
y4 represent random intercepts that vary across clusters. The filled
circle at the end of the arrow from x to c represents a random intercept.
The random intercepts are referred to in the between part of the model as
y1, y2, y3, y4, and c#1. In the between-part of the model, the random
intercepts are shown in circles because they are continuous latent
variables that vary across clusters.
In the within part of the model, the | statement names and defines the
within intercept and slope factors for the growth model. The names iw
and sw on the left-hand side of the | symbol are the names of the
intercept and slope growth factors, respectively. The values on the right-
hand side of the | symbol are the time scores for the slope growth factor.
The time scores of the slope growth factor are fixed at 0, 1, 2, and 3 to
define a linear growth model with equidistant time points. The zero time
score for the slope growth factor at time point one defines the intercept
growth factor as an initial status factor. The coefficients of the intercept
growth factor are fixed at one as part of the growth model
parameterization. The residual variances of the outcome variables are
estimated and allowed to be different across time and the residuals are
not correlated as the default. The first ON statement describes the linear
regressions of the growth factors on the individual-level covariate x.
The residual variances of the growth factors are free to be estimated as
the default. The residuals of the growth factors are correlated as the
default because residuals are correlated for latent variables that do not
influence any other variable in the model except their own indicators.
The second ON statement describes the multinomial logistic regression
of the categorical latent variable c on the individual-level covariate x
when comparing class 1 to class 2. The intercept in the regression of c
on x is estimated as the default.
In the overall model in the between part of the model, the | statement
names and defines the between intercept and slope factors for the growth
model. The names ib and sb on the left-hand side of the | symbol are the
names of the intercept and slope growth factors, respectively. The
values of the right-hand side of the | symbol are the time scores for the
slope growth factor. The time scores of the slope growth factor are fixed
at 0, 1, 2, and 3 to define a linear growth model with equidistant time
points. The zero time score for the slope growth factor at time point one
defines the intercept growth factor as an initial status factor. The
373
CHAPTER 10
coefficients of the intercept growth factor are fixed at one as part of the
growth model parameterization. The residual variances of the outcome
variables are fixed at zero on the between level in line with conventional
multilevel growth modeling. This can be overridden. The first ON
statement describes the linear regressions of the growth factors on the
cluster-level covariate w. The residual variance of the intercept growth
factor is free to be estimated as the default. The residual variance of the
slope growth factor is fixed at zero because it is often small and each
residual variance requires one dimension of numerical integration.
Because the slope growth factor residual variance is fixed at zero, the
residual covariance between the growth factors is automatically fixed at
zero. The second ON statement describes the linear regression of the
random intercept c#1 of the categorical latent variable c on the cluster-
level covariate w. A starting value of one is given to the residual
variance of the random intercept of the categorical latent variable c
referred to as c#1.
374
Examples: Multilevel Mixture Modeling
375
CHAPTER 10
y1 y2 y3 y4
iw sw
Within
Between
y1 y2 y3 y4
ib sb ib2
c#1 cb
In this example, the two-level growth mixture model (GMM; Muthén &
Asparouhov, 2009) for a continuous outcome (three-level analysis)
shown in the picture above is estimated. This example is similar to
Example 10.9 except that a between-level categorical latent variable cb
has been added along with a second between-level intercept growth
factor ib2. The second intercept growth factor is added to the model so
376
Examples: Multilevel Mixture Modeling
that the intercept growth factor mean can vary across not only the classes
of the individual-level categorical latent variable c but also across the
classes of the between-level categorical latent variable cb. Individual-
level classes consist of individuals, for example, students. Between-
level classes consist of clusters, for example, schools.
In the overall part of the between part of the model, the second |
statement names and defines the second between-level intercept growth
factor ib2. This growth factor is used to represent differences in
intercept growth factor means across the between-level classes of the
categorical latent variable cb.
When a model has more than one categorical latent variable, MODEL
followed by a label is used to describe the analysis model for each
categorical latent variable. Labels are defined by using the names of the
categorical latent variables. In the model for the individual-level
categorical latent variable c, the intercepts of the intercept and slope
growth factors ib and sb are allowed to vary across the classes of the
individual-level categorical latent variable c. In the model for the
between-level categorical latent variable cb, the means of the intercept
growth factor ib2 are allowed to vary across clusters (between-level
units). The mean in one class is fixed at zero for identification purposes.
377
CHAPTER 10
378
Examples: Multilevel Mixture Modeling
u1 u2 u3 u4
i s
c Within
c#1 Between
379
CHAPTER 10
In the overall part of the of the within part of the model, the variances of
the growth factors i and s are fixed at zero because latent class growth
analysis has no within class variability. In the overall part of the of the
between part of the model, the two thresholds for the outcome are held
equal across the four time points. The growth factor means are specified
in the within part of the model because there are no between growth
factors.
380
Examples: Multilevel Mixture Modeling
MODEL c1:
%BETWEEN%
%c1#1%
[u11$1-u14$1] (1-4);
%c1#2%
[u11$1-u14$1] (5-8);
MODEL c2:
%BETWEEN%
%c2#1%
[u21$1-u24$1] (1-4);
%c2#2%
[u21$1-u24$1] (5-8);
OUTPUT: TECH1 TECH8;
c1 c2
x Within
Between
c1#1 c2#1
381
CHAPTER 10
In the overall part of the between part of the model, the first ON
statement describes the linear regression of the random intercept c1#1 on
a cluster-level covariate w. The second ON statement describes the
linear regression of the random intercept c2#1 on the random intercept
c1#1 and the cluster-level covariate w. The residual variances of the
random intercepts c1#1 and c2#1 are estimated instead of being fixed at
the default value of zero.
382
Examples: Multilevel Mixture Modeling
383
CHAPTER 10
s
c1 c2
x Within
Between
c1#1 c2#1
cb s
384
Examples: Multilevel Mixture Modeling
In the overall part of the between part of the model, the first two ON
statements describe the linear regressions of c1#1 and c2#1 on the
between-level categorical latent variable cb. These regressions imply
that the means of the random intercepts vary across the classes of the
categorical latent variable cb. The variances of c1#1 and c2#1 within
the cb classes are zero as the default.
When a model has more than one categorical latent variable, MODEL
followed by a label is used to describe the analysis model for each
categorical latent variable. Labels are defined by using the names of the
categorical latent variables. In the class-specific part of the within part
of the model for the between-level categorical latent variable cb, the ON
statement describes the multinomial regression of c2 on c1. This implies
that the random slope s varies across the classes of cb. The within-class
variance of s is zero as the default.
385
CHAPTER 10
386
Examples: Missing Data Modeling And Bayesian Analysis
CHAPTER 11
EXAMPLES: MISSING DATA
MODELING AND BAYESIAN
ANALYSIS
Mplus provides estimation of models with missing data using both
frequentist and Bayesian analysis. Descriptive statistics and graphics are
available for understanding dropout in longitudinal studies. Bayesian
analysis provides multiple imputation for missing data as well as
plausible values for latent variables.
387
CHAPTER 11
388
Examples: Missing Data Modeling And Bayesian Analysis
y1 y2 y3 y4
i s
In this example, the linear growth model at four time points with missing
data on a continuous outcome shown in the picture above is estimated
using a missing data correlate. The missing data correlate is not part of
the growth model but is used to improve the plausibility of the MAR
assumption of maximum likelihood estimation (Collins, Schafer, &
Kam, 2001; Graham, 2003; Enders, 2010). The missing data correlate is
allowed to correlate with the outcome while providing the correct
389
CHAPTER 11
The TITLE command is used to provide a title for the analysis. The title
is printed in the output just before the Summary of Analysis.
The DATA command is used to provide information about the data set
to be analyzed. The FILE option is used to specify the name of the file
that contains the data to be analyzed, ex11.1.dat. Because the data set is
in free format, the default, a FORMAT statement is not required.
390
Examples: Missing Data Modeling And Bayesian Analysis
OUTPUT: TECH1;
391
CHAPTER 11
392
Examples: Missing Data Modeling And Bayesian Analysis
on the variable and for those who drop out or not before the next time
point.
393
CHAPTER 11
y0 y1 y2 y3 y4 y5
i s q
d1 d2 d3 d4 d5
In this example, the linear growth model at six time points with missing
data on a continuous outcome shown in the picture above is estimated.
The data are not missing at random because dropout is related to both
past and current outcomes where the current outcome is missing for
those who drop out. In the picture above, y1 through y5 are shown in
both circles and squares where circles imply that dropout has occurred
and squares imply that dropout has not occurred. The Diggle-Kenward
selection model (Diggle & Kenward, 1994) is used to jointly estimate a
394
Examples: Missing Data Modeling And Bayesian Analysis
growth model for the outcome and a discrete-time survival model for the
dropout indicators (see also Muthén et al, 2011).
395
CHAPTER 11
y0 y1 y2 y3 y4 y5
i s q
d1 d2 d3 d4 d5
In this example, the linear growth model at six time points with missing
data on a continuous outcome shown in the picture above is estimated.
The data are not missing at random because dropout is related to both
past and current outcomes where the current outcome is missing for
those who drop out. A pattern-mixture model (Little, 1995; Hedeker &
Gibbons, 1997; Demirtas & Schafer, 2003) is used to estimate a growth
model for the outcome with binary dummy dropout indicators used as
covariates (see also Muthén et al, 2011).
396
Examples: Missing Data Modeling And Bayesian Analysis
397
CHAPTER 11
y1 y2 y3 y4
i s
x1 x2
398
Examples: Missing Data Modeling And Bayesian Analysis
The maximum likelihood parameter estimates for the growth model are
averaged over the set of 10 analyses and standard errors are computed
using the average of the standard errors over the set of 10 analyses and
the between analysis parameter estimate variation (Rubin, 1987; Schafer,
1997). A chi-square test of overall model fit is provided (Asparouhov &
Muthén, 2008c; Enders, 2010). The ESTIMATOR option is used to
specify the estimator to be used in the analysis. By specifying ML,
maximum likelihood estimation is used. An explanation of the other
commands can be found in Examples 11.1 and 11.5.
399
CHAPTER 11
fl f2 f3
i s
In this example, plausible values (Mislevy et al., 1992; von Davier et al.,
2009) are obtained by multiple imputation (Rubin, 1987; Schafer, 1997)
based on a multiple indicator linear growth model for categorical
outcomes shown in the picture above using Bayesian estimation. The
400
Examples: Missing Data Modeling And Bayesian Analysis
plausible values in the multiple imputation data sets can be used for
subsequent analysis.
401
CHAPTER 11
402
Examples: Missing Data Modeling And Bayesian Analysis
MODEL: %WITHIN%
f1w BY u11
u21 (1)
u31 (2);
f2w BY u12
u22 (1)
u32 (2);
f3w BY u13
u23 (1)
u33 (2);
%BETWEEN%
fb BY u11-u33*1;
fb@1;
DATA IMPUTATION:
IMPUTE = u11-u33(c);
SAVE = ex11.8imp*.dat;
OUTPUT: TECH1 TECH8;
Within
fb Between
403
CHAPTER 11
In this example, missing values are imputed for a set of variables using
multiple imputation (Rubin, 1987; Schafer, 1997). In the first part of
this example, imputation is done using the two-level factor model with
categorical outcomes shown in the picture above. In the second part of
this example, the multiple imputation data sets are used for a two-level
multiple indicator growth model with categorical outcomes using two-
level weighted least squares estimation.
404
Examples: Missing Data Modeling And Bayesian Analysis
405
CHAPTER 11
Within
iw sw
Between
ib sb
In the second part of this example, the data sets saved in the first part of
the example are used in the estimation of a two-level multiple indicator
growth model with categorical outcomes. The model is the same as in
Example 9.15. The two-level weighted least squares estimator described
in Asparouhov and Muthén (2007) is used in this example. This
estimator does not handle missing data using MAR. By doing Bayesian
multiple imputation as a first step, this disadvantage is avoided given
that there is no missing data for the weighted least squares analysis. To
save computational time in subsequent analyses, the two-level weighted
least squares sample statistics and weight matrix for each of the imputed
data sets are saved.
406
Examples: Missing Data Modeling And Bayesian Analysis
To use the saved within- and between-level sample statistics and their
corresponding estimated asymptotic covariance matrix for each
imputation in a subsequent analysis, specify:
DATA:
FILE = ex11.8implist.dat;
TYPE = IMPUTATION;
SWMATRIX = ex11.8swlist.dat;
407
CHAPTER 11
408
Examples: Monte Carlo Simulation Studies
CHAPTER 12
EXAMPLES: MONTE CARLO
SIMULATION STUDIES
Monte Carlo simulation studies are often used for methodological
investigations of the performance of statistical estimators under various
conditions. They can also be used to decide on the sample size needed
for a study and to determine power (Muthén & Muthén, 2002). Monte
Carlo studies are sometimes referred to as simulation studies.
Mplus has extensive Monte Carlo simulation facilities for both data
generation and data analysis. Several types of data can be generated:
simple random samples, clustered (multilevel) data, missing data, and
data from populations that are observed (multiple groups) or unobserved
(latent classes). Data generation models can include random effects,
interactions between continuous latent variables, interactions between
continuous latent variables and observed variables, and between
categorical latent variables. Dependent variables can be continuous,
censored, binary, ordered categorical (ordinal), unordered categorical
(nominal), counts, or combinations of these variable types. In addition,
two-part (semicontinuous) variables and time-to-event variables can be
generated. Independent variables can be binary or continuous. All or
some of the Monte Carlo generated data sets can be saved.
The analysis model can be different from the data generation model. For
example, variables can be generated as categorical and analyzed as
continuous or data can be generated as a three-class model and analyzed
as a two-class model. In some situations, a special external Monte Carlo
feature is needed to generate data by one model and analyze it by a
different model. For example, variables can be generated using a
clustered design and analyzed ignoring the clustering. Data generated
outside of Mplus can also be analyzed using this special Monte Carlo
feature.
Other special features that can be used with Monte Carlo simulation
studies include saving parameter estimates from the analysis of real data
to be used as population parameter and/or coverage values for data
generation in a Monte Carlo simulation study. In addition, analysis
results from each replication of a Monte Carlo simulation study can be
409
CHAPTER 12
Monte Carlo data generation can include the following special features:
410
Examples: Monte Carlo Simulation Studies
411
CHAPTER 12
412
Examples: Monte Carlo Simulation Studies
Internal Monte Carlo can be used whenever the analysis type and scales
of the dependent variables remain the same for both data generation and
analysis. Internal Monte Carlo can also be used with TYPE=GENERAL
when dependent variables are generated as categorical and analyzed as
continuous. Internal Monte Carlo can also be used when data are
generated and analyzed for a different number of latent classes. In all
other cases, data from all replications can be saved and subsequently
analyzed using external Monte Carlo.
413
CHAPTER 12
Degrees of freedom 5
Mean 5.253
Std Dev 3.325
Number of successful computations 500
Proportions Percentiles
Expected Observed Expected Observed
0.990 0.988 0.554 0.372
0.980 0.976 0.752 0.727
0.950 0.958 1.145 1.193
0.900 0.894 1.610 1.539
0.800 0.804 2.343 2.367
0.700 0.710 3.000 3.090
0.500 0.532 4.351 4.555
0.300 0.330 6.064 6.480
0.200 0.242 7.289 7.870
0.100 0.136 9.236 9.950
0.050 0.062 11.070 11.576
0.020 0.022 13.388 13.394
0.010 0.014 15.086 15.146
The mean and standard deviation of the chi-square test statistic over the
replications of the Monte Carlo analysis are given. The column labeled
Proportions Expected (column 1) should be understood in conjunction
with the column labeled Percentiles Expected (column 3). Each value in
column 1 gives the probability of observing a chi-square value greater
than the corresponding value in column 3. The column 3 percentile
values are determined from a chi-square distribution with the degrees of
freedom given by the model, in this case 5. In this output, the column 1
value of 0.05 gives the probability that the chi-square value exceeds the
column 3 percentile value (the critical value of the chi-square
distribution) of 11.070. Columns 2 and 4 give the corresponding values
observed in the Monte Carlo replications. Column 2 gives the
proportion of replications for which the critical value is exceeded, which
414
Examples: Monte Carlo Simulation Studies
The summary of the analysis results includes the population value for
each parameter, the average of the parameter estimates across
replications, the standard deviation of the parameter estimates across
replications, the average of the estimated standard errors across
replications, the mean square error for each parameter (M.S.E.), 95
percent coverage, and the proportion of replications for which the null
hypothesis that a parameter is equal to zero is rejected at the .05 level.
MODEL RESULTS
S |
Y1 0.000 0.0000 0.0000 0.0000 0.0000 1.000 0.000
Y2 1.000 1.0000 0.0000 0.0000 0.0000 1.000 0.000
Y3 2.000 2.0000 0.0000 0.0000 0.0000 1.000 0.000
Y4 3.000 3.0000 0.0000 0.0000 0.0000 1.000 0.000
I WITH
S 0.000 0.0006 0.0301 0.0306 0.0009 0.958 0.042
Means
I 0.000 -0.0006 0.0473 0.0460 0.0022 0.950 0.050
S 0.200 0.2015 0.0278 0.0274 0.0008 0.946 1.000
Variances
I 0.500 0.4969 0.0704 0.0685 0.0050 0.936 1.000
S 0.200 0.1997 0.0250 0.0237 0.0006 0.930 1.000
Residual Variances
Y1 0.500 0.5016 0.0683 0.0657 0.0047 0.934 1.000
Y2 0.500 0.5018 0.0460 0.0451 0.0021 0.958 1.000
Y3 0.500 0.5025 0.0515 0.0532 0.0027 0.956 1.000
Y4 0.500 0.4991 0.0932 0.0918 0.0087 0.946 1.000
415
CHAPTER 12
The column labeled Std. Dev. gives the standard deviation of the
parameter estimates across the replications of the Monte Carlo
simulation study. When the number of replications is large, this is
considered to be the population standard error. The column labeled S.E.
Average gives the average of the estimated standard errors across
replications of the Monte Carlo simulation study. To determine standard
error bias, subtract the population standard error value from the average
standard error value, divide this number by the population standard error
value, and multiply by 100.
The column labeled M.S.E. gives the mean square error for each
parameter. M.S.E. is equal to the variance of the estimates across the
replications plus the square of the bias. For example, the M.S.E. for the
variance of i is equal to 0.0704 squared plus (0.4969 - 0.5) squared
which is equal to 0.00497 or 0.0050. The column labeled 95% Cover
gives the proportion of replications for which the 95% confidence
interval contains the population parameter value. This gives the
coverage which indicates how well the parameters and their standard
errors are estimated. In this output, all coverage values are close to the
correct value of 0.95.
The column labeled % Sig Coeff gives the proportion of replications for
which the null hypothesis that a parameter is equal to zero is rejected at
the .05 level (two-tailed test with a critical value of 1.96). The statistical
test is the ratio of the parameter estimate to its standard error, an
approximately normally distributed quantity (z-score) in large samples.
416
Examples: Monte Carlo Simulation Studies
For parameters with population values different from zero, this value is
an estimate of power with respect to a single parameter, that is, the
probability of rejecting the null hypothesis when it is false. For
parameters with population values equal to zero, this value is an estimate
of Type I error, that is, the probability of rejecting the null hypothesis
when it is true. In this output, the power to reject that the slope growth
factor mean is zero is estimated as 1.000, that is, exceeding the standard
of 0.8 power.
417
CHAPTER 12
In this example, data are generated and analyzed according to the CFA
with covariates (MIMIC) model described in Example 5.8. Two factors
are regressed on two covariates and data are generated with patterns of
missing data.
The TITLE command is used to provide a title for the output. The title
is printed in the output just before the Summary of Analysis.
418
Examples: Monte Carlo Simulation Studies
MONTECARLO:
NAMES ARE y1-y4 x1 x2;
NOBSERVATIONS = 500;
NREPS = 500;
SEED = 4533;
CUTPOINTS = x2(1);
PATMISS = y1(.1) y2(.2) y3(.3) y4(1) |
y1(1) y2(.1) y3(.2) y4(.3);
PATPROBS = .4 | .6;
419
CHAPTER 12
MODEL POPULATION:
[x1-x2@0];
x1-x2@1;
f BY y1@1 y2-y4*1;
f*.5;
y1-y4*.5;
f ON x1*1 x2*.3;
420
Examples: Monte Carlo Simulation Studies
for the analysis model can also be provided using the MODEL
COVERAGE command or the COVERAGE option of the
MONTECARLO command. Alternate starting values can be provided
using the STARTING option of the MONTECARLO command. Note
that the population parameter values for coverage given in the analysis
model are different from the population parameter values used for data
generation if the analysis model is misspecified.
OUTPUT: TECH9;
421
CHAPTER 12
MODEL MISSING:
[y1-y4@-1];
y1 ON x1*.4 x2*.2;
y2 ON x1*.8 x2*.4;
y3 ON x1*1.6 x2*.8;
y4 ON x1*3.2 x2*1.6;
MODEL: i s | y1@0 y2@1 y3@2 y4@3;
[i*1 s*2];
i*1; s*.2; i WITH s*.1;
y1-y4*.5;
i ON x1*1 x2*.5;
s ON x1*.4 x2*.25;
OUTPUT: TECH9;
422
Examples: Monte Carlo Simulation Studies
423
CHAPTER 12
In this example, data are generated according the two class model
described in Example 8.1 and analyzed as a one class model. This
results in a misspecified model. Differences between the parameter
values that generated the data and the estimated parameters can be
studied to determine the extent of the distortion.
424
Examples: Monte Carlo Simulation Studies
The commented out lines in the MODEL command show how the
MODEL command is changed from a two class model to a one class
model. An explanation of the other commands can be found in
Examples 12.1 and 8.1.
425
CHAPTER 12
MODEL MISSING:
[y1-y4@-1];
y1 ON x*.4;
y2 ON x*.8;
y3 ON x*1.6;
y4 ON x*3.2;
ANALYSIS: TYPE IS TWOLEVEL;
MODEL:
%WITHIN%
iw sw | y1@0 y2@1 y3@2 y4@3;
y1-y4*.5;
iw ON x*1;
sw ON x*.25;
iw*1; sw*.2;
%BETWEEN%
ib sb | y1@0 y2@1 y3@2 y4@3;
y1-y4@0;
ib ON w*.5;
sb ON w*.25;
[ib*1 sb*.5];
ib*.2; sb*.1;
OUTPUT: TECH9 NOCHISQUARE;
In this example, data for the two-level growth model for a continuous
outcome (three-level analysis) described in Example 9.12 are generated
and analyzed. This Monte Carlo simulation study can be used to
estimate the power to detect that the binary cluster-level covariate w has
a significant effect on the growth slope factor sb.
426
Examples: Monte Carlo Simulation Studies
427
CHAPTER 12
one as the default. The factors are correlated under the default oblique
GEOMIN rotation.
428
Examples: Monte Carlo Simulation Studies
sb ON w*.25;
[ib*1 sb*.5];
ib*.2; sb*.1;
MODEL MISSING:
[y1-y4@-1];
y1 ON x*.4;
y2 ON x*.8;
y3 ON x*1.6;
y4 ON x*3.2;
ANALYSIS: TYPE = TWOLEVEL;
MODEL:
%WITHIN%
iw sw | y1@0 y2@1 y3@2 y4@3;
y1-y4*.5;
iw ON x*1;
sw ON x*.25;
iw*1; sw*.2;
%BETWEEN%
ib sb | y1@0 y2@1 y3@2 y4@3;
y1-y4@0;
ib ON w*.5;
sb ON w*.25;
[ib*1 sb*.5];
ib*.2; sb*.1;
OUTPUT: TECH8 TECH9;
In this example, clustered data are generated and analyzed for the two-
level growth model for a continuous outcome (three-level) analysis
described in Example 9.12. The data are saved for a subsequent external
Monte Carlo simulation study. The REPSAVE and SAVE options of the
MONTECARLO command are used to save some or all of the data sets
generated in a Monte Carlo simulation study. The REPSAVE option
specifies the numbers of the replications for which the data will be
saved. In the example above, the keyword ALL specifies that all of the
data sets will be saved. The SAVE option is used to name the files to
which the data sets will be written. The asterisk (*) is replaced by the
replication number. For example, data from the first replication will be
saved in the file named ex12.6rep1.dat. A file is also produced where
the asterisk (*) is replaced by the word list. The file, in this case
ex12.6replist.dat, contains the names of the generated data sets. The
ANALYSIS command is used to describe the technical details of the
analysis. By selecting TYPE=TWOLEVEL, a multilevel model is
estimated. An explanation of the other commands can be found in
Examples 12.1, 12.2, 12.4 and Example 9.12.
429
CHAPTER 12
430
Examples: Monte Carlo Simulation Studies
431
CHAPTER 12
In this example, parameter estimates saved from a real data analysis are
used for population parameter values for data generation and coverage
using the POPULATION and COVERAGE options of the
MONTECARLO command. The POPULATION option is used to name
the data set that contains the population parameter values to be used in
data generation. The COVERAGE option is used to name the data set
that contains the parameter values to be used for computing coverage
and are printed in the first column of the output labeled Population. An
explanation of the other commands can be found in Example 12.1.
432
Examples: Monte Carlo Simulation Studies
433
CHAPTER 12
434
Examples: Monte Carlo Simulation Studies
435
CHAPTER 12
436
Examples: Monte Carlo Simulation Studies
437
CHAPTER 12
In this example, data are generated and analyzed for the two-level
continuous-time survival analysis using Cox regression with a random
intercept and a frailty shown in Example 9.16. Monte Carlo simulation
of continuous-time survival models is described in Asparouhov et al.
(2006).
438
Examples: Monte Carlo Simulation Studies
439
CHAPTER 12
MODEL:
%WITHIN%
c | y ON x;
b | y ON m;
a | m ON x;
m*1; y*1;
%BETWEEN%
y WITH m*0.1 b*0.1 a*0.1 c*0.1;
m WITH b*0.1 a*0.1 c*0.1;
a WITH b*0.1 (cab);
a WITH c*0.1;
b WITH c*0.1;
y*1 m*1 a*1 b*1 c*1;
[a*0.4] (ma);
[b*0.5] (mb);
[c*0.6];
MODEL CONSTRAINT:
NEW(m*0.3);
m=ma*mb+cab;
The label cab is assigned to the covariance between the random slopes a
and b. The labels ma and mb are assigned to the means of the random
slopes a and b. These labels are used in the MODEL CONSTRAINT
command. The MODEL CONSTRAINT command is used to define
linear and non-linear constraints on the parameters in the model. In the
MODEL CONSTRAINT command, the NEW option is used to
introduce a new parameter that is not part of the MODEL command.
The new parameter m is the indirect effect of the covariate x on the
outcome y. The two outcomes y and m can also be categorical. For a
discussion of indirect effects when the outcome y is categorical, see
MacKinnon et al. (2007).
440
Examples: Monte Carlo Simulation Studies
441
CHAPTER 12
442
Examples: Special Features
CHAPTER 13
EXAMPLES: SPECIAL FEATURES
443
CHAPTER 13
The example above is based on Example 5.1 in which individual data are
analyzed. In this example, a covariance matrix is analyzed. The TYPE
option is used to specify that the input data set is a covariance matrix.
The NOBSERVATIONS option is required for summary data and is
used to indicate how many observations are in the data set used to create
the covariance matrix. Summary data are required to be in an external
data file in free format. Following is an example of the data:
1.0
.86 1.0
.56 .76 1.0
.78 .34 .48 1.0
.65 .87 .32 .56 1.0
.66 .78 .43 .45 .33 1.0
444
Examples: Special Features
The example above is based on Example 5.9 in which individual data are
analyzed. In this example, means and a covariance matrix are analyzed.
The TYPE option is used to specify that the input data set contains
means and a covariance matrix. The NOBSERVATIONS option is
required for summary data and is used to indicate how many
observations are in the data set used to create the means and covariance
matrix. Summary data are required to be in an external data file in free
format. Following is an example of the data. The means come first
followed by the covariances. The covariances must start on a new
record.
.4 .6 .3 .5
1.0
.86 1.0
.56 .76 1.0
.78 .34 .48 1.0
445
CHAPTER 13
The example above is based on Example 5.11 in which the data contain
no missing values. In this example, there are missing values and the
asterisk (*) is used as a missing value flag. The MISSING option is used
to identify the values or symbol in the analysis data set that will be
treated as missing or invalid. Non-numeric missing value flags are
applied to all variables in the data set.
The example above is based on Example 5.11 in which the data contain
no missing values. In this example, there are missing values and
numeric missing value flags are used. The MISSING option is used to
identify the values or symbol in the analysis data set that will be treated
as missing or invalid. Numeric missing value flags can be applied to a
446
Examples: Special Features
The example above is based on Example 3.11 in which the entire data
set is analyzed. In this example, a subset of variables and a subset of
observations are analyzed. The USEVARIABLES option is used to
select variables for an analysis. In the example above, y1, y2, y3, x1, x2,
and x3 are selected. The USEOBSERVATIONS option is used to select
observations for an analysis by specifying a conditional statement. In
the example above, individuals with the value of 2 on variable x4 are
included in the analysis.
447
CHAPTER 13
The example above is based on Example 3.11 where the variables are
not transformed. In this example, two variables are transformed using
the DEFINE command. The variable y1 is transformed by dividing it by
100. The variable x3 is transformed by taking the square root of it. The
transformed variables are used in the estimation of the model. The
DEFINE command can also be used to create new variables.
448
Examples: Special Features
This example is based on the model in Example 5.1 where there are no
equality constraints on model parameters. In the example above, several
model parameters are constrained to be equal. Equality constraints are
specified by placing the same number in parentheses following the
parameters that are to be held equal. The label (1-2) following the factor
loadings uses the list function to assign equality labels to these
parameters. The label 1 is assigned to the factor loadings of y2 and y5
which holds these factor loadings equal. The label 2 is assigned to the
factor loadings of y3 and y6 which holds these factor loadings equal.
The third equality statement holds the residual variances of y1, y2, and
y3 equal using the label (3), and the fourth equality statement holds the
residual variances of y4, y5, and y6 equal using the label (4).
449
CHAPTER 13
This example is based on Example 5.15 in which the model has two
groups. In this example, the model has three groups. Parameters are
constrained to be equal by placing the same number in parentheses
following the parameters that will be held equal. In multiple group
analysis, the overall MODEL command is used to set equalities across
groups. The group-specific MODEL commands are used to specify
equalities for specific groups or to relax equalities specified in the
overall MODEL command. In the example above, the first equality
statement holds the variance of f1 equal across the three groups in the
analysis using the equality label 1. The second equality statement holds
the residual variances of y1, y2, and y3 equal to each other and equal
across groups using the equality label 2. The third equality statement
uses the list function to hold the residual variance of y4, y5, and y6 equal
across groups by assigning the equality label 3 to the residual variance of
y4, the label 4 to the residual variance of y5, and the label 5 to the
residual variance of y6. The fourth and fifth equality statements hold
the variance of f2 equal across groups g1 and g3 using the equality label
6.
450
Examples: Special Features
451
CHAPTER 13
The input setup above shows the first step needed to do a chi-square
difference test for the WLSMV and MLMV estimators. In this analysis,
the less restrictive H1 model is estimated. The DIFFTEST option of the
SAVEDATA command is used to save the derivatives of the H1 model
for use in the second step of the analysis. The DIFFTEST option is used
to specify the name of the file in which the derivatives from the H1
model will be saved. In the example above, the file name is deriv.dat.
The input setup above shows the second step needed to do a chi-square
difference test for the WLSMV and MLMV estimators. In this analysis,
the more restrictive H0 model is estimated. The restriction is that the
covariances among the factors are fixed at zero in this model. The
DIFFTEST option of the ANALYSIS command is used to specify the
name of the file that contains the derivatives of the H1 model that was
estimated in the first step of the analysis. This file is deriv.dat.
452
Examples: Special Features
The example above is based on Example 5.1 in which a single data set is
analyzed. In this example, data sets generated using multiple imputation
are analyzed. The FILE option of the DATA command is used to give
the name of the file that contains the names of the multiple imputation
data sets to be analyzed. The file named using the FILE option of the
DATA command must contain a list of the names of the multiple
imputation data sets to be analyzed. This file must be created by the
user unless the data are imputed using the DATA IMPUTATION
command in which case the file is created as part of the multiple
imputation. Each record of the file must contain one data set name. For
example, if five data sets are being analyzed, the contents of implist.dat
would be:
imp1.dat
imp2.dat
imp3.dat
imp4.dat
imp5.dat
453
CHAPTER 13
The example above is based on Example 3.11 in which the analysis data
are not saved. In this example, the SAVEDATA command is used to
save the analysis data set. The FILE option is used to specify the name
of the ASCII file in which the individual data used in the analysis will be
saved. In this example, the data will be saved in the file regress.sav.
The data are saved in fixed format as the default unless the FORMAT
option of the SAVEDATA command is used.
The example above is based on Example 5.8 in which factor scores are
not saved. In this example, the SAVEDATA command is used to save
the analysis data set and factor scores. The FILE option is used to
specify the name of the ASCII file in which the individual data used in
the analysis will be saved. In this example, the data will be saved in the
file mimic.sav. The SAVE option is used to specify that factor scores
will be saved along with the analysis data. The data are saved in fixed
454
Examples: Special Features
455
CHAPTER 13
This example shows how to merge two data sets using TYPE=BASIC.
Merging can be done with any analysis type. The first data set data1.dat
is named using the FILE option of the DATA command. The second
data set data2.dat is named using the MFILE option of the SAVEDATA
command. The NAMES option of the VARIABLE command gives the
names of the variables in data1.dat. The MNAMES option of the
SAVEDATA command gives the names of the variables in data2.dat.
The IDVARIABLE option of the VARIABLE command gives the name
of the variable to be used for merging. This variable must appear on
both the NAMES and MNAMES statements. The merged data set
data12.dat is saved in the file named using the FILE option of the
SAVEDATA command. The default format for this file is free and the
default missing value flag is the asterisk (*). These defaults can be
changed using the FORMAT and MISSFLAG options as shown above.
In the merged data set data12.dat, the missing value flags of asterisk (*)
in data1.dat and 99 in data2.dat are replaced by 999.
456
Examples: Special Features
flag. If the data are not in free format, the FORMAT statement can be
used to specify a fixed format.
457
CHAPTER 13
This example shows how to generate, use, and save replicate weights in
a factor analysis. Replicate weights summarize information about a
complex sampling design (Korn & Graubard, 1999; Lohr, 1999;
Asparouhov & Muthén, 2009b). When replicate weights are generated,
the REPSE option of the ANALYSIS command and the WEIGHT option
of the VARIABLE command along with the STRATIFICATION and/or
CLUSTER options of the VARIABLE command are used. The
WEIGHT option is used to identify the variable that contains sampling
weight information. In this example, the sampling weight variable is
weight. The STRATIFICATION option is used to identify the variable
in the data set that contains information about the subpopulations from
which independent probability samples are drawn. In this example, the
variable is strat. The CLUSTER option is used to identify the variable
in the data set that contains clustering information. In this example, the
variable is psu. Replicate weights can be generated and analyzed only
with TYPE=COMPLEX. The REPSE option is used to specify the
resampling method that will be used to create the replicate weights. The
setting BOOTSTRAP specifies that bootstrap draws will be used. The
BOOTSTRAP option specifies that 100 bootstrap draws will be carried
out. When replicate weights are generated, they can be saved for further
analysis using the FILE and SAVE options of the SAVEDATA
command. Replicate weights will be saved along with the other analysis
variables in the file named rweights.sav.
458
Special Modeling Issues
CHAPTER 14
SPECIAL MODELING ISSUES
• Model estimation
• Multiple group analysis
• Missing data
• Categorical mediating variables
• Calculating probabilities from probit regression coefficients
• Calculating probabilities from logistic regression coefficients
• Parameterization of models with more than one categorical latent
variable
MODEL ESTIMATION
There are several important issues involved in model estimation beyond
specifying the model. The following general analysis considerations are
discussed below:
459
CHAPTER 14
460
Special Modeling Issues
group and last class and are free and unequal in the other groups or
classes except when a categorical latent variable is regressed on a
continuous latent variable. In this case, the means and intercepts of
continuous latent variables are fixed at zero in all classes.
• Logit means and intercepts of categorical latent variables are fixed at
zero in the last class and free and unequal in the other classes.
461
CHAPTER 14
462
Special Modeling Issues
GENERAL DEFAULTS
463
CHAPTER 14
For situations where starting values depend on the analysis, the starting
values can be found using the TECH1 option of the OUTPUT command.
When growth models are specified using the | symbol of the MODEL
command and the outcome is continuous or censored, automatic starting
values for the growth factor means and variances are generated based on
individual regressions of the outcome variable on time. For other
outcome types, the defaults above apply.
464
Special Modeling Issues
class indicators, the threshold starting values for each variable must be
ordered from low to high. The exception to this is when equality
constraints are placed on adjacent thresholds for a variable in which case
the same starting value is used. It is a good idea to start the classes apart
from each other.
465
CHAPTER 14
possible that a local solution has been reached, and the results should not
be interpreted without further investigation. Following is an example of
a set of ten final stage solutions that point to a good solution because all
of the final stage solutions have the same loglikelihood value:
-836.899 902278 21
-836.899 366706 29
-836.899 903420 5
-836.899 unperturbed 0
-836.899 27071 15
-836.899 967237 48
-836.899 462953 7
-836.899 749453 33
-836.899 637345 19
-836.899 392418 28
-835.247 902278 21
-837.132 366706 29
-840.786 903420 5
-840.786 unperturbed 0
-840.786 27071 15
-853.684 967237 48
-867.123 462953 7
-890.442 749453 33
-905.512 637345 19
-956.774 392418 28
466
Special Modeling Issues
extracted. If the parameter values are very similar across the solutions,
the solution with the highest loglikelihood should be chosen.
CONVERGENCE PROBLEMS
Some combinations of models and data may cause convergence
problems. A message to this effect is found in the output. Convergence
problems are often related to variables in the model being measured on
very different scales, poor starting values, and/or a model being
estimated that is not appropriate for the data. In addition, certain models
are more likely to have convergence problems. These include mixture
models, two-level models, and models with random effects that have
small variances.
467
CHAPTER 14
For both types of convergence problems, the first thing to check is that
the variables are measured on similar scales. Convergence problems
may occur when the range of sample variance values greatly exceeds 1
to 10. This is particularly important with combinations of categorical
and continuous outcomes.
Random effect models can have convergence problems when the random
effect variables have small variances. Problems can arise in models in
which random effect variables are defined using the ON or AT options
of the MODEL command in conjunction with the | symbol of the
468
Special Modeling Issues
MODEL IDENTIFICATION
Not all models that can be specified in the program are identified. A
non-identified model is one that does not have meaningful estimates for
all of its parameters. Standard errors cannot be computed for non-
identified models because of a singular Fisher information matrix.
When a model is not identified, an error message is printed in the output.
In most cases, the error message gives the number of the parameter that
contributes to the non-identification. The parameter to which the
number applies is found using the TECH1 option of the OUTPUT
command. Additional restrictions on the parameters of the model are
often needed to make the model identified.
469
CHAPTER 14
NUMERICAL INTEGRATION
Numerical integration is required for maximum likelihood estimation
when the posterior distribution of the latent variable does not have a
closed form expression. In the table below, the ON and BY statements
that require numerical integration are designated by a single or double
asterisk (*). A single asterisk (*) indicates that numerical integration is
always required. A double asterisk (*) indicates that numerical
integration is required when the mediating variable has missing data.
Numerical integration is also required for models with interactions
involving continuous latent variables and for certain models with
random slopes such as multilevel mixture models.
470
Special Modeling Issues
471
CHAPTER 14
472
Special Modeling Issues
473
CHAPTER 14
474
Special Modeling Issues
the first group is males. For summary data, the first group is the group
with the label, g1. This group is the group represented by the first set of
summary data found in the summary data set.
All structural parameters are free and not constrained to be equal across
groups as the default. Structural parameters include factor means,
variances, and covariances and regressions coefficients. Factor means
are fixed at zero in the first group and are free to be estimated in the
other groups as the default. This is because factor means generally
cannot be identified for all groups. The customary approach is to set the
factor means to zero in a reference group, here the first group.
475
CHAPTER 14
MODEL: f1 BY y1 y2 y3;
f2 BY y4 y5 y6;
476
Special Modeling Issues
of each factor should not be included because including them frees their
factor loadings which should be fixed at one to set the metric of the
factors.
Factor means are fixed at zero in the first group and are estimated in
each of the other groups. The following group-specific MODEL
command relaxes the equality constraints on the intercepts and
thresholds of the observed dependent variables:
MODEL: f1 BY y1-y5;
f2 BY y6-y10;
f1 ON f2;
MODEL g1: f1 BY y5;
MODEL g2: f2 BY y9;
Differences between the overall model and the group-specific models are
specified using the MODEL command followed by a label. The two
group-specific MODEL commands above specify differences between
the overall model and the group-specific models. In the above example,
the factor loading for y5 in group g1 is not constrained to be equal to the
factor loading for y5 in the other two groups and the factor loading for
y9 in group g2 is not constrained to be equal to the factor loading for y9
477
CHAPTER 14
in the other two groups. The model for g3 is identical to that of the
overall model because there is no group-specific model statement for g3.
y1 ON x1 (1) ;
y2 ON x2 (1) ;
y3 ON x3 (2) ;
y4 ON x4 (2) ;
y5 ON x5 (2) ;
MODEL: f1 BY y1-y5;
y1 (1)
y2 (2)
y3 (3)
y4 (4)
y5 (5);
478
Special Modeling Issues
equal to each other. Note that only one equality constraint can be
specified per line.
MODEL: f1 BY y1-y5;
y1-y5 (1);
MODEL: f1 BY y1-y5;
y1-y5 (1);
479
CHAPTER 14
The overall MODEL command specifies the overall model for the three
groups as described above. Because there is no group-specific MODEL
command for g1, g1 uses the same model as that described in the overall
MODEL command. The group-specific MODEL commands describe
the differences between the overall model and the group-specific
models. The group g2 uses the overall model with the exception that the
one residual variance that is estimated is not constrained to be equal to
the other two groups. The group g3 uses the overall model with the
exception that five residual variances not constrained to be equal to the
other groups are estimated.
MEANS/INTERCEPTS/THRESHOLDS IN
MULTIPLE GROUP ANALYSIS
In multiple group analysis, the intercepts and thresholds of observed
dependent variables that are factor indicators are constrained to be equal
across groups as the default. The means and intercepts of continuous
latent variables are fixed at zero in the first group and are free to be
estimated in the other groups as the default. Means, intercepts, and
thresholds are referred to by the use of square brackets.
MODEL: f1 BY y1-y5;
f2 BY y6-y10;
f1 ON f2;
MODEL g1: [f1 f2];
MODEL g2: [f1@0 f2@0];
In the above example, the intercepts and the factor loadings for the
factor indicators y1-y5 are held equal across the three groups as the
default. In the group-specific MODEL command for g1, the mean of f2
and the intercept of f1 are specified to be free. In the group-specific
MODEL command for g2, the mean of f2 and the intercept of f1 are
fixed at zero.
480
Special Modeling Issues
MODEL: f BY u1-u5;
MODEL g2: {u1-u5*.5};
In the above example, the scale factors of the latent response variables of
the observed categorical dependent variables in g1 are fixed at one as the
default. Starting values are given for the free scale factors in g2.
481
CHAPTER 14
variances in a model with multiple groups where u1, u2, u3, u4, and u5
are observed categorical dependent variables.
MODEL: f BY u1-u5;
MODEL g2: u1-u5*2;
If individual data for several groups are stored in one data set, the data
set must include a variable that identifies the group to which each
observation belongs. The name of this variable is specified using the
GROUPING option of the VARIABLE command. Only one grouping
variable can be specified. If the groups to be analyzed are a combination
of more than one variable, for example, gender and ethnicity, a single
grouping variable can be created using the DEFINE command. An
example of how to specify the GROUPING option is:
482
Special Modeling Issues
the grouping variable that is not specified using the GROUPING option,
it is not included in the analysis.
For individual data stored in different data sets, the specification of the
FILE option of the DATA command has two differences for multiple
group analysis. First, a FILE statement is required for each data set.
Second, the FILE option allows a label to be specified that can be used
in the group-specific MODEL commands. In the situation where the
data for males are stored in a file named male.dat, and the data for
females are stored in a file named female.dat, the FILE option is
specified as follows:
The labels male and female can be used in the group-specific MODEL
commands to specify differences between the group-specific models for
males and females and the overall model.
When individual data are stored in different data sets, all of the data sets
must contain the same number of variables. These variables must be
assigned the same names and be read using the same format.
Summary data must be stored in one data set with the data for the first
group followed by the data for the second group, etc.. For example, in
an analysis of means and a covariance matrix for two groups with four
observed variables, the data would appear as follows:
0000
2
12
112
1112
1111
483
CHAPTER 14
3
23
223
2223
where the means for group 1 come first, followed by the covariances for
group 1, followed by the means for group 2, followed by the covariances
for group 2.
indicates that the summary data for males come from 180 observations
and the summary data for females come from 220 observations.
NGROUPS = 2;
which indicates that there are two groups in the analysis. For summary
data, the program automatically assigns the label g1 to the first group, g2
to the second group, etc. In this example, males would have the label g1
and females would have the label g2.
484
Special Modeling Issues
485
CHAPTER 14
486
Special Modeling Issues
487
CHAPTER 14
488
Special Modeling Issues
489
CHAPTER 14
The information in the table above represents how the data look before
they are transformed. As a first step, each observation that does not
have complete data for 1982, 1983, 1987, and 1989 is deleted from the
data set. Following is the data after this step.
The second step is to rearrange the data so that age is the time
dimension. This results in the following data set where asterisks (*)
represent values that are missing by design.
490
Special Modeling Issues
Obs Coh HD17 HD18 HD19 HD20 HD22 HD23 HD24 HD25 HD26
1 63 * * 3 4 * * 5 * 6
4 63 * * 5 7 * * 6 * 3
5 63 * * 5 8 * * 7 * 9
6 64 * 3 6 * * 5 * 9 *
8 64 * 4 9 * * 8 * 6 *
10 64 * 3 9 * * 8 * 5 *
12 65 6 5 * * 5 * 5 * *
13 65 5 5 * * 5 * 5 * *
14 65 4 5 * * 6 * 7 * *
15 65 4 5 * * 5 * 4 * *
The model is specified in the MODEL command using the new variables
hd17 through hd26 instead of the original variables hd82, hd83, hd87,
and hd89. Note that there is no hd21 because no combination of survey
year and birth cohort represents this age. The data are analyzed using
the missing by design feature.
x -> u -> y
491
CHAPTER 14
P (u = 1 | x) = F (a + b*x)
= F (-t + b*x),
Following is an output excerpt that shows the results from the probit
regression of a binary variable u on the covariate age:
u ON
age 0.055 0.001 43.075
Thresholds
u$1 3.581 0.062 57.866
492
Special Modeling Issues
given x using the two thresholds t1 and t2 and the single probit regression
coefficient b,
P (u = 0 | x) = F (t1 - b*x),
P (u = 1 | x) = F (t2 - b*x) - F (t1 - b*x),
P (u = 2 | x) = F (- t2 + b*x).
493
CHAPTER 14
For a binary covariate x scored as 0 and 1, the log odds for u = 1 versus
u = 0 are,
such that the increase in the log odds is b as above. Given the
mathematical rule that log y – log z is equal to log (y / z), the difference
in the two log odds,
494
Special Modeling Issues
where exp (aR + bR*x) = exp (0 + 0*x) = 1 and the log odds for
comparing category r to category R is
C#1 ON
AGE94 -.285 .028 -10.045
MALE 2.578 .151 17.086
BLACK .158 .139 1.141
C#2 ON
AGE94 .069 .022 3.182
MALE .187 .110 1.702
BLACK -.606 .139 -4.357
C#3 ON
AGE94 -.317 .028 -11.311
MALE 1.459 .101 14.431
BLACK .999 .117 8.513
Intercepts
C#1 -1.822 .174 -10.485
C#2 -.748 .103 -7.258
C#3 -.324 .125 -2.600
Using (3), the log odds expression for a particular class compared to the
last class is,
In the first example, the values of the three covariates are all zero so that
only the intercepts contribute to the log odds. Probabilities are
computed using (2). In the first step, the estimated intercept log odds
495
CHAPTER 14
In the second example, the values of the three covariates are all one so
that both the intercepts and the slopes contribute to the logs odds. In the
first step, the log odds values for each class are computed. In the second
step, the log odds values are exponentiated and summed. In the last step,
the exponentiated value is divided by the sum to compute the probability
for each class of c.
496
Special Modeling Issues
In the output shown above, the variable male has the value of 1 for males
and 0 for females and the variable black has the value of 1 for blacks and
0 for non-blacks. The variable age94 has the value of 0 for age 16, 1 for
age 17, up to 7 for age 23. An interpretation of the logistic regression
coefficient for class 1 is that comparing class 1 to class 4, the log odds
decreases by -.285 for a unit increase in age, is 2.578 higher for males
than for females, and is .158 higher for blacks than for non-blacks. This
implies that the odds ratio for being in class 1 versus class 4 when
comparing males to females is 13.17 (exp 2.578), holding the other two
covariates constant.
Class 1, 12.7%
Class 2, 20.5%
Class 3, 30.7%
0.6
Class 4, 36.2%
0.4
Probability
0.2
0
0
age94
497
CHAPTER 14
LOGIT PARAMETERIZATION
Following is a description of the logistic regression parameterization,
specified using PARAMETERIZATION=LOGIT, for the following
MODEL command for two categorical latent variables with three classes
each:
MODEL:
%OVERALL%
c2#1 ON c1#1;
c2#1 ON c1#2;
c2#2 ON c1#1;
c2#2 ON c1#2;
where a3 = 0, b31 = 0, b32 = 0, and b33 = 0 because the last class is the
reference class, and sumj represents the sum of the exponentiations
across the classes of c2 for c1 = j (j = 1, 2, 3). The corresponding log
odds when comparing a c2 class to the last c2 class are summarized in
the table below.
498
Special Modeling Issues
c2
1 2 3
c1 1 a1 + b11 a2 + b21 0
2 a1 + b12 a2 + b22 0
3 a1 a2 0
a1 [c2#1];
a2 [c2#2];
b11 c2#1 ON c1#1;
b12 c2#1 ON c1#2;
b21 c2#2 ON c1#1;
b22 c2#2 ON c1#2;
LOGLINEAR PARAMETERIZATION
Following is a description of the loglinear parameterization for the
following MODEL command for two categorical latent variables with
three classes each:
MODEL:
%OVERALL%
c2#1 WITH c1#1;
c2#1 WITH c1#2;
c2#2 WITH c1#1;
c2#2 WITH c1#2;
499
CHAPTER 14
a11 [c1#1];
a12 [c1#2];
a21 [c2#1];
a22 [c2#2];
w11 c2#1 WITH c1#1;
w12 c2#1 WITH c1#2;
w21 c2#2 WITH c1#1;
w22 c2#2 WITH c1#2;
The joint probabilities for the classes of c1 and c2 are computed using
the multinomial logistic regression formula (2) in the previous section,
summing over the nine cells shown in the table below.
c2
1 2 3
c1 1 a11 + a21 + a11 + a22 + a11
w11 w21
2 a12 + a21 + a12 + a22 + a12
w12 w22
3 a21 a22 0
PROBABILITY PARAMETERIZATION
Following is a description of the probability parameterization for the
following MODEL command for two categorical latent variables with
three classes each:
MODEL:
%OVERALL%
c2#1 ON c1#1;
c2#1 ON c1#2;
c2#1 ON c1#3;
c2#2 ON c1#1;
c2#2 ON c1#2;
c2#2 ON c1#3;
500
Special Modeling Issues
c2
1 2 3
c1 1 p11 p12 0
2 p21 p22 0
3 p31 p32 0
[c1#1];
[c1#2];
501
CHAPTER 14
502
TITLE, DATA, VARIABLE, And DEFINE Commands
CHAPTER 15
TITLE, DATA, VARIABLE, AND
DEFINE COMMANDS
The TITLE command can contain any letters and symbols except the
words used as Mplus commands when they are followed by a colon.
These words are: title, data, variable, define, analysis, model, output,
savedata, montecarlo, and plot. These words can be included in the title
if they are not followed by a colon. Colons can be used in the title as
long as they do not follow words that are used as Mplus commands.
Following is an example of how to specify a title:
The title is printed in the output just before the Summary of Analysis.
503
CHAPTER 15
Data must be numeric except for certain missing value flags and must
reside in an external ASCII file. There is no limit on the number of
variables or observations. The maximum record length is 10,000.
Special features of the DATA command for multiple group analysis are
discussed in Chapter 14. Monte Carlo data generation is discussed in
Chapters 12 and 19. The estimator chosen for an analysis determines the
type of data required for the analysis. Some estimators require a data set
with information for each observation. Some estimators require only
summary information.
504
TITLE, DATA, VARIABLE, And DEFINE Commands
Following are the options for the DATA and the DATA transformation
commands:
DATA:
FILE IS file name;
FORMAT IS format statement; FREE
FREE;
TYPE IS INDIVIDUAL; INDIVIDUAL
COVARIANCE;
CORRELATION;
FULLCOV;
FULLCORR;
MEANS;
STDEVIATIONS;
MONTECARLO;
IMPUTATION;
NOBSERVATIONS ARE number of observations;
NGROUPS = number of groups; 1
LISTWISE = ON; OFF
OFF;
SWMATRIX = file name;
VARIANCES = CHECK; CHECK
NOCHECK;
DATA IMPUTATION:
IMPUTE = names of variables for which missing values
will be imputed;
NDATASETS = number of imputed data sets; 5
SAVE = names of files in which imputed data sets
are stored;
MODEL = COVARIANCE; depends on
SEQUENTIAL; analysis type
REGRESSION;
VALUES = values imputed data can take; no restrictions
ROUNDING = number of decimals for imputed continuous 3
variables;
THIN = k where every k-th imputation is saved; 100
DATA WIDETOLONG:
WIDE = names of old wide format variables;
LONG = names of new long format variables;
IDVARIABLE = name of variable with ID information; ID
REPETITION = name of variable with repetition information; REP
505
CHAPTER 15
DATA LONGTOWIDE:
LONG = names of old long format variables;
WIDE = names of new wide format variables;
IDVARIABLE = name of variable with ID information;
REPETITION = name of variable with repetition information
(values); 0, 1, 2, etc.
DATA TWOPART:
NAMES = names of variables used to create a set of
binary and continuous variables;
CUTPOINT = value used to divide the original variables 0
into a set of binary and continuous
BINARY = variables;
CONTINUOUS = names of new binary variables;
TRANSFORM = names of new continuous variables; LOG
function to use to transform new continuous
variables;
DATA MISSING:
NAMES = names of variables used to create a set of
binary variables;
BINARY = names of new binary variables;
TYPE = MISSING;
SDROPOUT;
DDROPOUT;
DESCRIPTIVE = sets of variables for additional descriptive
statistics separated by the | symbol;
DATA SURVIVAL:
NAMES = names of variables used to create a set of
binary event-history variables;
CUTPOINT = value used to create a set of binary event-
history variables from a set of original
variables;
BINARY = names of new binary variables;
DATA COHORT:
COHORT IS name of cohort variable (values);
COPATTERN IS name of cohort/pattern variable (patterns);
COHRECODE = (old value = new value);
TIMEMEASURES = list of sets of variables separated by the |
symbol;
TNAMES = list of root names for the sets of variables in
TIMEMEASURES separated by the |
symbol;
506
TITLE, DATA, VARIABLE, And DEFINE Commands
summary data are analyzed. This option is not required when individual
data are analyzed. Default settings are shown in the last column. If the
default settings are appropriate for the options that are not required,
nothing needs to be specified for these options.
FILE
The FILE option is used to specify the name and location of the ASCII
file that contains the data to be analyzed. The FILE option is required
for each analysis. It is specified for a single group analysis as follows:
FILE IS c:\analysis\data.dat;
where data.dat is the name of the ASCII file containing the data to be
analyzed. In this example, the file data.dat is located in the directory
c:\analysis. If the full path name of the data set contains any blanks, the
full path name must have quotes around it.
If the name of the data set is specified with a path, the directory
specified by the path is checked. If the name of the data set is specified
without a path, the local directory is checked. If the data set is not found
in the local directory, the directory where the input file is located is
checked.
FORMAT
The FORMAT option is used to describe the format of the data set to be
analyzed. Individual data can be in fixed or free format. Free format is
the default. Fixed format is recommended for large data sets because it
is faster to read data using a fixed format. Summary data must be in free
format.
507
CHAPTER 15
For data in fixed format, each observation must have the same number of
records. Information for a given variable must occupy the same position
on the same record for each observation. A FORTRAN-like format
statement describing the position of the variables in the data set is
required. Following is an example of how to specify a format statement:
There are three options for the format statement related to skipping
columns or records when reading data: x, t, and /. The x option instructs
the program to skip columns. The statement 10x says to skip 10
columns and begin reading in column 11. The t option instructs the
program to go to a particular column and begin reading. For example,
t130 says to go to column 130 and begin reading in column 130. The /
option is used to instruct the program to go to the next record. Consider
the following format statements:
508
TITLE, DATA, VARIABLE, And DEFINE Commands
1. In the first statement, for each record the program reads 20 four-digit
numbers followed by 13 five-digit numbers, then three two-digit
numbers with a total record length of 151.
2. In the second statement, for each record the program reads three four-
digit numbers with one digit to the right of the decimal, skips 25 spaces,
and then reads five five-digit numbers with a total record length of 62.
3. The third statement is the same as the second but uses the t option
instead of the x option. In the third statement, for each record the
program reads three four-digit numbers with one digit to the right of the
decimal, goes to column 38, and then reads five five-digit numbers.
4. In the fourth statement, each observation has four records. For record
one the program reads two four-digit numbers; for record two the
program reads fourteen four-digit numbers with two digits to the right of
the decimal; record three is skipped; and for record four the program
reads six three-digit numbers with one number to the right of the decimal
point.
123234
342765
348765
FORMAT IS 6F1.0;
or
FORMAT IS 6F1;
509
CHAPTER 15
TYPE
The TYPE option is used in conjunction with the FILE option to
describe the contents of the file named using the FILE option. It has the
following settings:
INDIVIDUAL
The default for the TYPE option is INDIVIDUAL. The TYPE option is
not required if individual data are being analyzed where rows represent
observations and columns represent variables.
SUMMARY DATA
When summary data are analyzed and one or more dependent variables
are binary or ordered categorical (ordinal), only a correlation matrix can
be analyzed. When summary data are analyzed and all dependent
510
TITLE, DATA, VARIABLE, And DEFINE Commands
The external ASCII file for the above example contains the means,
standard deviations, and correlations in free format. Each type of data
must begin on a separate record even if the data fits on less than one
record. The means come first; the standard deviations begin on the
record following the last mean; and the entries of the lower triangular
correlation matrix begin on the record following the last standard
deviation. The data set appears as follows:
.4 .6 .3 .5 .5
.2 .5 .4 .5 .6
1.0
.86 1.0
.56 .76 1.0
.78 .34 .48 1.0
.65 .87 .32 .56 1.0
or alternatively:
.4 .6 .3 .5 .5
.2 .5 .4 .5 .6
1.0 .86 1.0 .56 .76 1.0 .78 .34 .48 1.0 .65 .87 .32 .56 1.0
MONTECARLO
The MONTECARLO setting of the TYPE option is used when the data
sets being analyzed have been generated and saved using either the
REPSAVE option of the MONTECARLO command or by another
computer program. The file named using the FILE option of the DATA
command contains a list of the names of the data sets to be analyzed and
511
CHAPTER 15
data1.dat
data2.dat
data3.dat
data4.dat
data5.dat
IMPUTATION
The IMPUTATION setting of the TYPE option is used when the data
sets being analyzed have been generated using multiple imputation
procedures. The file named using the FILE option of the DATA
command must contain a list of the names of the multiple imputation
data sets to be analyzed. Parameter estimates are averaged over the set
of analyses. Standard errors are computed using the average of the
squared standard errors over the set of analyses and the between analysis
512
TITLE, DATA, VARIABLE, And DEFINE Commands
imp1.dat
imp2.dat
imp3.dat
imp4.dat
imp5.dat
NOBSERVATIONS
The NOBSERVATIONS option is required when summary data are
analyzed. When individual data are analyzed, the program counts the
number of observations. The NOBSERVATIONS option can, however,
be used with individual data to limit the number of records used in the
analysis. For example, if a data set contains 20,000 observations, it is
possible to analyze only the first 1,000 observations by specifying:
NOBSERVATIONS = 1000;
NGROUPS
The NGROUPS option is used for multiple group analysis when
summary data are analyzed. It specifies the number of groups in the
analysis. It is specified as follows:
NGROUPS = 3;
513
CHAPTER 15
LISTWISE
The LISTWISE option is used to indicate that any observation with one
or more missing values on the set of analysis variables not be used in the
analysis. The default is to estimate the model under missing data theory
using all available data. To turn on listwise deletion, specify:
LISTWISE = ON;
SWMATRIX
The SWMATRIX option is used with TYPE=TWOLEVEL and
weighted least squares estimation to specify the name and location of the
file that contains the within- and between-level sample statistics and
their corresponding estimated asymptotic covariance matrix. The
univariate and bivariate sample statistics are estimated using one- and
two-dimensional numerical integration with a default of 7 integration
points. The INTEGRATION option of the ANALYSIS command can be
used to change the default. It is recommended to save this information
and use it in subsequent analyses along with the raw data to reduce
computational time during model estimation. Analyses using this
information must have the same set of observed dependent and
independent variables, the same DEFINE command, the same
USEOBSERVATIONS statement, and the same USEVARIABLES
statement as the analysis which was used to save the information. It is
specified as follows:
SWMATRIX = swmatrix.dat;
where swmatrix.dat is the file that contains the within- and between-
level sample statistics and their corresponding estimated asymptotic
covariance matrix.
514
TITLE, DATA, VARIABLE, And DEFINE Commands
VARIANCES
The VARIANCES option is used to check that the analysis variables do
not have variances of zero in the sample used for the analysis. Checking
for variances of zero is the default. To turn off this check, specify:
VARIANCES = NOCHECK;
515
CHAPTER 15
Imputation model
H1 H0
MODEL command
TYPE = BASIC Not ESTIMATOR = BAYES
Example 11.5 Example 11.6
The figure above shows three ways that data imputation can be done.
The first path in the figure uses an unrestricted H1 imputation model and
saves the imputed data sets for a subsequent analysis. In this case,
TYPE=BASIC is specified in the ANALYSIS command. See Example
11.5. To use the data sets in a subsequent analysis, specify
TYPE=IMPUTATION in the DATA command. See Example 13.13.
The second path in the figure uses an unrestricted H1 imputation model
with an estimator other than BAYES. In this case, the model is
estimated immediately after the data are imputed. See Example 11.6.
The third path in the figure uses an H0 imputation model and
ESTIMATOR=BAYES. The H0 model specified in the MODEL
command is used to impute the data. See Example 11.7.
IMPUTE
The IMPUTE option is used to specify the analysis variables for which
missing values will be imputed. Data can be imputed for all or a subset
of the analysis variables. These variables can be continuous or
categorical. If they are categorical a letter c in parentheses must be
included after the variable name. If a variable is on the
CATEGORICAL list in the VARIABLE command, it must have a c in
parentheses following its name. A variable not on the CATEGORICAL
list can have a c in parentheses following its name. Following is an
example of how to specify the IMPUTE option:
516
TITLE, DATA, VARIABLE, And DEFINE Commands
where values will be imputed for the continuous variables y1, y2, y3, y4,
x1, and x2 and the categorical variables u1, u2, u3, and u4.
The keyword ALL can be used to indicate that values are to be imputed
for all variables in the dataset. The ALL option can be used with the c
setting, for example,
indicates that all of the variables in the data set are categorical.
NDATASETS
NDATASETS = 20;
where 20 is the number of imputed data sets that will be created. The
default for the NDATASETS option is 5.
SAVE
The SAVE option is used to save the imputed data sets for subsequent
analysis using TYPE=IMPUTATION in the DATA command. It is
specified as follows:
SAVE = impute*.dat;
517
CHAPTER 15
where the asterisk (*) is replaced by the number of the imputed data set.
A file is also produced that contains the names of all of the data sets. To
name this file, the asterisk (*) is replaced by the word list.
MODEL
MODEL = SEQUENTIAL;
VALUES
The closest value to the imputed value is used. If the imputed value is
2.7, the value 3 will be used.
ROUNDING
518
TITLE, DATA, VARIABLE, And DEFINE Commands
The value zero is used to specify no decimals, that is, integer values.
THIN
The THIN option is used to specify which intervals in the draws from
the posterior distribution are used for imputed values. The default is to
use every 100th iteration. To request that every 200th iteration be used,
specify:
THIN = 200;
When the data are rearranged, the set of outcomes is given a new
variable name and ID and repetition variables are created. These new
variable names must be placed on the USEVARIABLES statement of
519
CHAPTER 15
the VARIABLE command if they are used in the analysis. They must be
placed after any original variables. If the ID variable is used as a cluster
variable, this must be specified using the CLUSTER option of the
VARIABLE command.
WIDE
The WIDE option is used to identify sets of variables in the wide format
data set that will be converted into single variables in the long format
data set. These variables must be variables from the NAMES statement
of the VARIABLE command. The WIDE option is specified as follows:
where y1, y2, y3, and y4 represent one variable measured at four time
points and x1, x2, x3, and x4 represent another variable measured at four
time points.
LONG
The LONG option is used to provide names for the new variables in the
long format data set. There should be the same number of names as
there are sets of variables in the WIDE statement. The LONG option is
specified as follows:
LONG = y | x;
where y is the name assigned to the set of variables y1-y4 on the WIDE
statement and x is the name assigned to the set of variables x1-x4.
520
TITLE, DATA, VARIABLE, And DEFINE Commands
IDVARIABLE
The IDVARIABLE option is used to provide a name for the variable that
provides information about the unit to which the record belongs. In
univariate growth modeling, this is the person identifier which is used as
a cluster variable. The IDVARIABLE option is specified as follows:
IDVARIABLE = subject;
where subject is the name of the variable that contains information about
the unit to which the record belongs. If an id variable is specified using
the IDVARIABLE option of the VARIABLE command, the values of
this variable are used for the variable specified using the IDVARIABLE
option. This option is not required.
REPETITION
The REPETITION option is used to provide a name for the variable that
contains information on the order in which the variables were measured.
The REPETITION option is specified as follows:
REPETITION = time;
521
CHAPTER 15
When the data are rearranged, the outcome is given a set of new variable
names. These new variable names must be placed on the
USEVARIABLES statement of the VARIABLE command if they are
used in the analysis. They must be placed after any original variables.
LONG
The LONG option is used to identify the variables in the long format
data set that will be used to create sets of variables in the wide format
data set. These variables must be variables from the NAMES statement
of the VARIABLE command. The LONG option is specified as follows:
LONG = y | x;
where y and x are two variables that have been measured at multiple
time points which are represented by multiple records.
WIDE
The WIDE option is used to provide sets of names for the new variables
in the wide format data set. There should be the same number of sets of
names as there are variables in the LONG statement. The number of
names in each set corresponds to the number of time points at which the
variables in the long data set were measured. The WIDE option is
specified as follows:
where y1, y2, y3, and y4 are the names for the variable y in the wide data
set and x1, x2, x3, and x4 are the names for the variable x in the wide
data set.
522
TITLE, DATA, VARIABLE, And DEFINE Commands
IDVARIABLE
IDVARIABLE = subject;
where subject is the name of the variable that contains information about
the unit to which each record belongs. This variable becomes the
identifier for each observation in the wide data set. The IDVARIABLE
option of the VARIABLE command cannot be used to select a different
identifier.
REPETITION
REPETITION = time;
where time is the variable that contains information about the time at
which the variables in the long data set were measured. If the time
variable does not contain consecutive integer values starting at zero, the
time values must be given. For example,
specifies that the values 4, 8, and 16 are the values of the variable time.
The number of values should be equal to the number of variables in the
WIDE option and the order of the values should correspond to the order
of the variables.
523
CHAPTER 15
A set of binary and continuous variables are created using the value
specified in the CUTPOINT option of the DATA TWOPART command
or zero which is the default. The two variables are created using the
following rules:
1. If the value of the original variable is missing, both the new binary
and the new continuous variable values are missing.
2. If the value of the original variable is greater than the cutpoint value,
the new binary variable value is one and the new continuous variable
value is the log of the original variable as the default.
3. If the value of the original variable is less than or equal to the
cutpoint value, the new binary variable value is zero and the new
continuous variable value is missing.
NAMES
The NAMES option identifies the variables that are used to create a set
of binary and continuous variables. These variables must be variables
from the NAMES statement of the VARIABLE command. The NAMES
option is specified as follows:
NAMES = smoke1-smoke4;
524
TITLE, DATA, VARIABLE, And DEFINE Commands
CUTPOINT
The CUTPOINT option is used to provide the value that is used to divide
the original variables into a set of binary and continuous variables. The
default value for the CUTPOINT option is zero. The CUTPOINT
option is specified as follows:
CUTPOINT = 1;
where variables are created based on values being less than or equal to
one or greater than one.
BINARY
The BINARY option is used to assign names to the new set of binary
variables. The BINARY option is specified as follows:
BINARY = u1-u4;
where u1, u2, u3, and u4 are the names of the new set of binary
variables.
CONTINUOUS
CONTINUOUS = y1-y4;
where y1, y2, y3, and y4 are the names of the new set of continuous
variables.
TRANSFORM
525
CHAPTER 15
TRANSFORM = NONE;
526
TITLE, DATA, VARIABLE, And DEFINE Commands
NAMES
The NAMES option identifies the set of variables that are used to create
a set of binary variables that are indicators of missing data. These
variables must be variables from the NAMES statement of the
VARIABLE command. The NAMES option is specified as follows:
NAMES = drink1-drink4;
where drink1, drink2, drink3, and drink4 are the set of variables for
which a set of binary indicators of missing data are created.
BINARY
The BINARY option is used to assign names to the new set of binary
variables. The BINARY option is specified as follows:
BINARY = u1-u4;
where u1, u2, u3, and u4 are the names of the new set of binary
variables.
TYPE
527
CHAPTER 15
TYPE = SDROPOUT;
Following are the rules for creating the set of binary variables for the
MISSING setting:
Following are the rules for creating the set of binary variables for the
SDROPOUT setting:
1. The value one is assigned to the time point after the last time point
an individual is observed.
2. The value missing is assigned to all time points after the value of
one.
3. The value zero is assigned to all time points before the value of one.
Following are the rules for creating the set of binary variables for the
DDROPOUT setting:
1. The value one is assigned to the time point after the last time point
an individual is observed.
2. The value zero is assigned to all other time points.
DESCRIPTIVE
528
TITLE, DATA, VARIABLE, And DEFINE Commands
Dropouts after each time point – Individuals who drop out before the
next time point and do not return to the study
Non-dropouts after each time point – Individuals who do not drop out
before the next time point
Total Dropouts – Individuals who are missing at the last time point
Dropouts no intermittent missing – Individuals who do not return to
the study once they have dropped out
Dropouts intermittent missing – Individuals who drop out and return
to the study
Total Non-dropouts – Individuals who are present at the last time point
Non-dropouts complete data – Individuals with complete data
Non-dropouts intermittent missing – Individuals who have missing
data but are present at the last time point
Total sample
The first set of variables, y0-y5 defines the number of time points as six.
The last set of variables has only five measures. An asterisk (*) is used
as a placeholder for the first time point.
529
CHAPTER 15
2. If the value of the original variable is greater than the cutpoint value,
the new binary variable value is one which represents that the event
has occurred.
3. If the value of the original variable is less than or equal to the
cutpoint value, the new binary variable value is zero which
represents that the event has not occurred.
4. After a discrete-time survival variable for an observation is assigned
the value one, subsequent discrete-time survival variables for that
observation are assigned the value of the missing value flag.
NAMES
The NAMES option identifies the variables that are used to create a set
of binary event-history variables. These variables must be variables
from the NAMES statement of the VARIABLE command. The NAMES
option is specified as follows:
NAMES = dropout1-dropout4;
where dropout1, dropout2, dropout3, and dropout4 are the variables that
are used to create a set of binary event-history variables.
CUTPOINT
The CUTPOINT option is used provide the value to use to create a set of
binary event-history variables from a set of original variables. The
default value for the CUTPOINT option is zero. The CUTPOINT
option is specified as follows:
530
TITLE, DATA, VARIABLE, And DEFINE Commands
CUTPOINT = 1;
where variables are created based on values being less than or equal to
one or greater than one.
BINARY
The BINARY option is used to assign names to the new set of binary
event-history variables. The BINARY option is specified as follows:
BINARY = u1-u4;
where u1, u2, u3, and u4 are the names of the new set of binary event-
history variables.
These variables must come after any original variables. The creation of
the new variables in the DATA COHORT command occurs after any
transformations in the DEFINE command. Following is a description of
the options used in the DATA COHORT command.
COHORT
The COHORT option is used when data have been collected using a
multiple cohort design. The COHORT option is used in conjunction
with the TIMEMEASURES and TNAMES options that are described
below. Variables used with the COHORT option must be variables from
the NAMES statement of the VARIABLE command. Following is an
example of how the COHORT option is specified:
531
CHAPTER 15
COPATTERN
The COPATTERN option is used when data are both missing by design
and have been collected using a multiple cohort design. Variables used
with the COPATTERN option must be variables from the NAMES
statement of the VARIABLE command. Following is an example of
how the COPATTERN option is specified:
COHRECODE
532
TITLE, DATA, VARIABLE, And DEFINE Commands
TIMEMEASURES
where y1, y2, y3, y4, and y5 are original variables that are to be used in
the analysis, and the numbers in parentheses following each of these
variables represent the years in which they were measured. In this
situation, y1, y2, y3, y4, and y5 are the same measure, for example,
frequency of heavy drinking measured on multiple occasions.
TNAMES
The TNAMES option is used to generate variable names for the new
multiple cohort analysis variables. A root name is specified for each set
533
CHAPTER 15
TNAMES = hd;
There is no hd variable for ages 23 and 35, no dep variable for ages 23,
31, and 34, and no marstat variable for ages 24, 29, 34, and 36 because
these ages are not represented by the combination of cohort values and
years of measurement.
534
TITLE, DATA, VARIABLE, And DEFINE Commands
VARIABLE:
535
CHAPTER 15
536
TITLE, DATA, VARIABLE, And DEFINE Commands
The NAMES option is used to assign names to the variables in the data
set named using the FILE option of the DATA command. This option is
required. The variable names can be separated by blanks or commas and
can be up to 8 characters in length. Variable names must begin with a
letter. They can contain only letters, numbers, and the underscore
symbol. The program makes no distinction between upper and lower
case letters. Following is an example of how the NAMES option is
specified:
537
CHAPTER 15
USEOBSERVATIONS
USEVARIABLES
538
TITLE, DATA, VARIABLE, And DEFINE Commands
If all of the original variables plus some of the new variables are used in
the analysis, the keyword ALL can be used as the first entry in the
USEVARIABLES statement. This indicates that all of the original
variables from the NAMES statement of the VARIABLE command are
used in the analysis. The keyword ALL is followed by the names of the
new variables created using the DEFINE command or the DATA
transformation commands that will be used in the analysis. Following is
an example of how to specify the USEVARIABLES option for this
situation:
where ALL refers to the total set of original variables and hd1, hd2, and
hd3 are new variables created using the DEFINE command or the DATA
transformation commands.
539
CHAPTER 15
MISSING VALUES
MISSING
The period (.), the asterisk (*), or the blank can be used as non-numeric
missing value flags. Only one non-numeric missing value flag can be
used for a particular data set. This missing value flag applies to all
variables in the data set. The blank cannot be used with free format data.
With fixed format data, blanks in the data not declared as missing value
flags are treated as zeroes.
The following command indicates that the period is the missing value
flag for all variables in the data set:
MISSING ARE . ;
The blank can be a missing value flag only in fixed format data sets.
The following command indicates that blanks are to be considered as
missing value flags:
MISSING = BLANK;
540
TITLE, DATA, VARIABLE, And DEFINE Commands
The following statement specifies that the number 9 is the missing value
flag for all variables in the data set:
The following example specifies that for the variable ethnic, the
numbers 9 and 99 are missing value flags, while for the variable y1, the
number 1 is the missing value flag:
The list function can be used with the MISSING option to specify a list
of missing value flags and/or a set of variables. The order of variables in
the list is determined by the order of variables in the NAMES statement
of the VARIABLE command. Values of 9, 99, 100, 101, and 102 can be
declared as missing value flags for all variables in a data set by the
following specification:
The above statement specifies that the values of 9, 30, 98, 99, 100, 101,
and 102 are missing value flags for the list of variables beginning with
gender and ending with income.
541
CHAPTER 15
CENSORED
where y1, y2, y3, y4 are censored dependent variables in the analysis.
The letter a in parentheses following the variable name indicates that the
variable is censored from above. The letter b in parentheses following
the variable name indicates that the variable is censored from below.
The lower and upper censoring limits are determined from the data.
where y1, y2, y3, y4 are censored dependent variables in the analysis.
The letters ai in parentheses following the variable name indicate that
the variable is censored from above and that a censored-inflated model
will be estimated. The letters bi in parentheses following the variable
name indicate that the variable is censored from below and that a
censored-inflated model will be estimated. The lower and upper
censoring limits are determined from the data.
542
TITLE, DATA, VARIABLE, And DEFINE Commands
CATEGORICAL
where u2, u3, u7, u8, u9, u10, u11, u12, and u13 are binary or ordered
categorical dependent variables in the analysis.
543
CHAPTER 15
1234 0123
2345 0123
2589 0123
01 no recode needed
12 01
where u1, u2, and u3 are a set of ordered categorical variables and the
asterisk (*) in parentheses indicates that the categories of each variable
are to be recoded using the categories found in the data for the set of
variables not for each variable. Based on the original data shown in the
table below, where the rows represent observations and the columns
represent variables, the set of variables are found to have four possible
categories: 1, 2, 3, and 4. The variable u1 has observed categories 1
and 2; u2 has observed categories 1, 2, and 3; and u3 has observed
categories 2, 3, and 4. The recoded values are shown in the table below.
Categories in the Original Data Set Categories in the Recoded Data Set
u1 u2 u3 u1 u2 u3
1 2 3 0 1 2
1 1 2 0 0 1
2 2 2 1 1 1
2 3 4 1 2 3
544
TITLE, DATA, VARIABLE, And DEFINE Commands
where the set of variables u1, u2, and u3 can have the categories of 1, 2,
3, 4, 5, and 6. In this example, 1 will be recoded as 0, 2 as 1, 3 as 2, 4 as
3, 5 as 4, and 6 as 5.
where the set of variables u1, u2, and u3 can have the categories 2, 4,
and 6. In this example, 2 will be recoded as 0, 4 as 1, and 6 as 2.
specifies that for the variables u1, u2, and u3, the possible categories are
taken from the data for the set of variables; for the variables u4, u5, and
u6, the possible categories are 2, 3, 4, and 5; and for the variables u7, u8,
and u9, the possible categories are the default, that is, the possible
categories are taken from the data for each variable.
NOMINAL
545
CHAPTER 15
For nominal dependent variables, all categories but the last category can
be referred to. The last category is the reference category. The
categories are referred to in the MODEL command by adding to the
variable name the number sign (#) followed by a number. The three
categories of a four-category nominal variable are referred to as u1#1,
u1#2, and u1#3.
1234 0123
2345 0123
2589 0123
01 no recode needed
12 01
COUNT
The COUNT option can be specified in two ways for a Poisson model:
546
TITLE, DATA, VARIABLE, And DEFINE Commands
COUNT = u1 u2 u3 u4;
or
or
where u1, u2, u3, and u4 are count dependent variables in the analysis.
The letter i or pi in parentheses following the variable name indicates
that a zero-inflated Poisson model will be estimated.
547
CHAPTER 15
548
TITLE, DATA, VARIABLE, And DEFINE Commands
DSURVIVAL
DSURVIVAL = u1-u4;
GROUPING
The GROUPING option is used to identify the variable in the data set
that contains information on group membership when the data for all
groups are stored in a single data set. Multiple group analysis is
discussed in Chapter 14. A grouping variable must contain only integer
values. Only one grouping variable can be used. If the groups to be
analyzed are a combination of more than one variable, a single grouping
variable can be created using the DEFINE command. Following is an
example of how to specify the GROUPING option:
549
CHAPTER 15
where country is the grouping variable and 101 through 200, 225, and
350 through 360 are the values of country that will be used as groups.
The values of the variable country are used as labels in group-specific
MODEL commands.
IDVARIABLE
IDVARIABLE = id;
FREQWEIGHT
FREQWEIGHT IS casewgt;
550
TITLE, DATA, VARIABLE, And DEFINE Commands
TSCORES
where a1, a2, a3, and a4 are observed variables in the analysis data set
that contain the individually-varying times of observation for an
outcome at four time points.
AUXILIARY
Auxiliary variables are variables that are not part of the analysis model.
The AUXILIARY option has seven uses. One is to identify a set of
variables that is not used in the analysis but is saved for use in a
subsequent analysis. A second is to identify a set of variables that will
be used as missing data correlates in addition to the analysis variables.
The last five are used with TYPE=MIXTURE. Two are used to identify
a set of variables not used in the analysis that are possible covariates in a
multinomial logistic regression for a categorical latent variable. Three
are used to identify a set of variables not used in the analysis for which
the equality of means across latent classes will be tested. Only one of
these five can be used in an analysis at a time.
551
CHAPTER 15
where gender, race, and educ are variables that are not used in the
analysis but that are saved in conjunction with the SAVEDATA and/or
the PLOT commands.
where z1, z2, z3, and z4 are variables that will be used as missing data
correlates in addition to the analysis variables.
552
TITLE, DATA, VARIABLE, And DEFINE Commands
where race, ses, x1, x2, x3, x4, and x5 will be used as covariates in a
multinomial logistic regression in a mixture model.
where race, ses, x1, x2, x3, x4, and x5 will be used as covariates in a
multinomial logistic regression in a mixture model.
where the equality of means for race, ses, and gender will be tested
across classes.
553
CHAPTER 15
where the equality of means for drinks and depress will be separately
tested across classes.
where the equality of means for drinks and depress will be separately
tested across classes.
554
TITLE, DATA, VARIABLE, And DEFINE Commands
CONSTRAINT
CONSTRAINT = y1 u1;
PATTERN
The PATTERN option is used when data are missing by design. The
typical use is in situations when, because of the design of the study, all
variables are not measured on all individuals in the analysis. This can
555
CHAPTER 15
where design is a variable in the data set that has integer values of 1, 2,
and 3. The variable names listed after each number and the equal sign
are variables used in the analysis which should have no missing values
for observations with that value on the pattern variable. For example,
observations with the value of one on the variable design should have
information for variables y1, y3, and y5 and have missing values for y2
and y4. Observations with the value of three on the variable design
should have information for variables y1, y4, and y5 and have missing
values for variables y2 and y3. The pattern variable must contain only
integer values. Observations that have a value for the pattern variable
that is not specified using the PATTERN option are not included in the
analysis.
STRATIFICATION
556
TITLE, DATA, VARIABLE, And DEFINE Commands
STRATIFICATION IS region;
CLUSTER
CLUSTER IS school;
where school and class are the variables that contain clustering
information. The cluster variable for the highest level must come first,
that is, classrooms are nested in schools.
where neighbor and school are the variables that contain clustering
information. Students are nested in schools crossed with neighborhoods.
557
CHAPTER 15
where school and class are the variables that contain clustering
information. The clusters for TYPE=TWOLEVEL are classroom. The
standard error and chi-square computations for TYPE=COMPLEX are
based on school.
STRATIFICATION = region;
CLUSTER = school;
where the clusters for TYPE=TWOLEVEL are schools and the standard
error and chi-square computations for TYPE=COMPLEX are based on
region.
where psu, school, and class are the variables that contain clustering
information. The cluster variable for the highest level must come first,
that is, classrooms are nested in schools and schools are nested in psu’s.
The clusters for TYPE=THREELEVEL are classroom and school. The
standard error and chi-square computations for TYPE=COMPLEX are
based on psu’s.
WEIGHT
558
TITLE, DATA, VARIABLE, And DEFINE Commands
WEIGHT IS sampwgt;
WTSCALE
The UNSCALED setting uses the within weights from the data set with
no adjustment. The CLUSTER setting scales the within weights from
the data set so that they sum to the sample size in each cluster. The
ECLUSTER setting scales the within weights from the data so that they
sum to the effective sample size (Pothoff, Woodbury, & Manton, 1992).
559
CHAPTER 15
WTSCALE = ECLUSTER;
where scaling the within weights so that they sum to the effective sample
size is chosen.
BWEIGHT
BWEIGHT = bweight;
B2WEIGHT
B2WEIGHT = b2weight;
B3WEIGHT
560
TITLE, DATA, VARIABLE, And DEFINE Commands
B3WEIGHT = b3weight;
BWTSCALE
The UNSCALED setting uses the between weights from the data set
with no adjustment. The SAMPLE option adjusts the between weights
so that the product of the between and the within weights sums to the
total sample size.
BWTSCALE = UNSCALED;
REPWEIGHTS
561
CHAPTER 15
they can be used in the analysis and/or saved (Asparouhov & Muthén,
2009b).
REPWEIGHTS = rweight1-rweight80;
SUBPOPULATION
562
TITLE, DATA, VARIABLE, And DEFINE Commands
weights of zero (see Korn & Graubard, 1999, pp. 207-211). The
SUBPOPULATION option is not available for multiple group analysis.
SUBPOPULATION = gender EQ 2;
FINITE
563
CHAPTER 15
where sampfrac is the variable that contains the sampling fraction for
each stratum.
MIXTURE MODELS
There are three options that are used specifically for mixture models.
They are CLASSES, KNOWNCLASS, and TRAINING.
CLASSES
where c1, c2, and c3 are the names of the three categorical latent
variables in the model. The numbers in parentheses specify the number
of classes for each categorical latent variable in the model. The
categorical latent variable c1 has two classes, c2 has two classes, and c3
has three classes.
564
TITLE, DATA, VARIABLE, And DEFINE Commands
When there is more than one categorical latent variable in the model,
there are rules related to the order of the categorical latent variables.
The order is taken from the order of the categorical latent variables in
the CLASSES statement. Because of the order in the CLASSES
statement above, c1 is not allowed to be regressed on c2 in the model. It
is only possible to regress c2 on c1 and c3 on c2 or c1. This order
restriction does not apply to PARAMETERIZATION=LOGLINEAR.
KNOWNCLASS
Following is an example with many known classes that uses the list
function:
565
CHAPTER 15
last class consists of individuals with the value 115 on the variable
country. There are a total of 14 classes.
TRAINING
TRAINING = t1 t2 t3;
where t1, t2, and t3 are variables that contain information about latent
class membership. The variable t1 provides information about
membership in class 1, t2 provides information about membership in
class 2, and t3 provides information about membership in class 3. An
individual is allowed to be in any class for which they have a value of
one on a training variable. An individual who is known to be in class 2
would have values of 0, 1, and 0 on t1, t2, and t3, respectively. An
individual with unknown class membership would have the value of 1 on
t1, t2, and t3. An alternative specification is:
TRAINING = t1 t2 t3 (MEMBERSHIP);
566
TITLE, DATA, VARIABLE, And DEFINE Commands
TRAINING = t1 t2 t3 (PROBABILITIES);
where t1, t2, and t3 are variables that contain information about the
probability of latent class membership. The variable t1 provides
information about the probability of membership in class 1, t2 provides
information about the probability of membership in class 2, and t3
provides information about the probability of membership in class 3.
Priors can be used when individual class membership is not known but
when information is available on the probability of an individual being
in a certain class. For example, an individual who has a probability of .9
for being in class 1, .05 for being in class 2, and .05 for being in class 3
would have t1=.9, t2=.05, and t3=.05. Prior values must sum to one for
each individual.
TRAINING = t1 t2 t3 (PRIORS);
where t1, t2, and t3 are variables that contain information about the
probability of being in a certain class. The variable t1 provides
information about the probability of membership in class 1, t2 provides
information about the probability of membership in class 2, and t3
provides information about the probability of membership in class 3.
567
CHAPTER 15
MULTILEVEL MODELS
There are two options specific to multilevel models. They are WITHIN
and BETWEEN. Variables identified using the WITHIN and
BETWEEN options can be variables from the NAMES statement of the
VARIABLE command and variables created using the DEFINE
command and the DATA transformation commands.
WITHIN
WITHIN = y1 y2 x1;
where y1, y2, and x1 are variables measured on the individual level and
modeled on only the within level.
568
TITLE, DATA, VARIABLE, And DEFINE Commands
In the example, y1, y2, and y3 are variables measured on the individual
level and modeled on only level 1, student. Variables modeled on only
level 1 must precede variables modeled on the other levels. Y4, y5, and
y6 are variables measured on the individual level and modeled on level
1, student, and level 2, class, where class is the level 2 cluster variable.
Y7, y8, and y9 are variables measured on the individual level and
modeled on level 1, student, and level 3, school, where school is the
level 3 cluster variable.
WITHIN = y1-y3;
WITHIN = (class) y4-y6;
WITHIN = (school) y7-y9;
569
CHAPTER 15
In the example, y1, y2, and y3 are variables measured on the individual
level and modeled on only level 1, student. Variables modeled on only
level 1 must precede variables modeled on the other levels. Y4, y5, and
y6 are variables measured on the individual level and modeled on level
1, student, and level 2a, school, where school is the level 2a cluster
variable. Y7, y8, and y9 are variables measured on the individual level
and modeled on level 1, student, and level 2b, neighborhood, where
neighborhood is the level 2b cluster variable.
BETWEEN
BETWEEN = z1 z2 x1;
where z1, z2, and x1 are variables measured on the cluster level and
modeled on the between level. The BETWEEN option is also used to
identify between-level categorical latent variables with
TYPE=TWOLEVEL MIXTURE.
570
TITLE, DATA, VARIABLE, And DEFINE Commands
BETWEEN = y1-y3;
BETWEEN = (class) y4-y6;
BETWEEN = (school) y7-y9;
571
CHAPTER 15
SURVIVAL
572
TITLE, DATA, VARIABLE, And DEFINE Commands
SURVIVAL = t;
The keyword ALL can be used if the time intervals are taken from the
data. Following is an example of how this is specified:
SURVIVAL = t (ALL);
[t];
TIMECENSORED
573
CHAPTER 15
when an individual has not experienced the event before the study ends.
There must be the same number and order of variables in the
TIMECENSORED option as there are in the SURVIVAL option. The
variables that contain information about right censoring must be coded
so that zero is not censored and one is right censored. If they are not,
this can be specified as part of the TIMECENSORED option. The
TIMECENSORED option is specified as follows when the variable is
coded zero for not censored and one for right censored:
TIMECENSORED = tc;
The value one is automatically recoded to zero and the value 999 is
automatically recoded to one.
574
TITLE, DATA, VARIABLE, And DEFINE Commands
DEFINE:
_MISSING
variable = MEAN (list of variables);
variable = SUM (list of variables);
CUT variable or list of variables (cutpoints);
variable = CLUSTER_MEAN (variable);
CENTER variable or list of variables (GRANDMEAN);
CENTER variable or list of variables (GROUPMEAN);
CENTER variable or list of variables (GROUPMEAN
label);
STANDARDIZE variable or list of variables;
DO (#, #) expression;
575
CHAPTER 15
+ addition y + x;
- subtraction y - x;
* multiplication y * x;
/ division y / x;
** exponentiation y**2;
% remainder remainder of y/x;
NON-CONDITIONAL STATEMENTS
When a non-conditional statement is used to transform existing variables
or create new variables, the variable on the left-hand side of the equal
sign is assigned the value of the expression on the right-hand side of the
equal sign, for example,
y = y/100;
576
TITLE, DATA, VARIABLE, And DEFINE Commands
CONDITIONAL STATEMENTS
Conditional statements can also be used to transform existing variables
and to create new variables. Conditional statements take the following
form:
IF (y EQ 0) THEN u = _MISSING;
IF (y EQ _MISSING) THEN u = 1;
577
CHAPTER 15
The third option categorizes one or several variables using the same set
of cutpoints. The fourth option creates a variable that is the average for
each cluster of an individual-level variable. The fifth option centers a
variable by subtracting the grand mean or group mean from each value.
The sixth option standardizes a variable to have a mean of zero and a
standard deviation of one.
MEAN
The MEAN option is used to create a variable that is the average of a set
of variables. It is specified as follows:
where the variable mean is the average of variables y1, y3, and y5.
Averages are based on the set of variables with no missing values. Any
observation that has a missing value on all of the variables being
averaged is assigned a missing value on the mean variable.
The list function can be used with the MEAN option as follows:
Variables used with the MEAN option must be original variables from
the NAMES statement of the VARIABLE command or temporary
variables created using the DEFINE command. The order of the
variables for the list function is taken from the NAMES statement not
the USEVARIABLES statement.
SUM
The SUM option is used to create a variable that is the sum of a set of
variables. It is specified as follows:
578
TITLE, DATA, VARIABLE, And DEFINE Commands
where the variable sum is the sum of variables y1, y3, and y5. Any
observation that has a missing value on one or more of the variables
being summed is assigned a missing value on the sum variable.
The list function can be used with the SUM option as follows:
Variables used with the SUM option must be original variables from the
NAMES statement of the VARIABLE command or temporary variables
created using the DEFINE command. The order of the variables for the
list function is taken from the NAMES statement not the
USEVARIABLES statement.
CUT
The CUT option categorizes a variable or list of variables using the same
set of cutpoints. More than one CUT statement can be included in the
DEFINE command. Following is an example of how the CUT option is
used:
This statement results in the variables y1, y5, y6, and y7 having three
categories: less than or equal to 30, greater than 30 and less than or
equal to 40, and greater than 40, with values of 0, 1, and 2, respectively.
Any observation that has a missing value on a variable that is being cut
is assigned a missing value on the cut variable.
Variables used with the CUT option must be original variables from the
NAMES statement of the VARIABLE command or temporary variables
created using the DEFINE command. The order of the variables for the
list function is taken from the NAMES statement not the
USEVARIABLES statement.
579
CHAPTER 15
CLUSTER_MEAN
where the variable clusmean is the average of the values of x for each
cluster. Averages are based on the set of non-missing values for the
observations in each cluster. Any cluster for which all observations have
missing values is assigned a missing value on the cluster mean variable.
CENTER
580
TITLE, DATA, VARIABLE, And DEFINE Commands
GRANDMEAN
where x1, x2, x3, and x4 are the variables to be centered using the
overall means for these variables.
where x1, x2, x3, and x4 are the variables to be centered using the
overall means for variables modeled on level 1 and higher, the cluster 1
581
CHAPTER 15
means for variables modeled on level 2 and higher, and the cluster 2
means for variables modeled on level 3 and higher.
where x1, x2, x3, and x4 are the variables to be centered using the
overall means for variables modeled on level 1 and higher, the cluster 2a
means for variables modeled on level 2a, and the cluster 2b means for
variables modeled on level 2b.
GROUPMEAN
where x1, x2, x3, and x4 are the variables to be centered using the
cluster means for these variables.
582
TITLE, DATA, VARIABLE, And DEFINE Commands
where school is the cluster 2 variable and x1, x2, x3, and x4 are the
variables to be centered using cluster 2 means for these variables.
In this example, class is the cluster 1 variable and x1 and x2 are the
variables to be centered using cluster 1 means for these variables.
School is the cluster 2 variable and x3 and x4 are the variables to be
centered using cluster 2 means for these variables. A variable cannot be
centered using both cluster 1 and cluster 2 means.
583
CHAPTER 15
In this example, school is the cluster 2a variable and x1, x2, x3, and x4
are the variables to be centered using cluster 2a means for these
variables. Neighbor is the cluster 2b variable and x5, x6, x7, and x8 are
the variables to be centered using the cluster 2b means for these
variables.
STANDARDIZE
where the variables y1, y5 through y10, and y14 will be standardized.
Any observation that has a missing value on the variable to be
standardized is assigned a missing value on the standardized variable.
DO
584
TITLE, DATA, VARIABLE, And DEFINE Commands
where the numbers in parentheses give the range of values the do loop
will use. The number sign (#) is replaced by these values during the
execution of the do loop. Following are the transformations that are
executed based on the DO option specified above:
diff1 = y1 - x1;
diff2 = y2 - x2;
diff3 = y3 - x3;
diff4 = y4 - x4;
diff5 = y5 - x5;
585
CHAPTER 15
586
ANALYSIS Command
CHAPTER 16
ANALYSIS COMMAND
ANALYSIS:
587
CHAPTER 16
588
ANALYSIS Command
589
CHAPTER 16
590
ANALYSIS Command
591
CHAPTER 16
TYPE
The TYPE option is used to describe the type of analysis. There are six
major analysis types in Mplus: GENERAL, MIXTURE, TWOLEVEL,
THREELEVEL, CROSSCLASSIFIED, and EFA. GENERAL is the
default.
The default is to estimate the model under missing data theory using all
available data; to include means, thresholds, and intercepts in the model;
to compute standard errors; and to compute chi-square when available.
These defaults can be overridden. The LISTWISE option of the DATA
command can be used to delete all observations from the analysis that
have one or more missing values on the set of analysis variables. For
592
ANALYSIS Command
GENERAL
• Regression analysis
• Path analysis
• Confirmatory factor analysis
• Structural equation modeling
• Growth modeling
• Discrete-time survival analysis
• Continuous-time survival analysis
Special features available with the above models for all observed
outcome variable types are:
593
CHAPTER 16
or simply,
TYPE = RANDOM;
MIXTURE
594
ANALYSIS Command
For models that include both continuous and categorical latent variables,
observed outcome variables can be continuous, censored, binary, ordered
categorical (ordinal), counts, or combinations of these variable types. In
addition, for regression analysis and path analysis for non-mediating
outcomes, observed outcome variables can also be unordered categorical
(nominal). Following are models that can be estimated using
TYPE=MIXTURE with both continuous and categorical latent variables:
Special features available with the above models for all observed
outcome variable types are:
595
CHAPTER 16
TWOLEVEL
Special features available for two-level models for all observed outcome
variable types are:
596
ANALYSIS Command
THREELEVEL
597
CHAPTER 16
CROSSCLASSIFIED
• Missing data
• Random slopes
EFA
598
ANALYSIS Command
Special features available for EFA for all observed outcome variable
types are:
• Missing data
• Complex survey data
Following are the other TYPE settings that can be used in conjunction
with TYPE=EFA along with a brief description of their functions:
TYPE = EFA 1 3;
where the two numbers following EFA are the lower and upper limits of
the number of factors to be extracted. In the example above factor
solutions are given for one, two, and three factors.
where the first two numbers, 3 and 4, are the lower and upper limits of
the number of factors to be extracted on the within level, UW* specifies
that an unrestricted within-level model is estimated, the second two
numbers, 1 and 2, are the lower and upper limits of the number of factors
to be extracted on the between level, and UB* specifies that an
unrestricted between-level model is estimated. The within- and
between-level specifications are crossed. In the example shown above,
the three- and four-factor models and the unrestricted model on the
599
CHAPTER 16
within level are estimated in combination with the one- and two-factor
models and the unrestricted model on between resulting in nine
solutions.
specifies that three- and four-factors models on the within level are
estimated in combination with an unrestricted model on the between
level.
ESTIMATOR
The ESTIMATOR option is used to specify the estimator to be used in
the analysis. The default estimator differs depending on the type of
analysis and the measurement scale of the dependent variable(s). Not all
estimators are available for all models. Following is a table that shows
which estimators are available for specific models and variable types.
The information is broken down by models with all continuous
dependent variables, those with at least one binary or ordered categorical
dependent variable, and those with at least one censored, unordered
categorical, or count dependent variable. All of the estimators require
individual-level data except ML for TYPE=GENERAL and EFA, GLS,
and ULS which can use summary data. The default settings are
indicated by bold type.
The first column of the table shows the combinations of TYPE settings
that are allowed. The second column shows the set of estimators
available for the analysis types in the first column for a model with all
600
ANALYSIS Command
601
CHAPTER 16
602
ANALYSIS Command
603
CHAPTER 16
BAYESIAN ESTIMATION
604
ANALYSIS Command
ESTIMATOR=BAYES;
PARAMETERIZATION
The PARAMETERIZATION option is used for two purposes. The first
purpose is to change from the default Delta parameterization to the
alternative Theta parameterization when TYPE=GENERAL is used, at
least one observed dependent variable is categorical, and weighted least
squares estimation is used in the analysis. The second purpose is to
change from the default logit regression parameterization to either the
loglinear or probability parameterization when TYPE=MIXTURE and
more than one categorical latent variable is used in the analysis.
605
CHAPTER 16
PARAMETERIZATION = THETA;
LINK
The LINK option is used with maximum likelihood estimation to select a
logit or probit link for models with categorical outcomes. The default is
a logit link. Following is an example of how to request a probit link:
LINK = PROBIT:
ROTATION
The ROTATION option is used with TYPE=EFA to specify the type of
rotation of the factor loading matrix to be used in exploratory factor
analysis. The default is the GEOMIN oblique rotation (Yates, 1987;
606
ANALYSIS Command
Standard errors are available as the default for all rotations except
PROMAX and VARIMAX. THE NOSERROR options of the OUTPUT
command can be used to request that standard errors not be computed.
The following rotations are available:
GEOMIN
QUARTIMIN
CF-VARIMAX
CF-QUARTIMAX
CF-EQUAMAX
CF-PARSIMAX
CF-FACPARSIM
CRAWFER
OBLIMIN
PROMAX
VARIMAX
TARGET
BI-GEOMIN
BI-CF-QUARTIMAX
607
CHAPTER 16
The Geomin rotation algorithm often finds several local minima of the
rotation function (Browne, 2001). To find a global minimum, 30
random rotation starts are used as the default. The RSTARTS option of
the ANALYSIS command can be used to change the default.
or
CF-QUARTIMAX (OBLIQUE)
CRAWFER (OBLIQUE 0)
OBLIMIN (OBLIQUE 0)
CF-VARIMAX 1/p
CF-QUARTIMAX 0
CF-EQUAMAX m/2p
CF-PARSIMAX (m-1)/(p+m-2)
CF-FACPARSIM 1
608
ANALYSIS Command
where .5 is the value of kappa. The kappa value can also be changed for
an oblique rotation as follows:
or
The default for the OBLIMIN rotation is oblique with a gamma value of
0. Gamma can take on any value. Following is an example of how to
specify an orthogonal rotation for the OBLIMIN rotation and to specify
a gamma value different than 0:
where 1 is the value of gamma. The gamma value can also be changed
for an oblique rotation as follows:
or
609
CHAPTER 16
The VARIMAX and PROMAX rotations are the same rotations as those
available in earlier versions of Mplus. The VARIMAX rotation is the
same as the CF-VARIMAX orthogonal rotation except that VARIMAX
row standardizes the factor loading matrix before rotation.
ROWSTANDARDIZATION
The ROWSTANDARDIZATION option is used with exploratory factor
analysis (EFA) and when a set of EFA factors is part of the MODEL
command to request row standardization of the factor loading matrix
before rotation. The ROWSTANDARDIZATION option has three
settings: CORRELATION, KAISER, and COVARIANCE. The
CORRELATION setting rotates a factor loading matrix derived from a
correlation matrix with no row standardization. The KAISER setting
610
ANALYSIS Command
ROWSTANDARDIZATION = KAISER;
PARALLEL
The PARALLEL option is used with TYPE=EFA to determine the
optimum number of factors in an exploratory factor analysis. It is
available for continuous outcomes using maximum likelihood
estimation. Parallel analysis (see, for example, Fabrigar, Wegener,
MacCallum, & Strahan, 1999; Hayton, Allen, & Scarpello, 2004) is a
method that uses random data with the same number of observations and
variables as the original data. The correlation matrix of the random data
is used to compute eigenvalues. These eigenvalues are compared to the
eigenvalues of the original data. The optimum number of factors is the
number of the original data eigenvalues that are larger than the random
data eigenvalues. TYPE=PLOT2 of the PLOT command gives a plot of
the sample eigenvalues, the parallel analysis eigenvalues, and the
parallel analysis eigenvalues for the 95th percentile. The PARALLEL
option is specified as follows:
PARALLEL = 50;
MODEL
The MODEL option is used to make changes to the defaults of the
MODEL command. The MODEL option has three settings:
NOMEANSTRUCTURE, NOCOVARIANCES, and ALLFREE. The
611
CHAPTER 16
MODEL = NOCOVARIANCES;
MODEL: %OVERALL%
f1 BY y1-y3* (lam#_1-lam#_3);
f2 BY y4-y6* (lam#_4-lam#_6);
[y1-y6] (nu#_1-nu#_6);
MODEL PRIORS:
DO(1,6) DIFF(lam1_#-lam10_#)~N(0,0.01);
DO(1,6) DIFF(nu1_#-nu10_#)~N(0,0.01);
In the overall part of the model, labels are assigned to the factor loadings
and the intercepts using automatic labeling for groups. The labels must
include the number sign (#) followed by the underscore (_) symbol
followed by a number. The number sign (#) refers to a group and the
number refers to a parameter. The label lam#_1 is assigned to the factor
612
ANALYSIS Command
loading for y1; the label lam#_2 is assigned to the factor loading for y2;
and the label lam#_3 is assigned to the factor loading for y3. These
labels are expanded to include group information. For example, the
label for parameter 1 is expanded across the ten groups to give labels
lam1_1, lam2_1 through lam10_1. In MODEL PRIORS, these expanded
labels are used to assign zero-mean and small-variance priors to the
differences across groups of the factor loadings and intercepts using the
DO and DIFF options. They can be used together to simplify the
assignment of priors to a large set of difference parameters for models
with multiple groups and multiple time points. For the DO option, the
numbers in parentheses give the range of values for the do loop. The
number sign (#) is replaced by these values during the execution of the
do loop. The numbers refer to the six factor indicators.
REPSE
The REPSE option is used to specify the resampling method that was
used to create existing replicate weights or will be used to generate
replicate weights (Fay, 1989; Korn & Graubard, 1999; Lohr, 1999;
Asparouhov, 2009). Replicate weights are used in the estimation of
standard errors of parameter estimates. The REPSE option has six
settings: BOOTSTRAP, JACKKNIFE, JACKKNIFE1, JACKKNIFE2,
BRR and FAY. There is no default. The REPSE option must be
specified when replicate weights are used or generated.
613
CHAPTER 16
REPSE = BRR;
For the FAY resampling method, a constant can be given that is used to
modify the sample weights. The constant must range between zero and
one. The default is .3. The REPSE option for the FAY setting is
specified as follows:
BASEHAZARD
The BASEHAZARD option is used in continuous-time survival analysis
to specify whether a non-parametric or a parametric baseline hazard
function is used in the estimation of the model. The default is OFF
which uses the non-parametric baseline hazard function. Following is an
example of how to request a parametric baseline hazard function:
BASEHAZARD = ON;
With TYPE=MIXTURE, the OFF setting has two settings, EQUAL and
UNEQUAL. The EQUAL setting is the default. With the EQUAL
setting, the baseline hazard parameters are held equal across classes. To
relax this equality, specify:
CHOLESKY
The CHOLESKY option is used in conjunction with
ALGORITHM=INTEGRATION to decompose the continuous latent
variable covariance matrix and the observed variable residual covariance
matrix into orthogonal components in order to improve the optimization.
The optimization algorithm starts out with Fisher Scoring used in
combination with EM. The CHOLESKY option has two settings: ON
and OFF. The default when all dependent variables are censored,
categorical, and counts is ON except for categorical dependent variables
614
ANALYSIS Command
CHOLESKY = ON;
ALGORITHM
The ALGORITHM option is used in conjunction with
TYPE=MIXTURE, TYPE=RANDOM, and TYPE=TWOLEVEL with
maximum likelihood estimation to indicate the optimization method to
use to obtain maximum likelihood estimates and to specify whether the
computations require numerical integration. The ALGORITHM option
is used with TYPE=TWOLEVEL and weighted least squares estimation
to indicate the optimization method to use to obtain sample statistics for
model estimation. There are four settings related to the optimization
method: EM, EMA, FS, and ODLL. The default depends on the analysis
type.
ALGORITHM = EM;
ALGORITHM = INTEGRATION;
615
CHAPTER 16
INTEGRATION
INTEGRATION = 10;
INTEGRATION = MONTECARLO;
616
ANALYSIS Command
MCSEED
MCSEED = 23456;
ADAPTIVE
ADAPTIVE = OFF;
INFORMATION
The INFORMATION option is used to select the estimator of the
information matrix to be used in computing standard errors when the ML
or MLR estimators are used for analysis. The INFORMATION option
has three settings: OBSERVED, EXPECTED, and COMBINATION.
OBSERVED estimates the information matrix using observed second-
order derivatives; EXPECTED estimates the information matrix using
expected second-order derivatives; and COMBINATION estimates the
information matrix using a combination of observed and expected
second-order derivatives. For MLR, OBSERVED, EXPECTED, and
COMBINATION refer to the outside matrices of the sandwich estimator
used to compute standard errors. The INFORMATION option is
specified as follows:
INFORMATION = COMBINATION;
The default is to estimate models under missing data theory using all
available data. In this case, the observed information matrix is used.
For models with all continuous outcomes that are estimated without
numerical integration, the expected information matrix is also available.
For other outcome types and models that are estimated with numerical
integration, the combination information matrix is also available.
617
CHAPTER 16
The defaults for the information matrix when the LISTWISE option of
the DATA command is used are summarized in the tables below. The
information matrix defaults vary depending on the analysis type. The
bolded entry is the default. Only the ML and MLR estimators have
choices beyond the default. Following is the information matrix table
for models with all continuous dependent variables that are estimated
without numerical integration:
Type of Analysis
TYPE=
GENERAL ML Observed
MLR Expected
------------------- -------------------
MLM Expected
MLMV
------------------- -------------------
MLF Observed
618
ANALYSIS Command
THREELEVEL ML Observed
THREELEVEL RANDOM MLR
COMPLEX THREELEVEL MLR Observed
COMPLEX THREELEVEL RANDOM
EFA ML Observed
Expected
-------------------- --------------------
MLR Observed
Combination
-------------------- --------------------
MLF Observed
-------------------- --------------------
MLM Expected
MLMV
Following is the information matrix table for models with at least one
binary, ordered categorical (ordinal), censored, unordered categorical
(nominal), or count dependent variable and for models estimated using
numerical integration:
Type of Analysis
TYPE=
GENERAL ML Observed
GENERAL RANDOM MLR Combination
MIXTURE ------------------- -------------------
MIXTURE RANDOM MLF Observed
TWOLEVEL
TWOLEVEL RANDOM
TWOLEVEL MIXTURE
TWOLEVEL MIXTURE RANDOM
GENERAL COMPLEX MLR Observed
GENERAL COMPLEX RANDOM Combination
MIXTURE COMPLEX
MIXTURE COMPLEX RANDOM
COMPLEX TWOLEVEL
COMPLEX TWOLEVEL RANDOM
COMPLEX TWOLEVEL MIXTURE
COMPLEX TWOLEVEL MIXTURE RANDOM
EFA COMPLEX
EFA ML Observed
EFA MIXTURE MLR Combination
-------------------- --------------------
MLF Observed
619
CHAPTER 16
BOOTSTRAP
The BOOTSTRAP option is used to request bootstrapping and to specify
the type of bootstrapping and the number of bootstrap draws to be used
in the computation. Two types of bootstrapping are available, standard
and residual (Bollen & Stine, 1992; Efron & Tibshirani, 1993; Enders,
2002). Residual bootstrap is the Bollen-Stine bootstrap. The
BOOTSTRAP option requires individual data.
BOOTSTRAP = 500;
620
ANALYSIS Command
LRTBOOTSTRAP
The LRTBOOTSTRAP option is used in conjunction with the TECH14
option of the OUTPUT command to specify the number of bootstrap
draws to be used in estimating the p-value of the parametric
bootstrapped likelihood ratio test (McLachlan & Peel, 2000). The
default number of bootstrap draws is determined by the program using a
sequential method in which the number of draws varies from 2 to 100.
The LRTBOOTSTRAP option is used to override this default.
LRTBOOTSTRAP = 100;
621
CHAPTER 16
STARTS
STARTS = 0;
specifies that 100 random sets of starting values are generated in the
initial stage and 20 optimizations are carried out in the final stage using
the default optimization settings for TYPE=MIXTURE.
622
ANALYSIS Command
or
STARTS = 10;
which specifies that 10 random sets of starting values are generated and
ten optimizations are carried out.
STITERATIONS
STITERATIONS = 20;
STCONVERGENCE
STSCALE
STSEED
The STSEED option is used to specify the random seed for generating
the random starts. The default value is zero.
623
CHAPTER 16
OPTSEED
The OPTSEED option is used to specify the random seed that has been
found to result in the highest loglikelihood in a previous analysis. The
OPTSEED option results in no random starts being used.
K-1STARTS
The K-1STARTS option is used in conjunction with the TECH11 and
TECH14 options of the OUTPUT command to specify the number of
starting values to use in the initial stage and the number of optimizations
to use in the final stage for the k-1 class analysis model. When the
OPTSEED option is used, the default is 20 random sets of starting
values in the initial stage and 4 optimizations in the final stage. When
the OPTSEED option is not used, the default is the same as what is used
for the STARTS option. Following is an example of how to specify the
K-1STARTS option:
K-1STARTS = 80 16;
LRTSTARTS
The LRTSTARTS option is used in conjunction with the TECH14
option of the OUTPUT command to specify the number of starting
values to use in the initial stage and the number of optimizations to use
in the final stage for the k-1 and k class models when the data generated
by bootstrap draws are analyzed. The default for the k-1 class model is 0
random sets of starting values in the initial stage and 0 optimizations in
the final stage. One optimization is carried out for the unperturbed set of
starting values. The default for the k class model is 40 random sets of
starting values in the initial stage and 8 optimizations in the final stage.
LRTSTARTS = 2 1 80 16;
624
ANALYSIS Command
which specifies that for the k-1 class model 2 random sets of starting
values are used in the initial stage and 1 optimization is carried out in the
final stage and for the k class model 80 random sets of starting values
are used in the initial stage and 16 optimizations are carried out in the
final stage.
RSTARTS
The RSTARTS option is used to specify the number of random sets of
starting values to use for the GPA rotation algorithm and the number of
rotated factor solutions with the best unique rotation function values to
print for exploratory factor analysis. The default is 30 random sets of
starting values and printing of the best solution. Following is an
example of how to use the RSTARTS option.
RSTARTS = 10 2;
which specifies that 10 random sets of starting values are used for the
rotations and that the rotated factor solutions with the two best rotation
function values will be printed.
DIFFTEST
The DIFFTEST option is used to obtain a correct chi-square difference
test when the MLMV and the WLSMV estimators are used because the
difference in chi-square values for two nested models using the MLMV
or WLSMV chi-square values is not distributed as chi-square. The chi-
square difference test compares the H0 analysis model to a less
restrictive H1 alternative model in which the H0 model is nested. To
obtain a correct chi-square difference test for MLMV or WLSMV, a two
step procedure is needed. In the first step, the H1 model is estimated. In
the H1 analysis, the DIFFTEST option of the SAVEDATA command is
used to save the derivatives needed for the chi-square difference test. In
the second step, the H0 model is estimated and the chi-square difference
test is computed using the derivatives from the H0 and H1 analyses. The
DIFFTEST option of the ANALYSIS command is used as follows to
specify the name of the data set that contains the derivatives from the H1
analysis:
DIFFTEST = deriv.dat;
625
CHAPTER 16
where deriv.dat is the name of the data set that contains the derivatives
from the H1 analysis that were saved using the DIFFTEST option of the
SAVEDATA command when the H1 model was estimated.
MULTIPLIER
The MULTIPLIER option is used with the JACKKNIFE setting of the
RESPE option when replicate weights are used in the analysis to provide
multiplier values needed for the computation of standard errors. The
MULTIPLIER option is specified as follows:
MULTIPLIER = multiplier.dat;
where multiplier.dat is the name of the data set that contains the
multiplier values needed for the computation of standard errors.
COVERAGE
The COVERAGE option is used with missing data to specify the
minimum acceptable covariance coverage value for the unrestricted H1
model. The default value is .10 which means that if all variables and
pairs of variables have data for at least ten percent of the sample, the
model will be estimated. Following is an example of how to use the
COVERAGE option:
COVERAGE = .05;
ADDFREQUENCY
The ADDFREQUENCY option is used to specify a value that is divided
by the sample size and added to each cell with zero frequency in the two-
way tables that are used in categorical data analysis. As the default, 0.5
divided by the sample size is added to each cell with zero frequency.
The ADDFREQUENCY option is specified as follows:
ADDFREQUENCY = 0;
626
ANALYSIS Command
where the value 0 specifies that nothing is added to each cell with zero
frequency. Any non-negative value can be used with this option.
ITERATIONS
SDITERATIONS
H1ITERATIONS
MITERATIONS
MCITERATIONS
MUITERATIONS
627
CHAPTER 16
RITERATIONS
CONVERGENCE
H1CONVERGENCE
LOGCRITERION
RLOGCRITERION
628
ANALYSIS Command
MCONVERGENCE
MCCONVERGENCE
MUCONVERGENCE
RCONVERGENCE
629
CHAPTER 16
MIXC
MIXC = CONVERGENCE;
MIXU
MIXU = CONVERGENCE;
LOGHIGH
The LOGHIGH option is used to specify the maximum value allowed for
the logit thresholds of the latent class indicators. The default is +15.
LOGLOW
The LOGLOW option is used to specify the minimum value allowed for
the logit thresholds of the latent class indicators. The default is -15.
UCELLSIZE
The UCELLSIZE option is used to specify the minimum expected cell
size allowed for computing chi-square from the frequency table of the
latent class indicators when the corresponding observed cell size is not
zero. The default value is .01.
630
ANALYSIS Command
VARIANCE
The VARIANCE option is used in conjunction with TYPE=RANDOM
and TYPE=TWOLEVEL when ESTIMATOR=ML, MLR, or MLF to
specify the minimum value that is allowed in the estimation of the
variance of the random effect variables and the variances of the between-
level outcome variables. The default value is .0001.
MATRIX
The MATRIX option identifies the matrix to be analyzed. The default
for continuous outcomes is to analyze the covariance matrix. The
following statement requests that a correlation matrix be analyzed:
MATRIX = CORRELATION;
The POINT option is used to specify the type of Bayes point estimate to
compute. The POINT option has three settings: MEDIAN, MEAN, and
MODE. The default is MEDIAN. With the MODE setting, the mode
reported refers to the multivariate mode of the posterior distribution.
This mode is different from the univariate mode reported in the plot of
the Bayesian posterior parameter distribution. To request that the mean
be computed, specify:
POINT = MEAN;
631
CHAPTER 16
CHAINS
CHAINS = 4;
With multiple chains, parallel computing uses one chain per processor.
To benefit from this speed advantage, it is important to specify the
number of processors using the PROCESSORS option.
BSEED
The BSEED option is used to specify the seed to use for random number
generation in the Markov chain Monte Carlo (MCMC) chains. The
default is zero. If one chain is used, the seed is used for this chain. If
more than one chain is used, the seed is used for the first chain and is the
basis for generating seeds for the other chains. The randomly generated
seeds for the other chains can be found in TECH8. If the same seed is
used in a subsequent analysis, the other chains will have the same seeds
as in the previous analysis. To request a seed other than zero be used,
specify:
BSEED = 5437;
STVALUES
STVALUES = ML;
632
ANALYSIS Command
MEDIATOR
MEDIATOR = OBSERVED;
ALGORITHM
633
CHAPTER 16
ALGORITHM = MH;
BCONVERGENCE
a = 1 + BCONVERGENCE* factor,
such that convergence is obtained when PSR < a for each parameter.
The factor value ranges between one and two depending on the number
of parameters. With one parameter, the value of factor is one and the
value of a is 1.05 using the default value of BCONVERGENCE. With a
large number of parameters, the value of factor is 2 and the value of a is
1.1 using the default value of BCONVERGENCE.
With a single chain, PSR is defined using the third and the fourth
quarters of the chain. The first half of the chain is discarded as a burnin
phase. To request a stricter convergence criterion, specify:
BCONVERGENCE = .01;
BITERATIONS
634
ANALYSIS Command
BITERATIONS = (2000);
FBITERATIONS
FBITERATIONS = 30000;
THIN
The THIN option is used to specify which iterations from the posterior
distribution to use in the parameter estimation. When a chain is mixing
poorly with high auto-correlations, the estimation can be based on every
k-th iteration rather than every iteration. This is referred to as thinning.
The default is 1 in which case every iteration is used. To request that
every 20th iteration be used, specify:
THIN = 20;
DISTRIBUTION
The distribution option is used with the MODE setting of the POINT
option to specify the maximum number of iterations to use to compute
635
CHAPTER 16
DISTRIBUTION = 15000;
KOLMOGOROV
KOLMOGOROV = 1000;
PRIOR
The PRIOR option is used to request a plot of the prior distribution for
each parameter that has a proper prior. The plot of the prior
distributions can be viewed by choosing Bayesian prior distributions
from the Plot menu of the Mplus Editor. The default is 1,000 draws
from the prior distribution. To request more draws, specify:
PRIOR = 5000;
INTERACTIVE
The INTERACTIVE option is used to allow changes in technical
specifications during the iterations of an analysis when TECH8 is used.
This is useful in analyses that are computationally demanding. If a
starting value set has computational difficulties, it can be skipped. If too
many random starts have been chosen, the STARTS option can be
changed. If a too strict convergence criterion has been chosen, the
MCONVERGENCE option can be changed. Following is an example of
how to use the INTERACTIVE option:
INTERACTIVE = control.dat;
636
ANALYSIS Command
where control.dat is the name of the file that contains the technical
specifications that can be changed during an analysis. This file is
created automatically and resides in the same directory as the input file.
The following options of the ANALYSIS command are contained in this
file: STARTS, MITERATIONS, MCONVERGENCE,
LOGCRITERION, and RLOGCRITERION. No other options can be
used in this file except the INTERRUPT statement which is used to skip
the current starting value set and go to the next starting value set. It has
settings of 0 and 1. A setting of 0 specifies that a starting value set is not
skipped. A setting of 1 specifies that the starting value set is skipped.
As the default, the INTERRUPT statement is set to 0 and the other
options are set to either the program default values or the values
specified in the input file.
The following file is automatically created and given the name specified
using the INTERACTIVE option.
INTERRUPT = 0
STARTS = 200 50
MITERATIONS = 500
MCONVERGENCE = 1.0E-06
LOGCRITERION = 1.0E-003
RLOGCRITERION = 1.0E-006
When the file is modified and saved, the new settings go into effect
immediately and are applied at each iteration. Following is an example
of a modified control.dat file where INTERRUPT and STARTS are
changed:
INTERRUPT = 1
STARTS = 150 50
MITERATIONS = 500
MCONVERGENCE = 1.0E-06
LOGCRITERION = 1.0E-003
RLOGCRITERION = 1.0E-006
PROCESSORS
The PROCESSORS option is used to specify the number of processors
to be used for parallel computing to increase computational speed.
When random starts are used, the PROCESSORS option is used in
637
CHAPTER 16
MULTIPLE PROCESSORS
PROCESSORS = 8;
PROCESSORS = 2;
When processor and threads are used together, the threads are distributed
across the processors and the memory used is a multiple of the number
of threads. For large models that require a lot of memory, it is important
to have fewer threads than processors because computations are slower
or impossible when the memory used by all processors exceeds the
memory limit.
638
ANALYSIS Command
The use of multiple processors and multiple threads with random starts
as the default is available for TYPE=MIXTURE; Bayesian analysis with
more than one chain if STVALUES=ML; and models that require
numerical integration. They are also available for TYPE=RANDOM
and TYPE=TWOLEVEL and THREELEVEL with continuous outcomes
using ESTIMATOR=ML, MLR, and MLF without numerical integration
if the STARTS option is used. Without random starts only one
processor is used in these cases.
PROCESSORS = 8 4;
STARTS = 400 40;
PROCESSORS = 4;
639
CHAPTER 16
640
MODEL Command
CHAPTER 17
MODEL COMMAND
VARIABLES
There are three important distinctions that need to be made about the
variables in an analysis in order to be able to specify a model. The
distinctions are whether variables are observed or latent, whether
variables are dependent or independent, and the scale of the observed
dependent variables.
641
CHAPTER 17
Observed and latent variables can play the role of a dependent variable
or an independent variable in the model. The distinction between
dependent and independent variables is that of a regression analysis for y
regressed on x where y is a dependent variable and x is an independent
variable. An independent variable is one that is not influenced by any
other variable. Dependent variables are those that are influenced by
other variables. Other terms used for dependent variables are outcome
variable, response variable, indicator variable, y variable, and
endogenous variable. Other terms used for independent variables are
covariate, background variable, explanatory variable, predictor, x
variable, and exogenous variable.
642
MODEL Command
MODEL:
BY short for measured by -- defines latent variables
example: f1 BY y1-y5;
ON short for regressed on -- defines regression relationships
example: f1 ON x1-x9;
PON short for regressed on -- defines paired regression relationships
example: f2 f3 PON f1 f2;
WITH short for correlated with -- defines correlational relationships
example: f1 WITH f2;
PWITH short for correlated with -- defines paired correlational
relationships
example: f1 f2 f3 PWITH f4 f5 f6;
list of variables; refers to variances and residual variances
example: f1 y1-y9;
[list of variables]; refers to means, intercepts, thresholds
example: [f1, y1-y9];
* frees a parameter at a default value or a specific starting value
example: y1* y2*.5;
@ fixes a parameter at a default value or a specific value
example: y1@ y2@0;
(number) constrains parameters to be equal
example: f1 ON x1 (1);
f2 ON x2 (1);
variable$number label for the threshold of a variable
variable#number label for nominal observed or categorical latent variable
variable#1 label for censored or count inflation variable
variable#number label for baseline hazard parameters
variable#number label for a latent class
(name) label for a parameter
{list of variables}; refers to scale factors
example: {y1-y9};
643
CHAPTER 17
644
MODEL Command
MODEL POPULATION:
%WITHIN% describes the individual-level data generation model for a
multilevel model
%BETWEEN% describes the cluster-level data generation model for a two-level
model
%BETWEEN label% describes the cluster-level data generation model for a three-
level or cross-classified model
MODEL COVERAGE: describes the population parameter values for a Monte Carlo
study
MODEL COVERAGE-label: describes the group-specific population parameter values in
multiple group analysis and the population parameter values for
each categorical latent variable and combinations of categorical
latent variables in mixture modeling for a Monte Carlo study
MODEL COVERAGE:
%OVERALL% describes the overall population parameter values of a mixture
model for a Monte Carlo study
%class label% describes the class-specific population parameter values of a
mixture model
MODEL COVERAGE:
%WITHIN% describes the individual-level population parameter values for
coverage
%BETWEEN% describes the cluster-level population parameter values for a
two-level model for coverage
%BETWEEN label% describes the cluster-level population parameter values for a
three-level or cross-classified model for coverage
MODEL MISSING: describes the missing data generation model for a Monte Carlo
study
MODEL MISSING-label: describes the group-specific missing data generation model for
a Monte Carlo study
MODEL MISSING:
%OVERALL% describes the overall data generation model of a mixture model
describes the class-specific data generation model of a mixture
%class label% model
645
CHAPTER 17
• BY
• ON
• WITH
The model in the following figure is used to illustrate the use of the BY,
ON, and WITH options. The squares represent observed variables and
the circles represent latent variables. Regression relationships are
represented by arrows from independent variables to dependent
variables. The variables f1 and f2 are continuous latent variables. The
observed dependent variables are y1, y2, y3, y4, y5, y6, y7, y8, and y9.
The measurement part of the model consists of the two continuous latent
variables and their indicators. The continuous latent variable f1 is
measured by y1, y2, y3, y4, and y5. The continuous latent variable f2 is
measured by y6, y7, y8, and y9. The structural part of the model
consists of the regression of the two continuous latent variables on nine
observed independent variables. The observed independent variables are
x1, x2, x3, x4, x5, x6, x7, x8, and x9. Following is the MODEL
command for the figure below:
MODEL: f1 BY y1-y5;
f2 BY y6-y9;
f1 f2 ON x1-x9;
646
MODEL Command
x1 y1
x2 y2
x3 f1 y3
x4 y4
x5 y5
x6 y6
x7 y7
f2
x8 y8
x9 y9
BY
The BY option is used to name and define the continuous latent
variables in the model. BY is short for measured by. The parameters
that are estimated are sometimes referred to as factor loadings or
lambdas. These are the coefficients for the regressions of the observed
dependent variables on the continuous latent variables. These observed
dependent variables are sometimes referred to as factor indicators. Each
BY statement can be thought of as a set of ON statements that describes
the regressions of a set of observed variables on a continuous latent
variable or factor. However, continuous latent variables in the
measurement model cannot be specified using a set of ON statements
647
CHAPTER 17
In this section the use of the BY option for confirmatory factor analysis
(CFA) models is described. Following are the two BY statements that
describe how the continuous latent variables in the figure above are
measured:
f1 BY y1- y5;
f2 BY y6- y9;
648
MODEL Command
where the asterisk (*) after y1 and y6 frees the factor loadings of y1 and
y6, and the @1 after f1 and f2 fixes the variances of f1 and f2 to one.
The use of the asterisk (*); @ symbol; and the specification of means,
thresholds, variances, and covariances are discussed later in the chapter.
f1 BY y1 y2 y3 y4 y5;
f2 BY y6 y7 y8 y9;
f3 BY f1 f2;
f3 BY f1 f2;
f1 BY y1 y2 y3 y4 y5;
f2 BY y6 y7 y8 y9;
649
CHAPTER 17
The BY option has three special features that are used with sets of EFA
factors in the MODEL command. One feature is used to define sets of
EFA factors. The second feature is a special way of specifying factor
loading matrix equality for sets of EFA factors. The third feature is used
in conjunction with the TARGET setting of the ROTATION option of
the ANALYSIS command to provide target factor loading values to
guide the rotation of the factor loading matrix for sets of EFA factors.
where the asterisk (*) followed by a label specifies that factors f1 and f2
are a set of EFA factors with factor indicators y1 through y5.
f1 BY y1-y5 (*1);
f2 BY y1-y5 (*1);
where the label 1 specifies that factors f1 and f2 are part of the same set
of EFA factors. Rotation is carried out on the five by two factor loading
matrix. Labels for EFA factors must follow an asterisk (*). EFA factors
with the same label must have the same factor indicators.
More than one set of EFA factors may appear in the MODEL command.
For example,
650
MODEL Command
specifies that factors f1 and f2 are one set of EFA factors with the label
1 and factors f3 and f4 are another set of EFA factors with the label 2.
The two sets of EFA factors are rotated separately.
Factors in a set of EFA factors can be regressed on covariates but the set
of covariates must be the same, for example,
f1-f2 ON x1-x3;
or
f1 ON x1-x3;
f2 ON x1-x3;
y ON f1-f2;
The number 1 following the labels 1 and 2 that define the EFA factors
specifies that the factor loadings matrices for the two sets of EFA factors
are held equal.
The BY option has a special feature that is used with the TARGET
setting of the ROTATION option of the ANALYSIS command to
specify target factor loading values for a set of EFA factors (Browne,
651
CHAPTER 17
2001). The target factor loading values are used to guide the rotation of
the factor loading matrix. Typically these values are zero. For the
TARGET rotation, a minimum number of target values must be given for
purposes of model identification. For the default oblique TARGET
rotation, the minimum is m(m-1) where the m is the number of factors.
For the orthogonal TARGET rotation, the minimum is m(m-1)/2. The
target values are given in the MODEL command using the tilde (~)
symbol. The target values are specified in a BY statement using the tilde
(~) symbol as follows:
where the target factor loading values for the factor indicator y1 for
factor f1 and y5 for factor f2 are zero.
ON
The ON option is used to describe the regression relationships in the
model and is short for regressed on. The general form of the ON
statement is:
y ON x;
f1 ON x1-x9;
f2 ON x1-x9;
652
MODEL Command
y9 ON x9;
653
CHAPTER 17
c ON x1-x3;
c2 ON c1;
or
u ON x1-x3;
654
MODEL Command
PON
A second form of the ON option is PON. PON is used to describe the
paired regression relationships in the model and is short for regressed
on. PON pairs the variables on the left-hand side of the PON statement
with the variables on the right-hand side of the PON statement. For
655
CHAPTER 17
PON, the number of variables on the left-hand side of the PON statement
must equal the number of variables on the right-hand side of the PON
statement. For example,
y2 y3 y4 PON y1 y2 y3;
implies
y2 ON y1;
y3 ON y2;
y4 ON y3;
The PON option cannot be used with the simplified language for
categorical latent variables or unordered categorical (nominal) observed
variables.
WITH
The WITH option is used to describe correlational relationships in a
model and is short for correlated with. Correlational relationships
include covariances among continuous observed variables and
continuous latent variables and among categorical latent variables. With
the weighted least squares estimator, correlational relationships are also
allowed for binary, ordered categorical, and censored observed
variables. For all other variable types, the WITH option cannot be used
to specify correlational relationships. Special modeling needs to be used
in these situations, for example, using a latent variable that influences
both variables.
f1 WITH f2;
This statement frees the covariance parameter for the continuous latent
variables f1 and f2.
656
MODEL Command
c1 WITH c2;
The association coefficient for the last class of each categorical latent
variable is fixed at zero as the default as in loglinear modeling.
PWITH
A second form of the WITH option is PWITH. PWITH pairs the
variables on the left-hand side of the PWITH statement with those on the
right-hand side of the PWITH statement. For PWITH, the number of
variables on the left-hand side of the PWITH statement must equal the
number of variables on the right-hand side of the PWITH statement. For
example,
y1 y2 y3 PWITH y4 y5 y6;
implies
y1 WITH y4;
y2 WITH y5;
y3 WITH y6;
whereas,
y1 y2 y3 WITH y4 y5 y6;
implies
657
CHAPTER 17
y1 WITH y4;
y1 WITH y5;
y1 WITH y6;
y2 WITH y4;
y2 WITH y5;
y2 WITH y6;
y3 WITH y4;
y3 WITH y5;
y3 WITH y6;
The PWITH option cannot be used with the simplified language for
categorical latent variables.
VARIANCES/RESIDUAL VARIANCES
For convenience, no distinction is made in how variances and residual
variances are referred to in the MODEL command. The model defines
whether the parameter to be estimated is a variance or a residual
variance. Variances are estimated for independent variables and residual
variances are estimated for dependent variables. Variances of
continuous and censored observed variables and continuous latent
variables are free to be estimated as the default. Variances of categorical
observed variables are not estimated. When the Theta parameterization
is used in either a growth model or a multiple group model, variances for
continuous latent response variables for the categorical observed
variables are estimated. Unordered categorical (nominal) observed
variables, observed count variables, and categorical latent variables have
no variance parameters.
y1 y2 y3;
refers to the variances of y1, y2, and y3 if they are independent variables
and refers to the residual variances of y1, y2, and y3 if they are
dependent variables. The statement means that the variances or residual
variances are free parameters to be estimated using default starting
values.
658
MODEL Command
MEANS/INTERCEPTS/THRESHOLDS
Means, intercepts, and thresholds are included in the analysis model as
the default. The NOMEANSTRUCTURE setting of the MODEL option
of the ANALYSIS command is used with TYPE=GENERAL to specify
that means, intercepts, and thresholds are not included in the analysis
model.
For example,
[y1 y2 y3];
659
CHAPTER 17
refers to the means of variables y1, y2, and y3 if they are independent
variables and refers to the intercepts if they are continuous dependent
variables. This statement indicates that the means or intercepts are free
parameters to be estimated using the default starting values.
For models with a mean structure, all means, intercepts, and thresholds
of observed variables are free to be estimated at the default starting
values. The means and intercepts of continuous latent variables are
fixed at zero in a single group analysis. In a multiple group analysis, the
means and intercepts of the continuous latent variables are fixed at zero
in the first group and are free to be estimated in the other groups. In a
mixture model, the means and intercepts of the continuous latent
variables are fixed at zero in the last class and are free to be estimated in
the other classes. The means and intercepts of categorical latent
variables are fixed at zero in the last class and are free to be estimated in
the other classes.
660
MODEL Command
all of the variables in NAMES statement are used in the analysis, then
the order is taken from there. If the variables for the analysis are a
subset of the variables in the NAMES statement, the order is taken from
the USEVARIABLES statement.
The list function can be used on the left- and right-hand sides of ON and
WITH statements and on the right-hand side of BY statements. A list on
the left-hand side implies multiple statements. A list on the right-hand
side implies a list of variables.
f1 BY y1-y4;
f2 BY y5-y9;
f3 BY y10-y12;
f1 BY y1 y2 y3 y4;
f2 BY y5 y6 y7 y8 y9;
f3 BY y10 y11 y12;
To use the list function with latent variables the order of latent variables
would be f1, f2, f3 because of the order of the BY statements in the
MODEL command.
Following is an example of using the list function on both the left- and
the right-hand sides of the ON statement:
f1-f3 ON x1-x3;
661
CHAPTER 17
f1 ON x1 x2 x3;
f2 ON x1 x2 x3;
f3 ON x1 x2 x3;
The list function can also be used with the WITH option,
This implies
y1 WITH y4;
y1 WITH y5;
y1 WITH y6;
y2 WITH y4;
y2 WITH y5;
y2 WITH y6;
y3 WITH y4;
y3 WITH y5;
y3 WITH y6;
y1*.5;
f1 BY y1 y2 y3 y4 y5;
f2 BY y6 y7 y8 y9;
662
MODEL Command
These same features can be used with the ON and WITH options and for
assigning starting values to variances, means, thresholds, and scales.
f1 ON x1-x3*1.5;
f1 WITH f2*.8;
y1-y12*.75;
[f1-f3*.5];
{y1-y12*5.0};
663
CHAPTER 17
By placing an asterisk (*) after y1, the factor loading for y1 is estimated
using the starting value of one. By placing @1 after y2, the factor
loading for y2 is fixed at one. Likewise, by placing an asterisk (*) after
y6, the factor loading for y6 is estimated using the starting value of one.
By placing @1 after y7, the factor loading for y7 is fixed at one.
f1 WITH f2@0;
y1 ON x1 (1) ;
y2 ON x2 (1) ;
y3 ON x3 (1) ;
y1 y2 y3 (2);
y1 WITH y2-y3 (3);
For example,
664
MODEL Command
specifies that the factor loadings of y13, y14, and y15 are constrained to
be equal because (1) refers to only the information on the line on which
it is located.
f1 BY y1 y2 y3 y4 y5 (1)
y6 y7 y8 y9 y10 (2)
y11 y12 y13 y14 y15 (3);
specifies that the factor loading of y1 is fixed at one and that the factor
loadings of y2, y3, y4, and y5 are held equal, that the factor loadings of
y6, y7, y8, y9, and y10 are held equal, and that the factor loadings of
y11, y12, y13, y14, and y15 are held equal.
indicate that the first threshold for variables u1, u2, and u3 are
constrained to be equal; that the second threshold for variables u1, u2,
and u3 are constrained to be equal; and that the third threshold for
variables u1, u2, and u3 are constrained to be equal. Out of nine
possible thresholds, three parameters are estimated. Only one set of
parentheses can be included on each line of the input file.
665
CHAPTER 17
f1 BY y1 y2-y4*0;
f2 BY y5 y6-y9*.5;
f3 BY y10 y11-y12*.75;
where the starting value of 0 is assigned to the factor loadings for y2, y3,
and y4; the starting value of .5 is assigned to the factor loadings for y6,
y7, y8, and y9; and the starting value of .75 is assigned to the factor
loadings for y11 and y12. The factor loading for the first factor
indicator of each factor is fixed at one as the default to set the metric of
the factor.
f1-f3@1;
The statement above fixes the variances/residual variances of f1, f2, and
f3 at one.
666
MODEL Command
f1 BY y1-y5 (1)
y6-y10 (2);
The statement above specifies that the factor loadings of y2, y3, y4, and
y5 are held equal and that the factor loadings of y6, y7, y8, y9, and y10
are held equal. The factor loading of y1 is fixed at one as the default to
set the metric of the factor.
f1 BY y1
y2-y5 (2-5);
f2 BY y6
y7-y10 (2-5);
The statements above specify that the factor loadings for y2 and y7 are
held equal, the factor loadings for y3 and y8 are held equal, the factor
loadings for y4 and y9 are held equal, and the factor loadings for y5 and
y10 are held equal. This can also be specified as shown below for
convenience.
f1 BY y1-y5 (1-5);
f2 BY y6-y10 (1-5);
667
CHAPTER 17
y1-y3 ON x (1 2 3);
y4-y6 ON x (1 2 3);
Each variable on the left-hand side of the ON option must have a list of
equalities for use with the variables on the right-hand side of the ON
option. Because there are three variables on the left-hand side of the ON
statement and two variables on the right-hand side of the ON statement,
three lists of two equalities are needed. A single list cannot be used.
y1 ON x1 (1);
y1 ON x2 (2);
y2 ON x1 (3);
y2 ON x2 (4);
y3 ON x1 (5);
y3 ON x2 (6);
y4 ON x1 (1);
y4 ON x2 (2);
y5 ON x1 (3);
y5 ON x2 (4);
y6 ON x1 (5);
y6 ON x2 (6);
The list function can be used with the simplified language for categorical
latent variables and unordered categorical (nominal) observed variables.
668
MODEL Command
c2 ON c1;
or
c2 ON c1 (1-2);
[y1-y5] (p1-p5);
The list function can be used to assign labels to a list of parameters using
a list of labels. A list of labels cannot be used with a set of individual
parameters. Following is an example of how to use the list function with
a list of parameters on the right-hand side of the BY option:
669
CHAPTER 17
f1 BY y1
y2-y5 (p2-p5);
The statement above assigns the label p2 to the factor loading for y2, p3
to the factor loading for y3, p4 to the factor loading for y4, and p5 to the
factor loading for y5.
Each variable on the left-hand side of the ON option must have a list of
labels for use with the variables on the right-hand side of the ON option.
Because there are three variables on the left-hand side of the ON
statement and two variables on the right-hand side of the ON statement,
three lists of two equalities are needed. A single list cannot be used.
y1 ON x1 (p1);
y1 ON x2 (p2);
y2 ON x1 (p3);
y2 ON x2 (p4);
y3 ON x1 (p5);
y3 ON x2 (p6);
670
MODEL Command
The list function can be used with the simplified language for categorical
latent variables and unordered categorical (nominal) observed variables.
c2 ON c1;
The list function has a special feature that can make model specification
easier. This feature allows a parameter to be mentioned in the MODEL
command more than once. The last specification is used in the analysis.
For example,
f1 BY y1-y6*0 y5*.5;
671
CHAPTER 17
f1-f4@1 f3@2;
fixes the variances/residual variances of f1, f2, and f4 at one and fixes
the variance/residual variance of f3 at 2.
This feature can also be used with equalities, however, the variable from
the list that is not to be constrained to be equal must appear on a separate
line in the input file. In a line with an equality constraint, anything after
the equality constraint is ignored. For example,
f1 BY y1-y5 (1)
y4
y6-y10 (2);
indicates that the factor loadings for y2, y3, and y5 are held equal, the
factor loading for y4 is free and not equal to any other factor loading,
and the factor loadings for y6, y7, y8, y9, and y10 are held equal. The
factor loading for y1 is fixed at one as the default.
LABELING THRESHOLDS
For binary and ordered categorical dependent variables, thresholds are
referred to by using the convention of adding to a variable name a dollar
sign ($) followed by a number. The number of thresholds is equal to the
number of categories minus one. For example, if u1 is an ordered
categorical variable with four categories it has three thresholds. These
thresholds are referred to as u1$1, u1$2, and u1$3.
672
MODEL Command
673
CHAPTER 17
LABELING PARAMETERS
Labels can be assigned to parameters by placing a name in parentheses
following the parameter in the MODEL command. These labels are
used in three ways. First, they are used in conjunction with the MODEL
CONSTRAINT command to define linear and non-linear constraints on
the parameters in the model. Second, they are used with the MODEL
TEST command to test linear restrictions on the model defined in the
MODEL and MODEL CONSTRAINT commands. Third, they are used
with ESTIMATOR=BAYES and the MODEL PRIORS command to
specify the prior distribution for parameters in the MODEL command.
The parameter labels follow the same rules as variable names. They can
be up to 8 characters in length; must begin with a letter; can contain only
letters, numbers, and the underscore symbol; and are not case sensitive.
Only one label can appear on a line. Following is an example of how to
label parameters:
MODEL: y ON x1 (p1)
x2 (p2)
x3 (p3);
674
MODEL Command
command. For example, the list p4-q2 includes p4, p5, p6, p7, p8, p9,
p10, q1, and q2.
The list function can be used with the ON and WITH options when there
are lists of variable names on both the right- and left-hand sides of these
options. Following is an example of how to use the list function to
assign labels when there are lists of variables on both the right- and left-
hand sides of ON:
The first variable on the left-hand side of ON is paired with all variables
on the right-hand side. Then the second variable on the left-hand side of
ON is paired with all variables on the right-hand side etc. The label p1
is assigned to the regression slope for y1 on x1. The label p2 is assigned
to the regression slope for y1 on x2. The label p3 is assigned to the
regression slope for y2 on y1. The label p6 is assigned to the regression
slope for y3 on x2.
The labels are assigned to the upper triangle of a symmetric matrix read
row-wise. The label p1 is assigned to the covariance between y1 and y2.
The label p2 is assigned to the covariance between y1 and y3. The label
p3 is assigned to the covariance between y2 and y3.
SCALE FACTORS
In models that use TYPE=GENERAL, it may be useful to multiply each
observed variable or latent response variable by a scale factor that can be
estimated. For example, with categorical observed variables, a scale
factor refers to the underlying latent response variables and facilitates
growth modeling and multiple group analysis because the latent response
variables are not restricted to have across-time or across-group equalities
of variances. With continuous observed variables, using scale factors
containing standard deviations makes it possible to analyze a sample
covariance matrix by a correlation structure model.
675
CHAPTER 17
{u1 u2 u3};
refers to scale factors for variables u1, u2, and u3. This statement means
that the scale factors are free parameters to be estimated using the
default starting values of one.
The | SYMBOL
The | symbol is used to name and define random effect variables in the
model. It can be used with all analysis types to specify growth models.
It can be used with TYPE=RANDOM to name and define random effect
variables that are slopes, to specify growth models with individually-
varying times of observation, and to specify latent variable interactions.
GROWTH MODELS
MODEL: i BY y1-y4@1;
s BY y1@0 y2@1 y3@2 y4@3;
q BY y1@0 y2@1 y3@4 y4@9;
[y1-y4@0 i s q];
If the | symbol is used to specify the same growth model for a continuous
outcome, the MODEL command is:
676
MODEL Command
All of the other specifications shown above are done as the default. The
defaults can be overridden by mentioning the parameters in the MODEL
command after the | statement. For example,
changes the parameterization of the growth model from one with the
intercepts of the outcome variable fixed at zero and the growth factor
means free to be estimated to a parameterization with the intercepts of
the outcome variable held equal, the intercept growth factor mean fixed
at zero, and the slope growth factor means free to be estimated.
Many other types of growth models can be specified using the | symbol.
Following is a table that shows how to specify some of these growth
models using the | symbol and also how to specify the same growth
models using the BY option and other options. All examples are for
continuous outcomes unless specified otherwise.
677
CHAPTER 17
678
MODEL Command
679
CHAPTER 17
680
MODEL Command
681
CHAPTER 17
682
MODEL Command
RANDOM SLOPES
683
CHAPTER 17
s | y ON x;
s* | y ON x;
where the asterisk (*) indicates that the random slope variable s has
variation on both the within and between levels.
A random slope variable can refer to more than one slope by being used
on the left-hand side of more than one | statement. In this case, the
random slope variables are the same. For example,
s1 | y1 ON x1;
s1 | y2 ON x2;
s2 | y1 ON x1 x2;
684
MODEL Command
which defines the random slope, s2, to be the same in the regressions of
y1 on x1 and y1 on x2.
s3 | y1 y2 ON x1 x2 x3;
the six slopes in the regressions of y1 on x1, x2, and x3 and y2 on x1,
x2, and x3 are the same.
s1-s10 | f BY y1-y10;
f@1;
685
CHAPTER 17
where s1 through s10 are random factor loadings for the factor f. All
factor loadings are free. The metric of the factor is set by fixing the
factor variance to one.
AT
The AT option is used with TYPE=RANDOM to define a growth model
with individually-varying times of observation for the outcome variable.
AT is short for measured at. It is used in conjunction with the | symbol
to name and define the random effect variables in a growth model which
are referred to as growth factors.
Four types of growth models can be defined using AT and the | symbol:
an intercept only model, a model with two growth factors, a model with
three growth factors, and a model with four growth factors. The names
of the random effect variables are specified on the left-hand side of the |
symbol. The number of names determines which of the four models
model will be estimated. One name is needed for an intercept only
model and it refers to the intercept growth factor. Two names are
needed for a model with two growth factors: the first one is for the
intercept growth factor and the second one is for the slope growth factor
that uses the time scores to the power of one. Three names are needed
for a model with three growth factors: the first one is for the intercept
growth factor; the second one is for the slope growth factor that uses the
time scores to the power of one; and the third one is for the slope growth
factor that uses the time scores to the power of two. Four names are
needed for a model with four growth factors: the first one is for the
intercept growth factor; the second one is for the slope growth factor that
uses the time scores to the power of one; the third one is for the slope
growth factor that uses the time scores to the power of two; and the
fourth one is for the slope growth factor that uses the time scores to the
power of three. Following are examples of how to specify these growth
models:
intercpt | y1 y2 y3 y4 AT t1 t2 t3 t4;
intercpt slope1 | y1 y2 y3 y4 AT t1 t2 t3 t4;
intercpt slope1 slope2 | y1 y2 y3 y4 AT t1 t2 t3 t4;
intercpt slope1 slope2 slope3 | y1 y2 y3 y4 AT t1 t2 t3 t4;
where intercpt, slope1, slope2, and slope3 are the names of the intercept
and slope growth factors; y1, y2, y3, and y4 are the outcome variables in
686
MODEL Command
the growth model; and t1, t2, t3, and t4 are observed variables in the data
set that contain information on times of measurement. The TSCORES
option of the VARIABLE command is used to identify the variables that
contain information about individually-varying times of observation for
the outcome in a growth model. The variables on the left-hand side of
AT are paired with the variables on the right-hand side of AT.
The intercepts of the outcome variables are fixed at zero as the default.
The residual variances of the outcome variables are free to be estimated
as the default. The residual covariances of the outcome variables are
fixed at zero as the default. The means, variances, and covariances of
the intercept and slope growth factors are free as the default.
XWITH
The XWITH option is used with TYPE=RANDOM to define
interactions between continuous latent variables or between a continuous
latent variable and an observed variable. XWITH is short for multiplied
with. It is used in conjunction with the | symbol to name and define
interaction variables in a model. Following is an example of how to use
XWITH and the | symbol to name and define an interaction:
where int is the name of the interaction between f1 and f2. Interaction
variables can be used only on the right-hand side of ON statements.
fsquare | f XWITH f;
687
CHAPTER 17
688
MODEL Command
Delta method standard errors for the indirect effects are computed as the
default. Bootstrap standard errors for the indirect effects can be
obtained by using the MODEL INDIRECT command in conjunction
with the BOOTSTRAP option of the ANALYSIS command.
Total, total indirect, specific indirect, and direct effects are obtained
using the IND and VIA options of the MODEL INDIRECT command.
The IND option is used to request a specific indirect effect or a set of
indirect effects. The VIA option is used to request a set of indirect
effects that includes specific mediators.
689
CHAPTER 17
IND
The variable on the left-hand side of IND is the dependent variable in the
indirect effect. The last variable on the right-hand side of IND is the
independent variable in the indirect effect. Other variables on the right-
hand side of IND are mediating variables. If there are no mediating
variables included in the IND option, all indirect effects between the
independent variable and dependent variable are computed. The total
indirect effect is the sum of all indirect effects. The total effect is the
sum of all indirect effects and the direct effect.
VIA
The variable on the left-hand side of VIA is the dependent variable in
the indirect effect. The last variable on the right-hand side of VIA is the
independent variable in the indirect effect. Other variables on the right-
hand side of VIA are mediating variables. All indirect effects that go
from the independent variable to the dependent variable and include the
mediating variables are computed. The total indirect effect is the sum of
all indirect effects.
x1 y1
y3
x2 y2
MODEL: y3 ON y1 y2;
y2 ON y1 x1 x2;
y1 ON x1 x2;
690
MODEL Command
MODEL INDIRECT:
y3 IND y1 x1;
y3 IND y2 x1;
y3 IND x2;
The first IND statement requests the specific indirect effect from x1 to
y1 to y3. The second IND statement requests the specific indirect effect
from x1 to y2 to y3. The third IND statement requests all indirect effects
from x2 to y3. These include x2 to y1 to y3, x2 to y2 to y3, and x2 to y1
to y2 to y3.
MODEL INDIRECT:
y3 VIA y1 x1 ;
The VIA statement requests all indirect effects from x1 to y3 that are
mediated by y1. These include x1 to y1 to y3 and x1 to y1 to y2 to y3.
MODEL INDIRECT:
y3 IND x1;
The IND statement requests all indirect effects from x1 to y3. These
include x1 to y1 to y3, x1 to y2 to y3, and x1 to y1 to y2 to y3, the total
effect, and the total indirect effect.
691
CHAPTER 17
NEW
MODEL:
[y1-y3] (p1-p3);
MODEL CONSTRAINT:
NEW (c*.6);
p2 = p1 + c;
p3 = p1 + 2*c;
CONSTRAINT
692
MODEL Command
CONSTRAINT = y1 u1;
MODEL:
[y1-y3] (p1-p3);
MODEL CONSTRAINT:
p1 = p2**2 + p3**2;
693
CHAPTER 17
MODEL:
[y1-y5] (m1-m5);
MODEL CONSTRAINT:
0 = - m4 + m1*m3 - m2;
0 = exp(m3) - 1 - m2;
0 = m4 - m5;
MODEL:
[y1-y4] (p1-p4);
MODEL CONSTRAINT:
p1 = p2**2 + p3**2;
p2 = p4;
DO
The DO option provides a do loop to facilitate specifying the same
expression for a set of parameters. Following is an example of how to
specify a do loop:
MODEL:
y1 ON x1 (p1);
y1 ON x2 (p2);
y1 ON x3 (p3);
y2 ON x1 (q1);
y2 ON x2 (q2);
y2 ON x3 (q3);
MODEL CONSTRAINT:
NEW (ratio1-ratio3);
DO (1, 3) ratio# = p#/q#;
where the numbers in parentheses give the range of values for the do
loop. The number sign (#) is replaced by these values during the
694
MODEL Command
ratio1 = p1/q1;
ratio2 = p2/q2;
ratio3 = p2/q3;
MODEL:
f BY y1-y4;
y1-y4 (p1-p4);
MODEL CONSTRAINT:
DO (1, 4) p# > 0;
where the numbers in parentheses give the range of values for the do
loop. The number sign (#) is replaced by these values during the
execution of the do loop. In this example the residual variances of y1
through y4 are constrained to be greater than zero.
PLOT
The PLOT option is used to name the variables that will be plotted on
the y-axis in the plots created using the LOOP option. Following is an
example of how to specify the PLOT option:
where ind1 and ind2 are the variables that will be plotted on the y-axis in
the plots created using the LOOP option.
LOOP
The LOOP option is used in conjunction with the PLOT option to create
plots of variables. For example, it is useful for plotting indirect effects
with moderation and mediation (Preacher, Rucker, & Hayes, 2007),
cross-level interactions in multilevel regression (Bauer & Curran, 2005),
and sensitivity graphs for causal effect mediation modeling (Imai, Keele,
& Tingley, 2010; Muthén, 2011). Following is an example of how to
specify the LOOP option:
695
CHAPTER 17
where mod is a variable that will be used on the x-axis, the numbers -1
and 1 are the lower and upper values of mod, and 0.01 is the incremental
value of mod to use in the computations. When mod appears in a
MODEL CONSTRAINT statement involving a new parameter, that
statement is evaluated for each value of mod specified by the LOOP
option. For example, the first value of mod is -1; the second value of
mod is -1 plus 0.01 or -0.99; the third value of mod is -0.99 plus 0.01 or -
0.98; the last value of mod is 1. Plots are created with mod on the x-axis
and the names in the PLOT option on the y-axis.
MODEL:
y ON x (p1);
MODEL CONSTRAINT:
PLOT (ypred);
LOOP (age, 10, 50, 1);
ypred = p1*age;
Using TYPE=PLOT2 in the PLOT command, the plot of ypred and age
can be viewed by choosing Loop plots from the Plot menu of the Mplus
Editor. The plot presents the computed values along with a 95%
confidence interval. For frequentist estimation, the default confidence
interval uses plus and minus 1.96 times the standard error. The
CINTERVAL option of the OUTPUT command can be used in
conjunction with the BOOTSTRAP option of the ANALYSIS command
to obtain bootstapped or bias-corrected bootstrap confidence intervals.
For Bayesian estimation, the default is credibility intervals of the
posterior distribution with equal tail percentages. The CINTERVAL
option of the OUTPUT command can be used to obtain credibility
intervals of the posterior distribution that give the highest posterior
density.
696
MODEL Command
MODEL:
f1 BY y1
y2-y6 (p2-p6);
MODEL CONSTRAINT:
p4 = 2*p2;
MODEL TEST:
0 = p3;
p6 = .5*p5;
where in the MODEL command, p2 represents the factor loading for y2,
p3 represents the factor loading for y3, p4 represents the factor loading
for y4, p5 represents the factor loading for y5, and p6 represents the
factor loading for y6. In the MODEL CONSTRAINT command, the
factor loading for y4 is constrained to be two times the factor loading for
y2. In the MODEL TEST command, the factor loading for y3 is fixed at
zero and the factor loading for y6 is equal to one half the factor loading
for y5. The model described in the MODEL and MODEL
CONSTRAINT commands is tested against the same model except for
the restrictions described in MODEL TEST. A Wald chi-square test
with two degrees of freedom is computed.
697
CHAPTER 17
* Continuous variables
** Categorical variables
For the normal distribution default, infinity is ten to the power of ten.
For the inverse Gamma default, the settings imply a uniform prior
ranging from minus infinity to plus infinity. For the inverse Wishart
default, p is the dimension of the multivariate block of latent variables.
698
MODEL Command
For the Dirichlet default, the first number gives the number of
observations to add to the class referred to and the second number gives
the number of observations to add to the last class. For a discussion of
priors, see Gelman et al. (2004), Browne and Draper (2006), and Gelman
(2006).
Normal – N
Lognormal – LN
Uniform – U
Inverse Gamma – IG
Gamma – G
Inverse Wishart – IW
Dirichlet – D
Each setting has two numbers in parentheses following the setting. For
the normal and lognormal distributions, the first number is the mean and
the second number is the variance. For the uniform distribution, the first
number is the lower limit and the second number is upper limit. For the
inverse Gamma distribution, the first number is the shape parameter and
the second number is the scale parameter. For the Gamma distribution,
the first number is the shape parameter and the second number is the
inverse scale parameter. For the inverse Wishart distribution, the first
number is used to form a covariance matrix and the second number is the
degrees of freedom. For the Dirichlet distribution, the first number gives
the number of observations to add to the class referred to and the second
number gives the number of observations to add to the last class. For a
technical description of the implementation of priors, see Asparouhov
and Muthén (2010b).
699
CHAPTER 17
MODEL:
f BY y1-y10* (p1-p10);
f@1;
MODEL PRIORS:
p1-p10 ~ N (1, 0.5);
where parameters p1 through p10 have normal priors with mean one and
variance 0.5.
COVARIANCE
The COVARIANCE option is used to assign a prior to the covariance
between two parameters. Only normal priors are available. Covariance
priors can be assigned to only factor loadings, regression coefficients,
intercepts, and thresholds for binary variables. Following is an example
of how to specify the COVARIANCE option:
MODEL:
y ON x1 (p1)
x2 (p2);
MODEL PRIORS:
p1 ~ N (10, 4);
p2 ~ N (6, 1);
COVARIANCE (p1, p2) = 0.5;
DO
The DO option provides a do loop to facilitate specifying the same
expression for a set of parameters. With MODEL PRIORS it can be
used with the DIFFERENCE option to assign priors to differences
among a set of parameters.
700
MODEL Command
DIFFERENCE
The DIFFERENCE option is used to assign priors to the difference
between two parameters. Only normal priors are available. Difference
priors can be assigned to only factor loadings, regression coefficients,
intercepts, and thresholds for binary variables. Following is an example
of how to specify the DIFFERENCE option:
MODEL:
y ON x1 (p1)
x2 (p2);
MODEL PRIORS:
DIFFERENCE (p1, p2) ~ N (0, 0.01);
where the difference between p1 and p2 has a normal prior with mean
zero and variance 0.01.
MODEL:
%OVERALL%
f BY y1-y5;
%c#1%
f BY y2-y5 (p12-p15);
%c#2%
f BY y2-y5 (p22-p25);
%c#3%
f BY y2-y5 (p32-p35);
MODEL PRIORS:
DO (2, 5) DIFFERENCE (p1#-p3#) ~ N (0, 0.01);
701
CHAPTER 17
where the numbers in parentheses give the range of values for the do
loop. The number sign (#) is replaced by these values during the
execution of the do loop. Following are the differences that were
assigned normal priors with mean zero and variance 0.01:
p12 - p22
p12 - p32
p22 - p32
p13 - p23
p13 - p33
p23 - p33
p14 - p24
p14 - p34
p24 - p34
p15 - p25
p15 - p35
p25 - p35
MODEL:
The MODEL command is used to describe the analysis model for a
single group analysis and the overall analysis model for multiple group
analysis.
MODEL label:
MODEL followed by a label is used to describe the group-specific
analysis models in multiple group analysis and the analysis model for
each categorical latent variable in mixture modeling when there are more
than one categorical latent variable in the analysis.
702
MODEL Command
When there are more than one categorical latent variable in the model,
the class-specific parts of the model for each categorical latent variable
must be specified within a MODEL command for that categorical latent
variable. The %OVERALL% specification is not included in the
MODEL commands for each categorical latent variable. Following is an
example of how to specify the MODEL command when there are more
than one categorical latent variable in the model:
MODEL c1:
%c1#1%
%c1#2%
When there are more than two categorical latent variables in the model,
MODEL commands for pairs of categorical latent variables are allowed.
These are used to specify parameters that are specific to the
combinations of classes for those two categorical latent variables.
Categorical latent variables can be combined in sets involving all but one
categorical latent variable. For example, with three categorical latent
variables c1, c2, and c3, combinations of up to two categorical latent
variables are allowed. Following is an example of how this is specified:
703
CHAPTER 17
MODEL c1.c2:
%c1#1.c2#1%
MODEL:
%OVERALL%
%class label%
The MODEL command used in conjunction with %OVERALL% and
%class label% is used to describe the overall and class-specific models
for mixture models. Statements following %OVERALL% refer to the
model common to all latent classes. Statements following %class
label% refer to class-specific model statements.
Class labels are created by adding to the name of the categorical latent
variable a number sign (#) followed by the class number. For example,
if c is a categorical latent variable with two latent classes, the class
labels are c#1 and c#2.
MODEL:
%WITHIN%
%BETWEEN%
%BETWEEN label%
The MODEL command used in conjunction with %WITHIN%,
%BETWEEN%, and %BETWEEN label% is used to describe the
individual-level and cluster-level models for multilevel modeling. For
TYPE=TWOLEVEL, the statements following %WITHIN% describe
the individual-level model and the statements following %BETWEEN%
describe the cluster-level model. With multilevel mixture models, the
%OVERALL% and %class label% specifications are used with the
%WITHIN% and %BETWEEN% specifications to describe the mixture
part of the model.
704
MODEL Command
Parameter estimates can be saved from a real data analysis using the
ESTIMATES option of the SAVEDATA command and used in a
subsequent Monte Carlo analysis as population parameter values. This
is done by using the POPULATION option of the MONTECARLO
command.
MODEL POPULATION:
The MODEL POPULATION command is used to provide the
population parameter values to be used in data generation for single
group analysis and the overall analysis model for multiple group
analysis.
MODEL POPULATION-label:
MODEL POPULATION followed by a dash and a label is used to
provide parameter values to be used in the generation of data for the
group-specific analysis models in multiple group analysis and the
analysis model for each categorical latent variable in mixture modeling
when there are more than one categorical latent variable in the analysis.
In multiple group analysis, the label following the dash refers to the
group. The first group is referred to by g1, the second group by g2, and
705
CHAPTER 17
so on. In mixture modeling, the label following the dash is the name of
each categorical latent variable when there are more than one categorical
latent variables in the generation of the data.
MODEL POPULATION:
%OVERALL%
%class label%
MODEL POPULATION used in conjunction with %OVERALL% and
%class label% is used to provide the population parameter values to be
used in the generation of data for mixture models. Statements following
%OVERALL% refer to the model common to all latent classes.
Statements following %class label% refer to class-specific model
statements. In addition, the GENCLASSES option of the
MONTECARLO command is used for the generation of data for mixture
models.
MODEL POPULATION:
%WITHIN%
%BETWEEN%
%BETWEEN label%
MODEL POPULATION used in conjunction with %WITHIN%,
%BETWEEN%, and %BETWEEN label% is used to provide the
population parameter values to be used in the generation of clustered
data. For TYPE=TWOLEVEL, %WITHIN% is used to provide
population parameter values for the individual-level model parameters.
%BETWEEN% is used to provide population parameter values for the
cluster-level model parameters. With multilevel mixture models, the
706
MODEL Command
Parameter estimates can be saved from a real data analysis using the
ESTIMATES option of the SAVEDATA command and used in a
subsequent Monte Carlo analysis as population parameter values. This
is done by using the COVERAGE option of the MONTECARLO
command.
707
CHAPTER 17
MODEL COVERAGE:
The MODEL COVERAGE command is used to provide the population
parameter values to be used for computing coverage for single group
analysis and the overall analysis model for multiple group analysis.
MODEL COVERAGE-label:
MODEL COVERAGE followed by a dash and a label is used in multiple
group analysis to provide group-specific parameter values to be used in
computing coverage. The label following the dash refers to the group.
The first group is referred to by g1, the second group by g2, and so on.
In mixture modeling, the label following the dash is the name of each
categorical latent variable when there are more than one categorical
latent variable.
MODEL COVERAGE:
%OVERALL%
%class label%
MODEL COVERAGE used in conjunction with %OVERALL% and
%class label% is used to provide the population parameter values to be
used in computing coverage. Statements following %OVERALL% refer
to the model common to all latent classes. Statements following %class
label% refer to class-specific model statements.
Class labels are created by adding to the name of the categorical latent
variable a number sign (#) followed by the class number. For example,
if c1 is a categorical latent variable with two latent classes, the class
labels are c1#1 and c1#2.
708
MODEL Command
MODEL COVERAGE:
%WITHIN%
%BETWEEN%
%BETWEEN label%
MODEL COVERAGE used in conjunction with %WITHIN% and
%BETWEEN% is used to provide the population parameter values to be
used in computing coverage. For TYPE=TWOLEVEL, %WITHIN% is
used to provide the population parameter values for the individual-level
model parameters. %BETWEEN% is used to provide the population
parameter values for the cluster-level model parameters.
709
CHAPTER 17
MODEL MISSING:
y1 ON x;
y2 ON y1 x;
MODEL MISSING:
The MODEL MISSING command is used to provide information about
the population parameter values for the missing data model to be used in
the generation of data for single group analysis and the overall analysis
model for multiple group analysis.
MODEL MISSING-label:
MODEL MISSING followed by a dash and a label is used in multiple
group analysis to provide group-specific population parameter values for
the missing data model to be used in the generation of data. The label
following the dash refers to the group. The first group is referred to by
g1, the second group by g2, and so on.
710
MODEL Command
MODEL MISSING:
%OVERALL%
%class label%
MODEL MISSING used in conjunction with %OVERALL% and %class
label% is used to provide the population parameter values for the
missing data model to be used in the generation of data for mixture
models. Statements following %OVERALL% refer to the model
common to all latent classes. Statements following %class label% refer
to class-specific model statements.
Class labels are created by adding to the name of the categorical latent
variable a number sign (#) followed by the class number. For example,
if c1 is a categorical latent variable with two latent classes, the class
labels are c1#1 and c1#2.
711
CHAPTER 17
712
OUTPUT, SAVEDATA, And PLOT Commands
CHAPTER 18
OUTPUT, SAVEDATA, AND PLOT
COMMANDS
OUTPUT:
SAMPSTAT;
CROSSTABS; ALL
CROSSTABS (ALL);
CROSSTABS (COUNT);
CROSSTABS (%ROW);
CROSSTABS (%COLUMN);
CROSSTABS (%TOTAL);
STANDARDIZED; ALL
STANDARDIZED (ALL);
STANDARDIZED (STDYX);
STANDARDIZED (STDY);
STANDARDIZED (STD);
RESIDUAL;
MODINDICES (minimum chi-square); 10
MODINDICES (ALL);
MODINDICES (ALL minimum chi-square); 10
713
CHAPTER 18
CINTERVAL; SYMMETRIC
CINTERVAL (SYMMETRIC);
CINTERVAL (BOOTSTRAP);
CINTERVAL (BCBOOTSTRAP);
CINTERVAL (EQTAIL); EQTAIL
CINTERVAL (HPD);
SVALUES;
NOCHISQUARE;
NOSERROR;
H1SE;
H1TECH3;
PATTERNS;
FSCOEFFICIENT;
FSDETERMINACY;
BASEHAZARD;
LOGRANK;
D(#); 1
TECH1;
TECH2;
TECH3;
TECH4;
TECH5;
TECH6;
TECH7;
TECH8;
TECH9;
TECH10;
TECH11;
TECH12;
TECH13;
TECH14;
TECH15;
TECH16;
The default output for all analyses includes a listing of the input setup, a
summary of the analysis specifications, and a summary of the analysis
results. Analysis results include a set of fit statistics, parameter
estimates, standard errors of the parameter estimates, the ratio of each
714
OUTPUT, SAVEDATA, And PLOT Commands
parameter estimate to its standard error, and a two-tailed p-value for the
ratio. Analysis results for TYPE=EFA include eigenvalues for the
sample correlation matrix, a set of fit statistics, estimated rotated factor
loadings and correlations and their standard errors, estimated residual
variances and their standard errors, the factor structure matrix, and factor
determinacies. Output for TYPE=BASIC includes sample statistics for
the analysis data set and other descriptive information appropriate for
the particular analysis.
Mplus OUTPUT
Following is a description of the information that is provided in the
output as the default. Information about optional output is described in
the next section.
INPUT SETUP
715
CHAPTER 18
Number of groups 1
Number of observations 500
Continuous
Y1 Y2 Y3 Y4
Estimator ML
Information matrix EXPECTED
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
The third part of the output consists of a summary of the analysis results.
Fit statistics, parameter estimates, and standard errors can be saved in an
external data set by using the RESULTS option of the SAVEDATA
command. Following is a description of what is included in the output.
Tests of model fit are printed first. For most analyses, these consist of
the chi-square test statistic, degrees of freedom, and p-value for the
analysis model; the chi-square test statistic, degrees of freedom, and p-
value for the baseline model of uncorrelated dependent variables; CFI
and TLI; the loglikelihood for the analysis model; the loglikelihood for
the unrestricted model; the number of free parameters in the estimated
model; AIC, BIC, and sample-size adjusted BIC; RMSEA; and SRMR.
716
OUTPUT, SAVEDATA, And PLOT Commands
Loglikelihood
H0 Value -3329.929
H1 Value -3326.522
Information Criteria
Value 6.815
Degrees of Freedom 5
P-Value 0.2348
Estimate 0.027
90 Percent C.I. 0.000 0.072
Probability RMSEA <= .05 0.755
CFI/TLI
CFI 0.999
TLI 0.997
Value 1236.962
Degrees of Freedom 10
P-Value 0.0000
Value 0.012
The results of the model estimation are printed after the tests of model
fit. The first column of the output labeled Estimates contains the model
estimated value for each parameter. The parameters are identified using
the conventions of the MODEL command. For example, factor loadings
are found in the BY statements. Other regression coefficients are found
in the ON statements. Covariances and residual covariances are found in
717
CHAPTER 18
718
OUTPUT, SAVEDATA, And PLOT Commands
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
F BY
Y1 1.000 0.000 999.000 999.000
Y2 0.907 0.046 19.908 0.000
Y3 0.921 0.045 20.509 0.000
Y4 0.949 0.046 20.480 0.000
F ON
X 0.606 0.049 12.445 0.000
Intercepts
Y1 0.132 0.051 2.608 0.009
Y2 0.118 0.049 2.393 0.017
Y3 0.061 0.048 1.268 0.205
Y4 0.076 0.050 1.529 0.126
Residual Variances
Y1 0.479 0.043 11.061 0.000
Y2 0.558 0.045 12.538 0.000
Y3 0.492 0.041 11.923 0.000
Y4 0.534 0.044 12.034 0.000
F 0.794 0.073 10.837 0.000
The second column of the output labeled S.E. contains the standard
errors of the parameter estimates. The type of standard errors produced
during model estimation is determined by the estimator that is used. The
estimator being used is printed in the summary of the analysis. Each
analysis type has a default estimator. For several analysis types, the
default estimator can be changed using the ESTIMATOR option of the
ANALYSIS command. A table of estimators that are available for each
analysis type can be found in Chapter 16.
The third column of the output labeled Est./S.E. contains the value of the
parameter estimate divided by the standard error (column 1 divided by
column 2). This statistical test is an approximately normally distributed
quantity (z-score) in large samples. The critical value for a two-tailed
test at the .05 level is an absolute value greater than 1.96. The fourth
column of the output labeled Two-Tailed P-Value gives the p-value for
the z-score in the third column.
719
CHAPTER 18
printed is too large to fit in the space provided. This happens when
variables are measured on a large scale. To reduce the risk of
computational difficulties, it is recommended to keep variables on a
scale such that their variances do not deviate too far from the range of
one to ten. Variables can be rescaled using the DEFINE command.
OUTPUT OPTIONS
SAMPSTAT
The SAMPSTAT option is used to request sample statistics for the data
being analyzed. For continuous variables, these include sample means,
sample variances, sample covariances, and sample correlations. For
binary and ordered categorical (ordinal) variables using weighted least
squares estimation, these include sample thresholds; first- and second-
order sample proportions if the model has all binary dependent variables;
sample tetrachoric, polychoric and polyserial correlations for models
without covariates; and sample probit regression coefficients and sample
probit residual correlations for models with covariates. The
SAMPSTAT option is not available for censored variables using
maximum likelihood estimation, unordered categorical (nominal)
variables, count variables, binary and ordered categorical (ordinal)
variables using maximum likelihood estimation, and time-to-event
variables. The sample correlation and covariance matrices can be saved
in an ASCII file using the SAMPLE option of the SAVEDATA
command.
SAMPLE STATISTICS
Means
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
0.104 0.092 0.035 0.050 -0.046
Covariances
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
Y1 1.608
Y2 1.028 1.487
Y3 1.027 0.957 1.451
720
OUTPUT, SAVEDATA, And PLOT Commands
Correlations
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
Y1 1.000
Y2 0.665 1.000
Y3 0.672 0.652 1.000
Y4 0.675 0.641 0.661 1.000
X 0.489 0.389 0.440 0.441 1.000
CROSSTABS
STANDARDIZED
bStdYX = b*SD(x)/SD(y),
721
CHAPTER 18
bStdY = b/SD(y).
The third type of standardization is shown under the heading Std in the
output. Std uses the variances of the continuous latent variables for
standardization.
STANDARDIZED (STDYX);
722
OUTPUT, SAVEDATA, And PLOT Commands
STDYX Standardization
Two-Tailed
Estimates S.E. Est./S.E. P-Value
F BY
Y1 0.838 0.018 47.679 0.000
Y2 0.790 0.020 38.807 0.000
Y3 0.813 0.019 42.691 0.000
Y4 0.810 0.019 42.142 0.000
F ON
X 0.545 0.034 15.873 0.000
Intercepts
Y1 0.104 0.040 2.605 0.009
Y2 0.097 0.040 2.391 0.017
Y3 0.051 0.040 1.269 0.205
Y4 0.061 0.040 1.529 0.126
Residual Variances
Y1 0.298 0.029 10.106 0.000
Y2 0.375 0.032 11.657 0.000
Y3 0.339 0.031 10.964 0.000
Y4 0.345 0.031 11.078 0.000
F 0.703 0.037 18.822 0.000
where the first column of the output labeled Estimates contains the
parameter estimate that has been standardized using the variances of the
continuous latent variables as well as the variances of the background
and outcome variables for standardization, the second column of the
output labeled S.E. contains the standard error of the standardized
parameter estimate, the third column of the output labeled Est./S.E.
contains the value of the parameter estimate divided by the standard
error (column 1 divided by column 2), and the fourth column of the
output labeled Two-Tailed P-Value gives the p-value for the z-score in
723
CHAPTER 18
RESIDUAL
724
OUTPUT, SAVEDATA, And PLOT Commands
MODINDICES
725
CHAPTER 18
When model modification indices are requested, they are provided as the
default when the modification index for a parameter is greater than or
equal to 10. The following statement requests modification indices
greater than zero:
MODINDICES (0);
Model modification indices are provided for the matrices that are opened
as part of the analysis. To request modification indices for all matrices,
specify:
MODINDICES (ALL);
or
The first column of the output labeled M.I. contains the modification
index for each parameter that is fixed or constrained to be equal to
another parameter. A modification index gives the expected drop in
chi-square if the parameter in question is freely estimated. The
parameters are labeled using the conventions of the MODEL command.
For example, factor loadings are found in the BY statements. Other
regression coefficients are found in the ON statements. Covariances and
residual covariances are found in the WITH statements. Variances,
residual variances, means, intercepts, and thresholds are found under
these headings. The scale factors used in the estimation of models with
categorical outcomes are found under the heading Scales.
WITH Statements
726
OUTPUT, SAVEDATA, And PLOT Commands
The second column of the output labeled E.P.C. contains the expected
parameter change index for each parameter that is fixed or constrained to
be equal to another parameter. An E.P.C. index provides the expected
value of the parameter in question if it is freely estimated. The third and
fourth columns of the output labeled Std E.P.C. and StdYX E.P.C.
contain the two standardized expected parameter change indices. These
indices are useful because the standardized values provide relative
comparisons. The Std E.P.C. indices are standardized using the
variances of the continuous latent variables. The StdYX E.P.C. indices
are standardized using the variances of the continuous latent variables as
well as the variances of the background and/or outcome variables.
CINTERVAL
The Bayesian settings are EQTAIL and HPD. EQTAIL is the default for
Bayesian estimation. EQTAIL produces 90%, 95%, and 99% credibility
intervals of the posterior distribution with equal tail percentages. HPD
produces 90%, 95%, and 99% credibility intervals of the posterior
distribution that give the highest posterior density (Gelman et al., 2004).
727
CHAPTER 18
CINTERVAL (BOOTSTRAP);
In the output, the parameters are labeled using the conventions of the
MODEL command. For example, factor loadings are found in the BY
statements. Other regression coefficients are found in the ON
statements. Covariances and residual covariances are found in the
WITH statements. Variances, residual variances, means, intercepts, and
thresholds will be found under these headings. The scale factors used in
the estimation of models with categorical outcomes are found under the
heading Scales. The CINTERVAL option is not available for
TYPE=EFA.
Lower .5% Lower 2.5% Lower 5% Estimate Upper 5% Upper 2.5% Upper .5%
F BY
Y1 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Y2 0.790 0.818 0.832 0.907 0.982 0.996 1.024
Y3 0.806 0.833 0.847 0.921 0.995 1.009 1.037
Y4 0.829 0.858 0.872 0.949 1.025 1.039 1.068
F ON
X 0.481 0.511 0.526 0.606 0.686 0.702 0.732
Intercepts
Y1 0.002 0.033 0.049 0.132 0.215 0.231 0.262
Y2 -0.009 0.021 0.037 0.118 0.199 0.214 0.245
Y3 -0.063 -0.033 -0.018 0.061 0.141 0.156 0.186
Y4 -0.052 -0.022 -0.006 0.077 0.159 0.175 0.205
Residual Variances
Y1 0.367 0.394 0.408 0.479 0.550 0.564 0.590
Y2 0.443 0.471 0.485 0.558 0.631 0.645 0.673
Y3 0.386 0.411 0.424 0.492 0.560 0.573 0.599
Y4 0.420 0.447 0.461 0.534 0.607 0.621 0.649
F 0.606 0.651 0.674 0.794 0.915 0.938 0.983
The fourth column of the output labeled Estimate contains the parameter
estimates. The third and fifth columns of the output labeled Lower 5%
728
OUTPUT, SAVEDATA, And PLOT Commands
and Upper 5%, respectively, contain the lower and upper bounds of the
90% confidence interval. The second and sixth columns of the output
labeled Lower 2.5% and Upper 2.5%, respectively, contain the lower
and upper bounds of the 95% confidence interval. The first and seventh
columns of the output labeled Lower .5% and Upper .5%, respectively,
contain the lower and upper bounds of the 99% confidence interval.
SVALUES
SVALUES;
NOCHISQUARE
NOCHISQUARE;
NOSERROR
729
CHAPTER 18
NOSERROR;
H1SE
The H1SE option is used with the ML, MLR, and MLF estimators to
request standard errors for the unrestricted H1 model. It must be used in
conjunction with TYPE=BASIC or the SAMPSTAT option of the
OUTPUT command. It is not available for any other analysis type and it
cannot be used in conjunction with the BOOTSTRAP option of the
ANALYSIS command.
H1TECH3
PATTERNS
1 2 3 4 5 6 7 8 9 10 11 12 13
Y1 x x x x x x x x
Y2 x x x x x x x
Y3 x x x x x x x
Y4 x x x x x x
730
OUTPUT, SAVEDATA, And PLOT Commands
The second part of the output shows the frequency with which each
pattern is observed in the data. For example, 984 individuals have
pattern 1 whereas 14 have pattern 7.
FSCOEFFICIENT
FSDETERMINACY
731
CHAPTER 18
FACTOR DETERMINACIES
F 0.945
BASEHAZARD
LOGRANK
The D option is used with one-factor models with all categorical factor
indicators to provide a number to be used in the translation from the
factor model parameterization to the IRT parameterization. The default
is one. Any number can be provided. The number 1.7 is commonly
used in this translation. The D option is specified as follows:
D(1.7);
where 1.7 is the number to be used in the translation from the factor
model parameterization to the IRT parameterization.
732
OUTPUT, SAVEDATA, And PLOT Commands
TECH1
PARAMETER SPECIFICATION
NU
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
1 2 3 4 0
LAMBDA
F X
________ ________
Y1 0 0
Y2 5 0
Y3 6 0
Y4 7 0
X 0 0
THETA
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
Y1 8
Y2 0 9
Y3 0 0 10
Y4 0 0 0 11
X 0 0 0 0 0
ALPHA
F X
________ ________
0 0
BETA
F X
________ ________
F 0 12
X 0 0
733
CHAPTER 18
PSI
F X
________ ________
F 13
X 0 0
STARTING VALUES
NU
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
0.104 0.092 0.035 0.050 0.000
LAMBDA
F X
________ ________
Y1 1.000 0.000
Y2 1.000 0.000
Y3 1.000 0.000
Y4 1.000 0.000
X 0.000 1.000
THETA
Y1 Y2 Y3 Y4 X
________ ________ ________ ________ ________
Y1 0.806
Y2 0.000 0.745
Y3 0.000 0.000 0.727
Y4 0.000 0.000 0.000 0.777
X 0.000 0.000 0.000 0.000 0.000
ALPHA
F X
________ ________
0.000 -0.046
BETA
F X
________ ________
F 0.000 0.000
X 0.000 0.000
PSI
F X
________ ________
F 0.050
X 0.000 0.912
734
OUTPUT, SAVEDATA, And PLOT Commands
TECH2
TECHNICAL 2 OUTPUT
DERIVATIVES
735
CHAPTER 18
TECH3
TECH4
TECHNICAL 4 OUTPUT
736
OUTPUT, SAVEDATA, And PLOT Commands
TECH5
TECH6
TECH7
TECH8
TECH9
737
CHAPTER 18
TECH10
TECH11
TECH12
738
OUTPUT, SAVEDATA, And PLOT Commands
TECH13
TECH14
739
CHAPTER 18
draws, are then generated using the parameter estimates from the k-1
class model. These data are analyzed for both the k and k-1 class models
to obtain loglikelihood values which are used to compute a likelihood
ratio test statistic for each bootstrap draw. The likelihood ratio test
statistic from the initial analysis is compared to the distribution of
likelihood ratio test statistics obtained from the bootstrap draws to
compute a p-value which is used to decide if the k-1 class model fits the
data as well as the k class model.
In the TECH14 output, the H0 loglikelihood value given is for the k-1
class model. It is important to check that the H0 loglikelihood value in
the TECH14 output is the same as the loglikelihood value for the H0
model obtained in a previous k-1 class analysis. If it is not the same, the
K-1STARTS option of the ANALYSIS command can be used to
increase the number of random starts for the estimation of the k-1 class
model for TECH14.
740
OUTPUT, SAVEDATA, And PLOT Commands
TECH15
TECH16
741
CHAPTER 18
TAU
NU
LAMBDA
THETA
The theta matrix contains the residual variances and covariances of the
observed dependent variables or the latent response variables. The rows
and columns both represent the observed dependent variables.
ALPHA
The alpha vector contains the means and/or intercepts of the continuous
latent variables.
BETA
The beta matrix contains the regression coefficients for the regressions
of continuous latent variables on continuous latent variables. Both the
rows and columns represent continuous latent variables.
742
OUTPUT, SAVEDATA, And PLOT Commands
GAMMA
PSI
The psi matrix contains the variances and covariances of the continuous
latent variables. Both the rows and columns represent the continuous
latent variables in the model.
DELTA
ALPHA (C)
The alpha (c) vector contains the mean or intercept of the categorical
latent variables.
LAMBDA (U)
The lambda (u) matrix contains the intercepts of the binary observed
variables that are influenced by the categorical latent variables. The
rows of lambda (u) represent the binary observed variables in the model.
The columns of lambda (u) represent the classes of the categorical latent
variables in the model.
TAU (U)
The tau (u) vector contains the thresholds of the categorical observed
variables that are influenced by the categorical latent variables. The
elements are in the order of thresholds within variables.
743
CHAPTER 18
GAMMA (C)
The gamma (c) matrix contains the regression coefficients for the
regressions of the categorical latent variables on observed independent
variables. The rows represent the latent classes. The columns represent
the observed independent variables in the model.
KAPPA (U)
The kappa (u) matrix contains the regression coefficients for the
regressions of the binary observed variables on the observed independent
variables. The rows represent the binary observed variables. The
columns represent the observed independent variables in the model.
ALPHA (F)
The alpha (f) vector contains the means and/or intercepts of the growth
factors for the categorical observed variables that are influenced by the
categorical latent variables.
LAMBDA (F)
The lambda (f) matrix contains the fixed loadings that describe the
growth of the categorical observed variables that are influenced by the
categorical latent variables. The rows represent the categorical observed
variables. The columns represent the growth factors.
GAMMA (F)
The gamma (f) matrix contains the regression coefficients for the
regressions of the growth factors on the observed independent variables
and the regression coefficients for the regressions of the categorical
observed variables on the observed independent variables.
744
OUTPUT, SAVEDATA, And PLOT Commands
• Analysis data
• Sample correlation or covariance matrix
• Estimated sigma between matrix from TYPE=TWOLEVEL
• Within- and between-level sample statistics for weighted least
squares estimation
• Analysis results
• Parameter estimates for use in the MONTECARLO command
• Derivatives from an H1 model
• TECH3
• TECH4
• Kaplan-Meier survival curve values for continuous-time survival
• Baseline hazard values for continuous-time survival
• Estimated baseline hazard curve values for continuous-time survival
• Factor scores, posterior probabilities, and most likely class
membership for each response pattern
• Factor scores
• Latent response variables
• Posterior probabilities for each class and most likely class
membership
• Outliers
• Bayesian posterior parameter values
SAVEDATA:
745
CHAPTER 18
746
OUTPUT, SAVEDATA, And PLOT Commands
FILE
The FILE option is used to specify the name of the ASCII file in which
the individual-level data used in the analysis will be saved. Following is
an example of how to specify the FILE option:
FILE IS newdata.dat;
FORMAT
The FORMAT option is used to specify the format in which the analysis
data will be saved. This option cannot be used for saving other types of
data. All dependent and independent variables used in the analysis are
saved. In addition, all other variables that are used in conjunction with
the analysis are saved. The name of the data set along with the names of
the variables saved and the format are printed in the output. The default
is to save the analysis variables using a fixed format.
FORMAT IS FREE;
Individual data can also be saved in a fixed format specified by the user.
The user has the choice of which F or E format the analysis variables are
747
CHAPTER 18
FORMAT IS F2.0;
which indicates that all analysis variables will be saved with an F2.0
format.
MISSFLAG
The MISSFLAG option is used to specify the missing value flag to use
in the data set named in the FILE option of the SAVEDATA command.
The default is the asterisk (*). The period (.) and any number can be
used instead. All variables must have the same missing value flag.
RECORDLENGTH
The RECORDLENGTH option is used to specify the number of
characters per record in the file to which the analysis data are saved. It
cannot be used for saving other types of data. The default and maximum
record length is 5000. Following is an example of how the
RECORDLENGTH option is specified:
RECORDLENGTH = 220;
SAMPLE
The SAMPLE option is used to specify the name of the ASCII file in
which the sample statistics such as the correlation or covariance matrix
will be saved. Following is an example of how to specify the SAMPLE
option:
SAMPLE IS sample.dat;
where sample.dat is the name of the file in which the sample statistics
will be saved. If the working directory contains a file of the same name,
it will be overwritten. The data are saved using free format delimited by
a space.
748
OUTPUT, SAVEDATA, And PLOT Commands
COVARIANCE
The COVARIANCE option is used to specify the name of the ASCII file
in which the model estimated covariance matrix for continuous analysis
variables is saved. Following is an example of how this option is
specified:
COVARIANCE = cov.dat;
where cov.dat is the name of the file in which the covariance matrix for
continuous analysis variables will be saved. If the working directory
contains a file of the same name, it will be overwritten. The data are
saved using free format delimited by a space.
SIGBETWEEN
The SIGBETWEEN option is used to specify the name of the ASCII file
in which the estimated sigma between covariance matrix or the estimated
sigma between correlation matrix will be saved. For maximum
likelihood estimation, it is the consistent maximum likelihood estimate
of sigma between. For weighted least squares estimation, it is the
pairwise maximum likelihood estimated sigma between covariance and
correlation matrices. For ESTIMATOR=MUML, it is the unbiased
estimate of sigma between. Following is an example of how to specify
the SIGB option:
749
CHAPTER 18
SIGBETWEEN IS sigma.dat;
where sigma.dat is the name of the file in which the estimated sigma
between matrix will be saved. If the working directory contains a file of
the same name, it will be overwritten. The data are saved using free
format delimited by a space.
SWMATRIX
The SWMATRIX option is used with TYPE=TWOLEVEL and
weighted least squares estimation to specify the name of the ASCII file
in which the within- and between-level sample statistics and their
corresponding estimated asymptotic covariance matrix will be saved.
The univariate and bivariate sample statistics are estimated using one-
and two-dimensional numerical integration with a default of 7
integration points. The INTEGRATION option of the ANALYSIS
command can be used to change the default. It is recommended to save
this information and use it in subsequent analyses along with the raw
data to reduce computational time during model estimation. Analyses
using this information must have the same set of observed dependent and
independent variables, the same DEFINE command, the same
USEOBSERVATIONS statement, and the same USEVARIABLES
statement as the analysis which was used to save the information.
SWMATRIX IS swmatrix.dat;
where swmatrix.dat is the name of the file in which the analysis results
will be saved. If the working directory contains a file of the same name,
it will be overwritten.
SWMATRIX IS sw*.dat;
750
OUTPUT, SAVEDATA, And PLOT Commands
where the asterisk (*) is replaced by the number of the imputed data set.
A file is also produced that contains the names of all of the imputed data
sets. To name this file, the asterisk (*) is replaced by the word list. The
file, in this case swlist.dat, contains the names of the imputed data sets.
This file is used with the SWMATRIX of the DATA command in
subsequent analyses.
RESULTS
The RESULTS option is used to specify the name of the ASCII file in
which the results of an analysis will be saved. The results saved include
parameter estimates, standard errors of the parameter estimates, and fit
statistics. If the STANDARDIZED option of the OUTPUT command is
used, standardized parameters estimates and their standard errors will
also be saved. Following is an example of how to specify the RESULTS
option:
where results.dat is the name of the file in which the analysis results will
be saved. If the working directory contains a file of the same name, it
will be overwritten. The data are saved using free format delimited by a
space.
ESTIMATES
The ESTIMATES option is used to specify the name of the ASCII file in
which the parameter estimates of an analysis will be saved. The saved
parameter estimates can be used in a subsequent Monte Carlo simulation
study as population values for data generation and/or coverage values
using the POPULATION and/or COVERAGE options of the
MONTECARLO command. The SVALUES option is an alternative to
the ESTIMATES option. The SVALUES option creates input
statements that contain parameter estimates from the analysis as starting
values.
751
CHAPTER 18
DIFFTEST
The DIFFTEST option is used in conjunction with the MLMV and
WLSMV estimators to specify the name of the ASCII file in which the
derivatives from an H1 model will be saved. These derivatives are used
in the subsequent estimation of an H0 model to compute a chi-square
difference test using the DIFFTEST option of the ANALYSIS
command. The H1 model is the less restrictive model. The H0 model is
the more restrictive model nested within H1. Following is an example of
how to specify the DIFFTEST option:
DIFFTEST IS deriv.dat;
where deriv.dat is the name of the file in which the derivatives from the
H1 model will be saved. If the working directory contains a file of the
same name, it will be overwritten. The data are saved using free format
delimited by a space.
TECH3
The TECH3 option is used to specify the name of the ASCII file in
which the covariance matrix of parameter estimates will be saved.
Following is an example of how to specify the TECH3 option:
TECH3 IS tech3.dat;
where tech3.dat is the name of the file in which the covariance matrix of
parameter estimates will be saved. If the working directory contains a
file of the same name, it will be overwritten. The data are saved using
free format delimited by a space.
TECH4
The TECH4 option is used to specify the name of the ASCII file in
which the estimated means and covariance matrix for the latent variables
752
OUTPUT, SAVEDATA, And PLOT Commands
TECH4 IS tech4.dat;
where tech4.dat is the name of the file in which the estimated means and
covariance matrix for the latent variables will be saved. If the working
directory contains a file of the same name, it will be overwritten. The
data are saved using free format delimited by a space.
KAPLANMEIER
The KAPLANMEIER option is used to specify the name of the ASCII
file in which the y- and x-axis values for the Kaplan-Meier survival
curve for continuous-time survival analysis will be saved. This option is
available only with the SURVIVAL option. Following is an example of
how this option is specified:
KAPLANMEIER IS kapmeier.dat;
where kapmeier.dat is the name of the file in which the survival curve
values will be saved. If the working directory contains a file of the same
name, it will be overwritten. The data are saved using free format
delimited by a space.
BASEHAZARD
The BASEHAZARD option is used to specify the name of the ASCII file
in which the estimated baseline hazard values for continuous-time
survival analysis will be saved. This option is available only with the
SURVIVAL option. Following is an example of how this option is
specified:
BASEHAZARD IS base.dat;
where base.dat is the name of the file in which the estimated baseline
hazard values will be saved. If the working directory contains a file of
the same name, it will be overwritten. The data are saved using free
format delimited by a space.
753
CHAPTER 18
ESTBASELINE
The ESTBASELINE option is used to specify the name of the ASCII file
in which the y- and x-axis values for the estimated baseline hazard curve
of the continuous-time survival analysis will be saved. This option is
available only with the SURVIVAL option. Following is an example of
how this option is specified:
ESTBASELINE IS estbase.dat;
where estbase.dat is the name of the file in which the estimated baseline
hazard curve values will be saved. If the working directory contains a
file of the same name, it will be overwritten. The data are saved using
free format delimited by a space.
RESPONSE
The RESPONSE option is used with single-level models and the ML,
MLR, and MLF estimators when all dependent variables are categorical
to specify the name of the ASCII file in which information about each
response pattern is saved. It is not available for models with covariates.
It is available for TYPE=EFA and TYPE=MIXTURE EFA when the
lower and upper limits of the number of factors to be extracted is the
same. If the model has continuous latent variables, factor scores and the
standard errors of the factor scores are saved. For TYPE=MIXTURE,
the factor scores based on most likely class membership are saved in
addition to posterior probabilities for each class and most likely class
membership for each response pattern. The RESPONSE option is not
available for the KNOWNCLASS and TRAINING options of the
VARIABLE command. Following is an example of how to specify the
RESPONSE option:
RESPONSE IS response.dat;
754
OUTPUT, SAVEDATA, And PLOT Commands
MULTIPLIER
The MULTIPLIER option is used with the JACKKNIFE setting of the
REPSE option to specify the name of the ASCII file in which the
multiplier values are saved. Following is an example of how to specify
the MULTIPLIER option:
MULTIPLIER IS multiplier.dat;
where multiplier.dat is the name of the file in which the multiplier values
are saved. If the working directory contains a file of the same name, it
will be overwritten. The values are saved as E15.8.
BPARAMETERS
The BPARAMETERS option is used in Bayesian analysis to specify the
name of the ASCII file in which the Bayesian posterior parameter values
for all iterations are saved. Following is an example of how this option
is specified:
BPARAMETERS = bayes.dat;
where bayes.dat is the name of the file in which the Bayesian posterior
parameter values for all iterations will be saved. If the working directory
contains a file of the same name, it will be overwritten. The data are
saved using free format delimited by a space.
TYPE
The TYPE option is used to specify the type of matrix to be saved. It
can be used in conjunction with the SAMPLE and SIGB options to
override the default. The default matrix for the SAMPLE option is the
covariance matrix for continuous outcomes, the correlation matrix for
categorical outcomes, and the correlation matrix for combinations of
continuous and categorical outcomes. The default matrix for the SIGB
option is the covariance matrix. If the default matrix is the covariance
matrix, a correlation matrix can be requested by the following statement:
TYPE = CORRELATION;
755
CHAPTER 18
SAVE
The SAVE option is used to save factor scores, posterior probabilities
for each class, and outliers along with the analysis and/or auxiliary
variables.
FSCORES
SAVE = FSCORES;
where 50 is the number of imputations or draws that are used from the
Bayesian posterior distribution to compute the plausible value
distribution for each observation. The number of imputations or draws
must be specified. There is no default.
The FACTORS option is used to specify the names of the factors for
which the plausible value distributions will be saved. Following is an
example of how this option is specified:
FACTORS = f1 f2 f3;
756
OUTPUT, SAVEDATA, And PLOT Commands
where f1, f2, and f3 are the factors for which the plausible value
distributions will be saved. If the PLOT command is used, these
plausible values will be saved for plotting.
LRESPONSES
where 50 is the number of imputations or draws that are used from the
Bayesian posterior distribution to compute the latent response variable
distribution for each observation. The number of imputations or draws
must be specified. There is no default.
LRESPONSES = u1 u2 u3;
where u1, u2, and u3 are the latent response variables underlying
categorical outcomes for which the latent response variable distributions
will be saved.
CPROBABILITIES
757
CHAPTER 18
SAVE = CPROBABILITIES;
REPWEIGHTS
SAVE = REPWEIGHTS;
MAHALANOBIS
SAVE = MAHALANOBIS;
LOGLIKELIHOOD
SAVE = LOGLIKELIHOOD;
758
OUTPUT, SAVEDATA, And PLOT Commands
INFLUENCE
SAVE = INFLUENCE;
COOKS
SAVE = COOKS;
FACTORS
The FACTORS option is used in conjunction with
ESTIMATOR=BAYES to specify the names of the factors for which the
distribution of factor scores, called plausible values, will be saved.
Following is an example of how to specify the FACTORS option:
FACTORS = f1 f2 f3;
759
CHAPTER 18
where f1, f2, and f3 are the factors for which the plausible value
distributions will be saved.
LRESPONSES
The LRESPONSES option is used in conjunction with ESTIMATOR =
BAYES to specify the names of the latent response variables underlying
the categorical outcomes for which the plausible value distributions will
be saved. Following is an example of how to specify the LRESPONSES
option:
LRESPONSES = u1 u2 u3;
where u1, u2, and u3 are the latent responses variables underlying the
categorical outcomes for which the latent response variable distributions
will be saved.
MFILE
The MFILE option is used to specify the name and location of the ASCII
file that is merged with the file named in the FILE option of the DATA
command. It is specified as
MFILE IS c:\merge\merge.dat;
where merge.dat is the name of the ASCII file containing the data to be
merged with the data set named using the FILE option of the DATA
command. In this example, the file merge.dat is located in the directory
c:\merge. If the full path name of the data set contains any blanks, the
full path name must have quotes around it.
760
OUTPUT, SAVEDATA, And PLOT Commands
If the name of the data set is specified with a path, the directory
specified by the path is checked. If the name of the data set is specified
without a path, the local directory is checked. If the data set is not found
in the local directory, the directory where the input file is located is
checked.
MNAMES
MFORMAT
The MFORMAT option is used to describe the format of the data set to
be merged with the analysis data set. Individual data can be in fixed or
free format. Free format is the default. Fixed format is recommended
for large data sets because it is faster to read data using a fixed format.
761
CHAPTER 18
not allowed. The number of variables in the data set is determined from
information provided in the MNAMES option of the SAVEDATA
command. Data are read until the number of pieces of information equal
to the number of variables is found. The program then goes to the next
record to begin reading information for the next observation.
For data in fixed format, each observation must have the same number of
records. Information for a given variable must occupy the same position
on the same record for each observation. A FORTRAN-like format
statement describing the position of the variables in the data set is
required. See the FORMAT option of the DATA command for a
description of how to specify a format statement.
MMISSING
MSELECT
The MSELECT option is used to select the variables from the data set to
be merged with the analysis data set. Variables included on the
MSELECT list must come from the MNAMES statement. The
MSELECT option is specified as follows:
762
OUTPUT, SAVEDATA, And PLOT Commands
PLOT:
TYPE IS PLOT1;
PLOT2;
PLOT3;
SERIES IS list of variables in a series plus x-axis
values;
FACTORS ARE names of factors (#);
LRESPONSES ARE names of latent response variables (#);
OUTLIERS ARE MAHALANOBIS;
LOGLIKELIHOOD;
INFLUENCE;
COOKS;
MONITOR IS ON; OFF
OFF;
TYPE
The TYPE option is used to specify the types of plots that are requested.
The TYPE option has three settings: PLOT1, PLOT2, and PLOT3.
763
CHAPTER 18
• Estimated means
• Sample proportions and estimated probabilities
• Plot estimated probabilities only
• Plot sample proportions only
• Plot estimated probabilities and sample proportions
• Plot estimated probabilities, conditional on a set of covariates
• Plot conditional estimated probabilities as a function of one
covariate
• Sample and estimated means
• Adjusted estimated means
• IRT plots
• Item characteristic curves
• Information curves
• Eigenvalues for EFA
• Mixture distributions
• Survival curves
• Kaplan-Meier curve
• Sample log cumulative hazard curve
• Estimated baseline hazard curve
• Estimated baseline survival curve
• Estimated log cumulative baseline curve
• Kaplan-Meier curve with estimated baseline survival curve
• Sample log cumulative hazard curve with estimated log
cumulative baseline curve
• Estimated survival curve
• Estimated log cumulative curve
• Missing data plots
• Dropout means
• Sample means
• Bayesian plots
• Posterior parameter distributions
• Posterior parameter trace plots
• Autocorrelation plots
• Prior parameter distributions
• Posterior predictive checking scatterplots
• Posterior predictive checking distribution plots
• Loop plots
764
OUTPUT, SAVEDATA, And PLOT Commands
• Histograms of outliers
• Histograms of estimated values
• Scatterplots (sample values, estimated factor scores, outliers,
estimated values)
• Observed individual values
• Estimated individual values
• Estimated individual probability values
• Estimated means and observed individual values
• Estimated means and estimated individual values
• Adjusted estimated means and observed individual values
• Adjusted estimated means and estimated individual values
• Estimated probabilities for a categorical latent variable as a function
of its covariates
• Latent variable distribution plots
Plots can be generated for the total sample, by group, by class, and
adjusted for covariates.
SERIES
The SERIES option is used to list the names of the set of variables to be
used in plots where the values are connected by a line. The x-axis values
for each variable must also be given. For growth models, the set of
variables is the repeated measures of the outcome over time, and the x-
axis values are the time scores in the growth model. For other models,
the set of variables reflects an ordering of the observed variables in the
plot. Non-series plots such as histograms and scatterplots are available
for all analyses.
Values for the x axis can be given in three ways: by putting the x-axis
values in parentheses following each variable in the series; by using an
asterisk (*) in parentheses to request integer values starting with 0 and
increasing by 1; and for growth models, by putting the name of the slope
growth factor in parentheses following each outcome or a list of the
outcomes to request time score values.
765
CHAPTER 18
SERIES = y1 y2 y3 y4 (*);
or
where slope is the name of the slope growth factor. The list function can
also be used with the SERIES option. It is specified as follows:
This results in the time scores for the slope growth factor being used as
the x-axis values.
The SERIES option can be used to give variables and x-axis values for
more than one series. The list of variables for each series is separated by
the | symbol. Following is an example for two growth processes:
where for the first growth process, the time score for y1 is 0, the time
score for y2 is 1, the time score for y3 is 2, and the time score for y4 is 3;
766
OUTPUT, SAVEDATA, And PLOT Commands
and for the second growth process, the time score for y5 is 0, the time
score for y6 is 1, the time score for y7 is 4, and the time score for y8 is 5.
Using the list function and the name of the slope growth factor, the
SERIES option is specified as:
where s1 is the name of the slope growth factor for the first growth
process and s2 is the name of the slope growth factor for the second
growth process. The names of the slope growth factors are defined in
the MODEL command.
FACTORS
The FACTORS option is used in conjunction with
ESTIMATOR=BAYES to specify the names of the factors for which the
distributions of factor scores, called plausible values, will be saved for
plotting. Following is an example of how to specify the FACTORS
option:
FACTORS = f1 f2 f3 (50);
where 50 is the number of imputations or draws that are used from the
Bayesian posterior distribution to compute the plausible value
distribution for each observation. F1, f2, and f3 are the factors for which
the plausible value distributions will be saved for plotting.
LRESPONSES
The LRESPONSES option is used in conjunction with
ESTIMATOR=BAYES to specify the names of the latent response
variables underlying the categorical outcomes for which the plausible
value distributions will be saved for plotting. Following is an example
of how to specify the LRESPONSES option:
LRESPONSES = u1 u2 u3 (50);
where 50 is the number of imputations or draws that are used from the
Bayesian posterior distribution to compute the latent response variable
distributions for each observation. U1, u2, and u3 are the latent response
767
CHAPTER 18
OUTLIERS
The OUTLIERS option is used to select the outliers that will be saved
for use in graphical displays. The OUTLIERS option has the following
settings:
With this specification, the Mahalanobis distance and its p-value and
Cook’s D will be saved for use in graphical displays.
MONITOR
The MONITOR option is used to request that certain plots be shown on
the monitor during model estimation. The default is OFF. To request
that the plots be shown specify:
MONITOR = ON:
For Bayesian analysis, trace plots are shown when one chain is used.
For all models except TYPE=GENERAL and TYPE=EFA,
loglikelihoods are shown.
768
OUTPUT, SAVEDATA, And PLOT Commands
Plots can be viewed by selecting the View plots item under the Plot
menu or by clicking on the V button on the toolbar. A list of plots
available appears in the window as shown below.
After a plot is selected, a window appears showing ways that the plot
can be customized. For example, if observed individual curves is
selected, the following window appears:
769
CHAPTER 18
770
OUTPUT, SAVEDATA, And PLOT Commands
771
CHAPTER 18
772
OUTPUT, SAVEDATA, And PLOT Commands
The plots can be exported as a DIB, EMF, or JPEG file using the Export
plot to item of the Plot menu. In addition, the data for each plot can be
saved in an external file using the Save plot data item of the Plot menu
for subsequent use by another program.
773
CHAPTER 18
774
MONTECARLO Command
CHAPTER 19
MONTECARLO COMMAND
MONTECARLO:
775
CHAPTER 19
GENERAL SPECIFICATIONS
The NAMES, NOBSERVATIONS, NGROUPS, NREPS, and SEED
options are used to give the basic specifications for a Monte Carlo
simulation study. These options are described below.
776
MONTECARLO Command
NAMES
NOBSERVATIONS
NOBSERVATIONS = 500;
where 500 is the sample size to be used for data generation and in the
analysis.
If the data being generated are for a multiple group analysis, a sample
size must be specified for each group. In multiple group analysis, the
NOBSERVATIONS option is specified as follows:
where a sample size of 500 is used for data generation and in the
analysis in the first group and a sample size of 1000 is used for data
generation and in the analysis for the second group.
NGROUPS
777
CHAPTER 19
NGROUPS = 3;
where 3 is the number of groups to be used for data generation and in the
analysis. The default for the NGROUPS option is 1. The NGROUPS
option is not available for TYPE=MIXTURE.
For Monte Carlo studies, the program automatically assigns the label g1
to the first group, g2 to the second group, etc. These labels are used with
the MODEL POPULATION and MODEL commands to describe the
data generation and analysis models for each group.
NREPS
NREPS = 100;
where 100 is the number of samples that are drawn and the number of
analyses that are carried out. The default for the NREPS option is 1.
SEED
The SEED option is used to specify the seed to be used for the random
draws. The SEED option is specified as follows:
SEED = 23458256;
where 23458256 is the random seed to be used for the random draws.
The default for the SEED option is zero.
DATA GENERATION
The GENERATE, CUTPOINTS, GENCLASSES, NCSIZES, CSIZES,
and HAZARDC options are used in conjunction with the MODEL
POPULATION command to specify how data are to be generated for a
Monte Carlo simulation study. These options are described below.
778
MONTECARLO Command
GENERATE
Count variables can be generated for the following six models: Poisson,
zero-inflated Poisson, negative binomial, zero-inflated negative
binomial, zero-truncated negative binomial, and negative binomial
hurdle (Long, 1997; Hilbe, 2011). The letter c or p in parentheses
779
CHAPTER 19
following the variable name indicates that the variable is generated using
a Poisson model. The letters ci or pi in parentheses following the
variable name indicate that the variable is generated using a zero-inflated
Poisson model. The letters nb in parentheses following the variable
name indicate that the variable is generated using a negative binomial
model. The letters nbi in parentheses following the variable name
indicate that the variable is generated using a zero-inflated negative
binomial model. The letters nbt in parentheses following the variable
name indicate that the variable is generated using a zero-truncated
negative binomial model. The letters nbh in parentheses following the
variable name indicate that the variable is generated using a negative
binomial hurdle model.
780
MONTECARLO Command
CUTPOINTS
where the cutpoints before the | symbol are the cutpoints for group 1 and
the cutpoints after the | symbol are the cutpoints for group 2.
GENCLASSES
781
CHAPTER 19
where c1, c2, and c3 are the names of the three categorical latent
variables in the data generation model. The numbers in parentheses are
the number of classes that will be used for each categorical latent
variable for data generation. Three classes will be used for data
generation for c1, two classes for c2, and three classes for c3.
The letter b following the number of classes specifies that the categorical
latent variable is a between-level variable. Following is an example of
how to specify that a categorical latent variable being generated is a
between-level variable:
GENCLASSES = cb (2 b);
NCSIZES
NCSIZES = 3;
782
MONTECARLO Command
NCSIZES = 3 | 2;
NCSIZES = 3 [2];
where the numbers 3 and 2 are the unique cluster sizes to be used for
data generation. In this example, 3 is the number of unique cluster sizes
for level 3, school, and 2 is the number of unique cluster sizes for level
2, classroom.
where the numbers 3 and 2 are the unique cluster sizes to be used for
data generation in group 1 and the numbers 4 and 3 are the unique
cluster sizes to be used for data generation in group 2. In this example, 3
is the number of unique cluster sizes for level 3, school, and 2 is the
number of unique cluster sizes for level 2, classroom, for group 1 and 4
is the number of unique cluster sizes for level 3, school, and 3 is the
number of unique clusters sizes for level 2, classroom, for group 2.
NCSIZES = 3 [2];
783
CHAPTER 19
where the numbers 3 and 2 are the unique cluster sizes to be used for
data generation. In this example, 3 is the number of unique cluster sizes
for level 2a, school, and 2 is the number of unique cluster sizes for level
2b, neighborhood.
CSIZES
where the numbers 40, 30, and 7 are the number of level 3, school,
clusters. There are a total of 77 level 3, school, clusters. The 40 level 3,
school, clusters are made up of 15 level 2, class, clusters of size two and
10 level 2, class, clusters of size 5 for a total of 3200 observations. The
30 level 3, school, clusters are made up of 6 level 2, class, clusters of
784
MONTECARLO Command
size 8 for a total of 1440 observations. The 7 level 3, school, clusters are
made up of 20 level 2, class, clusters of size 2 for a total of 280
observations. The total sample size for data generation is 4920.
where the numbers 30 and 7 are the number of level 3, school, clusters
for group 1 and the numbers 40 and 20 are the number of level 3, school,
clusters for group 2. There are a total of 37 level 3, school, clusters for
group 1 and 60 level 3, school, clusters for group 2. For group 1, the 30
level 3, school, clusters are made up of 6 level 2, class, clusters of size 8
for a total of 1440 observations. The 7 level 3, school, clusters are made
up of 20 level 2, class, clusters of size 2 for a total of 280 observations.
For group 1, the total sample size for data generation is 1720. For group
2, the 40 level 3, school, clusters are made up of 5 level 2, class, clusters
of size 6 for a total of 1200 observations. The 20 level 3, school,
clusters are made up of 4 level 2, class, clusters of size 2 for a total of
160 observations. For group 2, the total sample size for data generation
is 1360.
where the numbers 40, 30, and 7 are the number of level 2b,
neighborhood, clusters. There are a total of 77 level 2b, neighborhood,
clusters. The numbers 15, 10, 6, and 20 are the number of level 2a,
school, clusters. There are a total of 51 level 2a, school, clusters. The
40 level 2b, neighborhood, clusters are crossed with the 15 level 2a,
school, clusters and the 10 level 2a, school, clusters. Each cell of the 40
by 15 cross-classification contains 2 students for a total of 1200
observations. Each cell of the 40 by 10 cross-classification contains 5
students for a total of 2000 observations. The 30 level 2b,
neighborhood, clusters are crossed with 6 level 2a, school, clusters.
785
CHAPTER 19
HAZARDC
The HAZARDC option is used to specify the hazard for the censoring
process in continuous-time survival analysis when time-to-event
variables are generated. This information is used to create a censoring
indicator variable where zero is not censored and one is right censored.
The HAZARDC option is specified as follows:
HAZARDC = t1 (.5);
PATMISS
The PATMISS option is used to specify the missing data patterns and
the proportion of data that are missing to be used in missing data
generation for each dependent variable in the model. Any variable in the
NAMES statement that is not listed in a missing data pattern is assumed
786
MONTECARLO Command
The statement above specifies that there are three missing data patterns
which are separated by the | symbol. The number in parentheses
following each variable is the probability of missingness to be used for
that variable in data generation. In the first pattern, y1, y2, y3 are
observed with missingness probabilities of .2, .3, and .1, respectively. In
the second pattern, y2, y3, y4 are observed with missingness
probabilities of .2, .1, and .3, respectively. In the third pattern, y3 and y4
are observed with missingness probabilities of .1 and .3, respectively.
Assuming that the NAMES statement includes variables y1, y2, y3, and
y4, individuals in the first pattern have no missing data on variable y4;
individuals in the second pattern have no missing data on variable y1;
and individuals in the third pattern have no missing data on variables y1
and y2.
PATPROBS
PATPROBS = .4 | .3 | .3;
where missing data pattern one has probability .40 of being observed in
the data being generated, missing data pattern two has probability .30 of
being observed in the data being generated, and missing data pattern
three has probability .30 of being observed in the data being generated.
The missing data pattern probabilities must sum to one.
787
CHAPTER 19
MISSING
MISSING = y1 y2 u1;
which indicates that missing data will be generated for variables y1, y2,
and u1. The probabilities of missingness are described using the
MODEL MISSING command which is described in Chapter 17.
CENSORED
788
MONTECARLO Command
where y1, y2, y3, y4 are censored dependent variables in the analysis.
The letter a in parentheses following the variable name indicates that the
variable is censored from above. The letter b in parentheses following
the variable name indicates that the variable is censored from below.
The lower and upper censoring limits are determined from the data
generation.
where y1, y2, y3, y4 are censored dependent variables in the analysis.
The letters ai in parentheses following the variable name indicates that
the variable is censored from above and that a censored-inflated model
will be estimated. The letter bi in parentheses following the variable
name indicates that the variable is censored from below and that a
censored-inflated model will be estimated. The lower and upper
censoring limits are determined from the data generation.
CATEGORICAL
789
CHAPTER 19
where u2, u3, u7, u8, u9, u10, u11, u12, and u13 are binary or ordered
categorical dependent variables in the analysis.
NOMINAL
For nominal dependent variables, all categories but the last category can
be referred to. The last category is the reference category. The
categories are referred to in the MODEL command by adding to the
variable name the number sign (#) followed by a number. The three
categories of a four-category nominal variable are referred to as u1#1,
u1#2, and u1#3.
COUNT
790
MONTECARLO Command
The COUNT option can be specified in two ways for a Poisson model:
COUNT = u1 u2 u3 u4;
or
or
where u1, u2, u3, and u4 are count dependent variables in the analysis.
The letter i or pi in parentheses following the variable name indicates
that a zero-inflated Poisson model will be estimated.
791
CHAPTER 19
792
MONTECARLO Command
CLASSES
where c1, c2, and c3 are the names of the three categorical latent
variables in the analysis model. The numbers in parentheses specify the
number of classes that will be used for each categorical latent variable in
the analysis. The categorical latent variable c1 has two classes, c2 has
two classes, and c3 has three classes.
AUXILIARY
Auxiliary variables are variables that are not part of the analysis model.
The AUXILIARY option has five uses with TYPE=MIXTURE. The
first two uses are to identify a set of variables not used in the analysis
that are possible covariates in a multinomial logistic regression for a
categorical latent variable. The last three uses are to identify a set of
variables not used in the analysis for which the equality of means across
latent classes will be tested. Only one of these five can be used in an
analysis at a time.
793
CHAPTER 19
where race, ses, x1, x2, x3, x4, and x5 will be used as covariates in a
multinomial logistic regression in a mixture model.
where race, ses, x1, x2, x3, x4, and x5 will be used as covariates in a
multinomial logistic regression in a mixture model.
where the equality of means for race, ses, and gender will be tested
across classes.
794
MONTECARLO Command
where the equality of means for drinks and depress will be separately
tested across classes.
where the equality of means for drinks and depress will be separately
tested across classes.
795
CHAPTER 19
that the equality of means for drinks and depress will be separately
tested across classes:
SURVIVAL
SURVIVAL = t;
The keyword ALL can be used if the time intervals are taken from the
data. Following is an example of how this is specified:
SURVIVAL = t (ALL);
796
MONTECARLO Command
TSCORES
where a1, a2, a3, and a4 are variables to be generated that contain the
individually-varying times of observation for an outcome at four time
points. The first number in parentheses is the mean of the variable. The
second number in parentheses is the standard deviation of the variable.
Each variable is generated using a univariate normal distribution using
the mean and standard deviation specified in the TSCORES statement.
WITHIN
797
CHAPTER 19
is modeled on both the within and between levels. The WITHIN option
is specified as follows:
WITHIN = y1 y2 x1;
where y1, y2, and x1 are variables measured on the individual level and
modeled on only the within level.
In the example, y1, y2, and y3 are variables measured on the individual
level and modeled on only level 1. Variables modeled on only level 1
must precede variables modeled on the other levels. Y4, y5, and y6 are
variables measured on the individual level and modeled on levels 1 and
2. Y7, y8, and y9 are variables measured on the individual level and
modeled on levels 1 and 3.
798
MONTECARLO Command
WITHIN = y1-y3;
WITHIN = (level2) y4-y6;
WITHIN = (level3) y7-y9;
In the example, y1, y2, and y3 are variables measured on the individual
level and modeled on only level 1. Variables modeled on only level 1
must precede variables modeled on the other levels. Y4, y5, and y6 are
variables measured on the individual level and modeled on levels 1 and
2a. Y7, y8, and y9 are variables measured on the individual level and
modeled on levels 1 and 2b.
BETWEEN
799
CHAPTER 19
BETWEEN = z1 z2 x1;
where z1, z2, and x1 are variables measured on the cluster level and
modeled on the between level. The BETWEEN option is also used to
identify between-level categorical latent variables with
TYPE=TWOLEVEL MIXTURE.
BETWEEN = y1-y3;
800
MONTECARLO Command
POPULATION
The POPULATION option is used to name the data set that contains the
population parameter values to be used in data generation. Following is
an example of how the POPULATION option is specified:
801
CHAPTER 19
POPULATION = estimates.dat;
COVERAGE
The COVERAGE option is used to name the data set that contains the
population parameter values to be used for computing parameter
coverage in the Monte Carlo summary. They are printed in the first
column of the output labeled Population. Following is an example of
how the COVERAGE option is specified:
COVERAGE = estimates.dat;
STARTING
The STARTING option is used to name the data set that contains the
values to be used as starting values for the analysis. Following is an
example of how the STARTING option is specified:
STARTING = estimates.dat;
REPSAVE
802
MONTECARLO Command
REPSAVE option specifies the numbers of the replications for which the
data are saved. The keyword ALL can be used to save the data from all
of the replications. The list function is also available with REPSAVE.
To save the data from specific replications, REPSAVE is specified as
follows:
which results in the data from replications 1, 10, 11, 12, 13, 14, 15, and
100 being saved. To save the data from all replications, REPSAVE is
specified as follows:
REPSAVE = ALL;
SAVE
The SAVE option is used to save data from the first replication for
future analysis. It is specified as follows:
SAVE = rep1.dat;
where rep1.dat is the name of the file in which data from the first
replication is saved. The data are saved using a free format.
The SAVE option can be used in conjunction with the REPSAVE option
to save data from any or all replications. When the SAVE option is used
with the REPSAVE option, it is specified as follows:
SAVE = rep*.dat;
803
CHAPTER 19
RESULTS
The RESULTS option is used to save the analysis results for each
replication of the Monte Carlo study in an ASCII file. The results saved
include the replication number, parameter estimates, standard errors, and
a set of fit statistics. The parameter estimates and standard errors are
saved in the order shown in the TECH1 output in free format delimited
by a space. The values are saved as E15.8. The RESULTS option is
specified as follows:
RESULTS = results.sav;
where results.sav is the name of the file in which the analysis results for
each replication will be saved.
BPARAMETERS
BPARAMETERS = bayes.dat;
where bayes.dat is the name of the file in which the Bayesian posterior
parameter values for all iterations will be saved.
804
A Summary Of The Mplus Language
CHAPTER 20
A SUMMARY OF THE Mplus
LANGUAGE
This chapter contains a summary of the commands, options, and settings
of the Mplus language. For each command, default settings are found in
the last column. Commands and options can be shortened to four or
more letters. Option settings can be referred to by either the complete
word or the part of the word shown in bold type.
DATA:
FILE IS file name;
FORMAT IS format statement; FREE
FREE;
TYPE IS INDIVIDUAL; INDIVIDUAL
COVARIANCE;
CORRELATION;
FULLCOV;
FULLCORR;
MEANS;
STDEVIATIONS;
MONTECARLO;
IMPUTATION;
NOBSERVATIONS ARE number of observations;
NGROUPS = number of groups; 1
LISTWISE = ON; OFF
OFF;
SWMATRIX = file name;
VARIANCES = CHECK; CHECK
NOCHECK;
805
CHAPTER 20
DATA IMPUTATION:
IMPUTE = names of variables for which missing values
will be imputed;
NDATASETS = number of imputed data sets; 5
SAVE = names of files in which imputed data sets
are stored;
MODEL = COVARIANCE; COVARIANCE
SEQUENTIAL;
REGRESSION;
VALUES = values imputed data can take; no restrictions
ROUNDING = number of decimals for imputed continuous 3
variables;
THIN = k where every k-th imputation is saved; 100
DATA WIDETOLONG:
WIDE = names of old wide format variables;
LONG = names of new long format variables;
IDVARIABLE = name of variable with ID information;
REPETITION = name of variable with repetition information;
DATA LONGTOWIDE:
LONG = names of old long format variables;
WIDE = names of new wide format variables;
IDVARIABLE = name of variable with ID information;
REPETITION = name of variable with repetition information
(values); 0, 1, 2, etc.
DATA TWOPART:
NAMES = names of variables used to create a set of
binary and continuous variables;
CUTPOINT = value used to divide the original variables 0
into a set of binary and continuous
BINARY = variables;
CONTINUOUS = names of new binary variables;
TRANSFORM = names of new continuous variables; LOG
function to use to transform new continuous
variables;
DATA MISSING:
NAMES = names of variables used to create a set of
binary variables;
BINARY = names of new binary variables;
TYPE = MISSING;
SDROPOUT;
DDROPOUT;
DESCRIPTIVE = sets of variables for additional descriptive
statistics separated by the | symbol;
806
A Summary Of The Mplus Language
DATA SURVIVAL:
NAMES = names of variables used to create a set of
binary event-history variables;
CUTPOINT = value used to create a set of binary event-
history variables from a set of original
variables;
BINARY = names of new binary variables;
DATA COHORT:
COHORT IS name of cohort variable (values);
COPATTERN IS name of cohort/pattern variable (patterns);
COHRECODE = (old value = new value);
TIMEMEASURES = list of sets of variables separated by the |
symbol;
TNAMES = list of root names for the sets of variables in
TIMEMEASURES separated by the |
symbol;
VARIABLE:
807
CHAPTER 20
808
A Summary Of The Mplus Language
DEFINE:
_MISSING
variable = MEAN (list of variables);
variable = SUM (list of variables);
CUT variable or list of variables (cutpoints);
variable = CLUSTER_MEAN (variable);
CENTER variable or list of variables (GRANDMEAN);
CENTER variable or list of variables (GROUPMEAN);
CENTER variable or list of variables (GROUPMEAN
label);
STANDARDIZE variable or list of variables;
DO (#, #) expression;
ANALYSIS:
809
CHAPTER 20
CROSSCLASSIFIED;
RANDOM;
EFA # #;
BASIC;
MIXTURE;
COMPLEX;
TWOLEVEL;
EFA # # UW* # # UB*;
EFA # # UW # # UB;
ESTIMATOR = ML; depends on
MLM; analysis type
MLMV;
MLR;
MLF;
MUML;
WLS;
WLSM;
WLSMV;
ULS;
ULSMV;
GLS;
BAYES;
PARAMETERIZATION = DELTA; depends on
THETA; analysis type
LOGIT;
LOGLINEAR;
PROBABILITY;
LINK = LOGIT; LOGIT
PROBIT;
ROTATION = GEOMIN; GEOMIN
(OBLIQUE value)
GEOMIN (OBLIQUE value);
GEOMIN (ORTHOGONAL value);
QUARTIMIN; OBLIQUE
CF-VARIMAX; OBLIQUE
CF-VARIMAX (OBLIQUE);
CF-VARIMAX (ORTHOGONAL);
CF-QUARTIMAX; OBLIQUE
CF- QUARTIMAX (OBLIQUE);
CF- QUARTIMAX (ORTHOGONAL);
CF-EQUAMAX; OBLIQUE
CF- EQUAMAX (OBLIQUE);
CF- EQUAMAX (ORTHOGONAL);
810
A Summary Of The Mplus Language
CF-PARSIMAX; OBLIQUE
CF- PARSIMAX (OBLIQUE);
CF- PARSIMAX (ORTHOGONAL);
CF-FACPARSIM; OBLIQUE
CF- FACPARSIM (OBLIQUE);
CF- FACPARSIM (ORTHOGONAL);
CRAWFER; OBLIQUE 1/p
CRAWFER (OBLIQUE value);
CRAWFER (ORTHOGONAL value);
OBLIMIN; OBLIQUE 0
OBLIMIN (OBLIQUE value);
OBLIMIN (ORTHOGONAL value);
VARIMAX;
PROMAX;
TARGET;
BI-GEOMIN;
BI-GEOMIN (OBLIQUE);
BI-GEOMIN (ORTHOGONAL);
BI-CF-QUARTIMAX;
BI-CF-QUARTIMAX (OBLIQUE);
BI-CF-QUARTIMAX (ORTHOGONAL);
ROWSTANDARDIZATION = CORRELATION; CORRELATION
KAISER;
COVARIANCE;
PARALLEL = number; 0
MODEL = NOMEANSTRUCTURE; means
NOCOVARIANCES; covariances
ALLFREE; equa;
REPSE = BOOTSTRAP;
JACKKNIFE;
JACKKNIFE1;
JACKKNIFE2;
BRR;
FAY (#); .3
BASEHAZARD = ON; OFF
OFF;
OFF (EQUAL);
OFF (UNEQUAL);
CHOLESKY = ON; depends on
OFF; analysis type
ALGORITHM = EM; depends on
EMA; analysis type
FS;
ODLL;
INTEGRATION;
811
CHAPTER 20
812
A Summary Of The Mplus Language
813
CHAPTER 20
MODEL:
BY short for measured by -- defines latent variables
example: f1 BY y1-y5;
ON short for regressed on -- defines regression relationships
example: f1 ON x1-x9;
PON short for regressed on -- defines paired regression relationships
example: f2 f3 PON f1 f2;
WITH short for correlated with -- defines correlational relationships
example: f1 WITH f2;
PWITH short for correlated with -- defines paired correlational
relationships
example: f1 f2 f3 PWITH f4 f5 f6;
list of variables; refers to variances and residual variances
example: f1 y1-y9;
814
A Summary Of The Mplus Language
815
CHAPTER 20
MODEL:
%WITHIN% describes the individual-level model
%BETWEEN% describes the cluster-level model for a two-level model
%BETWEEN label% describes the cluster-level model for a three-level or cross-
classified model
MODEL POPULATION: describes the data generation model for a Monte Carlo study
MODEL POPULATION-label: describes the group-specific data generation model in multiple
group analysis and the data generation model for each
categorical latent variable and combinations of categorical
latent variables in mixture modeling for a Monte Carlo study
MODEL POPULATION:
%OVERALL% describes the overall data generation model of a mixture
%class label% model
describes the class-specific data generation model of a mixture
model
MODEL POPULATION:
%WITHIN% describes the individual-level data generation model for a
multilevel model
%BETWEEN% describes the cluster-level data generation model for a two-
%BETWEEN label% level model
describes the cluster-level data generation model for a three-
level or cross-classified model
MODEL COVERAGE: describes the population parameter values for a Monte Carlo
study
MODEL COVERAGE-label: describes the group-specific population parameter values in
multiple group analysis and the population parameter values for
each categorical latent variable and combinations of categorical
latent variables in mixture modeling for a Monte Carlo study
MODEL COVERAGE:
%OVERALL% describes the overall population parameter values of a mixture
model for a Monte Carlo study
%class label% describes the class-specific population parameter values of a
mixture model
816
A Summary Of The Mplus Language
MODEL COVERAGE:
%WITHIN% describes the individual-level population parameter values for
coverage
%BETWEEN% describes the cluster-level population parameter values for a
%BETWEEN label% two-level model for coverage
describes the cluster-level population parameter values for a
three-level or cross-classified model for coverage
MODEL MISSING: describes the missing data generation model for a Monte Carlo
study
MODEL MISSING-label: describes the group-specific missing data generation model for
a Monte Carlo study
MODEL MISSING:
%OVERALL% describes the overall data generation model of a mixture model
describes the class-specific data generation model of a mixture
%class label% model
OUTPUT:
SAMPSTAT;
CROSSTABS; ALL
CROSSTABS (ALL);
CROSSTABS (COUNT);
CROSSTABS (%ROW);
CROSSTABS (%COLUMN);
CROSSTABS (%TOTAL);
STANDARDIZED; ALL
STANDARDIZED (ALL);
STANDARDIZED (STDYX);
STANDARDIZED (STDY);
STANDARDIZED (STD);
RESIDUAL;
MODINDICES (minimum chi-square); 10
MODINDICES (ALL);
MODINDICES (ALL minimum chi-square); 10
CINTERVAL; SYMMETRIC
CINTERVAL (SYMMETRIC);
CINTERVAL (BOOTSTRAP);
CINTERVAL (BCBOOTSTRAP);
CINTERVAL (EQTAIL); EQTAIL
CINTERVAL (HPD);
SVALUES;
NOCHISQUARE;
817
CHAPTER 20
NOSERROR;
H1SE;
H1TECH3;
PATTERNS;
FSCOEFFICIENT;
FSDETERMINACY;
BASEHAZARD;
LOGRANK;
D(#);
TECH1;
TECH2;
TECH3;
TECH4;
TECH5;
TECH6;
TECH7;
TECH8;
TECH9;
TECH10;
TECH11;
TECH12;
TECH13;
TECH14;
TECH15;
TECH16;
SAVEDATA:
818
A Summary Of The Mplus Language
PLOT:
TYPE IS PLOT1;
PLOT2;
PLOT3;
SERIES IS list of variables in a series plus x-axis
values;
FACTORS ARE names of factors (#);
LRESPONSES ARE names of latent response variables (#);
819
CHAPTER 20
MONTECARLO:
820
A Summary Of The Mplus Language
821
CHAPTER 20
822
REFERENCES
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons.
Agresti, A. (2002). Categorical data analysis. Second Edition. New York: John Wiley & Sons.
Asparouhov, T. (2007). Wald test of mean equality for potential latent class predictors in mixture
modeling. Technical appendix. Los Angeles: Muthén & Muthén.
Asparouhov, T. & Muthén, B. (2005). Multivariate statistical modeling with survey data.
Proceedings of the FCMS 2005 Research Conference.
Asparouhov, T. & Muthén, B. (2008a). Multilevel mixture models. In G.R. Hancock, & K.M.
Samuelson (eds.), Advances in latent variable mixture models. Charlotte, NC: Information Age
Publishing, Inc.
Asparouhov, T. & Muthén, B. (2008b). Auxiliary variables predicting missing data. Technical
appendix. Los Angeles: Muthén & Muthén.
Asparouhov, T. & Muthén, B. (2008c). Chi-square statistics with multiple imputation. Technical
appendix. Los Angeles: Muthén & Muthén.
823
Asparouhov, T. & Muthén, B. (2009a). Exploratory structural equation modeling. Structural
Equation Modeling, 16, 397-438.
Asparouhov, T. & Muthén, B. (2009b). Resampling methods in Mplus for complex survey data.
Technical appendix. Los Angeles: Muthén & Muthén.
Asparouhov, T. & Muthén, B. (2010a). Weighted least squares estimation with missing data.
Technical appendix. Los Angeles: Muthén & Muthén.
Asparouhov, T. & Muthén, B. (2012a). General random effect latent variable modeling:
Random subjects, items, contexts, and parameters. Technical Report. Los Angeles: Muthén &
Muthén.
Asparouhov, T. & Muthén, B. (2012c). Using Mplus TECH11 and TECH14 to test the number
of latent classes. Mplus Web Notes: No. 14. www.statmodel.com.
Asparouhov, T., Masyn, K. & Muthén, B. (2006). Continuous time survival in latent variable
models. Proceedings of the Joint Statistical Meeting in Seattle, August 2006. ASA section on
Biometrics, 180-187.
Baker, F.B. & Kim, S. (2004). Item response theory. Parameter estimation techniques. Second
edition. New York: Marcel Dekker, Inc.
Bauer, D.J. & Curran, P.J. (2005). Probing interactions in fixed and multilevel regression:
Inferential and graphical techniques. Multivariate Behavioral Research, 40, 373-400.
Bauer, D.J., Preacher, K.J., & Gil, K.M. (2006). Conceptualizing and testing random indirect
effects and moderated mediation in multilevel models: New procedures and recommendations.
Psychological Methods, 11, 142-163.
Bernaards, C.A. & Jennrich, R.I. (2005). Gradient projection algorithms and software for
arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement, 65,
676-696.
824
Bijmolt, T.H.A., Paas, L.J., & Vermunt, J.K. (2004). Country and consumer segmentation:
Multi-level latent class analysis of financial product ownership. International Journal of Research
in Marketing, 21, 323-340.
Bollen, K.A. (1989). Structural equations with latent variables. New York: John Wiley & Sons.
Bollen, K.A. & Stein, R.A. (1992). Bootstrapping goodness-of-fit measures in structural equation
models. Sociological Methods & Research, 21, 205-229.
Boscardin, J., Zhang, X., & Belin, T. (2008). Modeling a mixture of ordinal and continuous
repeated measures. Journal of Statistical Computation and Simulation, 78, 873-886.
Browne, M.W. & Arminger, G. (1995). Specification and estimation of mean- and covariance-
structure models. In G. Arminger, C.C. Clogg & M.E. Sobel (eds.), Handbook of statistical
modeling for the social and behavioral sciences (pp. 311-359). New York: Plenum Press.
Browne, W.J. & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for
fitting multilevel models. Bayesian Analysis, 3, 473-514.
Browne, M.W., Cudeck, R., Tateneni, K., & Mels, G. (2004). CEFA: Comprehensive
Exploratory Factor Analysis, Version 2.00 [Computer software and manual]. Retrieved from
http://quantrm2.psy.ohio-state.edu/browne/.
Chib, S. & Greenberg, E. (1998). Bayesian analysis of multivariate probit models. Biometrika
85, 347-361.
Collins, L.M. & Lanza, S.T. (2010). Latent class and latent transition analysis. Hoboken, N.J.:
John Wiley & Sons.
Collins, L.M, Schafer, J.L., & Kam, C-H (2001). A comparison of inclusive and restrictive
strategies in modern missing data procedures. Psychological Methods, 6, 330-351.
Cook, R.D. (1977). Detection of influential observations in linear regression. Technometrics, 19,
15-18.
Cook, R.D. & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman
& Hall.
825
Cudeck, R. & O'Dell, L.L. (1994). Applications of standard error estimates in unrestricted factor
analysis: Significance tests for factor loadings and correlations. Psychological Bulletin, 115,
475-487.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society, ser. B, 39, 1-38.
Diggle, P.D. & Kenward, M.G. (1994). Informative drop-out in longitudinal data analysis (with
discussion). Applied Statistics, 43, 49-73.
du Toit, M. (ed.) (2003). IRT from SSI: BILOG-MG MULTILOG PARSCALE TESTFACT.
Lincolnwood, IL: Scientific Software International, Inc.
Efron, B. & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman &
Hall.
Enders, C.K. (2002). Applying the Bollen-Stine bootstrap for goodness-of-fit measures to
structural equation models with missing data. Multivariate Behavioral Research, 37, 359-377.
Enders, C.K. (2010). Applied missing data analysis. New York: Guilford Press.
Everitt, B.S. & Hand, D.J. (1981). Finite mixture distributions. London: Chapman & Hall.
Hagenaars, J.A. & McCutcheon, A.L. (2002). Applied latent class analysis. Cambridge, UK:
Cambridge University Press.
Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., & Strahan, E.J. (1999). Evaluating the use of
exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299.
Fay, R.E. (1989). Theoretical application of weighting for variance calculation. Proceedings of
the Section on Survey Research Methods of the American Statistical Association, 212-217.
Fox, J. P. (2010). Bayesian item response modeling. Theory and applications. New York:
Springer.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian
Analysis, 3, 515-533.
826
Gelman, A. & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences
(with discussion). Statistical Science, 7, 457-511.
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian data analysis. Second
edition. New York: Chapman & Hall.
Graham, J.W. (2003). Adding missing-data relevant variables to FIML-based structural equation
models. Structural Equation Modeling: A Multidisciplinary Journal, 10, 80-100.
Hayton, J.C., Allen, D.G., & Scarpello, V. (2004). Factor retention decisions in exploratory
factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7, 191-205.
Hedeker, D. & Gibbons, R.D. (1994). A random-effects ordinal regression model for multilevel
analysis. Biometrics, 50, 933-944.
Hedeker, D. & Gibbons, R.D. (1997). Application of random-effects pattern-mixture models for
missing data in longitudinal studies. Psychological Methods, 2, 64-78.
Hilbe, J.M. (2011). Negative binomial regression. Second edition. New York: Cambridge
University Press.
Hildreth, C. & Houck, J.P. (1968). Some estimates for a linear model with random coefficients.
Journal of the American Statistical Association, 63, 584-595.
Hosmer, D.W. & Lemeshow, S. (2000). Applied logistic regression. Second edition. New York:
John Wiley & Sons.
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis.
Psychological Methods, 15, 309-334.
Jedidi, K., Jagpal. H.S. & DeSarbo, W.S. (1997). Finite-mixture structural equation models for
response-based segmentation and unobserved heterogeneity. Marketing Science, 16, 39-59.
Jeffries, N.O. (2003). A note on ‘testing the number of components in a normal mixture’.
Biometrika, 90, 991-994.
Jennrich, R.I. (1973). Standard errors for obliquely rotated factor loadings. Psychometrika, 38,
593-604.
827
Jennrich, R.I. (1974). Simplified formulae for standard errors in maximum-likelihood factor
analysis. The British Journal of Mathematical and Statistical Psychology, 27, 122-131.
Jennrich, R.I. (2007). Rotation methods, algorithms, and standard errors. In R. Cudeck & R.C.
MacCallum (eds.). Factor analysis at 100. Historical developments and future directions (pp.
315-335). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.
Jennrich, R.I. & Bentler, P.M. (2011). Exploratory bi-factor analysis. Psychometrika, 76, 537-
549.
Jennrich, R.I. & Bentler, P.M. (2012). Exploratory bi-factor analysis: The oblique case.
Psychometrika, 77, 442-454.
Jennrich, R.I. & Sampson, P.F. (1966). Rotation for simple loadings. Psychometrika, 31, 313-
323.
Joreskog, K.G. & Sorbom, D. (1979). Advances in factor analysis and structural equation
models. Cambridge, MA: Abt Books.
Kaplan, D. (2008). An overview of Markov chain methods for the study of stage-sequential
developmental processes. Developmental Psychology, 44, 457-467.
Kenward, M.G. & Molenberghs, G. (1998). Likelihood based frequentist inference when data are
missing at random. Statistical Science, 13, 236-247.
Klein J.P. & Moeschberger, M.L. (1997). Survival analysis: Techniques for censored and
truncated data. New York: Springer.
Korn, E.L. & Graubard, B.I. (1999). Analysis of health surveys. New York: John Wiley & Sons.
Kreuter, F. & Muthén, B. (2008). Analyzing criminal trajectory profiles: Bridging multilevel and
group-based approaches using growth mixture modeling. Journal of Quantitative Criminology,
24, 1-31.
Langeheine, R. & van de Pol, F. (2002). Latent Markov chains. In J.A. Hagenaars & A.L.
McCutcheon (eds.), Applied latent class analysis (pp. 304-341). Cambridge, UK: Cambridge
University Press.
828
Larsen, K. (2004). Joint analysis of time-to-event and multiple binary indicators of latent classes.
Biometrics 60, 85-92.
Larsen, K. (2005). The Cox proportional hazards model with a continuous latent variable
measured by multiple binary indicators. Biometrics, 61, 1049-1055.
Lee, S.Y. (2007). Structural equation modeling. A Bayesian approach. New York: John Wiley
& Sons.
Little, R.J. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of
the American Statistical Association, 90, 1112-1121.
Little, R.J. & Rubin, D.B. (2002). Statistical analysis with missing data. Second edition. New
York: John Wiley & Sons.
Little, R.J. & Yau, L.H.Y. (1998). Statistical techniques for analyzing data from prevention
trials: Treatment of no-shows using Rubin’s causal model. Psychological Methods, 3, 147-159.
Lo, Y., Mendell, N.R. & Rubin, D.B. (2001). Testing the number of components in a normal
mixture. Biometrika, 88, 767-778.
Lohr, S.L. (1999). Sampling: Design and analysis. Pacific Grove, CA: Brooks/Cole Publishing
Company.
Long, J.S. (1997). Regression models for categorical and limited dependent variables. Thousand
Oaks, CA: Sage Publications, Inc.
Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008).
The multilevel latent covariate model: A new, more reliable approach to group-level effects in
contextual studies. Psychological Methods, 13, 203-229.
MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New York: Lawrence
Erlbaum Associates.
MacKinnon, D.P., Lockwood, C.M., & Williams, J. (2004). Confidence limits for the indirect
effect: Distribution of the product and resampling methods. Multivariate Behavioral Research,
39, 99-128.
MacKinnon, D. P., Lockwood, C. M., Brown, C. H., Wang, W., & Hoffman, J. M. (2007). The
intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499-513.
829
Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its
consideration. Cancer Chemotherapy Reports, 50, 163-170.
Marlow, A.J., Fisher, S.E., Francks, C., MacPhie, I.L., Cherny, S.S., Richardson, A.J., Talcott,
J.B., Stein, J.F., Monaco, A.P., & Cardon, L.R. (2003). Use of multivariate linkage analysis for
dissection of a complex cognitive trait. American Journal of Human Genetics, 72, 561-570.
McCutcheon, A.L. (2002). Basic concepts and procedures in single- and multiple-group latent
class analysis. In J.A. Hagenaars & A.L. McCutcheon (eds.), Applied latent class analysis (pp.
56-85). Cambridge, UK: Cambridge University Press.
McDonald, R.P. (1967). Nonlinear factor analysis. Psychometric Monograph Number 15.
University of Chicago. Richmond, VA: The William Byrd Press.
McLachlan, G. & Peel, D. (2000). Finite mixture models. New York: John Wiley & Sons.
McLachlan, G.J., Do, K.A., & Ambroise, C. (2004). Analyzing microarray gene expression data.
New York: John Wiley & Sons.
Mislevy, R.J., Johnson, E.G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of
Educational Statistics, 17, 131-154.
Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical,
and continuous latent variable indicators. Psychometrika, 49, 115-132.
Muthén, B. (1990). Mean and covariance structure analysis of hierarchical data. Paper presented
at the Psychometric Society meeting in Princeton, NJ, June 1990. UCLA Statistics Series 62.
Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.),
Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376-398.
830
Muthén, B. (1997). Latent variable modeling with longitudinal and multilevel data. In A.
Raftery (ed.), Sociological Methodology (pp. 453-480). Boston: Blackwell Publishers.
Muthén, B. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 81-
117.
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for
longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social
sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
Muthén, B. (2008). Latent variable hybrids: Overview of old and new models. In Hancock, G.
R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 1-24. Charlotte,
NC: Information Age Publishing, Inc.
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation
analysis using SEM in Mplus. Technical Report. Los Angeles: Muthén & Muthén.
Muthén, B. and Asparouhov, T. (2002). Latent variable analysis with categorical outcomes:
Multiple-group and growth modeling in Mplus. Mplus Web Notes: No. 4. www.statmodel.com.
Muthén, B. & Asparouhov, T. (2006). Item response mixture modeling: Application to tobacco
dependence criteria. Addictive Behaviors, 31, 1050-1066.
Muthén, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-Gaussian
random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.),
Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.
831
Muthén, B. & Masyn, K. (2005). Discrete-time survival mixture analysis. Journal of Educational
and Behavioral Statistics, 30, 27-28.
Muthén, L.K. & Muthén, B. (2002). How to use a Monte Carlo study to decide on sample size
and determine power. Structural Equation Modeling, 4, 599-620.
Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. In P.
Marsden (ed.), Sociological Methodology 1995, 216-316.
Muthén, B. & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM
algorithm. Biometrics, 55, 463-469.
Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares
and quadratic estimating equations in latent variable modeling with categorical and continuous
outcomes. Unpublished manuscript.
Muthén, B., Jo., B., & Brown, H. (2003). Comment on the Barnard, Frangakis, Hill, & Rubin
article, Principal stratification approach to broken randomized experiments: A case study of
school choice vouchers in New York City. Journal of the American Statistical Association, 98,
311-314.
Muthén, B., Asparouhov, T. & Rebollo, I. (2006). Advances in behavioral genetics modeling
using Mplus: Applications of factor mixture modeling to twin data. Twin Research and Human
Genetics, 9, 313-324.
Muthén, B., Asparouhov, T., Boye, M.E., Hackshaw, M.D., & Naegeli, A.N. (2009).
Applications of continuous-time survival in latent variable models for the analysis of oncology
randomized clinical trial data using Mplus. Technical Report. www.statmodel.com.
Muthén, B., Asparouhov, T., Hunter, A., & Leuchter, A. (2011). Growth modeling with non-
ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological
Methods, 16, 17-33.
Muthén, B., Brown, C.H., Masyn, K., Jo, B., Khoo, S.T., Yang, C.C., Wang, C.P., Kellam, S.,
Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive
interventions. Biostatistics, 3, 459-475.
Neale, M.C. & Cardon, L.R. (1992). Methodology for genetic studies of twins and families. The
Netherlands: Kluwer Academic Publishers.
832
Nylund, K. (2007). Latent transition analysis: Modeling extensions and an application to peer
victimization. Doctoral dissertation, University of California, Los Angeles.
www.statmodel.com.
Nylund, K.L., Asparouhov, T., & Muthén, B.O. (2007). Deciding on the number of classes in
latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural
Equation Modeling, 14, 535-569.
Olsen, M.K. & Schafer, J.L. (2001). A two-part random-effects model for semicontinuous
longitudinal data. Journal of the American Statistical Association, 96, 730-745.
Qu, Y., Tan, M., & Kutner, M.H. (1996). Random effects models in latent class analysis for
evaluating accuracy of diagnostic tests. Biometrics, 52, 797-810.
Posthuma, D., de Geus, E.J.C., Boomsma, D.I., & Neale, M.C. (2004). Combined linkage and
association tests in Mx. Behavior Genetics, 34, 179-196.
Pothoff, R.F., Woodbury, M.A., & Manton, K.G. (1992). “Equivalent sample size” and
“equivalent degrees of freedom” refinements for inference using survey weights under
superpopulation models. Journal of the American Statistical Association, 87, 383-396.
Preacher, K.J., Rucker, D.D., & Hayes, A.F. (2007). Addressing moderated mediation
hypotheses: Theory, methods, and prescriptions. Multivariate Behavioral Research, 42, 185-227.
Prescott, C.A. (2004). Using the Mplus computer program to estimate models for continuous and
categorical data from twins. Behavior Genetics, 34, 17- 40.
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate
technique for multiply imputing missing values using a sequence of regression models. Survey
Methodology, 27, 85-95.
Raudenbush, S.W. & Bryk, A.S. (2002). Hierarchical linear models: Applications and data
analysis methods. Second edition. Newbury Park, CA: Sage Publications.
Reboussin, B.A., Reboussin, D.M., Liang, K.L., & Anthony, J.C. (1998). Latent transition
modeling of progression of health-risk behavior. Multivariate Behavioral Research, 33, 457-478.
Roeder, K., Lynch, K.G., & Nagin, D.S. (1999). Modeling uncertainty in latent class
membership: A case study in criminology. Journal of the American Statistical Association, 94,
766-776.
833
Rousseeuw P.J. & Van Zomeren B.C. (1990). Unmasking multivariate outliers and leverage
points. Journal of the American Statistical Association. 85, 633-651.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley &
Sons.
Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.
Singer, J.D. & Willett, J.B. (2003). Applied longitudinal data analysis: Modeling change and
event occurrence. New York: Oxford University Press.
van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional
specification. Statistical Methods in Medical Research, 16, 219-242.
Verhagen, J. & Fox, J.P. (2012). Bayesian tests of measurement invariance. British Journal of
Mathematical and Statistical Psychology. Accepted for publication.
Vermunt, J.K. (2003). Multilevel latent class models. In R.M. Stolzenberg (ed.), Sociological
Methodology 2003 (pp. 213-239). Washington, D.C.: ASA.
Vermunt, J.K. (2010). Latent class modeling with covariates: Two improved three-step
approaches. Political Analysis, 18, 450-469.
von Davier, M., Gonzalez, E., & Mislevy, R.J. (2009). What are plausible values and why are
they useful? IERI Monograph Series, 2, 9-36.
Yuan, K.H. & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance
structure analysis with nonnormal missing data. In M.E. Sobel & M.P. Becker (eds.),
Sociological Methodology 2000 (pp. 165-200). Washington, D.C.: ASA.
834
INDEX
%BETWEEN%, 704–5 AUXILIARY, 551–55, 793–96
%class label%, 704 B2WEIGHT, 560
%COLUMN, 721 B3WEIGHT, 560–61
%OVERALL%, 704 balanced repeated replication, 613–14
%ROW, 721 BASEHAZARD
%TOTAL, 721 ANALYSIS, 614
%WITHIN%, 704–5 OUTPUT, 732
(#), 664–65 SAVEDATA, 753
*, 662–63 baseline hazard function, 149–50, 247–49
@, 663–64 BASIC, 594
[ ], 659–60 Bayes, 604–5, 631–36
_missing, 577 BAYES, 604–5
{ }, 675–76 Bayes factor, 741
| symbol Bayesian estimation, 604, 631–36
growth models, 676–83 Bayesian plots, 763–64
individually-varying times of observation, Bayesian posterior parameter values, 755,
686–87 804
latent variable interactions, 687–88 Bayesian structural equation modeling
random slopes, 683–85 (BSEM)
accelerated cohort, 143–47 bi-factor CFA with two items loading on only
ACE, 83–84, 85–86 the general factor and cross-loadings with
ADAPTIVE, 617 zero-mean and small-variance priors, 105–
ADDFREQUENCY, 626–27 6
ALGORITHM MIMIC model with cross-loadings and direct
Bayesian, 633–34 effects with zero-mean and small-variance
frequentist, 615 priors, 107–8
ALL multiple group model with approximate
CROSSTABS, 721 measurement invariance using zero-mean
MISSING, 540–41 and small-variance priors, 108–10
MODINDICES, 725–27 BCONVERGENCE, 634
REPSAVE, 803 beta, 742
USEVARIABLES, 538–39 BETWEEN
ALLFREE, 108–10, 611–13 Monte Carlo, 799–801
alpha, 742 real data, 570–72
alpha (c), 743 between-level categorical latent variable,
alpha (f), 744 348–51, 352–54, 358–60, 364–66, 367–
ANALYSIS command, 587–639 70, 375–77, 383–85
arithmetic operators, 576 BI-CF-QUARTIMAX, 606–10
AT, 686–87 bi-factor EFA, 52–53
auto-correlated residuals, 142–43 bi-factor rotations, 606–10
automatic labeling, 611–13 BI-GEOMIN, 606–10
835
BINARY chi-square difference test for WLSMV and
DATA MISSING, 527 MLMV, 451–52, 625–26, 752
DATA SURVIVAL, 531 CHOLESKY, 614–15
DATA TWOPART, 525 CINTERVAL, 727–29
birth cohort, 143–47 CINTERVAL (BCBOOTSTRAP), 727
BITERATIONS, 634–35 CINTERVAL (BOOTSTRAP), 727
bivariate frequency tables, 721 CINTERVAL (EQTAIL), 727
BLANK, 540 CINTERVAL (HPD), 727
BOOTSTRAP CINTERVAL (SYMMETRIC), 727
ANALYSIS, 620–21 class probabilities, 757–58
REPSE, 613–14
CLASSES
bootstrap standard errors, 37–38, 620–21 Monte Carlo, 793
Box data, 607–8 real data, 564–65
BPARAMETERS CLUSTER, 557–58
Monte Carlo, 804 cluster size, 784–86
real data, 755
CLUSTER_MEAN, 579–80
BRR, 613–14
cluster-level factor indicators, 280–84
BSEED, 632
cohort, 143–47
BSEM, 105–6, 107–8, 108–10
COHORT, 531–32
burnin, 604–5
COHRECODE, 532–33
BWEIGHT, 560
COMBINATION, 617
BWTSCALE, 561
COMPLEX, 594
BY, 647–52
complex survey data, 556–63
confirmatory factor analysis (CFA), 648–49
exploratory structural equation modeling complier-average causal effect estimation
(ESEM), 649–52 (CACE), 191–93, 193–95
CATEGORICAL conditional independence (relaxed), 180–81,
Monte Carlo, 789 198–99
real data, 543–45 conditional probabilities, 741
categorical latent variables, 4–5, 653–54 confidence intervals, 37–38, 727–29
categorical mediating variables, 491 confirmatory analysis, 60–61, 176–78
CENSORED confirmatory factor analysis (CFA)
Monte Carlo, 788–89 categorical factor indicators, 62
real data, 542–43 censored and count factor indicators, 64–65
CENTER, 580–84 continuous and categorical factor indicators,
CF-EQUAMAX, 606–10 63
CF-FACPARSIM, 606–10 continuous factor indicators, 60–61
mixture, 182
CF-PARSIMAX, 606–10
two-level mixture, 355–57
CF-QUARTIMAX, 606–10
CONSTRAINT, 555
CF-VARIMAX, 606–10
constraints, 31–32, 87–88, 89–90, 90–91,
CHAINS, 632
91–92, 175–76, 200–202, 203–5, 691–96
CHECK, 514–15
contextual effect, 263–67
CONTINUOUS, 525
836
continuous latent variables, 3–4 CROSSCLASSIFIED, 598
continuous-time survival analysis, 148–49, cross-classified modeling
149–50, 151–52, 205–7, 247–49, 308–9, IRT with random binary items, 333–34
437–38 multiple indicator growth model with random
convenience features, 660–62 intercepts and factor loadings, 334–37
CONVERGENCE, 628 path analysis with continuous dependent
variables, 330–32
convergence problems, 467–69
regression for a continuous dependent
COOKS variable, 326–30
PLOT, 768
cross-loadings, 107–8
SAVEDATA, 759
CROSSTABS, 721
Cook's distance, 759
CSIZES, 784–86
COPATTERN, 532
CUT, 579
CORRELATION
ANALYSIS, 631 CUTPOINT
DATA, 510 DATA SURVIVAL, 530–31
ROWSTANDARDIZATION, 610–11 DATA TWOPART, 525
SAVEDATA, 755 CUTPOINTS, 781
COUNT D, 732
CROSSTABS, 721 DATA COHORT, 531–34
Monte Carlo, 790–92 DATA command, 504–34
real data, 546–49 data generation, 779–81
count variable models data imputation, 515–19
negative binomial, 547–48, 791–92 DATA IMPUTATION, 515–19
negative binomial hurdle, 548–49, 792 DATA LONGTOWIDE, 521–23
Poisson, 546–47, 791 DATA MISSING, 526–29
zero-inflated negative binomial, 548, 792 data reading
zero-inflated Poisson, 547, 791 covariance matrix, 444
zero-truncated negative binomial, 548, 792 fixed format, 445
COVARIANCE means and covariance matrix, 444–45
ANALYSIS, 631 DATA SURVIVAL, 529–31
DATA, 510
data transformation, 519–34, 574–85
DATA IMPUTATION, 518
MODEL PRIORS, 700 DATA TWOPART, 523–26
ROWSTANDARDIZATION, 610–11 DATA WIDETOLONG, 519–21
SAVEDATA, 749, 755 DDROPOUT, 526–29
COVERAGE DE3STEP
ANALYSIS, 626 Monte Carlo, 793–96
MONTECARLO, 802 real data, 551–55
Cox regression, 148–49, 247–49, 308–9, decomposition, 258–63
437–38 defaults, 460–64
CPROBABILITIES, 757–58 DEFINE command, 574–85
CRAWFER, 606–10 delta, 743
Crawfer family of rotations, 609 DELTA, 605–6
credibility interval, 727–29 derivatives of parameters, 734–35
837
DESCRIPTIVE, 528–29 estimated correlation matrix, 736
Deviance Information Criterion, 604–5 estimated covariance matrix
Diagrammer, 10–11 OUTPUT, 736
DIC, 604–5 SAVEDATA, 749
DIFFERENCE, 108–10, 700–702 estimated sigma between matrix, 749–50
difference testing, 486–87 estimated time scores, 126
DIFFTEST ESTIMATES, 751–52
ANALYSIS, 625–26 ESTIMATOR, 600–604
SAVEDATA, 752 event history indicators, 433–34
direct effects, 107–8 EXPECTED, 617–19
Dirichlet, 698–702 expected frequencies, 737–38
discrete-time survival analysis, 147–48, exploratory factor analysis (EFA)
433–34 bi-factor with continuous factor indicators,
discrete-time survival mixture analysis, 52–53
245–47 categorical factor indicators, 47
distal outcome, 223–24 continuous factor indicators, 45–46
continuous, censored, categorical, and count
DISTRIBUTION, 635–36
factor indicators, 48–49
DO factor mixture analysis, 49–50
DEFINE, 584–85 parallel analysis, 611
MODEL CONSTRAINT, 694–95 two-level with continuous factor indicators,
MODEL PRIORS, 700 50–51
do loop two-level with individual- and cluster-level
DEFINE, 584–85 factor indicators, 51–52
MODEL CONSTRAINT, 694–95 exploratory structural equation modeling
MODEL PRIORS, 700
(ESEM)
dropout, 392–93 bi-factor EFA, 103–4
DSURVIVAL, 549 bi-factor EFA with two items loading on only
DU3STEP the general factor, 104–5
Monte Carlo, 793–96 EFA at two timepoints, 97–98
real data, 551–55 EFA with covariates (MIMIC), 93–94
E EFA with residual variances constrained to be
Monte Carlo, 793–96 greater than zero, 102–3
real data, 551–55 multiple group EFA with continuous factor
ECLUSTER, 559–60 indicators, 99–101
EFA, 598–600 SEM with EFA and CFA factors, 95–96
eigenvalues, 611 external Monte Carlo simulation, 428–31
EM, 615 factor mixture analysis, 49–50, 198–99
EMA, 615 factor score coefficients, 731
EQUAL, 614 factor score determinacy, 731–32
equalities, 664–65 factor scores, 756–57, 759–60, 767
ESEM, 93–94, 95–96, 97–98, 99–101, 102– FACTORS
3, 103–4, 104–5 PLOT, 767
ESTBASELINE, 754 SAVEDATA, 759–60
FAY, 613–14
838
FBITERATIONS, 635 graded response model, 65–66
FILE GRANDMEAN, 580–84
DATA, 507 graphics module, 768–73
SAVEDATA, 746–48 GROUPING, 549–50
FINITE, 563–64 GROUPMEAN, 580–84
finite population correction factor, 563–64 growth mixture modeling
fixing parameter values, 663–64 between-level categorical latent variable, 375–
FORMAT 77
DATA, 507–9 categorical outcome, 219–20
SAVEDATA, 747–48 censored outcome, 218–19
FPC, 563–64, 563–64 continuous outcome, 213–16
frailty, 1 distal outcome, 223–24
FREE known classes (multiple group analysis), 228–
DATA, 507–8 30
MFORMAT, 761–62 negative binomial model, 223
SAVEDATA, 747–48 sequential process, 225–28
freeing parameters, 662–63 two-level, 371–74
zero-inflated Poisson model, 220–23
frequency tables, 721
growth modeling, 676–83
frequency weights, 550–51
auto-correlated residuals, 142–43
FREQWEIGHT, 550–51 categorical outcome, 121–22
FS, 615 censored outcome, 117–18
FSCOEFFICIENT, 731 censored-inflated outcome, 119–21
FSCORES, 756–57 continuous outcome, 115–17
FSDETERMINACY, 731–32 count outcome, 123
FULLCORR, 510 estimated time scores, 126
FULLCOV, 510 individually-varying times of observation,
functions, 576 130–32
gamma, 742–43 multiple group multiple cohort, 143–47
multiple indicators, 135–37, 137–38
Gamma, 698–702
parallel processes, 133–34
gamma (c), 744 piecewise model, 129–30
gamma (f), 744 quadratic model, 126–27
GAUSSHERMITE, 616 time-invariant covariate, 128–29
Gelman-Rubin convergence, 634 time-varying covariate, 128–29
GENCLASSES, 781–82 two-part (semicontinuous), 138–41
GENERAL, 593–94 using the Theta parameterization, 122
GENERATE, 779–81 with covariates, 128–29
generating data, 779–81 zero-inflated count outcome, 124–25
generating missing data, 786–88 H1CONVERGENCE, 628
GEOMIN, 606–10 H1ITERATIONS, 627
GIBBS, 633–34 H1SE, 730
Gibbs sampler algorithm, 633–34 H1TECH3, 730
GLS, 603–4 HAZARDC, 786
GPA algorithm, 629 heritability, 89–90, 90–91, 200–202, 203–5
839
hidden Markov model, 233–35 ITERATIONS, 627
highest posterior density, 727–29 JACKKNIFE, 613–14
identification, 469–70 JACKKNIFE1, 613–14
identity by descent (IBD), 91–92 JACKKNIFE2, 613–14
IDVARIABLE K-1STARTS, 624
DATA LONGTOWIDE, 522–23 KAISER, 610–11
DATA WIDETOLONG, 520–21 KAPLANMEIER, 753
VARIABLE, 550 kappa (u), 744
imputation, 515–19 known class, 188–89, 228–30
IMPUTATION, 512–13 KNOWNCLASS, 108–10, 565–66
IMPUTE, 516–17 KOLMOGOROV, 636
IND, 690 Kolmogorov-Smirnov test, 636
indirect effect plot, 40–41 labeling
indirect effects, 37–38, 75, 689–91 baseline hazard parameters, 673
INDIVIDUAL, 510 categorical latent variables, 672–73
individually-varying times of observation, classes, 673
130–32, 686–87 inflation variables, 673
INFLUENCE nominal variables, 672–73
PLOT, 768 parameters, 674–75, 692
SAVEDATA, 758–59 thresholds, 672
influential observations, 768 Lagrange multiplier tests. See modification
INFORMATION, 617–19 indices
information curves, 763–64 lambda, 742
INTEGRATION, 616 lambda (f), 744
INTEGRATION setting for ALGORITHM, lambda (u), 743
615 LATENT, 633
interaction between latent variables, 76–77, latent class analysis (LCA)
687–88 binary latent class indicators, 163–65
interactions, 687–88 binary, censored, unordered, and count latent
class indicators, 172–73
INTERACTIVE, 636–37
confirmatory, 175–76, 176–78
intercepts, 659–60 continuous latent class indicators, 170–71
INTERRUPT, 637 three-category latent class indicators, 167–68
inverse Gamma, 698–702 two-level, 361–63
inverse Wishart, 698–702 two-level with a between-level categorical
IRT translation, 732 latent variable, 364–66
item characteristic curves, 763–64 unordered categorical latent class indicators,
item response theory (IRT) models 168–69
factor mixture, 198–99 with a covariate and a direct effect, 174–75
random binary items using cross-classified with a second-order factor (twin analysis),
data, 333–34 183–85
twin, 203–5 with partial conditional independence, 180–81
two-level mixture, 358–60 latent class growth analysis (LCGA)
two-parameter logistic, 65–66 binary outcome, 230–31
840
three-category outcome, 231–32 LOGRANK, 732
two-level, 378–80 logrank test, 732
zero-inflated count outcome, 232–33 Lo-Mendell-Rubin test, 738
latent response variables, 757, 760, 767–68 LONG
latent transition analysis (LTA) DATA LONGTOWIDE, 522
for two time points with a binary covariate DATA WIDETOLONG, 520
influencing the latent transition LOOP, 40–41, 695–96
probabilities, 235–38 loop plots, 40–41
for two time points with a continuous
LRESPONSES
covariate influencing the latent transition
PLOT, 767–68
probabilities, 238–41
SAVE, 757
mover-stayer for three time points using a
SAVEDATA, 760
probability parameterization, 241–45
two-level, 380–82 LRTBOOTSTRAP, 621
two-level with a between-level categorical LRTSTARTS, 624–25
latent variable, 383–85 LTA calculator, 11, 238–41
latent transition probabilities, 741 M, 551–55
latent variable covariate, 258–63, 263–67 MAHALANOBIS
latent variable interactions, 76–77, 687–88 PLOT, 768
liabilities, 1, 85–86, 90–91, 200–202 SAVEDATA, 758
likelihood ratio bootstrap draws, 739–41 Mantel-Cox test, 732
likelihood ratio test, 738, 739–41 marginal probabilities, 741
linear constraints, 175–76, 691–96 Markov chain Monte Carlo, 631–36
LINK, 606 MATRIX, 631
list function, 666–72 MCCONVERGENCE, 629
LISTWISE, 513–14 MCITERATIONS, 627
listwise deletion, 513–14 MCMC, 604–5
local maxima, 465–67 MCMC chain, 604–5
local solution, 465–67 MCONVERGENCE, 629
log odds, 493–97 MCSEED, 617
LOGCRITERION, 628 MEAN
DEFINE, 578
LOGHIGH, 630
POINT, 631
logical operators, 575–76
mean square error (MSE), 416
logistic regression, 493–97
mean structure, 70–72, 79–80
LOGIT
means, 659–60
LINK, 606
PARAMETERIZATION, 606 MEANS, 510
LOGLIKELIHOOD measurement invariance, 484–87
PLOT, 768 approximate, 108–10
SAVEDATA, 758 MEDIAN, 631
LOGLINEAR, 606 mediation
loglinear analysis, 179–80, 499–500 bootstrap, 37–38
categorical variable, 491
LOGLOW, 630
cluster-level latent variable, 270–71
lognormal, 698–702
841
continuous variable, 32–33 ML
missing data, 39–40 ESTIMATOR, 603
moderated, 40–41 STVALUES, 632
random slopes, 439–41 MLF, 603
MEDIATOR, 633 MLM, 603
MEMBERSHIP, 566–67 MLMV, 603
merging data sets, 456–57, 760–62 MLR, 603
Metropolis-Hastings algorithm, 633–34 MMISSING, 762
MFILE, 760–61 MNAMES, 761
MFORMAT, 761–62 MODE, 631
MH, 633–34 MODEL
MIMIC ANALYSIS, 611–13
continuous factor indicators, 69–70 DATA IMPUTATION, 518
multiple group analysis, 78–79, 79–80, 80–81 MODEL command, 643–711
MISSFLAG, 748 MODEL command variations, 702–5
MISSING MODEL CONSTRAINT, 691–96
DATA MISSING, 526–29 MODEL COVERAGE, 707–9
MONTECARLO, 787–88 model estimation, 459–73
VARIABLE, 540–41
MODEL INDIRECT, 689–91
missing data, 39–40, 389–91, 392–93, 393–
MODEL label, 702–4
95, 395–96, 397, 398–99, 399–402, 402–
MODEL MISSING, 709–11
4, 417–21, 421–22, 425–27, 487–91
MODEL POPULATION, 705–7
missing data correlate, 389–91, 551–55
MODEL PRIORS, 698–702
missing data generation, 786–88
MODEL TEST, 696–97
missing data patterns, 730–31
modeling framework, 1–6
missing data plots, 763–64
moderated mediation, 40–41, 695–96
missing value flags, 540–41
modification indices, 725–27
non-numeric, 446
numeric, 446–47 MODINDICES, 725–27
MITERATIONS, 627 MONITOR, 768
MIXC, 629–30 Monte Carlo simulation studies
discrete-time survival analysis, 433–34
MIXTURE, 594–96
EFA with continuous outcomes, 427–28
mixture modeling external Monte Carlo, 428–31
confirmatory factor analysis (CFA), 182 GMM for a continuous outcome, 423–25
multivariate normal, 189–91 growth with attrition under MAR, 421–22
randomized trials (CACE), 191–93, 193–95 mediation with random slopes, 439–41
regression analysis, 158–61 MIMIC with patterns of missing data, 417–21
structural equation modeling (SEM), 186–87 missing data, 417–21, 421–22
with known class, 188–89 multiple group EFA with measurement
zero-inflated Poisson regression analysis, invariance, 441–42
162–63 saved parameter estimates, 431–32
zero-inflated Poisson regression as a two-class two-level Cox regression, 437–38
model, 195–96 two-level growth model for a continuous
MIXU, 630 outcomes (three-level analysis), 425–27
842
two-part (semicontinuous) model, 435–37 three-level MIMIC model with continuous
MONTECARLO factor indicators, two covariates on within,
DATA, 511–12 one covariate on between level 2, and one
INTEGRATION, 616 covariate on between level 3 with random
MONTECARLO command, 775–804 slopes on both within and between level 2,
mover-stayer model, 241–45 318–23
Mplus language, 13–14 three-level path analysis with a continuous
and a categorical dependent variable, 315–
Mplus program
18
base, 17
three-level regression for a continuous
combination add-on, 18
dependent variable, 312–15
mixture add-on, 17
two-level confirmatory factor analysis (CFA)
multilevel add-on, 18
with categorical factor indicators, 277–78
MSE, 416 two-level confirmatory factor analysis (CFA)
MSELECT, 762 with continuous factor indicators, 274–76,
MUCONVERGENCE, 629 278–80
MUITERATIONS, 627 two-level growth for a zero-inflated count
multilevel mixture modeling outcome (three-level analysis), 306–8
two-level confirmatory factor analysis (CFA), two-level growth model for a categorical
355–57 outcome (three-level analysis), 294–95
two-level growth mixture model (GMM), two-level growth model for a continuous
371–74 outcome (three-level analysis), 291–94
two-level growth mixture model (GMM) with two-level MIMIC model with continuous
a between-level categorical latent variable, factor indicators, random factor loadings,
375–77 two covariates on within, and one covariate
two-level growth model with a between-level on between with equal loadings across
categorical latent variable, 367–70 levels, 310–12
two-level item response theory (IRT), 358–60 two-level multiple group confirmatory factor
two-level latent class analysis (LCA), 361–63 analsyis (CFA), 288–90
two-level latent class analysis (LCA) with a two-level multiple indicator growth model,
between-level categorical latent variable, 299–303
364–66 two-level path analysis with a continuous and
two-level latent class growth analysis a categorical dependent variable, 267–70
(LCGA), 378–80 two-level path analysis with a continuous, a
two-level latent transition analysis (LTA), categorical, and a cluster-level observed
380–82 dependent variable, 270–71
two-level latent transition analysis (LTA) with two-level path analysis with random slopes,
a between-level categorical latent variable, 272–74
383–85 two-level regression for a continuous
two-level mixture regression, 342–47, 348– dependent variable with a random
51, 352–54 intercept, 258–63
multilevel modeling two-level regression for a continuous
three-level growth model with a continuous dependent variable with a random slope,
outcome and one covariate on each of the 263–67
three levels, 323–26 two-level structural equation modeling
(SEM), 285–88
843
multinomial logistic regression, 493–97 non-convergence, 467–69
multiple categorical latent variables, 176–78 non-linear constraints, 31–32, 691–96
multiple cohort, 143–47 non-linear factor analysis, 68
multiple group analysis non-parametric, 197
known class, 188–89, 228–30 NOSERROR, 729–30
MIMIC with categorical factor indicators, 80– not missing at random (NMAR)
81 Diggle-Kenward selection model, 393–95
MIMIC with continuous factor indicators, 79– pattern-mixture model, 395–96
80 NREPS, 778
special issues, 473–84 nu, 742
multiple imputation, 453, 512–13, 515–19 numerical integration, 470–73
missing values, 397, 398–99, 402–4
OBLIMIN, 606–10
plausible values, 399–402
OBLIQUE, 606–10
multiple indicators, 135–37, 137–38
OBSERVED
multiple processors, 637–39
INFORMATION, 617–19
multiple solutions, 465–67 MEDIATOR, 633
MULTIPLIER odds, 493–97
ANALYSIS, 626
ODLL, 615
SAVEDATA, 755
OFF
multivariate normal mixture model, 189–91
ADAPTIVE, 617
MUML, 603 BASEHAZARD, 614
NAMES CHOLESKY, 614–15
DATA MISSING, 527 LISTWISE, 513–14
DATA SURVIVAL, 530 MONITOR, 768
DATA TWOPART, 524 ON, 652–55
MONTECARLO, 776–77 ADAPTIVE, 617
VARIABLE, 537 BASEHAZARD, 614
NCSIZES, 782–84 CHOLESKY, 614–15
NDATASETS, 517 LISTWISE, 513–14
negative binomial, 28–29, 546–49 MONITOR, 768
NEW, 692 optimization history, 736–37, 737
NGROUPS OPTSEED, 624
DATA, 513 ORTHOGONAL, 606–10
MONTECARLO, 777–78 outliers, 768
NOBSERVATIONS OUTLIERS, 768
DATA, 513 OUTPUT command, 713–44
MONTECARLO, 777
PARALLEL, 611
NOCHECK, 514–15
parallel analysis, 611
NOCHISQUARE, 729
parallel computing, 637–39
NOCOVARIANCES, 611–13
parallel processes, 133–34
NOMEANSTRUCTURE, 611–13
parameter constraints. See constraints
NOMINAL
parameter derivatives, 736
Monte Carlo, 790
real data, 545–46 parameter extension, 633–34
844
parameterization posterior, 604–5
delta, 80–81 posterior predictive checks, 604–5
logistic, 498–99 potential scale reduction, 604–5, 634
loglinear, 176–78, 179–80, 499–500 PRIOR, 636
probability, 500–501 priors, 698–702
theta, 34, 82, 122
PRIORS, 566–67
PARAMETERIZATION, 605–6
PROBABILITIES, 566–67
parametric bootstrap, 739–41
PROBABILITY, 606
parametric proportional hazards, 149–50,
probability calculations
151–52 logistic regression, 493–97
path analysis multinomial logistic regression, 493–97
categorical dependent variables, 33–34 probit regression, 492–93
combination of censored, categorical, and PROBIT, 606
unordered categorical (nominal) dependent
probit link, 200–202, 606
variables, 36–37
combination of continuous and categorical PROCESSORS, 637–39
dependent variables, 35 profile likelihood, 149–50, 247–49, 308–9
continuous dependent variables, 32–33 PROMAX, 606–10
PATMISS, 786–87 proportional hazards model, 149–50, 151–52
PATPROBS, 787 pseudo-class
PATTERN, 555–56 Monte Carlo, 793–96
PATTERNS, 730–31 real data, 551–55
PERTURBED, 632 psi, 743
piecewise growth model, 129–30 PSR, 634
plausible values, 399–402, 756–57, 759–60, PWITH, 657
767 PX1, 633–34
PLOT, 695 PX2, 633–34
PLOT command, 762–73 PX3, 633–34
PLOT1, 763 quadratic growth model, 126–27
PLOT2, 763–64 quantitative trait locus (QTL), 91–92
PLOT3, 765 QUARTIMIN, 606–10
Plots R
Bayesian, 636, 763–64 Monte Carlo, 793–96
missing data, 763–64 real data, 551–55
survival, 763–64 R3STEP
POINT, 631 Monte Carlo, 793–96
real data, 551–55
Poisson. See zero-inflated Poisson
RANDOM, 594
PON, 655–56
random factor loadings, 310–12, 334–37,
pooled-within covariance matrix. See
685–86
sample covariance matrices
random items, 333–34
POPULATION
FINITE, 563–64 random slopes, 29–31, 130–32, 263–67,
MONTECARLO, 801–2 272–74, 278–80, 285–88, 296–98, 303–5,
population size, 563–64 683–85
845
random starts, 158–61, 167 RW, 633–34
RCONVERGENCE, 629 SAMPLE, 748–49
reading data sample covariance matrices
fixed format, 445 pooled-within, 750–51
RECORDLENGTH, 748 sample, 748–49
REGRESSION, 518 sigma between, 749–50
regression analysis sample statistics, 720–21
censored inflated regression, 24 sampling fraction, 563–64
censored regression, 23–24 sampling weights, 558–59
linear regression, 22–23 SAMPSTAT, 720–21
logistic regression, 25–26 SAVE
multinomial logistic regression, 26–27 DATA IMPUTATION, 517–18
negative binomial regression, 28–29 MONTECARLO, 803
Poisson regression, 27 SAVEDATA, 755–59
probit regression, 25 SAVEDATA command, 744–62
random coefficient regression, 29–31 saving data and results, 744–62
zero-inflated Poisson regression, 28
scale factors, 675–76
REPETITION
SDITERATIONS, 627
DATA LONGTOWIDE, 523
DATA WIDETOLONG, 521 SDROPOUT, 526–29
replicate weights, 457, 458, 561–62 second-order factor analysis, 66–67
REPSAVE, 802–3 SEED, 778
REPSE, 613–14 selection modeling, 393–95
REPWEIGHTS semicontinuous, 138–41, 435–37
SAVEDATA, 758 SEQUENTIAL, 518
VARIABLE, 561–62 sequential cohort, 143–47
RESIDUAL sequential regression, 518
BOOTSTRAP, 621 SERIES, 765–67
OUTPUT, 724–25 SFRACTION, 563–64
residual variances, 658 sibling modeling, 91–92
residuals, 724–25 SIGB, 749–50
RESPONSE, 754 sigma between covariance matrix. See
RESULTS sample covariance matrices
MONTECARLO, 803 STANDARD
SAVEDATA, 751 BOOTSTRAP, 620–21
right censoring, 149–50, 151–52, 308–9 INTEGRATION, 616
RITERATIONS, 628 STANDARDIZE, 584
RLOGCRITERION, 628–29, 628–29 STANDARDIZED, 721–24
robust chi-square, 603 standardized parameter estimates, 721–24
robust standard errors, 603 STARTING, 802
ROTATION, 606–10 starting values
ROUNDING, 518–19 assigning, 662–63
ROWSTANDARDIZATION, 610–11 automatic, 158–61
RSTARTS, 625 saving, 729
846
user-specified, 165–66, 217, 662–63 TECH3
STARTS, 622–23 OUTPUT, 736
STCONVERGENCE, 623 SAVEDATA, 752
STD, 722 TECH4
STDEVIATIONS, 510 OUTPUT, 736
STDY, 722 SAVEDATA, 752–53
STDYX, 722 TECH5, 736–37
STITERATIONS, 623 TECH6, 737
STRATIFICATION, 556–57 TECH7, 737
structural equation modeling (SEM) TECH8, 737
categorical latent variable regressed on a TECH9, 737
continuous latent variable, 185–86 theta, 742
continuous factor indicators, 73–74 THETA, 605–6
with interaction between latent variables, 76– theta parameterization, 34, 82, 122, 605–6
77 THIN
STSCALE, 623 ANALYSIS, 635
STSEED, 623 DATA IMPUTATION, 519
STVALUES, 632 thinning, 519
SUBPOPULATION, 562–63 threads, 637–39
SUM, 578–79 THREELEVEL, 597–98
summary data, 510–12 three-level analysis, 291–94, 294–95, 367–
SURVIVAL 70
Monte Carlo, 796 three-step mixture analysis
real data, 572–73 Monte Carlo, 793–96
survival analysis. See continuous-time real data, 163–65, 551–55
survival analysis and discrete-time threshold structure, 72–73
survival analysis thresholds, 659–60
survival plots, 763–64 Thurstone's Box data, 607–8
SVALUES, 729 TIMECENSORED, 573–74
SWMATRIX time-invariant covariates, 128–29
DATA, 514 TIMEMEASURES, 533
SAVEDATA, 750–51 time-to-event variable, 148–49, 247–49,
tau, 742 308–9, 437–38
tau (u), 743 time-varying covariates, 128–29
TECH1, 732–34 TITLE command, 503
TECH10, 737–38 TNAMES, 533–34
TECH11, 738 total effect, 689–91
TECH12, 738 TRAINING, 566–67
TECH13, 739 training data, 191–93
TECH14, 739–41 TRANSFORM, 525–26
TECH15, 741 transformation
TECH16, 741 data, 519–34
TECH2, 734–35 variables, 574–85
847
TSCORES dependent, 642
Monte Carlo, 797 independent, 642
real data, 551 latent, 641
twin analysis, 83–84, 85–86, 89–90, 90–91, observed, 641
183–85, 200–202, 203–5 scale of measurement, 642
TWOLEVEL, 596–97 VARIANCE, 630–31
two-part (semicontinuous), 138–41, 435–37 variances, 658
TYPE VARIANCES, 514–15
ANALYSIS, 592–600 VARIMAX, 606–10
DATA, 510–13 VIA, 690
DATA MISSING, 527–28 Wald test, 696–97
PLOT, 763–65 WEIGHT, 558–59
SAVEDATA, 755 WIDE
UB, 599–600 DATA LONGTOWIDE, 522
UB*, 599–600 DATA WIDETOLONG, 520
UCELLSIZE, 630 WITH, 656–57
ULS, 603 WITHIN
ULSMV, 603 Monte Carlo, 797–99
UNEQUAL, 614 real data, 568–70
UNPERTURBED, 632 WLS, 603
UNSCALED WLSM, 603
BWTSCALE, 561 WLSMV, 603
WTSCALE, 559–60 WTSCALE, 559–60
USEOBSERVATIONS, 538 XWITH, 687–88
USEVARIABLES, 538–39 zero cells, 626–27
UW, 599–600 zero-inflated Poisson, 28–29, 124–25, 195–
UW*, 599–600 96, 220–23, 232–33, 546–49
VALUES, 518 zero-mean and small-variance priors, 105–6,
VARIABLE command, 534–74 107–8, 108–10
variables
848
849
MUTHÉN & MUTHÉN
Mplus SINGLE-USER LICENSE AGREEMENT
Carefully read the following terms and conditions before opening the sealed CD sleeve or downloading the
software. Opening the CD sleeve or downloading the software indicates your acceptance of the terms and
conditions listed below. The Mplus CD and download contains several versions of Mplus. The Mplus
Single-User License allows for the use of only one of these programs. Using more than one is a violation
of the Mplus Single-User License Agreement.
Muthén & Muthén grants you the non-exclusive right to use the copyrighted computer program Mplus and the
accompanying written materials. You assume responsibility for the selection of Mplus to achieve your intended
results, and for the installation, use, and results obtained from Mplus.
1. Copy and Use Restrictions. Mplus and the accompanying written materials are copyrighted. Unauthorized
copying of Mplus and the accompanying written materials is expressly forbidden. One copy of Mplus may be
made for backup purposes, and it may be copied as part of a normal system backup. Mplus may be transferred
from one computer to another but may only be used on one computer at a time.
2. Transfer Restrictions. The Mplus license may be transferred from one individual to another as long as all
copies of the program and documentation are transferred, registered, and the recipient agrees to the terms and
conditions of this agreement.
3. Termination. The license is effective until terminated. You may terminate it at any time by destroying the
written materials and all copies of Mplus, including modified copies, if any. The license will terminate
automatically without notice from Muthén & Muthén if you fail to comply with any provision of this agreement.
Upon termination, you shall destroy the written materials and all copies of Mplus, including modified copies, if
any, and shall notify Muthén & Muthén of same.
4. Limited Warranty. Muthén & Muthén warrants that for ninety (90) days after purchase, Mplus shall
reasonably perform in accordance with the accompanying documentation. Muthén & Muthén specifically does
not warrant that Mplus will operate uninterrupted and error free. If Mplus does not perform in accordance with
the accompanying documentation, you may notify Muthén & Muthén in writing of the non-performance within
ninety (90) days of purchase.
5. Customer Remedies. Muthén & Muthén and its supplier’s entire liability and your exclusive remedy shall be,
at Muthén & Muthén’s option, either return of the price paid, or repair or replacement of the defective copy of
Mplus and/or written materials after they have been returned to Muthén & Muthén with a copy of your receipt.
6. Disclaimer of Other Warranties. Muthén & Muthén and its suppliers disclaim all other warranties, either
express or implied, including, but not limited to, any implied warranties of fitness for a particular purpose or
merchantability. Muthén & Muthén disclaims all other warranties including, but not limited to, those made by
distributors and retailers of Mplus. This license agreement gives you specific legal rights. You may have other
rights that vary from state to state.
7. Disclaimer. In no event shall Muthén & Muthén or its suppliers be liable for any damages, including any lost
profits, lost savings or other incidental or consequential damages arising out of the use or inability to use Mplus
even if Muthén & Muthén or its suppliers have been advised of the possibility of such damages. Some states do
not allow the limitation or exclusion of liability for incidental or consequential damages so the above limitation
or exclusion may not apply to you.
8. Return Policy: All sales are final. Software purchased on-line through our website is considered opened at
the time of purchase. This also applies to hard copy purchases because downloads are made available at the time
of purchase. In rare instances, and only within 30 days of purchase, if due to technical difficulties or platform
incompatibilities, the software will not function, we may, at our discretion, issue a refund. In such instances, an
LOD (Letter Of Destruction) on company letterhead will be required to process the refund.
850