Programming With S Pss Syntax and Macros
Programming With S Pss Syntax and Macros
SPSS Inc.
233 S Wacker Drive, 11th Floor
Chicago, Illinois 60606
312.651.3000
Training Department
800.543.6607
General notice: Other product names mentioned herein are used for
identification purposes only and may be trademarks of their respective
companies.
Table of Contents - 1
SPSS Training
Exercises Exercises
Exercises E-1
Table of Contents - 2
SPSS Training
T
INTRODUCTION his course has two major topical areas. We will review how to use
SPSS Syntax to perform complex data manipulations that are not
available under the SPSS menu system. This will be of interest to
those who need to read complex data files from legacy computer systems
(for example, legacy health care data, transaction oriented sales systems)
and those who find they need to reorganize their data in order to perform
a desired analysis. Examples of the latter include marketing and
customer relationship studies in which a number of products (or services
of an company) are rated on each of many attributes. All information
from a respondent is typically stored in a single record, but needs to be
spread across multiple records in order for factor analysis and perceptual
mapping to be performed. When preparing data for churn (customer
retention – for telecoms, credit card issuers, insurance companies)
studies, comparisons might need to be made across transactional records
sorted by customer ID and date. SPSS Syntax permits a richer array of
data manipulations in this content than would the menu system. In
short, we will examine uses of SPSS Syntax to facilitate analysis of files
with complex structures or files that must be restructured for a desired
analysis.
A DATA To illustrate the type of data manipulation that can be performed with
MANIPULATION SPSS Syntax, we will display the beginning and final form of a data file
recording SPSS training course purchases. Within the Training
EXAMPLE department, there was interest in examining patterns of training courses
taken by SPSS customers, and an analysis was performed using SPSS
Clementine. However, a requirement of the analysis was a data set in
which all courses taken by a customer (an SPSS ID) were contained in a
single customer record.
The original data file, extracted from a transaction database,
contained one record per course taken, since an instance of a course being
taken by a customer constituted a sales transaction. We show this below.
The training data has been reorganized so there is a single record per
customer ID and a separate variable for each training course. These
course variables are coded 1 if a customer signed up for the course and 0
if not. This structure makes it easy to explore associations among
training courses taken by customers. The SPSS syntax to perform the
data reorganization involved two steps: creating a vector of variables in
which each variable represented a specific course, and aggregating this
file to the customer ID level. The logic behind these operations is
reviewed in Chapter 5.
The dialog above will create a clustered bar chart displaying attitude
toward government action on health for different marital status groups.
Note that only a single variable can be placed in horizontal and Color
boxes. (Note: actually multiple variables can be placed in a single box, but
this action will not produce multiple charts.) Thus creating a series of
charts, in which either the horizontal axis or Color variables change,
would require repeated visits to this dialog, substituting one variable at a
time. However, the macro below can build many clustered bar charts.
The six Interactive Graphs in the Outline pane were produced from
the macro. In this way, macros can automate the running of sets of
similar analyses. The second section of this course reviews SPSS macros
in detail.
RULES AND AIDS Since this course involves either the writing or generation of SPSS
Syntax, we begin be reviewing the rules of SPSS Syntax and how to
FOR SPSS
obtain syntax help.
SYNTAX
The syntax rules for editing and writing SPSS commands are as
follows:
1. Each new command must begin on a new line and end with a
period (.) or a blank line.
2. *Each command must begin in the first column of a new line.
3. *Continuation lines of a command must be indented at least
one space.
4. Variable names must be spelled out fully.
5. Subcommands must be separated with a forward slash (/).
The slash before the first subcommand is usually optional.
6. *Each line of command syntax cannot exceed 80 characters.
*Not required when running from a Syntax window, but required when
using the INCLUDE command or the SPSS Production Facility
There are several useful sources of help when writing SPSS syntax. A
quick reminder of the keywords and requirements for an SPSS command
are only a tool-button click away. To demonstrate:
Click File..Open..Syntax
Move to the c:\Train\ProgSynMac directory
Double click on TransactionAgg
Scroll down to the Vector command
Click on the Vector command (so the insertion pointer touches it)
Note The sequence below assumes the SPSS 10.0 Syntax Reference Guide has
been installed on your machine. If not, it can be installed from the SPSS
for Windows 10.0 CD-ROM.
Figure 1.7 SPSS Base 10.0 Syntax Reference for Vector Command
Click File..Exit to exit Adobe Acrobat and the SPSS 10.0 Syntax
Reference Guide
ADVICE FOR THOSE Finally, it is worth mentioning, at the risk of being obvious, several
WORKING WITH recommendations for those working with SPSS Syntax.
SYNTAX
Display Syntax commands as Log Items
By default, SPSS does not display syntax in the Viewer window, although
it is written to the SPSS journal file. If SPSS issues any error or warning
messages, it is useful to see which command they follow. For this reason,
while writing, editing and testing SPSS syntax, we recommend you set on
the option to display syntax as a log item in the Viewer window. We will
do this explicitly in the next chapter, but view the Options dialog here.
Click Edit..Options
Click the Viewer tab
SUMMARY In this chapter we introduced, with examples, the major focus areas of
this course: Syntax for complex data manipulation and SPSS Macros. We
also briefly reviewed the available help for SPSS Syntax and offered some
advice for those working with SPSS Syntax.
Topics Introduction
Command Types in SPSS
The Three Types of SPSS Programming
SPSS Data Definition
SPSS Programming Constructs
A Note About Program Execution
Analysis Tip: Reordering Variables
A
INTRODUCTION ll SPSS procedures are built upon a powerful programming
language that has been consistent, though greatly extended, since
SPSS was first developed as a mainframe statistical software
program. This course will teach you how to use this language and other
features for file and data input and manipulation, and for overall control
of SPSS execution.
The SPSS language, called syntax, is generated by the program every
time a user clicks on the OK button in a dialog box to execute a
procedure. Behind the scenes, SPSS builds syntax to send to the SPSS
central engine to execute a particular procedure or transformation. Using
the Paste button in a dialog box places a copy of that syntax in a Syntax
window so that it can be edited or saved and used again.
SPSS syntax is also often called a command or set of commands. The
grammar or rules associated with commands are fairly simple, and we
will review them as necessary throughout the chapters.
Note about Data SPSS for Windows 10.0 can run entirely on your desktop machine.
Alternatively, an SPSS Client, through which you request analyses and
File Access
view results, can run on your desktop, while the analyses are run by the
When Running SPSS Server, possibly located on a different machine. In this course,
SPSS from a except for the directory you use to access the training data files, it makes
Remote Server no difference whether the SPSS Server is located on your desktop or a
different computer. The SPSS Server Login dialog (click File..Switch
Server) allows you to connect to a remote SPSS Server (if installed on
your network).
If you are running SPSS from a Remote (not Local) server, then to
use the data files accompanying this course, they must be copied either to
the server running SPSS or to a directory that can be accessed by
(mapped from) the server. The directory references in this guide assume
you are running SPSS as a local server and can thus directly access files
stored on your hard drive.
THE THREE Logically, there are three general methods of programming in SPSS. Two
TYPES OF SPSS of them involve syntax, while the third uses a version of the Basic
programming language (Sax Basic).
PROGRAMMING
Standard Syntax: These programs are the most common and simply
involve writing a series of SPSS commands to accomplish a set of tasks.
An example of a simple program is shown in the box (this program uses
the FILE TYPE command to read a non-standard ASCII data file). In a
standard syntax program, each command does one thing, and it does not
refer to other SPSS syntax. Standard programs are executed either
through the Run button, the INCLUDE command, or the SPSS
Production Facility.
LIST
SPSS Macros are a bit different and not exactly parallel to the more
common definition of a macro in other programs. First, they are written
in SPSS syntax (plus a few special macro commands) and are essentially
executed like any other syntax file. Second, they generate customized
SPSS command syntax, i.e., standard syntax, to reduce the time and
effort needed by the program writer to perform complex and repetitive
tasks. There is no special macro editor in SPSS or macro facility to
execute a macro; again, a macro is simply a specialized syntax file. Below
is an example of a macro that automates the production of a bar graph
and the insertion of today’s date into the title of a graph (this program
actually defines two macros). The macro begins with DEFINE and ends
with !ENDDEFINE.
SPSS DATA A substantial portion of this course is devoted to the manipulation of files
DEFINITION and data with SPSS. As such it will be helpful to understand a bit about
SPSS data file definition. More information is available in the Commands
and Program States Appendix in the SPSS Base 10.0 Syntax Reference
Guide (this guide is available on the CD-ROM containing SPSS and can
be copied to your hard drive when SPSS is installed).
Often these decisions are straightforward for SPSS. When you click
on File...Open..Data, and name a file with an extension of SAV, SPSS
knows how to read the file, that each logical record in the file is to be
written to one row in the Data Editor, and that it should read the whole
file. Or in the simple program below, the execution of the DATA LIST
command (accessed by choosing File...Read Text Data; note that as of
SPSS 10.0 a GET DATA command is pasted instead) tells SPSS that
what follows is a normal file, where each line of data is to be written to a
new row, or case, in the Data Editor, and that the last case should be
written and the file created when the END DATA command is
encountered.
INPUT PROGRAM.
DATA LIST FREE / X.
END INPUT PROGRAM.
BEGIN DATA.
123
END DATA.
LIST.
DO IF (WORLD = 1).
COMPUTE BTHR = 10.872 + .0014 *GDP.
ELSE IF (WORLD = 3).
COMPUTE BTHR = 46.148 -.004 * GDP.
END IF.
DO REPEAT & A DO REPEAT construct allows you to repeat the same group of
END REPEAT transformations on a set of variables, thereby reducing the number of
commands that you must enter. SPSS must still execute the same
number of commands; the efficiency comes for the user, not SPSS. To
illustrate its use, let’s access the 1994 General Social Survey file, stored
in the c:\Train\ProgSynMac directory.
Displaying First, to simplify the instructions in this course, we will request that
Variable Names in variable names (and not the default variable labels) be displayed in
dialog boxes. From within SPSS:
Dialog Boxes
Click Edit..Options
Click the Display Names option button in the Variable Lists
section of the General tab
Click the Alphabetical option button in the Variable Lists
section of the General tab
Click File...Open..Data
Move to the c:\Train\ProgSynMac directory (if necessary)
Double-click on GSS94
There are several questions in the file that ask about whether
national spending on various programs or areas should be increased, stay
the same, or be reduced. Imagine that we wish to compare pairs of
Although in this instance little if any work was saved by the use of
DO REPEAT, in many circumstances the savings can be substantial.
LOOP & END The DO REPEAT structure is an iterative construct because SPSS
LOOP iterates over sets of elements to carry out the user instructions. A more
generic form of iteration is provided by the looping facility in SPSS,
represented by the LOOP & END LOOP commands. They can be used to
perform repeated transformations on the same case until a specified
cutoff is reached, which can be defined by an index on the LOOP
command, an IF statement on the END LOOP command, or other
options. By default, the maximum number of loops is 40, defined on the
SET command. Almost any transformation can be used within a loop.
On the LOOP command, we tell SPSS to loop five times with the
index clause of #I=1 to 5. This tells SPSS to repeat the COMPUTE
command five times for each person in the GSS file. Usually indices are
increased by one, as in this example, but that is not always the case. Nor
must they begin at 1.
The COMPUTE command itself tells SPSS to add one to the previous
value of Z, which has initially been set to 0 before the loop. The loop then
finishes with the required END LOOP command to tell SPSS the
construct has finished.
A NOTE ABOUT Notice that the program ends with an EXECUTE command. When
running syntax from a Syntax window, SPSS does not immediately
PROGRAM
process transformations by reading the data file. Instead, it stores
EXECUTION transformations in memory and waits until a command is encountered
which forces a pass of the data. This is in comparison to running SPSS
commands from a dialog box, where the command is executed
immediately after the OK button is clicked. The EXECUTE command
forces a pass of the data and executes any preceding transformations.
SPSS has added Z to itself plus 1 five times, and since Z initially was
zero, Z is now 5 for every case in the file. To reiterate, the LOOP
command works within a case rather than across cases. We will see many
uses of looping in programs, and the concept of looping will be repeated in
macros and scripts.
SCRATCH The variable #I used to index the loop does not exist in the GSS file. If it
VARIABLES did, we would see it next to Z in the Data Editor. It hasn’t been created by
SPSS because it was declared a scratch variable. This is done by
specifying a variable name that begins with the # character. Scratch
variables are used in transformations or data definition when there is no
reason to retain them in the data file. They cannot be used in procedures.
The vector SAT is created from the five questions in the General
Social Survey that ask about a respondent’s satisfaction with various
aspects of his/her life. This vector is not visible in the Data Editor as a
separate variable or set of variables because it is a logical construct from
existing variables. These variables must be contiguous in the file; that is,
they must be located next to each other when viewed in the Data Editor.
VECTOR X(10).
The ten new variables all have system-missing values for each case.
Where a case has valid values for the spending variables, we can see
that SPSS created the four new DIFF_ variables measuring the
difference between NATCITY and the other four spending items. It would
be straightforward to create additional COMPUTE statements to
compare all other possible pairs.
ANALYSIS TIP: A set of variables must be contiguous when placing them into a vector.
REORDERING What can you do if that is not true in an existing file? Perhaps the easiest
method to rearrange variables is to use the trick of matching a file to
VARIABLES itself. Figure 2.8 displays syntax from CHAPT2.SPS that illustrates this
technique.
SUMMARY We reviewed the types of SPSS commands, the three types of SPSS
programs, and briefly reviewed data definition. We then discussed some
of the key programming techniques in SPSS, including the use of loops,
the creation of vectors, the processing of conditional statements (DO IF),
and the creation of repeating elements (DO REPEAT). These techniques
will be used repeatedly in SPSS programming. There are a few other
important programming techniques that you will see in later chapters
when the need arises. We turn in Chapter 3 to the handling of complex
data files.
Topics Introduction
ASCII Data and Records
File Types
Syntax Basics
Data File Structure
Reading a Mixed File
Errors in the Data
Grouped File Type Without Record Information
M
INTRODUCTION ost SPSS users find that the standard DATA LIST command is
sufficient to read the great majority of the data files they
normally encounter. This is because most data files are
rectangular, i.e., they contain the same number of records per case, the
definition of a case is consistent throughout the file, and the variables to
be defined are identical for each case. There are, however, situations in
which the above conditions do not hold. One example is a file at a medical
center with two types of records, one for inpatients and one for
outpatients, with identical variables located in different column positions
on each type of record, and some variables unique to each type of patient.
A standard DATA LIST cannot correctly read such a file and create a
separate case for each patient type.
ASCII DATA AND SPSS assumes that complex files are in ASCII format so that they can be
RECORDS read with a DATA LIST command (within the complex file types). Files
that are stored in a spreadsheet or database format cannot be read
directly by SPSS with these techniques. In that case, you have two
options. You can write out an ASCII file from the other software and then
read it into SPSS with a complex file definition. Or you can read the file
into SPSS as you normally would, temporarily creating a working file
with an incorrect format for analysis. You can then use various
programming techniques to restructure the file.
It is common to have several records for each case you plan to create
in the final SPSS data file or to have several cases on one physical record.
Understanding what constitutes a record and what the case definition
should be in the final SPSS file is part of the art of successful data input
programming.
FILE TYPES The three available file types within the FILE TYPE command are:
SYNTAX BASICS Complex file type programs are begun by the command FILE TYPE and
closed with the command END FILE TYPE. These two commands enclose
all definitional statements. One of the three keywords GROUPED,
MIXED, or NESTED must be placed on the FILE TYPE command. The
commands that define the data must include at least one RECORD TYPE
and one DATA LIST command, though it is common to have several. One
set of RECORD TYPE and DATA LIST commands is used to define each
type of record in any data file. The definition of a case, again, depends
upon which FILE TYPE is specified.
All three file types have subcommands available that warn the user
when records and cases are encountered that don't meet the definitions of
the file type, record, and case. This warning can include situations when
records are missing.
DATA FILE To further illustrate the three types of files, we display samples of data
STRUCTURE files that can be read as grouped, mixed, or nested files.
GROUPED DATA A grouped data file often looks identical or very similar to a standard
rectangular data file. However, a grouped file often has one or more of the
following problems:
These situations all mean that SPSS will not read the file
successfully with a simple DATA LIST command. The structure of a
simple grouped data file is shown in Figure 3.1.
These data are from a hospital and contain information on tests and
procedures administered to each patient. Each patient’s data begins with
a record that lists identifying information. The second and subsequent
records include information on a test that was given, the date of the test,
and the cost. Each record after the first defines a test, but we would like
the case definition to be a patient. The problem is that a different number
of tests is given to each patient, so we cannot specify the same number of
records for each patient.
All defined variable names for a grouped file must be unique because
multiple records will be put together to form one case. By default all
MIXED DATA A mixed raw data file looks quite different than a rectangular data file.
Again, a MIXED file type is used when each record type defines a
separate case (though not all record types need be defined). Figure 3.2
depicts a portion of the file MIXED.DAT that contains job information on
employees from a large company.
A standard DATA LIST cannot be used to read this file because some
of the same information is in a different location for each record type.
Salary is recorded in a different location for each record type, and other
variables are not recorded in each system. We will attempt to read this
file in the first example.
NESTED DATA A nested data file also looks quite different than a rectangular data file.
A FILE TYPE NESTED command is used when the records in a file are
hierarchically related. One example is a file with two types of records,
one for each department in a company, and one for each of the employees
in that department. All the employee records for one department are
placed consecutively together, after the record for the department in
which they are located and before the next department record.
The variable names on all the records must be unique because one
record of each type will be grouped together to form a case. Since not all
record types need be mentioned on the RECORD TYPE command, it is
possible to define a case at a higher level in the hierarchy, e.g., a
department rather than an employee. In fact, a case can be defined at any
level in the hierarchy of record types. Figure 3.3 depicts a nested data file
for a school district.
There is only one variable in common for all three records, the record
identifying information in the first column.
READING A We will read the employee mixed data file from Figure 3.2 (named
MIXED FILE MIXED.DAT) into SPSS and create a rectangular file with a case for each
employee.
The appropriate commands to read this file are included in Figure 3.4
and are in the file MIXED.SPS.
Click on File..Open..Syntax
If necessary, switch directories to c:\Train\ProgSynMac
Double-click on MIXED
The command FILE TYPE begins the file definition and puts SPSS
into an input program state. The MIXED subcommand tells SPSS that
this is a mixed data file. The data file is named here, not on the DATA
LIST commands that follow. The only other required subcommand is
RECORD to specify the record identification variable. The equal sign is
not required following RECORD or FILE. The record variable is in
column 6 and will be named SYSTEM. For the employee data it is
important to retain information that tells us under what record system
the data were created because of duplicate IDs under each system; often,
though, the record variable doesn’t need to be retained in the final file. In
that case, it can be declared a scratch variable by beginning its name
with “#”.
Click Run..All
SPSS displays the commands in the Viewer window (not shown) and
then the frequency table for SYSTEM. We can see that there are 212
employees in the file, created from 212 records, and that there are 52 of
record type 1, 122 of record type 2, and 38 of record type 3 in the data file.
All the information on, for example, salary, has now been placed in one
column despite its two different locations in the data (if you wish, switch
to the Data Editor to verify this).
ERRORS IN THE We will illustrate what occurs with undefined record types by once again
DATA reading the file MIXED.DAT. Because warning messages are turned off
by default, we were unaware that there are in fact 213 records, or
employees, in the file. However, the 213th case has an error in its record
type, as shown in Figure 3.6.
Its record type should be a “3” but is instead a “4.”
Figure 3.7 Modified File Type Command to Add Warnings From SPSS
Clicking Run..All
When SPSS switches to the Viewer, you will now see a warning
message and a note in the log under the FREQUENCIES command, as
shown in Figure 3.8. The warning message is clear, telling us that the
record type (4) was ignored when building the file. You can verify this by
looking at the frequencies output, which lists only 212 cases.
The exact position of the problem is noted in the message that begins
with “Command line:”. The critical information is that it was on case 213
that SPSS encountered an unknown record ID, whose value is 4. SPSS
also conveniently lists the actual line of data from the file MIXED.DAT
for reference. Obviously, warnings can be helpful in finding and fixing
errors in data entry or definition.
It is now possible to use the file created via the FILE TYPE MIXED
command to report on either the total group of employees, or differences
between employees across record-keeping systems.
Analysis Tip If you plan to read a file with known errors, you might think that you
want to be warned every time there is a problem defining an SPSS data
file. However, this is not always the case. If it is a large data file with
many errors, SPSS could possibly generate hundreds, even thousands, of
warning messages. It is unlikely that you will care to scroll through all
that output. In recognition of this, the maximum number of warnings
SPSS will display has been set relatively low, to a value of 10. When you
do want to see more warnings, use the SET command with this syntax:
GROUPED FILE Reading either a grouped or nested file into SPSS isn’t much different
TYPE WITHOUT than reading a mixed file in terms of the syntax. However, one situation
that causes problems yet is still relatively common, and therefore worth
RECORD exploring, is when you wish to use FILE TYPE GROUPED but don’t have
INFORMATION a record identification variable. This is fairly common, especially because
any rectangular data file can be read with either a standard DATA LIST
or via a FILE TYPE GROUPED format. The advantage of the latter is
that SPSS will fix any problems with out-of-order records and read the
file correctly if there are missing records. However, a record type variable
is needed in each instance, and you may not have created one for what
you knew was a standard rectangular file format.
If this file had been created with one record for each student, and
each assignment in a separate column, it would be a straightforward task
to read it into SPSS.
Note We should point out that the score type (quiz1, etc.) field can be used as a
record type identifier, although it is a string field. However, we will
ignore this in order to demonstrate another method of reading the file.
1) Read the data into SPSS, create a record identifier, write the
data back out as an ASCII file, then read it back in using
FILE TYPE GROUPED.
Analysis Tip When creating programs that read or transform data, it is very helpful in
debugging your programs to list out the data after an action or series of
actions is executed. If you don’t do this, then it will be very difficult to
figure out where things went wrong. Following this advice, the LIST
command has been placed at four spots in the program. It is better to err
on the side of excess here.
Our problems are not yet solved, though. Student 2 didn’t complete
the homework, but the value of RECORD for his test1 score is 2, not 3.
The set of IF statements fix this problem. They assign the correct value of
RECORD for each type of assignment, so that student 2’s test1 score will
now be listed as record type 3.
In the Viewer (not shown) we can now see that the value of RECORD
for the second line (or case) for student 2 has been changed to a 3.
The values for each assignment or test have been placed in separate
variables, with the quiz1 score in SCORE_1, the hmwk1 score in
SCORE_2, and the test1 score in SCORE_3. However, there are still
three cases for each student, with lots of missing data, and all we need is
one case for each student to calculate appropriate statistics.
where you define new variable names on the left of the expression, an
aggregate function on the right, followed by the same number of existing
variables that will be used to create the new variables. In our example,
we use the MAX (maximum) function to create three variables called
QUIZ1, HMWK1, and TEST1, based on the maximum value of SCORE_1
to SCORE_3 for each ID value. Why does this accomplish our task? Could
we have used a different function?
The output from LIST demonstrates that there are only three cases in
the new file, one for each student. Three new variables have been created,
one for each assignment. This makes it easy to calculate statistics for
each assignment. And student 2 has been correctly assigned a missing
score for HMWK1. Since AGGREGATE only creates the new summary
Analysis Tip Unlike the definition of complex file types, every command in this
program could have been created from the dialog boxes except VECTOR.
As this course is generally concerned with SPSS programming, we
instead worked from a Syntax file. Either approach is acceptable,
although seeing the syntax often helps your understanding, and it
certainly lets you apply the same type of program in the future to another
data file.
SUMMARY We reviewed the types of complex files, their structure, and the SPSS
syntax used to read these data files. We illustrated the use of complex file
types by reading a mixed data file, then discussed how data errors are
handled by SPSS. We then showed how to read a grouped file with an odd
structure and no numeric record type information.
Topics Introduction
Syntax Basics
Changing the Case Base of a File
End of Case Processing
End of File Processing
Checking Input Programs
Incomplete Input Programs
Reading Files with Missing Identifiers
When Things Go Wrong
T
INTRODUCTION here are situations where you encounter non-rectangular raw
data files that cannot be read directly with the complex file types
provided by SPSS. For those situations, SPSS offers an input
program facility, as mentioned in Chapter 2, that has the capability to
read essentially any type of ASCII data file. The ability to read a file will
at times depend upon the cleverness of the user, as really odd files may
require creative solutions.
An input program can also be used to create data that match a target
distribution, often for purposes of teaching or illustration. In other words,
an input program can create data from nothing (this is the one time that
SPSS provides a free lunch, so to speak).
For very large files, input programs also offer great efficiencies, even
if the file is a standard rectangular data file. An input program can be
used to select only certain cases as the file is read, saving one pass of the
data. Or it can concatenate raw data files, saving on having to create
SPSS data files of each. And input programs can perform the equivalent
functions of a grouped, mixed, or nested file type, but with added
flexibility.
Input Programs 4 - 1
SPSS Training
SYNTAX The commands INPUT PROGRAM and END INPUT PROGRAM enclose
COMPONENTS data definition and transformation commands that build cases from input
data. At least one file definition command, such as a DATA LIST, must
be included in the structure. Essentially any transformation commands
can be placed within an input program structure, but no procedures. This
means that you can use COMPUTE, IF, DO IF, REPEATING DATA,
LOOP, or any of the other transformation commands that may help to
create a working data file.
Input Programs 4 - 2
SPSS Training
When we are done, the SPSS data file should look like Figure 4.2. In
the restructured file, the rows have become columns and the columns
have become rows, separately for each person. That is our goal. It is
usually very important when using an input program to literally sketch
out the equivalent of Figure 4.2 to have a clear goal in mind.
Note that typically such files would contain subject and characteristic
identifiers, and additional information. We drop these variables to better
focus on the data reorganization. However, without such identifiers you
cannot tell what is what within the file.
Click on File..Open..Syntax
If necessary, switch directories to c:\Train\ProgSynMac
Double-click on INPUT1
Input Programs 4 - 3
SPSS Training
Figure 4.3 Input Program to Invert a File
The case basis can be changed as the file is read because we are still
in an input program structure. We will walk through each command of
this program to show how it operates.
Input Programs 4 - 4
SPSS Training
LOOP: We loop from 1 to 5 (using the scratch variable #I) because
we want to create five new cases for each person.
When SPSS first encounters the END LOOP command, it resets the
value of #I to 2 and passes through the three COMPUTE statements and
the END CASE command again with that value. Recall from Chapter 3
that looping occurs within a case in SPSS, not across cases. Since the
initial DATA LIST statement essentially created a (temporary) case from
each set of three records, the five passes through the loop create five new
cases from these three records, or five cases per person.
When looping is done for the first person, SPSS goes on to read the
next set of three records and do the same for this person, and this will
continue until the end of the data file is encountered.
The commands are echoed in the Viewer, and the output from LIST
shows that the file has been inverted as desired. The variables A, B, and
C for the first case have values of 1, 9, and 0.
Input Programs 4 - 5
SPSS Training
Figure 4.4 List Output Showing New File Structure
Input Programs 4 - 6
SPSS Training
END OF CASE A basic problem that many people have when writing input programs is
PROCESSING how to properly handle end of case and end of file processing. We can use
the program INPUT1.SPS to illustrate the complexities of these
decisions.
What type of data file will this structure create? How many cases will
be created? Why?
Input Programs 4 - 7
SPSS Training
Figure 4.6 New SPSS Data File
What happens is that the new program still tells SPSS to loop five
times within a case, but the second loop replaces the values of A, B, and C
with new values, the third does the same, and so forth. After the fifth and
last loop, the values of A, B, and C for the first case are 5, 5, and 0—the
values for the last number in each of the three records. At this point, we
then tell SPSS to create a case. Since we effectively didn’t change the
case basis from what the DATA LIST command created, we end up with
only two cases, one for each person. This is not what we intended, and it
illustrates the importance of correctly placing the END CASE command.
Input Programs 4 - 8
SPSS Training
END OF FILE When you write an input program, you can control all aspects of case and
PROCESSING file creation. Often, though, file creation is determined by SPSS
automatically when it runs out of raw data after reading to the end of an
ASCII data file. At that point, it creates the working data file. However,
you can control this decision, and we illustrate this capability by
modifying INPUT1.SPS slightly.
The output from LIST says that SPSS created five cases, not ten.
Inspection of the output shows that only data from the first person in the
file was used to create the working data file. The program operated
correctly for the first person, creating five rows with three variables, but
it dropped the second person (and it would have dropped any subsequent
records).
Input Programs 4 - 9
SPSS Training
The END FILE command was encountered before END INPUT
PROGRAM. This means that SPSS passed the decision of when to finish
reading the data file to the user. Our program operates as follows:
Figure 4.8 List Output Showing Data From Only One Person
Like END CASE, END FILE takes effect immediately, which is why
it produces this result. It also has this effect because SPSS processes files
by case. Logically, it is true that the end of file comes just after the end of
the looping, but in SPSS programming the entire input program is read
and processed for each case before the next case is read and defined.
Thus, the END FILE command is encountered for the first case, and
SPSS stops reading INPUT1.DAT.
CHECKING INPUT In Chapter 3 we strongly advised you to place LIST commands liberally
PROGRAMS throughout your programs when creating and debugging them.
Nevertheless, an input program is a transformation, and so procedures
cannot be placed within its structure.
Input Programs 4 - 10
SPSS Training
INCOMPLETE There are two types of mistakes to make in writing input programs. The
INPUT first is a logical mistake, such as placing the END CASE command in the
wrong location. The second, just as common, is to make a typing error
PROGRAMS
when creating commands. In either instance, strange things can happen
that are directly related to the fact that an error occurred in the middle of
an input program structure.
Input Programs 4 - 11
SPSS Training
Notice that the second error, on END INPUT PROGRAM, explicitly
states that there is an error created while there is an unclosed complex
file structure. If you take a look at the Data Editor (not shown), you will
see that it is empty, although it does have the three variables A, B, and C
created.
Now let’s fix the error and run the program again.
In Figure 4.10, you can see what occurred. After processing the first
command, INPUT PROGRAM, SPSS complained with an error that such
a command is not allowed in the set of file definition commands, i.e.,
those commands that normally follow an INPUT PROGRAM command.
Input Programs 4 - 12
SPSS Training
Click Window...INPUT1 - SPSS Syntax Editor
Add the command NEW FILE. on its own line before INPUT
PROGRAM (Alternatively, click File..New..Data)
The program has now worked flawlessly with no errors. The output
from LIST shows the correct file structure of 10 cases. Figure 4.12
provides a look at the file in the Data Editor. The NEW FILE command is
so important that, until you become accomplished at writing input
programs, and whenever you first start writing such a program, we
recommend you include it as the first command and run it each time
along with the other commands.
Input Programs 4 - 13
SPSS Training
Figure 4.12 Data Editor With Five Cases Per Person
Input Programs 4 - 14
SPSS Training
Figure 4.13 INPUT2.DAT File
In this file, there are multiple records per customer, and an unequal
number per customer, since not every customer has made three orders.
The goal is to create an SPSS data file with one case per customer. In
addition, the three order types can come out of order—for the first
customer, the third order appears second—and we also want to record
these correctly in sequence in the SPSS data file. As always, it is
important to have a picture of the goal in mind, so Figure 4.14 displays
the SPSS Data Editor with the target file structure. Each customer,
identified by the billing number, represents one case, and the orders are
now in the correct sequence, so that order number 3 for the first customer
is placed in the variable VALUE3, even though it appeared on the second
record.
Input Programs 4 - 15
SPSS Training
Would it be possible to read this file with a FILE TYPE GROUPED?
If not, what are we missing? What about FILE TYPE MIXED? Or FILE
TYPE NESTED?
The complications in the file are that the header record has no record
information, and the order records have no ID variable. We must take
this into account when reading the file.
Let’s open the program that accomplishes the task of reading this
data file.
Click on File..Open..Syntax
If necessary, switch directories to c:\Train\ProgSynMac
Double-click on INPUT2
Figure 4.15 displays the program in the Syntax Editor window. The
general method that will read this file is to:
4) If the record is not a header record, read the order type and
cost as scratch variables so they are not retained in the final
file.
Input Programs 4 - 16
SPSS Training
Figure 4.15 INPUT2.SPS File
Input Programs 4 - 17
SPSS Training
The program begins with the INPUT PROGRAM command.
The END CASE command, for the last record in the file, tells
SPSS to create a case. We must do this check because the last
record is not followed by another header record, which is the
normal method used to tell SPSS when to build a case.
END FILE: The command tells SPSS to create the working data
file, again because this is the last record.
Input Programs 4 - 18
SPSS Training
DO IF #TESTREC=0: This second DO IF is the heart of the
program. It checks to see whether #TESTREC is 0, which
means we are reading a header record, and so must create a
case.
DATA LIST: This second DATA LIST rereads the record with a
format appropriate for a header record.
DATA LIST: The third DATA LIST command reads the two
variables that are on order records. It uses scratch variables
because this information will either be passed to other
permanent variables or is unnecessary to retain.
COMPUTE: This command plugs the cost of each order into the
correct element of the vector ORDER. For example, if
#TYPE=2, then the second element of ORDER (VALUE2) is
set to the value of #COST for that record.
Input Programs 4 - 19
SPSS Training
three cases as planned, retaining region, billing number, and product
type, and we have placed the costs in the correct VALUE variables. The
second customer has a missing value for VALUE2 because he made no
order of this type.
WHEN THINGS Creating a program such as INPUT2.SPS is not easy, though, without
lots of experience at programming with SPSS or with other programming
GO WRONG
languages. Expect to make several errors when trying to create this
program, which, as noted previously, can be difficult to check because
procedures can’t be placed in the middle of the program to check your
work. One important thing to do is to make sure you are working in an
empty Viewer window, and to empty it out on a regular basis. Otherwise,
as you execute the program, syntax errors from one run of the program
will blend with errors from subsequent runs, and it will be more difficult
to recognize what needs to be fixed.
One error that is sometimes made is to ignore the need to tell SPSS to
stop reading the data file. What makes the decision to use an END FILE
command tricky is that, in general, there is no requirement within an
input program to tell SPSS when to stop reading the file. So in our first
example, no END FILE command was necessary. This is because
normally SPSS reads the last line of data, inputs it as instructed, sees an
end–of–file marker in the raw data file, and writes out a working data file
automatically.
SUMMARY This chapter provided a brief look at input programs via the use of a
simple example. We discussed the basics of writing such programs, and
the importance of controlling end of case and end of file processing. We
also discussed how to check and correct input programs. We then
attempted a complex input program that read a data file with missing
record and identification fields.
Input Programs 4 - 20
SPSS Training
Topics Introduction
Splitting a String Variable on a Delimiter
Reading Multiple Cases on the Same Record
An Existing SPSS Data File with Repeating Data
Print Command for Diagnostics
Practical Example: Consolidating Transactions
Appendix: Identify Missing Values by Case
INTRODUCTION
W
e have discussed how to read various non-rectangular files
in SPSS, using either input programs or complex file types. In
this chapter, we consider some examples of how to further
manipulate data in SPSS, and we discuss a few additional commands
often used in data transformation. The examples were chosen to both
illustrate potentially useful applications and to show the operation of
these commands.
SPLITTING A Often users encounter ASCII data files that are comma delimited, rather
STRING than in fixed or freefield format. Comma delimited means that the values
of variables are separated by commas. Many database software programs
VARIABLE ON A
create such files. An example of this type of file is displayed in Figure 5.1.
DELIMITER
Figure 5.1 Comma–Delimited Data
Click on File...Open..Syntax
If necessary, switch directories to c:\Train\ProgSynMac
Double-click on COMMA
We are going to read the small dataset from Figure 5.1, which is
included as inline data in the program. This method will easily generalize
to any size file, and can be used to split an existing string variable into
new variables. We now discuss program execution, step by step.
INDEX(variable,test string)
The first COMPUTE uses the SUBSTR function. SUBSTR has this
format:
SUBSTR(variable,position,length)
COMPUTE: SLOTS(#I): Let’s see how this operates for the first
iteration of the loop. #COMMA is equal to 2, so this says to
take a substring of VAR, beginning in column 1, for one
column. This means to read the first column of information (a
“1”) and place it in the first element of SLOTS, because #I =1
(now you can see why we increment #I once each loop). So
now SLOTS1=1 and the first variable for the first case has
been created.
Highlight all the lines from DATA LIST to the first LIST
command
Click the Run tool button
The output from LIST shows that the values from the raw data have
been correctly placed in the appropriate variables, with missing data (a
blank for string variables) inserted in the right spots. The tenth variable
value is missing for both the first and second cases. To further illustrate
how the program is functioning, switch to the Data Editor window.
Note that the VAR column is narrowed in the Data Editor so both it
and the SLOT variables are visible.
This file is now a regular SPSS data file and ready to be used for
analysis, but all the variables are strings. To make the variables that are
numbers into numeric variables, we use the NUMBER function. The DO
REPEAT command simply affords an easy means to create a series of
eight COMPUTE statements. The NUMBER function requires two
arguments, the variable to be converted, and a format by which to read
the variable (here F5 for 5 characters, no decimal digits).
After these commands are executed, the Viewer window displays all
eight COMPUTE commands created from the DO REPEAT (not shown).
Figure 5.5 displays a portion of the output from LIST with the numeric
versions of eight of the SLOTS variables. Why do these variables all have
two decimal digits displayed?
Each case has identifying information in the first two fields. It then
has up to five fields of valid data, which could be customer orders,
procedures in a hospital, exam scores, and so forth. This type of structure
can be thought of as repeating data, because each case contains repeating
bits of information that are of the same type.
The user must tell SPSS how many repeating groups of data are in
each record, but often files of this type do not include such information, as
in Figure 5.6. This value can be created from the data, though, as we
demonstrate below. The sample program that reads these types of data is
named Repeat1.sps.
You must supply the number of repeating data groups for the
REPEATING DATA command, but there is no penalty in initially
estimating too many of these on the DATA LIST. We illustrate this by
having SPSS read up to six variables for the critical information even
though no more than five appear on any input record.
Twenty one cases have been created by SPSS, for the 21 separate
repeating data values in the input file. There are four repeating groups
for the first record (117, 837, 947, and 594), so SPSS created four cases
from the first input record. Although no input record had six repeating
data groups, our specification of six input variables on the DATA LIST
command didn’t cause any difficulty. SPSS simply created a sixth
variable, V6, that is missing for every case. The critical information, in
addition to ID and GROUP, is the information stored in VARVALUE,
which came from the repeating data.
AN EXISTING What if an SPSS data file has already been created with the same format
SPSS DATA FILE as the ASCII data file in Figure 5.6? That is, what if you are faced with
an SPSS data file that has only six cases, with repeating data on each
WITH case? Such a file is REPEAT.SAV.
REPEATING
DATA Click on File..Open..Data (move to c:\Train\ProgSynMac)
Double–click on REPEAT
Click No when asked to save current data
What is certainly true is that you cannot use the REPEATING DATA
command to restructure the file, because that command, and input
programs in general, operate on raw ASCII data, not SPSS data files.
One option is to write the data out as an ASCII file and read it back
in as above, but this is cumbersome and time–consuming with large files.
A better alternative is to use SPSS programming to take this existing file
structure and modify it.
1 1 1.00 117.00
and waits for input from the next iteration of the loop.
In this manner, SPSS creates 21 cases from the original 6, one for
each valid repeating data group. The idea of looping through a record and
writing out the cases to an SPSS data file with XSAVE is the central idea
of this, and similar, SPSS programs.
The program closes with END IF and END LOOP. We then need to
get the file STACKDAT.SAV to see the new structure.
Highlight all the lines from VECTOR to EXECUTE (do not run
the GET FILE command yet)
Click on the Run tool button
The output from LIST shows that we have created the desired file
structure. There are 21 cases in total, and we have only the variables we
need, unlike the program with the REPEATING DATA command, in
which variables were dropped to create the final file.
These examples have illustrated how there is often more than one
way to accomplish the same result in SPSS. If a file is small, it may make
little difference which method is used, but for larger files, efficiency is
often important (see Chapter 8). This example also demonstrates how an
PRINT In the examples run thus far we frequently made use of the List
command to display values for specified variables in the active data file.
COMMAND FOR
One disadvantage of the List command is that it is a procedure. As a
DIAGNOSTICS result, List cannot be placed within a loop or Do If structure, and would
be of limited value in determining what is happening within such
structures. Also, as a procedure the List command forces a data pass.
Thus if we insert three List commands in a program, three data passes
must be performed, which is inefficient.
To answer these issues, SPSS has a transformation command called
Print. It provides the same general features as List, but is a
transformation. Print can be placed within Do If and loop structures, and
can provide detailed feedback about what occurs. However, be aware that
if a Print command is placed within a loop, then it will print a line of
output during each iteration of the loop for every case in the file. Such
output is extensive for large data files, so care must be taken. Below we
briefly demonstrate the use of the Print command by inserting one into
the program run earlier that read a comma-delimited file.
Click Run..All
Scroll up in the Viewer window to the lines beginning with
#comma=
The output shows the values for #comma (column position for the
next comma) and VAR (the string containing the data). In each iteration
of the loop we see how the first element (preceding the comma) of VAR is
stripped off, leaving the tail end (beyond the comma) to be processed in
the next step. Now we have access to just what is occurring within the
loop. Recall that the List command, although useful, can display only the
final result. Thus the Print command provides a glimpse into the
workings of loops, Do Ifs, and other transformations and input program
structures. Print is extremely useful when debugging programs
containing such structures or complex transformations.
The first task is to obtain the 3-digit training course topic code from
the COURSE variable.
FORMAT: This changes the display and writing formats for the
course variables to be as fixed numbers one column wide.
Recall the default would be as a fixed number eight columns
wide and with two decimals.
APPENDIX: SPSS via the Frequencies procedure can easily provide output to note the
IDENTIFYING categories and amount of missing information for each variable in a file.
What if, instead, you wished to know which variables were missing for
MISSING each case, and to automatically produce a report of this information? To
VARIABLES BY see an example of this type of analysis, examine Figure 5.23.
CASE
In this small file of 7 cases and 10 variables, we can see that
variables X7 and X9 are missing for case 1, X4 and X10 for case 2, and so
forth. This type of information can be very handy in searching for missing
data patterns and when doing multivariate analysis to understand what
is causing some cases to be dropped from an analysis.
To create the report in Figure 5.23, we need to have a file that has
one row for every case, which is already true in a standard SPSS data
file, so that certainly doesn’t sound difficult. But the tricky part is that
the best way to create the output in Figure 5.23 is to have a series of
variables—as many as in the original file—that are all strings, with the
values of these strings the variable names with missing data from the
original file.
To make this clear, we’ve displayed the target file in the Data Editor
in Figure 5.24. The fact that the values in the columns beginning with
the variable VNAMES1 are left justified indicates that these are string
variables. This file is what we need to create.
The first portion of the program reads in a small dataset with both
numeric and string variables. In this file, blanks represent missing data
in the lines between BEGIN DATA and END DATA. There are eight
numeric variables and two string variables. An ID variable is computed
by using $CASENUM, the SPSS system variable that records case
number.
This file is then saved by dropping the original X9 and X10 (the
strings), renaming NUM9 and NUM10 back to X9 and X10 so we can
retain the original variable names, and using the KEEP subcommand to
place all the variables in consecutive order in the file before ID. This is
done so that they can all be placed in one vector (because variables on a
vector must be contiguous in the data file).
With the original file saved in correct format, we now need to create a
file that stores information on which variables are missing for which
cases.
The critical part of the whole program is next. First we must get the
file DATA.SAV, which has the new file structure we need.
The Viewer echoes back the syntax from these commands. The
important thing is the information in MISSING.SAV. Figure 5.27 shows
this file. If you refer to Figure 5.24 you can see that there were two
missing variables for the first case, variables X7 and X9. Therefore,
MISSING.SAV has two rows for case 1, and the VARNUM for each
represents which variables have missing values, 7 and 9. The second case
has missing values for variables 4 and 10, and so forth.
We don’t need all the data in this file, just the labeling information,
so we will select out the first case and transpose the file.
We now have the file you see in Figure 5.29. This file has the variable
CASE_LBL that contains the original variable names. It also has a
column labeled VAR001 that was the first row of data, which is
unimportant. For purposes of matching this file back to the
MISSING.SAV file, we need to create a case ID (actually a variable ID).
Since the first ten values of CASE_LBL are X1 to X10, we can create such
a variable by using the SPSS system variable $CASENUM as before.
Compare the working data file in the Data Editor (not shown) to
Figure 5.27 that displays the MISSING.SAV file. We are going to match
the CASE_LBL information from the file VARNAMES.SAV by VARNUM
to each case in MISSING.SAV. In other words, for the first case in
MISSING.SAV, with VARNUM=7, the value of “X7” will be added to the
file. For the second case, the value of “X9” will be added, etc. This action
will place the correct variable names on MISSING.SAV. We do this
through a table match.
The hard part of our work is over. If you refer to Figure 5.24, you will
see a file with only seven cases, one for each of the original cases, and a
column for each of the ten original variables. To create such a file from
the current working data file, we need to use the AGGREGATE
command, breaking on ID, and spreading the CASE_LBL information
into separate variables.
The final step is to use AGGREGATE. The MAX function is one way
to take the values of VNAMES1 to VNAMES10 and place them into
variables with the same names (we could also have used the MIN
function). Although the VNAMES variables are strings, the MAX function
can be applied. The N function puts the number of cases (here the
number of missing variables) into the variable NMISS.
We have created the target file structure, with one case for each case
in the original data file, and 10 string variables which store the names of
the missing variables for each case. A report can now be produced like
Figure 5.23.
Note: The SPSS Missing Values module produces a report similar to the output
from this program. It also contains algorithms for imputing (substituting
values) for missing data.
Topics Introduction
Macro Basics
Macro Arguments
Macro Tokens
Viewing a Macro Expansion
Keyword Arguments
Using a Varying Number of Tokens
When Things Go Wrong
M
INTRODUCTION acros in most software programs are small routines of
commands that automate one or more tasks to make your work
more efficient and easier. Using a macro means not having to
construct commands each time you need to do a particular analysis or
data transformation task.
As noted in Chapter 2, SPSS macros are a bit different than those you
may have encountered in other software. Although the SPSS macro
facility has its own language, you don’t invoke a macro editor to create
macros. And SPSS macros don’t directly automate actions you take with a
mouse. Instead, SPSS macros are created like any other syntax file, then
executed in the same manner as other syntax. SPSS macros always
contain regular SPSS syntax in addition to the specialized macro
commands.
Introduction to Macros 6 - 1
SPSS Training
MACRO BASICS The structure of a macro is outlined below. Macros begin with the
DEFINE command and end with the !ENDDEFINE command. Macro
subcommands and keywords all begin with an exclamation point to
distinguish them from regular SPSS commands. The macro name (which
can also begin with an exclamation point) will be used in the macro call.
macro body
!ENDDEFINE
.
INCLUDE 'PathToMacro\MACRO.SPS'.
MYMACRO arguments.
Highlighting these two commands and clicking on the Run button will
execute the macro. Again, no special macro editor needs to be invoked to
execute macros in SPSS. And a macro call does not have to immediately
follow a macro definition.
Introduction to Macros 6 - 2
SPSS Training
MACRO There are many subcommands and specifications available in the macro
ARGUMENTS facility, but macro arguments are central to macro creation. Arguments
are input to the macro that will be supplied by the user, and they can be
of two types: keyword and positional.
MACRO TOKENS Tokens are used in conjunction with arguments in macros. A token is a
character or group of characters that has a predefined function in a
specified context. That definition probably doesn’t seem too illuminating
because tokens are so varied in their operation. In essence, a token
declaration tells SPSS how to know or recognize which elements on the
macro call should be associated with this argument. The simplest token
definition is one that assigns the next n values (or tokens) to the
argument. It is better to see this in action where we can also discuss
other issues of macro syntax.
In the macro below, we’ll focus first on the argument and tokens. The
arguments to a macro must be enclosed in parentheses. The use of the
!POSITIONAL argument keyword tells SPSS that arguments, or input, to
the macro will be positional in the macro call. The !TOKENS (4)
specification declares that there will be four positional tokens.
Introduction to Macros 6 - 3
SPSS Training
The diagram in Figure 6.1 may make the relationship between
arguments and tokens more clear. The keyword !POSITIONAL, an
argument, refers to where in the macro body the tokens will be placed.
Hence !1, the first positional argument, is placed after the equals sign.
The !TOKENS (4) specification on the argument is not referred to in the
macro body; instead, it tells SPSS how many elements in the positional
argument will be input by the user. In the macro call, the four variable
names are not just variable names, but also tokens.
We will use the 1994 General Social Survey file to demonstrate how
this macro functions. We open the GSS file and then run this macro.
Click File..Open..Data
Switch to the c:\Train\ProgSynMac directory if necessary
Double-click on GSS94
Then
Click File..Open..Syntax
Double-click on MACRO1 (not shown)
Introduction to Macros 6 - 4
SPSS Training
Figure 6.2 Descriptives Output from DESMACRO
What happens when you define the same macro twice? Let’s find out
by rerunning the commands.
Introduction to Macros 6 - 5
SPSS Training
Figure 6.3 Warning Message From Defining DESMACRO Twice
When you are writing macros you will undoubtedly see warning 6804
now and then. Warnings can be turned off with the SET command, but
we don’t recommend that approach, especially when debugging your
macro definitions. Clearly, this warning means that once defined in an
SPSS session, a macro need not be defined again.
Introduction to Macros 6 - 6
SPSS Training
In the Viewer, the lines beginning with "M>" appear due to the SET
command. The macro call appears just above these lines, and the
DESCRIPTIVES command constructed by the macro appears below. The
four positional tokens (EDUC, SPEDUC, AGE, and AGEWED) are
placed, in that order, on the DESCRIPTIVES command just after the
equals sign (which is where the !1 positional argument was in the macro
definition).
Introduction to Macros 6 - 7
SPSS Training
KEYWORD Since arguments are central to macros, we next review the syntax for a
ARGUMENTS keyword argument. This type of argument is given a user-defined
keyword in the macro definition. In the macro body the argument name is
preceded by an exclamation point, so keyword arguments should be seven
characters or less in length. On the macro call, though, the keyword is
specified without the exclamation point (when and when not to use an
exclamation point is often a point of confusion for those new to macros).
Notice that !ARG1 appears after !ARG2 in the macro body. Keyword
arguments can appear in any order in the macro. When the macro is
called, the three variables are input as arguments, two for ARG1 and one
for ARG2. Two crosstabulations will be produced, PRES92 by SEX and
PRES92 by RACE.
Introduction to Macros 6 - 8
SPSS Training
When the output is produced, the first thing to observe is that the
macro expansion is displayed not just for the macro call, but also for the
macro definition itself. This is not a problem, but it can be annoying. You
can turn this off by placing a SET MPRINT OFF command before
DEFINE, and then a SET MPRINT ON before the macro call.
If you scroll down a bit, you will see the table in Figure 6.8. The
macro facility created the CROSSTAB command shown in Figure 6.7,
which when executed by SPSS created two tables. In the first table, we
see that females were more likely to vote for Clinton in 1992, and males
more likely to vote for Perot.
Introduction to Macros 6 - 9
SPSS Training
Figure 6.8 Crosstabs of 1992 Presidential Vote and Sex
USING A What if you wish to use a different number of inputs, or tokens, to the
VARYING macro arguments? You might want to create crosstab tables with four
row variables and three column variables the next time you call TABX.
NUMBER OF This is easily accomplished with the !CHAREND or the !ENCLOSE
TOKENS keywords, both of which we will briefly review.
Introduction to Macros 6 - 10
SPSS Training
Figure 6.9 CORMACRO Macro Definition
The first call of the macro assigns the variables AGE and EDUC to
the first positional argument. This is because the two variables are
followed by a colon, which tells the macro facility that input to the first
positional argument has ended. The second positional argument has only
PRESTG80, ended by a slash.
You can see the macro expansion in the Viewer and the
CORRELATIONS command created by the expansion. Below this is the
correlation output, where education is reassuringly highly correlated with
occupational prestige, but not age. The key point is that SPSS knew
where to end each set of tokens by using ending characters. This means
that with this feature any number of variables can be specified.
Introduction to Macros 6 - 11
SPSS Training
Figure 6.10 Output From Call of CORMACRO
The macro call does not end with a slash after TVHOURS, the second
variable for the second positional argument. However, the macro worked
perfectly, creating the two by two correlation matrix you see in Figure
6.11.
Introduction to Macros 6 - 12
SPSS Training
Figure 6.11 Output From Second Call of CORMACRO
Introduction to Macros 6 - 13
SPSS Training
Figure 6.12 CORMAC2 Macro Definition
The macro body is identical to the previous macro. The macro call of
CORMAC2 puts three variables between the first set of brackets. They
will be placed on the left of the WITH keyword in the CORRELATIONS
command. The second list of variables between brackets will be placed at
the second positional argument, to the right of the WITH keyword.
Introduction to Macros 6 - 14
SPSS Training
Figure 6.13 Output From Call of CORMAC2
So what does happen when the user calls a macro with the wrong
type of input? One possible problem is to add extra tokens to a list. In
Figure 6.14 you see the TABX macro with an error on its call. Instead of
one token on ARG2 we have two, PRES92 and VOTE92. If we execute
this call, what will SPSS do?
Introduction to Macros 6 - 15
SPSS Training
Figure 6.14 Misspecified Macro Call: Extra Token
Not surprisingly, the command doesn’t work, but perhaps not in the
way we might have expected. SPSS expands the macro and doesn’t
complain upon expansion. As is evident in Figure 6.14, PRES92 is placed
in the correct spot before the BY keyword, and VOTE92 is ignored while
the rest of the command is built. Then, rather than throwing away this
extra specification, SPSS simply adds it to the end of the command and
then puts in a period. This causes a problem because it now appears that
VOTE92 is a part of the CELLS subcommand, which it certainly is not.
SPSS then provides a warning message when it tries to run the
constructed command.
And if instead we invoke this same macro with no tokens for ARG2,
the result is another problem, as in Figure 6.16. The CROSSTABS
Introduction to Macros 6 - 16
SPSS Training
command has no variables before the BY keyword, but the macro
expansion didn’t complain about the missing argument. SPSS, of course,
did complain when it processed the command, with an accurate warning
message about the problem.
Introduction to Macros 6 - 17
SPSS Training
Figure 6.17 Error in Macro Definition
SUMMARY This chapter has reviewed the basics of macro definition and calls,
focusing on arguments and tokens. The macros in this chapter were
simple so that we could focus on understanding macro construction. The
next chapter will introduce more complex macro features and more
typical uses of macros.
Introduction to Macros 6 - 18
SPSS Training
Topics Introduction
Looping in Macros
Producing Several Clustered Bar Charts
Double Loops in Macros
String Manipulation Functions
Direct Assignment of Macro Variables
Conditional Processing
Creating Concatenated Stub and Banner Tables
Additional Recommendations
T
INTRODUCTION he macro facility is very powerful and offers several additional
features that were not discussed in the previous chapter. For
example, since macros essentially manipulate and create strings,
the macro facility offers several functions that manipulate strings,
including some that are equivalent to the string functions available in
SPSS. Macros can also do looping to accomplish repetitive tasks and
conditional processing (IF like statements). Also, macros permit
assignment of values to macro variables either as constants or through
the evaluation of an expression.
Advanced Macros 7 - 1
SPSS Training
There are two types of loop constructs in macros: the index loop and
the list processing loop. These constructs allow the user to iterate over
just about anything, including variables, numbers, file names, or
procedures.
The list processing loop also begins with !DO and ends with !DOEND,
but has this syntax
Here, the loop iterates for as many elements as there are in the list,
which is usually a macro argument (specifically, the tokens for that
argument). On each iteration of the loop, the value of !VAR is set to one of
the values of the list, in order.
PRODUCING The first macro illustrates the list processing looping construct.
SEVERAL
Click File..Open..Data
CLUSTERED BAR
Switch directories to c:\Train\ProgSynMac if necessary
CHARTS Double click on GSS94
Now that we have the data file, we’ll open the syntax file.
Click File..Open..Syntax
Double click on MACRO2
Advanced Macros 7 - 2
SPSS Training
can be frustrating when a user wants to produce several bar charts at
once. The first macro in MACRO2.SPS, displayed in Figure 7.1, solves
this problem by allowing several clustered bar charts to be specified at
once.
Note We illustrate using standard graphs because the basic Graph command is
simpler than an IGraph command. For those using Interactive Graphs,
equivalent macros using Interactive Graphs are included at the end of
the Macro2.sps Syntax file.
Advanced Macros 7 - 3
SPSS Training
Only one token is named for CLUS on the macro call, so there will be
one clustering variable. But on each iteration of the loop, a token from
CAT will be placed in the position of !I. The macro call names two tokens
(here variable names) for CAT, so two GRAPH commands will be
constructed, one for each token.
The macro body ends with !DOEND first to close the loop, then
!ENDDEFINE to close the definition.
On the macro call, the variables NATHEAL and NATENVIR (asking
about the level of federal spending on these issues) are read as values
(tokens) of CAT, and the variable CLASS as the value of CLUS.
If you scroll to the second bar chart (not shown) you can see that the
macro call did indeed produce two graphs.
Advanced Macros 7 - 4
SPSS Training
The ability to loop over variable names is very powerful and allows
you to create a huge amount of output very quickly. Below we rewrite the
macro to allow additional variables for the CLUS argument.
DOUBLE LOOPS Our next example directly extends the macro we just examined to allow
multiple variables for both the Category Axis (Horizontal or X-Axis
IN MACROS
variable for Interactive Graphs) and Define Clusters By (by default, Color
variable for Interactive Graphs) variables. Again we present the example
using a simple Graph command, and the equivalent macro using an
IGraph command appears at the end of the Macro2.sps file.
Advanced Macros 7 - 5
SPSS Training
Both of the macro loop variables, !I and !J, appear in the GRAPH
command. As before !I is substituted in the position of the Category Axis
variable (after the first BY keyword), but now !J appears in the position
of the Define Clusters By spot (after the second BY). This position was
formerly occupied by the !CLUS argument itself.
Since there are two !DO structures, there must be two !DOENDs; the
inner !DOEND ends the inner (!J) loop and the outer !DOEND ends the
outer (!I) loop. If we reversed the order of the two loops, that is, if !DO !J
preceded !DO !I, the same set of Graph commands (and graphs) would be
produced, but in a different order. This is because the inner loop iterates
completely through its range or set of values for each successive value of
the outer loop. The macro definition is closed with an !ENDDEFINE.
The macro call names two tokens for CAT (NATHEAL and
NATENVIR) and two for CLUS (CLASS and MARITAL), so four GRAPH
commands will be constructed, one for each combination of a CAT and
CLUS token.
Figure 7.4 Bar Chart Produced from Macro with Double Loop
Advanced Macros 7 - 6
SPSS Training
STRING Since macro expansion creates strings, it is only natural that the macro
MANIPULATION facility supplies about a dozen string manipulation functions to create
exactly the command needed. Some of the most commonly–used are
FUNCTIONS
these.
!LET !X = 5
!LET !Y = !CONCAT(ABC,!1)
Advanced Macros 7 - 7
SPSS Training
CREATING Using multiple categorical variables in both the rows and columns of a
CONCATENATED table is very common. The simplest possible TABLES syntax to create
such a table, for a set of generic variables A through F with column
STUB AND percents, is this:
BANNER TABLES
TABLES
/TABLE= A + B + C BY D + E + F
/STATISTICS cpct( :D E F ).
The macro, called STB_BAN, will create a table with any number of
variables on the stub and banner. It also allows the user to specify a title
for the column percent statistic and an overall table title.
Advanced Macros 7 - 8
SPSS Training
number of variables, there is no way to specify ahead of time how many
plus signs are needed. That is the chief problem the macro solves.
Two arguments, !ST and !BA, which we have not defined yet, are
placed in the position of the stub and banner variables. This seems odd
since we have defined STUB and BANR as arguments to contain these
same variables. In fact, !BANR is used after the colon in parentheses on
the STATISTICS subcommand to tell SPSS to calculate column
percentages. We can use !BANR in the latter position because the
variables appear together with no intervening commas or plus signs, but
that is not true in the TABLE subcommand.
Four !LET commands follow the DEFINE command. They set the
macro variable !STCPY equal to STUB and !BACPY equal to BANR. In
other words, they make copies of the variables that the user inputs. The
third and fourth !LET commands set the new macro variables !ST and
!BA (the ones that will actually be used in the TABLES command) to
!NULL, which is an empty string of length 0.
The macro then has two similar sections, one for the stub variables
and one for the banner variables.
Advanced Macros 7 - 9
SPSS Training
!DO: This is a list processing loop, with the variable !S taking on
successive values of the tokens in !STUB. To follow the macro
through, we will use the first token in !STUB, which is
DRINK. That is the first value of !S.
!LET !ST: !ST, which begins as a null string, is set equal to the
concatenation of itself, plus the first token in !STCPY, which
is DRINK, plus one blank. So !ST = “DRINK ”. So far, so good.
!LET !STCPY: The !TAIL function returns all tokens except the
first, so !STCPY now consists of SMOKE and VOTE92.
Exactly the same process is used to create the syntax for the banner.
Then in the TABLES command within the macro body, quotes are placed
around the title, which is input as the token for TTL, using the !QUOTE
function.
Advanced Macros 7 - 10
SPSS Training
Switch to the Viewer window
The macro worked perfectly, placing three variables each on the stub
and banner and creating the command you see in Figure 7.6. The token
“Percent” was substituted for !COLSTAT, and the TITLE subcommand
was created with the title “Drinking, Smoking, and Voting” placed in
quotes. The !BANR argument after the colon placed the three banner
variables in that spot.
Scrolling down a bit will display the table created from this
command. There are interesting relationships between the three
demographic variables and the three behavioral variables in the stub.
Advanced Macros 7 - 11
SPSS Training
Figure 7.7 Table Created From STB_BAN Call
One oddity you may have noticed is the large number of lines between
the macro definition in the Viewer, and the macro call and table. A
portion of these lines is shown in Figure 7.8. This occurs because of all
the periods ending the commands in the macro body, such as on the eight
!LET commands.
Advanced Macros 7 - 12
SPSS Training
!LET !STCPY = !Stub !LET !BACPY=!BANR !LET !ST=!NULL
!LET !BA=!NULL
Not only are several commands placed on the same line, but also the
line doesn’t end with a period. You can use this feature if you wish, and it
will reduce the amount of output, but usually writing macros in this
fashion makes them harder to read and debug, so it may not be the best
practice except for the experienced user.
ADDITIONAL To ease production of macros we suggest the following steps for your first
RECOMMENDATIONS few macros.
SUMMARY We’ve discussed more advanced macro commands and functions, and
we’ve reviewed three programs that use many of these features to
accomplish common tasks.
Advanced Macros 7 - 13
SPSS Training
Advanced Macros 7 - 14
SPSS Training
I
INTRODUCTION n previous chapters we have introduced the basic syntax of the SPSS
macro language and have demonstrated macro use with a series of
examples. In this chapter, we extend the discussion by presenting
several macro features that we feel may not be obvious, but have proven
to be very useful in practice. These examples are based on programs
written by the SPSS Consulting group for customer applications. The
macros presented here have been simplified so we can easily focus on the
salient points.
One of the macros uses an input program to generate a data file for
testing purposes. A second pairs together crosstabulation tables and
clustered bar charts based on the same variables. The final example
demonstrates how macro logic can be used to include or exclude SPSS
commands from a program. These serve to illustrate how macros are used
in practice.
Macro Tricks 8 - 1
SPSS Training
The MAKEDAT macro contains two arguments, one for the number of
cases (NCASES) and one for the number of variables (NVARS). For each
argument a single token (TOKENS(1)) is expected, which makes sense,
since only a single value need be given for the number of cases or the
number of variables.
Macro Tricks 8 - 2
SPSS Training
INPUT PROGRAM: If we focus first on the input program
(between the INPUT PROGRAM and END INPUT
PROGRAM commands), we see it is composed of two loops,
one to create cases and the other to create variables within a
case.
SET SEED: This is not required, but you can specify the starting
point of the pseudo-random number generator by providing a
large integer value for the SEED. This is mainly used to
permit you to reproduce the same data values later, if desired.
Macro Tricks 8 - 3
SPSS Training
When the MakeDat macro is called, it will create a data file
containing 50 variables (!NVARS) and 1,000 cases (!NCASES).
Click Run..All
Switch to the Viewer window
The argument value for !NCASES (1000) appears in the first LOOP
(outer loop) command. Also, the argument value for !NVARS (50), has
been substituted into the VECTOR, second LOOP (inner loop) and
VARIABLE LEVEL commands.
Macro Tricks 8 - 4
SPSS Training
Figure 8.3 Random Data Produced by Macro
We see values (1 through 5) for the first few variables (of 50) and
cases (of 1,000).
Extensions The example presented was simple in that no relationships were imposed
on the created variables. You might want to create a data set that
demonstrates specific relationships, for example, strong or weak
correlations. Those interested in examining a more complex variation, in
which the variables are first forced to be uncorrelated and then a
relationship imposed, should examine the Syntax file AtSimdat.sps. It
generates a data file containing a user-specified number of categorical
and continuous predictor variables that are related to an outcome
variable. In addition, the number of categories present in the categorical
variables is passed as an argument.
Macro Tricks 8 - 5
SPSS Training
The challenge here is to produce just the desired tables and charts in
the proper order. Specifically, if four variables were given, A, B, C and D,
then we want a crosstab followed by a bar chart displaying A by B, A by
C, A by D, then B by C, B by D, and finally C by D. Thus each unique
pairing of the variables on the list should produce a table and chart,
which should follow each other in the Viewer window.
As you, no doubt, suspect at this point, loops will be involved.
Although we have seen loops within macros earlier, here the same list of
variables must drive two separate loops in a coordinated fashion. In order
to accomplish this, we will make use of the !TAIL macro function
reviewed in Chapter 7.
Macro Tricks 8 - 6
SPSS Training
!LET !TAILIST: The macro variable !TAILIST is set equal to the
list of variables passed as an argument to the macro. It will
be used later to store the tail (all tokens except the first) of
the list of variables.
!DO !COLVAR !IN (!VLIST): This is the outer loop within the
macro. !COLVAR is a macro variable that stores the variable
name used as column variable in the crosstab table and the
cluster variable in the barchart. At each iteration of the loop
the next variable from !VLIST is assigned to !COLVAR.
!DO !ROWVAR !IN (!TAILIST): The inner loop will assign, during
each iteration, a value from !TAILIST to !ROWVAR. The
variable name stored in !ROWVAR will be paired with the
variable name in !COLVAR to create the table and chart.
Macro Tricks 8 - 7
SPSS Training
In this way, at each outer loop iteration the first variable name on the
list of remaining variables (!TAILIST) is dropped from this list and
becomes the !COLVAR.
Click Run..All
Macro Tricks 8 - 8
SPSS Training
THE CASE OF So far we have used macros to build commands. However, macro logic can
THE be used to build or not build certain commands based on arguments
passed to the macro. This can be useful if you wish data selection or
DISAPPEARING
modification done only when requested. We illustrate this in a macro that
COMMAND runs a simple human resources report for employee data. If an age
argument is passed to the macro, then a data selection command
(SELECT IF) will be run based on the age value. If no age argument is
passed to the macro, then no data selection command is built. In addition,
if age selection is performed, then a note is added to the title of the
summary report (CASE SUMMARIES procedure).
The data file contains employee information. The macro will generate
a report containing summaries of education, current salary and time in
current job position for subgroups. Age is included in the file, but is not
visible in Figure 8.6.
Macro Tricks 8 - 9
SPSS Training
Figure 8.7 MacroSelect Syntax File
Macro Tricks 8 - 10
SPSS Training
age information. If !AGE equals ‘ALL’, then no TEMPORARY
and SELECT IF commands are created, and a standard title
string is used.
Macro Tricks 8 - 11
SPSS Training
Click Run..All
Scroll to the first Summarize pivot table
Note that the Case Summaries pivot table and Title items are hidden
(double-click on item in Outline pane to hide) in these screen shots.
Macro Tricks 8 - 12
SPSS Training
Figure 8.9 Report with Gender Groups and No Age Selection
Macro Tricks 8 - 13
SPSS Training
Figure 8.10 Report with Age Selection and Education Groups
Macro Tricks 8 - 14
SPSS Training
Exercises
All exercise files for this class are located in the c:\Train\ProgSynMac
folder on your training machine. If you are not working in an SPSS
Training center, the training files can be copied from the floppy disk that
accompanies this course guide. If you are running SPSS Server (click
File..Switch Server to check), then you should copy these files to the
server or a machine that can be accessed (mapped from) the computer
running SPSS Server.
Field Columns
Record Type 1
Household ID 2-5
Income group 8
# Members 11-13
Field Columns
Record Type 1
Household ID 2-5
Age 6-8
Sex 13
Rating 1 18
Rating 2 23
Rating 3 28
b) Obtain means for age and the rating variables broken down by income
group.
Exercises E - 1
SPSS Training
Variable Position
ID 1-5
SEX 6-10
Product1 11-15
P1Rating1 16-20
P1Rating2 21-25
P1Rating3 26-30
P1Rating4 31-35
P1Rating5 36-40
Product2 41-45
P2Rating1 46-50
P2Rating2 51-55
P2Rating3 56-60
P2Rating4 61-65
P2Rating5 66-70
Product3 71-75
P3Rating1 76-80
P3Rating2 81-85
P3Rating3 86-90
P3Rating4 91-95
P3Rating5 96-100
a) Read the data so that there is one case per product rated by a
customer, with variables ID, SEX, PRODUCT, and RATE1 to RATE5.
Exercises E - 2
SPSS Training
a) Open the TrainTransact.sav SPSS data file. Open the Syntax file
Rating.sps and execute the commands, which will compute a RATING
variable with values 1 to 100. Open the TransactionAgg.sps Syntax file.
Modify the program so that if a customer takes a course, the rating is
stored under the course name variable in the aggregated file. If a course
was not taken by the customer, course variable’s value in the aggregated
file should be system missing.
c) Modify the final version of the macro created in the previous step, so it
accepts a second argument, named STATS, which will be a list of the
statistics to be displayed in the Descriptives output.
d) For those with extra time: Test the TestDes macro with the arguments
in different orders. If the macro fails due to argument order, modify the
macro so that argument order does not matter.
Exercises E - 3
SPSS Training
a) Write a macro that takes one argument: a variable list. It should run a
Frequencies command, using all the variables appearing on the variable
list.
c) Test to see if your macro will work with upper and lower case letters
(YES, Yes, yes). If not, use the !UPCASE function so it works in
regardless of case.
a) Open the MacroSelect.sps Syntax file used in the chapter. Modify the
macro so that in place of the age argument it takes a YEAR (4-digit)
argument. If this argument is specified, only those born after the
specified year will be included in the report (hint: use the XDATE.YEAR
SPSS function to obtain the year from the bdate variable).
b) Modify this macro so the YEAR argument will take two values, a
beginning year and an ending year. If this argument is used, only those
born within the range of years given (including the years specified) will
be included in the report. Save the macro as ModMacroSelect.sps.
c) For those with extra time: Modify the macro so that GROUP will accept
a list of variables, and will use them to create additional subdivisions in
the report. (Hint: the BY keyword must separate each grouping variable
from the next in the SUMMARIZE command).
Exercises E - 4