Analysing Quantitative Data Using SPSS 10 For Windows
Analysing Quantitative Data Using SPSS 10 For Windows
Analysing Quantitative Data Using SPSS 10 For Windows
1 Introduction
1.1 Aims:
· during the class work your way through exercises 1 to 16 excluding exercise 3 following the instructions as
requested. The symbol ý usually means you should undertake some work away from the computer or check
that you have already undertaken some tasks on the computer. The symbol þ usually means that you should
issue a command or series of commands to the computer -this usually means pointing and clicking with the
mouse's left button;
· Exercises 17 and 18are designed to help you to analyse your own data in SPSS 1 for Windows2;
· during the class if you get stuck ask for help;
· note: these notes assume the user is familiar with a Windows package such as Word for Windows or Excel 3.
2 Background
2.1 SPSS for Windows
SPSS for Windows is a powerful computer package providing statistical analyses and data management. The
SPSS suite of programs are the most widely used statistical analysis package in the world.
Before data are analysed in SPSS it is necessary to understand the type of data as this will affect the analysis used.
Categorical: categorical data consists of values which cannot be expressed numerically but can be grouped into
categories; for example gender which can be grouped into male and female.
Quantifiable: quantifiable data consists of values that can be expressed numerically as quantities; for example
year of birth.
Discrete, where individual items of numeric data can have one of a finite number of values within a
specified range; such as spinal column point for the variable salary scale. The value can usually be
counted and it changes in discrete units, in this case whole numbers. In some instances discrete data may
be rank data, for example the order a group of people finished in a race.
Continuous, where numeric data is not restricted to specific values and is usually measured on a
continuous scale; such as journey to work distance (in km).
With such data it is possible to tell the interval between the data values for different cases; for example
the interval between a journey to work of 15 miles and another of 22 miles is 7 miles.
1
SPSS is a registered trademark of SPSS Inc.
2
Windows is a registered trademark of Microsoft Inc.
3
Word for Windows and Excel are registered trademark of Microsoft Inc.
N.B. Observed values of a continuous variable always appear discrete due to limitations of the equipment
used for measurement (e.g. a car odometer).
One potentially confusing aspect of SPSS is that all data are usually coded numerically (e.g. 1 = male). Although
it appears less meaningful to code such responses numerically it is better from a data manipulation point of view
since SPSS allows only automatic recoding on codes which are numeric.
This data file consists of data about 347 people recruited to work for a UK local authority over a ten year period;
the vast majority of data having been recorded at the time of their appointment. The data refers predominantly to
non-manual employees, although there are a few manual employees. The data have been anonymised in a variety
of ways and all locational data has been amended to preserve confidentiality.
The data file can best be thought of as a large spreadsheet with each column representing a variable for which data
are available and each row representing that data for an individual or case:
Thus for the table above row 1 represents a person who has gender code 2 (female), was born in 1967, has marital
status code 1 (single), was educated up to code 5 (O'level / GCSE grade C or above), and professional
membership code 3 (none). The data then continues to the right for further variables. The symbol "." is the SPSS
symbol for missing data, this is discussed in more detail in Help 17.4. A full list of variables and their codes is
given in appendix 1.
SPSS for Windows follows the conventions used in other Windows applications, making use of a variety of menus
and dialogue boxes. This means you rarely have to use the keyboard other than for entering data, or for naming
specific variables.
þ Power up your machine (switch it on!) and your normal screen will appear
After clicking on the start button in the bottom left of your screen SPSS is usually found somewhere in the
Programs option as shown below:
þ Click on the option to load and run SPSS. This will take some time so be
patient!
þ If you do, click on the button at the bottom of the dialogue box to remove it.
You should now have an Untitled SPSS Data Editor screen.End of Exercise 1
3.1 The SPSS Windows
When you load and run the SPSS package it opens up a menu bar and two views. These are the Data View
(currently visible) and the Variable View.
Menu Bar: This provides a selection of options (File Edit View Data.....)which allow you for example to open
files, edit data, generate graphs, create tables and perform statistical analyses. Selecting from this menu bar will,
like in other windows packages, provide further pull-down menus and dialogue boxes.
Data View: This sheet contains your data (once you have entered it!), each column representing a variable for
which data are available and each row representing that data for an individual or case. At present this sheet
should be blank. As this sheet is currently selected its name on the tab at the bottom is in bold.
Variable View: At present this sheet is not visible as the variable view sheet is not active. Consequently the name
is not in bold. Don’t bother to look click on the tab and look at this sheet yet, we will do that later.
File is used to access any files whether you want to Open an existing SPSS file or read data in from another
application such as Excel of dBase, or start a New file. It is also the menu option you choose to Save files.
Edit can be used to alter data or text in the Data View or the Variable View.
View can be used to alter the way your screen looks. Please leave this on the default settings.
Data is used to define variables and make changes to the data file you are using.
Transform is used to make changes to selected variable(s) in the data file you are using. This can include
recode(ing) existing variables and compute(ing) new variables.
Analyze is used to undertake a variety of analyses such as producing Reports, Calculating Descriptive Statistics
such as Frequencies and Crosstabs (crosstabulations) and associated summary statistics, as well as various
statistical procedures such as Regression and Correlation.
Graphs is used to create a variety of graphs and charts such as Bar, Line and Pie charts.
Utilities is for more general housekeeping such as changing display options and fonts, displaying information on
variables.
Help is a context sensitive help feature which operates the same way as other Windows packages.
SPSS for Windows saves data files using a filename of up to 8 characters and the file extension .SAV, for example
teach.sav
Notice that SPSS looks for data files in the most recently used sub directory (usually c:\spsswin). As you are
going to load a file which is on a floppy disc you need to use the drive
You should now see the following with teach.sav filename displayed.
þ Open the file by clicking on TEACH.SAV and then on the button (or double click on the file
name).
You will now see the file appear in the Data View and the filename above the menu bar change to TEACH.SAV
This will take some time so be patient!
ý Do not undertake this exercise until you need to load your own data from an Excel Spreadsheet.
ý Make sure that your Excel spreadsheet file is set out with one column per variable and one row for each
individual (survey form). Note the first row should be the variable names. This is illustrated for the Excel
equivalent of an extract from the teach.sav data file below:
A B C D E
1 gender born marital educate profmemb
2 2 67 1 5 3
3 1 19 7 3
4 2 24 2 7 3
Notice that SPSS looks for data files in the current directory (usually c:\spsswin). As you are going to load an
Excel file which is on a floppy disc you need to use the a: drive
Notice that SPSS looks for data files in the most recently used sub directory (usually c:\spsswin). As you are
going to load a file which is on a floppy disc you need to use the drive
þ click on the down arrow in the Files of type: box and use the scroll arrows on the right of the dialogue
box to find Excel
þ click on Excel (*.xls)
You should now see the following but with your Excel files displayed in the Open File dialogue box.
þ Select the filename you want by clicking on it and then click on the Open button. The following dialogue
box will appear:
þ Make sure there is a üto the left of Read variable names and click on OK.
You will now see the file appear in the Data View and the filename above the menu bar change. This will take
some time so be patient!
Because you are loading the file from Excel you will still need to add variable labels and value labels within SPSS
and save your data as an SPSS data file (*.sav).
To check what the column heading for each variable and the codes refer to:
þ click on the Variable View sheet at the bottom of the screen. You will now see:
The first column contains the variable Name, in the case of the first row “gender”. This is the column heading
that appears in the Data View.
The second column refers to the Type of data. Although gender is categorical data, it is refereed to as numeric
because numeric code values have been used! The key to these code values is given in the column headed
Values.
The fifth column contains the variable’s Label. At present this is partially obscured by the subsequent column.
To see the full value label:
þ move your mouse pointer in-between the Label and the Values column headings so that the appears.
þ click and drag the column width to the right until the variable’s label can be read.
(Note: if you wish to edit a variable’s label just retype the label in the appropriate cell)
The sixth column contains the key to the codes used for each variable. These are known as the Values Labels.
(Note: you can also use this option to change each value label for the codes or enter new value labels.)
þ click on the Cancel button in the Value Labels dialogue box to return to the Variable View.
þ Use the ideas in this exercise to explore at least five other variables in the data set.
ý Check the codes with those that appear in appendix 1, can you find any errors?.
This will usually give the Frequencies dialogue box. However sometimes the variables in the left hand box are
arranged alphabetically.
þ If the variables are arranged alphabetically use the downward arrow on the left hand box to scroll down until
gender appears.
þ Highlight gender in the left hand box by clicking on it
þ Click on the button to move gender into the Variable(s) box
ý (Note how the arrow button changes direction and the cursor moves to the Variable(s) box. This is to allow
you to reverse your decision if you wish.)
þ Click on OK
You will now see a series of tables displayed in the SPSS Output Viewer. Note how SPSS first tells you if there
are any missing cases. For this variable there is one missing case.
þ Use the and arrows to scroll down and across to view the actual frequencies table. Note how
SPSS lets you know if there are any missing cases and calculates the valid percent appropriately.
þ Repeat this process using Analyze, Descriptive Statistics, Frequencies for at least five other variables of
your choice. You can do this by pointing and clicking on the menu commands which are visible at the top of
your screen.
Whilst you are doing this explore the effect of the button on your
output.
To remove the variables from the right Variable(s) box within the dialogue box either:
click on the button
or highlight the variable in the right Variable(s) box and click on the button
þ To quit this analysis (for example if you make a mistake) click on the button
You may (or may not!) have noticed that each of the tasks you have performed in SPSS have been automatically
appended to the SPSS Output Viewer. You can see this by scrolling through your output window using the up and
down arrows on the right of the window.
You can edit the SPSS Viewer and save it, or parts of it, to a file which can subsequently be read into a word
processor. Alternatively you can print it out directly.
Help 5.2: to save the contents of the SPSS Output Viewer to a file
Exercise 6: to calculate the arithmetic mean (average) and the standard deviation
We can therefore see that the mean journey to work is 11.45 km.
End of exercise 6
Obviously calculating a mean makes sense here as we are working out the average distance. However we have to
be careful....
If we wanted we could calculate the mean gender in the same way! SPSS would take the codes for male (1) and
female (2), add them all up and divide by the number of observations. It is therefore important that you decide
what statistic makes sense for the type of data (section 2.2).
Other statistics for the average are more appropriate in this case - the mode (the one that occurs most often).
þ Click on the button at the bottom and select Mode in the Central Tendency dialogue box by
clicking on it
A ü will appear in the box when it is selected (see below):
þ Repeat this process by calculating the most appropriate average for the following variables:
educate prevemp
seg class
salary three others of your choice
Your choices for the most appropriate average are:· mean: normally known as the average of the data values·
median: the mid point once all the data values have been ranked
· mode: the data value that occurs most often
þ Click on the Bar Chart(s) radio button and then on the Continue button
þ At the Frequencies dialogue box click on OK
ý The SPSS Output Viewer should now contain your bar chart.
ý Notice that missing data are automatically excluded from the chart. Notice also that you are presented with a
different menu bar which allows you to Edit the current chart and other options such as Delete.
ý To the left of your bar chart are a series of icons. These provide an index to your output that is in the SPSS
Output Viewer.
þ Now practice your charting skills by creating another bar chart for the variable educational attainment but
with the vertical axis displaying percentages rather than frequencies.
One of the most useful features of SPSS is its ability to create crosstabulations of one variable against another.
You will now create a table of the variable educational attainment by the variable gender. You want your table
to look like this:
male female
postgrad plus
up to degree
up to HNC/D
up to A'level
up to O'level (GCSE C+)
up to CSE (GCSE D-)
No quals.
To do this: þ minimise the SPSS Output Viewer
þ click on Analyze, Descriptive Statistics, Crosstabs
ý this gives the dialogue box:
þ Select the Row(s) variable educational attainment and the Column(s) variable gender using the same
principles as when selecting frequencies (hint: click on the variable and use the appropriate right arrow)
þ Once a row and a column variable have been selected you will be able to click on OK
ý Your table will appear in the SPSS Output Viewer.
The key elements of your output are in the row titled Pearson Chi-Square and the associated footnote.
The chi square statistic is (the value for Pearson Chi Square), in this case 52.529 with 6 degrees of freedom (df).
This is highly significant .000. The footnote states that no cells have an expected count of less than 5 and that the
minimum expected frequency for each cell in the table is 7.86. This means the assumptions of the Chi Square
test are satisfied.
Exercise 11: to add row and column percents to a table using crosstabs
þ Use SPSS to create further new tables from pairs of variables of your choice.
(Hint: it would be sensible to use variables which contain categorical data rather that quantifiable data -see
section 2.2)
In this exercise you are going to create a new variable educnew from the variable educational attainment by
recoding the values as follows:
This will split educational attainment into those educated up to A level (code 4) and those educated above A level
(code 1).
þ minimise the SPSS Output Viewer
þ click on Transform, Recode, Into Different Variables
this gives the dialogue box:
þ Click on Educational attainment in the variable list on the left hand side, it will be highlighted
þ Click on the right arrow to transfer the variable into the Numeric VariableàOutput Variable box
þ Click on the Name box to the right of change in the Output Variable box and type in the new variable name
educnew and a new variable label College Education? in the Label box below
þ Click on the button
ý Notice that the new variable label appears in the Numeric VariableàOutput Variable box
þ Click on the button and a new dialogue box opens:
þ Click on the top Range radio button in the Old Value dialogue box and type 1 in the first box and 3 in the
second box
þ Click on the Value radio button in the New Value dialogue box and type 1 in the box to the right
þ Click on Add
ý Note that the recode has been added into the OldàNew dialogue box:
þ Click on Continue, OK
The new variable will be created and you will be returned to the Data View.
þ Now use the procedures outlined in exercise 5 to produce a frequency distribution for your new variable
WARNING: It is possible to recode a variable into the same variable, however doing this deletes the original
values for the variable. If you decide to do this make a security copy (save on to a different disc) of your data
first.
In this exercise you are going to create a new variable age from the variable born by subtracting the variable born
from the year, in this instance 95. Remember year of birth was only coded as the years in the last century and so
we do not include the 19.
age = 95 - born
þ type in the name of the variable you want to compute age in the Target Variable box
þ click on button and label the variable as appropriate, e.g. "age in years"
þ click on continue
þ point to and click on the number 9 followed by 5 on the number pad in the dialogue box
þ point to and click on the arithmetic operator in the dialogue box
þ click on the variable name Year of Birth [born] in the list of variable names on the list and click on the right
arrow
ý Check that the following expression has appeared in the Numeric Expression box:
þ If you make a complete mess of it click on the Reset button at the bottom and start again
þ Click on OK
This is only a very simple compute and it is possible to do far more complex calculations. In some cases it is
better to write them down prior to typing them in!
In this exercise you are going to select a subset of your data: all female employees.
þ Click on the If button, the Select Cases: If dialogue box will appear
þ Click on gender in the variable list on the left hand side, it will be highlighted
þ Click on the right arrow to transfer the variable into the box on the right
þ point to and click on the operator = in the dialogue box
þ point to and click on the number 2 on the number pad in the dialogue box
ý Check that the expression gender = 2 has appeared in the box:
þ Click on
ý Check that the Unselected Cases Are Filtered radio button is on (black):
Filtered means that you will not be deleting the rest of your data, in this case all the males!
þ Click on OK
þ Undertake an analysis of your choice using just the data for females
Exercise 15: to return to the full data set after using a selection (Select Cases)
þ click on OK
For the purposes of this class, in each case click on the button as this means the data file will
be intact for the next person and the floppy disk will not be filled with your output!
ý Make sure that you have loaded SPSS and that the data file has already been opened.
þ Click on the cell that contains the data value
þ Enter the new value (this will replace the old value)
þ Press ¿Return (the new value appears in the cell)
Help 17.2 to delete all data values for a variable (or case)
Hint: before deleting an entire variable (or case) it is worth saving the data file (exercise 10) in case you
make a mistake.
þ Highlight all the values for that variable (or row) but not the variable name
Hint: If you make a mistake you can rectify it immediately afterwards by clicking on Edit, Undo Clear
þ Press the delete key (the variable (or case) will be deleted)
Hint: If you make a mistake you can rectify it immediately afterwards by clicking on Edit, Undo Clear
Help 18.1: to test for a significant relationship between two variables (correlation)
ý Make sure you have loaded SPSS and that the data file has already been opened.
þ Click on Analyze, Correlate, Bivariate
þ Point to and click on the first variable for you wish to obtain a correlation coefficient with another variable
þ Click on to transfer the variable into the Variables box
þ Repeat this procedure for the other variable(s) you wish to correlate with the first variable
þ Choose the most appropriate Correlation Coefficient for your data and make sure a ü is in the box
þ Choose the most appropriate Test of Significance and click on the radio button (Use a Two-tailed test when
the direction of the relationship, positive or negative, can not specified in advance. Use a One-tailed test
when it can be specified in advance).
ý Make sure there is an ü in the box Flag significant correlations. This will ensure the significance level is
displayed.
þ Click on OK
ý The results of the correlation will appear in the Output Window.
In the output window below the variables salary and born from the teach.sav data set have been correlated:
SPSS has produced a correlation matrix. Obviously there is a perfect correlation (1.0000) between the variable
initial annual salary and itself. As the variable is correlated with itself it is impossible to calculate the
significance (p= . ). There is no correlation (-0.011) between the variable salary and the variable born and this
lack of correlation is significant at the 0.841 (p=0.841) level.
Help 18.2: to test for a significant causal relationship between one dependent and one or more independent
variables (linear regression)
ý Make sure you have loaded SPSS and that the data file has already been opened.
ý Check that your data are appropriate for regression analysis
þ Click on Analyze, Regression, Linear
þ Point to and click on the dependent variable which you wish to predict using another variable or variables
þ Click on to transfer the variable into the Dependent box
þ Repeat this procedure to transfer the independent variable(s) you wish to use to predict the dependent variable
into the Independent(s) box
þ Click on OK
WARNING: Interpreting the regression output is comparatively complicated. You need to understand the
regression coefficient (r2) and the regression equation (y = a + bx). The SPSS manual (Norusis / SPSS 1992)
explains these in some detail. A simpler explanation of regression with one independent variable is provided in
Section 2, Unit 18 of Saunders and Cooper (1993).
Given the introductory nature of this hand out, and the need for a reasonable statistical knowledge to make
informed decisions about the use of statistical tests the procedures for other statistical tests are not discussed.
5 Further Reading
The most useful book on SPSS in my opinion is:
Norusis MJ (1998) SPSS 8.0 Guide to Data Analysis London, Prentice Hall
Unlike many computer manuals this is both readable and easy to use! It also contains advice regarding when to
use different statistical tests. However at the time of writing the update of the book for version 10 has yet to be
written.
Bryman A and Cramer D (1999) Quantitative data analysis with SPSS Release 8 for Windows: for Social
Scientists London, Routledge.
This book offers a formula free non-technical approach to using SPSS. It assumes no familiarity with the data
analysis software and covers both inputting data and how to generate and interpret a wide range of tables,
diagrams and statistics.
A reasonably straight forward book on collecting your data and preparing it for quantitative analysis is:
nd
Saunders MNK, Lewis P and Thornhill A (2000) Research Methods for Business Students (2 edition), London:
Financial Times Prentice Hall, Chapters 10 and 11
Appendix 1: List of Variables and their codes for data set TEACH.SAV
Variable names are in capitals with the 3.10 Personnel Officer
variable label on the same line. Codes and 3.20 Work Study Officer
value labels appear on subsequent lines. 4.20 Computer Programmer
5.20 Advertising Executive
GENDER Gender of Employee 6.10 E.H. Officer
1 Male 6.20 Building Inspector
2 Female 8.00 Admin Executive
9.50 Legal Executive
BORN Year of Birth 9.80 Curator (Museum)
19-- year 13.10 Warden (O.A.P.)
13.20 Play Group Leader
MARITAL Marital Status 13.30 Welfare Occupations n.e.c.
1 Single 18.20 Vet
2 Married 22.20 Projectionist
3 Widowed 25.00 Municipal Engineer
4 Divorced 31.10 Architect / Town Planner
31.20 Quantity Surveyor
EDUCATE Educational Attainment 31.30 Building Surveyor
1 Postgraduate Study 33.10 Architect / Town Planner
2 Up to Degree Level Technician
3 Up to H.N.C. / D or Diploma 33.20 Building / Engineering
4 Up to A level Technician
5 Up to O level 33.40 Works Manager
6 Up to C.S.E. 35.10 Maintenance Supervisor
7 No educational qualifications 35.20 Clerk of Works
stated 36.20 Transport Manager
36.30 Stores Controller
PROFMEMB Professional Body Membership 37.20 Office Manager
1 Member of Professional Body 39.50 Entertainments / Sports
2 Not a member of a Professional Manager
Body 44.10 Caravan Site Manager
44.40 Managers n.e.c.
PREVEMP Nature of Previous Employment 45.20 Supervisor Stores Clerks
1 Local Government 45.30 Supervisor Drawing Assistants
3 Outside Local Government 45.40 Supervisor Clerks
4 Student 45.50 Supervisor Cashiers
5 Unemployed 46.10 Clerk Stores
6 Self Employed 46.20 Tracer Assistant
7 Youth Training Scheme 46.30 Clerk Non-retail
8 Retired 47.00 Cashier Retail
48.20 Supervisor Machine Operators
PREVEAST Town of previous employment 49.10 Receptionist
Eastings 49.20 Typist
50.00 Punch Card Operator
PREVNOR Town of previous employment 51.10 Telephone Receptionist
Northings 51.20 Switch Board Operator
56.00 Meals on Wheels Operator
APPLEAST Home Town when applied 60.40 Estate Ranger
Eastings 60.60 Supervisor Security
62.10 Park Keeper
APPLNOR Home Town when Applied 62.30 Art Gallary Attendant
Northings 63.30 Supervisor Bar
63.40 Supervisor Catering
OCCUPAT Occupation (O.P.C.S. 1980 65.20 Bar person
Classification) 66.10 Counter Hand
1.00 Solicitor 71.10 Supervisor Caretakers
2.10 Auditor 71.40 Supervisor Car Parks
2.20 Accountant 72.10 Caretaker
2.50 Valuer 72.20 Cleaner