Introduction To SPSS (Version 18) For Windows: Practical Workbook
Introduction To SPSS (Version 18) For Windows: Practical Workbook
Introduction To SPSS (Version 18) For Windows: Practical Workbook
Practical workbook
Document information
Course files
This document and any associated practice files (if needed) are available on the web. To find these, go to www.bristol.ac.uk/is/learning/resources and in the Keyword box, type the document code given in brackets at the top of this page.
Related documentation
Other related documents are available from the web at: http://www.bristol.ac.uk/is/learning/resources There is no introduction to STATA documentation produced by Information Services, but there are short courses on STATA available within the University of Bristol provided by the Department of Social Medicine.
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales Licence (http://creativecommons.org/licences/by-nc-sa/2.0/uk/). Its original author is the University of Bristol which should be acknowledged as such in any derivative work.
Introduction to SPSS (version 18) for Windows (October 2010) 2010 University of Bristol. All rights reserved.
Introduction
SPSS provides facilities for analysing and displaying information using a variety of techniques. This document uses version 18 of SPSS for Windows.
Prerequisites
Basic familiarity with Windows and at least an elementary knowledge of simple statistics such as t-tests, chi squared tests, p values and confidence intervals would be useful (statistical theory is not taught on this course).
Contents
Document information Task 1 Task 2 Task 3 Task 4 Task 5 Practical exercise in Data Preparation ...................................................2 Starting SPSS ...........................................................................................4 Example dataset .......................................................................................5 The Data Editor .........................................................................................8 Naming and defining variables ...............................................................9 Variable names ..................................................................................9 Variable types ..................................................................................10 Variable width and decimal places ................................................12 Variable labels .................................................................................12 Value labels ......................................................................................13 Missing values .................................................................................14 Data display .....................................................................................15 Measurement scale of variables ....................................................15 Role of variables ..............................................................................16 Entering data ...........................................................................................18 Help system ............................................................................................21 Frequency tables - the frequencies procedure ....................................22 Saving files in SPSS ...............................................................................25 Saving an SPSS data file ................................................................25 Saving an SPSS output file .............................................................25
Task 10 Leaving SPSS .........................................................................................27 Task 11 Opening a file ..........................................................................................28 Task 12 Controlling your output .........................................................................29 Task 13 Procedure commands - Frequencies ...................................................31 Task 14 Using summary statistics for continuous variables the Descriptives procedure ................................................................................................................33 Task 15 Producing a bar chart from frequencies ..............................................35 Task 16 Displaying histograms ...........................................................................38 Task 17 Crosstabulation ......................................................................................40 Adding cell percents and the chi-square statistic ........................40 Task 18 Clustered bar chart .................................................................................42 Task 19 Analysing data in subgroups - Split File ..............................................44 Task 20 Excluding observations Select cases ...............................................46 Task 21 Modifying variables ................................................................................48 Recoding values into a new variable .............................................48 Computing new variables ...............................................................50
Using the IF Statement to Compute New Variables .....................51 Task 22 Working with Dates in SPSS .................................................................53 Task 23 Correlations ............................................................................................54 Task 24 Creating charts - drawing a scatter plot ...............................................56 Task 25 Saving an updated copy of the data .....................................................58 Task 26 Getting SPSS to read data from other spreadsheet formats e.g. Excel59 Task 27 Saving output from SPSS into word processor documents e.g. Microsoft Word .......................................................................................................62 Task 28 Analysis Box plot ................................................................................64 Task 29 Analysis - T-test ......................................................................................67 Task 30 Analysis Non-parametric tests ..............................................................69 Task 31 Analysis Other .....................................................................................71 Graphs ..............................................................................................71 Appendix A References ..............................................................................................72
Content
SPSS can read dates in Excel but only if Excel knows they are dates check this is the case using the format cell feature in Excel. If you are entering numbers such as weight, make sure all the values are numbers. Remove ?, n/a, <10, etc. If the value is missing, either code with a missing value such as 9, or 99 etc, or leave with out entering anything. Do not enter a space or a fullstop. If you are entering text such as yes or no, check that you have always spelt this in the same way. ie. Y will be seen by SPSS as different from y and different from Y and YES and yes and Yes. Do not merge cells. Do not sort data without taking steps to ensure that all the data is sorted together. Avoid putting totals, means, counts or graphs on the same spreadsheet
Suppose this was a data set that your research assistant had prepared for you. Can you convert this into a usable SPSS data set?
Think about how you need to alter the structure and content of the data using the rules in section 1(3).
The data set that you produce should look something like this:
This has been achieved using copy and past special (transpose) and by removing text from the columns and giving the columns easy to read unique labels. The problem with two sections of data, one recorded on one day and the other the next day has been sorted out by using a column labelled day. There is still one problem of a data point which was labelled less than 10. If you can find out what the threshold value of the assay is, you can sometimes substitute half the threshold value, or ask them to conduct the assay again so that the true value can be ascertained. Less than 10 is not a number SPSS will analyse.
Select Cancel or Type in data to this query window. The query window then closes. Tips When you have been using SPSS at your own PC in the recent past, it will give you the option of opening up the last few files that you accessed in SPSS just like the list available in most Microsoft packages
3.1
Imagine that you interviewed some people on their smoking habits using the questions shown below: Reference number Smoking Questionnaire 1. How old were you on your last birthday? 2. Indicate your sex 3. Do you smoke at all? 4. Do you smoke cigarettes? 5. On average how many cigarettes a day would you say that you smoked? 6. Do you smoke a pipe? 7. Do you smoke cigars? Yes Yes
1
No No No
9. Tell me what you think on each of the following three statements:Tax on tobacco is too high strongly disagree
1
disagree
agree
strongly agree
disagree
agree
strongly agree
disagree
agree
strongly agree / / / /
From 10 completed questionnaires it is possible to create a dataset like the one shown below. This dataset has 10 observations and 12 variables. The data is in fixed column format; each measurement forms a column and the values in each column make up a variable. Note that blanks indicate missing values. Each of the items recorded Age, Sex and so on - are data values. For Sex, M and F are used to denote male and female, rather than using numeric codes. All the information about a single person makes up one observation.
Ref Number Age Sex Smoke Smoke Cigs How many Pipe Cigars Give Up Tax Health Cine ma Date survey Date submit
1 2 3 4 5 6 7 8 9 10
27 31 35 58 56 25 41 38 43 29
F M M M M F F F F M
1 2 2 2 2 1 1 1 1 1
10
3 4 4 3 4
3 2 1 1 3 4 1 4 2 4
3 1 1 2 2 4 3 4 2 4
02/11/1995 16/11/1995 01/12/1995 17/11/1995 10/11/1995 02/12/1995 17/11/1995 02/12/1995 16/11/1995 02/12/1995
25/09/1995 04/09/1995 21/07/1995 05/09/1995 30/09/1995 10/08/1995 24/08/1995 30/08/1995 01/08/1995 26/09/1995
1 1 1 2 1
20 30 999
2 2 2 2
2 1 2 1 2
2 1 1 1 2
3 3 4 4 2
40
From this dataset you can see that the reference number of the first subject is 1, she is 27 and is female. She smokes, and smokes cigarettes, on average 10 per day. She doesn't smoke a pipe or cigars and has tried to give up. She agrees that tax on tobacco is too high and that smoking is dangerous to your health and that smoking should be banned from cinemas. Later you will be asked to enter these values but first you are asked to consider how variables are defined and what attributes they can have.
Each variable needs to be given a variable name that is used in describing the variable to SPSS. Table 2 lists the names that are to be used in the example and specifies the order in which they are to be given to SPSS. It also suggests suitable labels that can later be associated with the variables to clarify output.
ref_no age sex smoker cigs num_cigs pipe cigars give_up tax danger cinemas Date_survey Date_submit
Reference number Age last birthday Sex of respondent Do you smoke? Do you smoke cigarettes? How many cigarettes per day? Do you smoke a pipe? Do you smoke cigars? Have you tried to give up smoking? Do you think tax on tobacco is too high? Do you think smoking is dangerous to your health? Do you think smoking should be allowed in cinemas? Date the survey was completed Date the subject submitted their thesis
You should now have all the information necessary to start creating the example dataset in SPSS. First you will use the Variable View of the Data Editor window to specify the variable names and their attributes. You should note that this data was collected in 1995 before any of the present day legislation on smoking in public had been introduced.
Note
This window already has a defined structure. There are eleven columns headed: Name, Type, Width, Decimals, Label, Values, Missing, Columns, Align, Measure and Role. Each of these headings is used to indicate some facet of the definition of each variable. Their use is described as we proceed to develop our sample dataset.
Variable names
The rules for names are: the name must begin with a letter. The remaining characters can be any letter, any digit, a period, or the symbols @, #, _, or $; variable names cannot end with a period variable names that end with an underscore should be avoided the length of the name cannot exceed 64 bytes. Sixty-four bytes typically means 64 characters in single-byte languages (eg, English, French, German) and 32 characters in double-byte languages (eg, Japanese, Chinese, Korean). spaces and special characters (eg !, ?, ', and *) cannot be used; each variable name must be unique; duplication is not allowed; 9
if SPSS comes across a duplication when reading a data set prepared in another package it will create a new variable name such as var0004; the underscore character is frequently used where a space is desired in names; SPSS may miss out spaces when reading in names from another package such as Excel, ie age group would become agegroup; Reserved keywords cannot be used as variable names these are ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO and WITH; A mixture of uppercase and lowercase characters, and case is preserved for display. 5.2 Point and click on the cell in row 1 and column 1. Type ref_no in this cell. Use the down arrow to move to row 2 column 1. Type age in this cell. Use the down arrow to move to row 3 column 1. Continue with this process until all 14 variable names from Table 2 are entered in this first column. Your screen should then look like Figure 5.
Note
When the down arrow is pressed, the 10 columns to the right of the first column fill with values (default values). We can examine these ten columns and if necessary change values in specific cells to reflect properties of our example dataset.
Variable types
The Type column is showing Numeric for all rows. This means that numeric (number) values will be expected in the dataset relating to these variables. This is correct for all the variables except sex where we have collected data in the form of F for female and M for male, and age where we would like SPSS to record this as a date. 10
5.3
To enter the data as M or F, change the variable type: Point and click on the grey box on the right of the cell giving Type for the variable sex.
A dropdown menu appears offering eight data types (Figure 6). The common items are Numeric, Date, Custom currency and String. For a full description of each of the variable types, click on the Help button. 5.4 To enter alpha characters into the variable sex, select data type String by clicking on its selection button. Notice that a string of 8 characters is the default. Whilst this would cause no problem, it is more efficient to reduce it to the actual number of characters you are going to input. Therefore change the default of 8 in the box to 1. Click OK to return to the Variable View window. Click on the variable type options box for the 2 dates at the bottom of the list. Choose Date for these variables. There are a large number of different ways in which the date can be written. The top option dd-mmm-yyyy is probably the safest option, as it is least ambiguous, but if you are using SPSS to enter the data you might consider choosing dd.mm.yyyy as the easiest one for entering data. You can switch back to dd-mmm-yyyy after the data has been entered.
11
Variable labels
The next column is headed Label and is used to inform SPSS about the details associated with each variable name. The maximum length of any label is 255 12
characters and there are no restrictions on what may appear. Spaces are entered just as typed. If you want to specify where a new line appears in a label, type \n within the text and SPSS will wrap the label at this point. 5.6 Moving to the first row, fifth column, click on the cell and type in the words: Reference number. The width of the column can be expanded to allow for the number of characters in the label. To do this, place the cursor on the divide between Label and Values in the table headings, where it will change to a two headed horizontal arrow, and then drag to the right as required. Move down to row 2, column 5 and type Age last birthday. Continue entering the labels for all the other variables as given in Table 2. To correct any existing labels, double-click on the entry and edit as you would in a word processor. Alternatively use copy and paste to enter all the labels from a word table in one go.
Value labels
5.7 The next task is to enter Value Labels for each variable if appropriate. These will appear in the Values column. For our first two variables (ref_no and age) there are no Value Labels, so the default entry of None can remain. For sex you can indicate that M is male and F is female.
Move to row 3 column 6 and click on the grey box on the right of the cell. A dropdown menu appears so you can provide Value Label information (Figure 8). In the box by the word Value type F. In the box by the word Label type Female. Click on Add and watch the value and its label move to the bottom box. In the box by the word Value now type M and the word Male in the Label box. Click on Add. Now that all the Value Labels for this variable are complete (your screen should look like Figure 9), click on OK to return to the Variable View window.
13
Note 5.8
The first part of the Value Label that you entered appears in the appropriate cell.
The next variable needing a Value Label is smoker: Click on row 4 column 6 and its grey box. Enter the value 1 and label Yes in the dropdown menu box and click Add. Enter the Value 2 and Label No and click on Add. Click on OK.
5.9
The variables cigs, pipe, cigars and give_up all need the same Value Labels as smoker. Either you can repeat the above instructions for each variable in turn or take advantage of a useful shortcut: Click in the cell containing the values for smoker. Copy this cell by using either Edit / Copy from the main menu or by clicking the right mouse button and selecting Copy, or by pressing CTRL + C. Point at the cell for Value for variable cigs and paste the current clip board using either Edit / Paste or use the right mouse button and choose Paste, or use CTRL + V. Repeat this process for pipe, cigars and giveup values.
5.10
Finally for the variables tax, danger and cinemas, you will need to provide four value labels for each. Use the basic method to enter this information for the variable tax: Value 1 has the label Strongly disagree. Value 2 has the label Disagree. Value 3 has the label Agree. Value 4 has the label Strongly agree. Now copy and paste for the other two variables danger and cinemas.
Missing values
The next column of the Variable View sheet is Missing Values. In the statistical analysis of any dataset it is sometimes necessary to exclude cases where the information is not known or not appropriate. An example of this occurs in the variable num_cigs in this dataset. The information is missing in two situations; for 14
non-smokers it is not appropriate and in the data the appropriate cell has been left blank; for one interview, ref_no 8, the respondent failed to give an answer to this question (no he did not smoke 999 cigarettes per day!) A number that could not be expected as a genuine response is selected to represent this circumstance. However in any analysis, it should not be considered as it would seriously distort many statistical procedures. Within SPSS there are two types of missing value - system-missing values and user-defined missing values. By default, for non-string variables, an empty cell is defined as a system missing value and does not need to be further declared. For user-defined missing values this column of the Variable View has to be used.
Click on row 6, column 7 and then the small grey box at the right of the cell to produce the Missing Values dropdown menu (Figure 10). Select the Discrete missing values button and enter 999 in the first cell. Click on OK and return to the Variable View window. Notice that the value 999 now appears in the missing column for the variable num_cigs. If entering missing values for string variables, the required discrete value should be entered as characters e.g. X to represent the letter X, a space to represent an empty cell.
Data display
The next two columns (Columns and Align) are concerned with the display of data in the Data View window. For the purposes of this example dataset, the default values of a column 8 characters wide and the values right aligned for numeric variables and left aligned for string variables are fine. When you have entered your data as instructed below, return to the Variable View window and change one or more of these values. Then flip to the Data View window and see the effect your choice has made.
15
to represent a numeric variable that can take discrete or continuous values along a range to represent values that, although numeric, only represent an ordered listing of such values to represent values that are simply names
You should be able to recognise that in the example dataset, there are: 6 nominal measures - sex, smoker, cigs, pipe, cigars and give up. 3 ordinal measures - tax, danger and cinemas. 3 scale measures - ref_no, age and num_cigs (ref_num could be nominal, ordinal or scale!) Plus 2 date variables which are considered by SPSS to be scale variables and can be used as such. 5.11 Starting with ref_no in row 1, column 10, click on the cell and choose the appropriate measure. (You should not have to change this from the default). Move down the column making the appropriate choice in each case. The first change is for the variable smoker, which will need to be changed to nominal. You have now defined all the information that SPSS needs to know about the characteristics of your specific dataset. Your variable view pane should look like Figure 11.
Role of variables
The final column is concerned with role your variable is going to take in the analysis. This is a new column for version 18. In statistics certain procedures are only appropriate for certain types of variable. It is not unusual for a research study to use one variable as the outcome for one set of analyses and as the confounder in another set of analyses, which means that using this column may mean rechecking these classifications before each analysis. The roles recognised by SPSS are as follows: Input this is variable can be used as an independent predictor. Target this is the outcome of the analysis Both this can be either target or input None no role assigned Partition this variable can be used to partition the data, such as a variable which defines a test or training data set. Split This is included for compatibility with other PASW programmes There are very few procedures in version 18 which require the role to be defined. We will leave all variables with the default role of input.
16
You may return to the Variable View window at any time if further changes are needed.
17
1 2 3 4 5 6 7 8 9 10
27 31 35 58 56 25 41 38 43 29
F M M M M F F F F M
1 2 2 2 2 1 1 1 1 1
10
3 4 4 3 4
3 2 1 1 3 4 1 4 2 4
3 1 1 2 2 4 3 4 2 4
02-Nov-1995 16-Nov-1995 01-Dec-1995 17-Nov-1995 10-Nov-1995 02-Dec-1995 17-Nov-1995 02-Dec-1995 16-Nov-1995 02-Dec-1995
25-Sep-1995 04-Sep-1995 21-Jul-1995 05-Sep-1995 30-Sep-1995 10-Aug-1995 24-Aug-1995 30-Aug-1995 01-Aug-1995 26-Sep-1995
1 1 1 2 1
20 30 999
2 2 2 2
2 1 2 1 2
2 1 1 1 2
3 3 4 4 2
40
6.1
Click on the Data View tab of the Data Editor Window To enter the first persons data, click the first cell of ref_no. Type 1. Press the TAB key or right arrow once and the heavy outline moves to the next column. Type in 27 and press the TAB key. Type in F and press the TAB key. Type in 1 and press the TAB key. Follow the same procedure along the first row until all twelve data values are entered.
6.2
Move back to row 2, column 1 and start to enter the values for interview 2. Press the TAB key twice to skip over a column. Notice that a dot appears in the cell. This is the systemmissing value (see Figure 12).
18
Figure 12 - Data Editor window with the first and part of the second interview entered
6.3
Continue the process until the data from all ten interviews are entered. Your Data Editor screen should now look like Figure below. Some people find it easier to enter data by column rather than by row. The method is similar except that you use the down arrow key instead of the TAB key. The HOME and END keys take the cursor to the first or last column of a particular case. CTRL + HOME will take you to row 1, column 1, and CTRL + END to the last used cell.
19
Figure 13 - the data editor window with all interview data entered
20
There are also a series of 14 tutorials available from the Help menu. Some of these tutorials are directed at a more commercial or financial perspective rather than an academic use, but they can be used to familiarise yourself with the way SPSS works. They are not interactive tutorials.
21
Figure 15 - the window with drop down menus from the Analyze command
22
Select Sex of respondent. Click the right pointing arrow head () to move sex into the Variables box (see Figure 16). Click OK. A frequency table is produced. Note that tables, statistics and charts are displayed in the SPSS Viewer window a completely different window from the Data Editor (Figure 17).
The SPSS Viewer window has two panes, each with its own bottom and right scroll bars. The left pane contains an outline (index) of the results so far. The right pane contains the detailed tables, graphs and text output. Clicking on the + and symbols in the left pane controls what output is displayed in the right pane. Clicking on the other section names in the left pane moves the focus of the right pane display so it starts the display at the selected item. You can use the Window command to select which window you want to be in at any particular time. Check that you can see the command lines above the output:
23
8.2
Click Window. Note that there are currently 2 types of window: Untitled [Dataset0] - SPSS Data Editor Output1 [Document 1] - SPSS Viewer Click SPSS Data Editor.
24
SPSS saves the file as smoking.sav in the specified directory. Normal Windows rules apply to file names.
Click Save.
The saved output from SPSS appears in the file smoking.spv. This is the output produced by the various procedures, eg tables, means, plots. It does NOT contain all the values of the dataset variables nor all the label information so cannot be used like an SPSS data file for future analysis purposes. It is known as the spool file. Note This file is not a text file - you must inspect such files by opening them via the File option in the SPSS Viewer window. To get other formats eg, text, you must select the desired items in the Viewer outline and use the File Export option (see Task 27 at the end of this document).
26
27
28
Select the General tab (it may be displayed automatically). At the top left is the Variable Lists section. These options specify how the list of variables offered in all SPSS procedures is composed. If you prefer your variables to be listed in alphabetical order, as opposed to entry (file) order, press the appropriate button. Also some users prefer to have the variable name and not its label displayed. In the Notification panel of the Output section, you will see a Scroll to new output option. You are advised to always have this ticked. If you make any changes to this General tab, click the Apply button before moving on. You may be advised that certain changes cannot be applied until you start SPSS again. Click on OK but remember when you have made all your changes you may need to leave SPSS and restart it from the SPSS 18.0 command. 12.2 Select the Viewer tab (Figure 22).
29
At the foot of the first column (Initial Output State) is a tick box labelled Display commands in the log. It is very helpful in any discussion of your output to have this command log available. If necessary tick the box by clicking on it. When you now run any SPSS command, your output file will include a representation of the choices you made from any drop-down menu. This takes the form of the commands generated in SPSSs own language. You can see this in Figure 23 on the following page where the first two lines in the right hand frame are the log generated by the particular request for a Frequencies count. This SPSS language is referred to as syntax and is a very powerful tool in the advanced use of the program. The right hand frames of this screen allow you to change the default fonts used by SPSS in the production of output. If you make any changes to this Viewer tab, click the Apply button before moving on. This practical does not intend to investigate any further tabs in the Options menu, so press OK to exit this section. (Remember to restart SPSS if you have been told it is necessary.)
30
31
32
Task 14 Using summary statistics for continuous variables the Descriptives procedure
With continuous variables, frequency tables are not always the best method of summarising. A better option would be to use a selection of summary statistics in place of a frequency table. Frequencies and crosstabulations are useful mainly for categorical variables, ie where the values represent categories such as male/female, nationality, class of university degree. However, variables like age and num_cigs have many values and for these continuous variables, statistics like the mean and standard deviation are sometimes useful. 14.1 In the Viewer or Data Editor window select Analyze. From the Analyze menu, select Descriptive Statistics. From the Descriptive Statistics submenu, select Descriptives. Select the variable Age last birthday [age] (see Figure 24).
Click the Options button in the Descriptives window. A number of statistics options which are appropriate for continuous variables are displayed (Figure 25).
33
Select Mean, Standard deviation, Minimum, Maximum and Range. Click Continue. Click OK. The required statistics are displayed in Figure 26. You may have to scroll down in the SPSS Viewer window to see this.
34
35
Click the Bar chart(s) option and click on Continue. Click Display Frequency Tables to suppress the display of the frequency table. Click OK. SPSS draws the chart and shows it in the Output Viewer window (Figure 29). (You may need to scroll down to see the complete chart.)
Figure 29 - SPSS viewer output produced using the Bar Chart option from the Frequencies procedure
To display the chart in a full screen window, double-click on the chart in the Viewer pane (this takes you into the SPSS Chart Editor window where the chart may be amended to suit personal preferences e.g. colour choice, labels, etc) (Figure 30).
36
To leave the Chart Editor window click File/Close. If the chart editor is left open, then the image of the chart in the Viewer window is displayed as crossed-out and cannot be displayed properly until the chart editor is closed.
37
38
Figure 31 - SPSS viewer output produced using the histogram option from the Frequencies procedure
39
Task 17 Crosstabulation
The data may be broken down further with crosstabulation (multi-way) tables, which show the joint distribution of two variables values. If you want to know how many women are smokers, the two numerical variables, sex and smoker are used in Crosstabs. 17.1 To get a crosstabulation: Select Analyze. Select Descriptive Statistics. Select Crosstabs. Select Do you smoke? [smoker] from the source variable list. Click adjacent to the Row(s) text box. From the source variable list select Sex of respondent [sex]. Click adjacent to the Column(s) text box. To see the crosstabulation click OK. SPSS produces a crosstabulation of smoker by sex. The cells of the table show the Counts. You should find that all five women are smokers and only one man is a smoker.
Modify the Crosstabs table to request statistics and include the options Row, Column, Total and Expected as follows: Click the Cells button. Select the additional options Expected, Row Percentages, Column Percentages and Total Percentages. Click Continue. The Statistics button in the Crosstabs window requests statistics. Click the Statistics button. Chi-square requests a Chi-Square ( 2) test of independence and a Fisher's Exact test when there are fewer than 20 cases in a 2 x 2 table. Select the Chi-square option. Click Continue. 40
Figure 32 - SPSS viewer output from Crosstabs procedure (with Chi-squared tests)
Suppose that you wish to test the hypothesis: 'Men and women are not equally likely to smoke' against the alternative: 'Men and women are equally likely to smoke'. Since 100% of the cells have expected counts of less than 5, the Chi-Square test is not valid. However, the Fisher's Exact test (p=0.048 - statistically quoted as p<0.05) which is also given, lends support to the hypothesis that women and men in this sample are not equally likely to smoke.
41
Figure 33 - the Define options window in the Clustered Bar Chart procedure
42
Click Cum. N in the Bars Represent panel. From the source variable list select Sex of respondent [sex]. Click to place sex underneath Category Axis. From the source variable list select Do you smoke? [smoker]. Click adjacent to Define Clusters by. Click OK. 18.2 Is the graph what you expected? Look carefully at the right hand cluster of bars. (It should NOT look like the graph below in Figure 34) Try re-running this procedure but this time in the Define section select N of cases in the Bars Represent panel. (You should now get a graph like Figure 34). See that you understand the difference between the two graphs you have produced using SPSS.
18.3
New for version 18 use options to include error bars on the bar chart There are three different options for error bars if you are not sure which to use please ask a statistician, all are valid options.
43
19.1
Select Data. Select Split File (see Figure 35). Select Organise output by groups. Move Sex of respondent [sex] to the Groups Based on box. Click OK. Note Split File sorts your data. In the Data Editor, observations are no longer in Ref_no order.
19.2
To see how Split File processing works, from the Analyze menu, select the Descriptive Statistics option to run Descriptives for Age last birthday [age]. Click OK. In the Viewer window, you should find that the mean age for women is 34.8 years and the mean age for men is 41.8 years. All subsequent analyses are done separately for each group.
19.3
To turn off the Split File option: Select Data. From the Data menu, click Split File. Click Analyze all cases. 44
45
20.1
Select Data. Select Select cases (see Figure 36). Click on If condition is satisfied Choose Filter out unselected cases Choosing the option delete unselected cases will permanently delete the other cases and is not recommended! Click on the If under the words If condition is satisfied Choose Do you smoke from the list of variables Click the arrow to put the variable in the top right box The word smoker should appear in the window Type =1 after the word smoker (see Figure 37) Click on Continue. Click on OK
46
20.2
To see how Select Cases processing works, from the Analyze menu, select the Descriptive Statistics option to run Frequencies for the two smoking variables Do you smoke cigarettes and have you tried to give up smoking. Click OK. In the Viewer window, you should find that 83% of the smokers smoke cigarettes and two thirds of them have tried to give up smoking. All subsequent analyses are done separately for each group.
20.3
To turn off the Select Cases option: Select Data. From the Data menu, click Select Cases. Choose All Cases Click on OK.
47
Once you have decided which variables to recode you need to specify the old and new values. 21.3 Select the Old and New Values box (Figure 39). To create a young (to be called 1) and an old category (to be called 2), recode the values 0 through to 30 into the new value 1: 48
Click on the Range, LOWEST through button. Type 30 in the text box.
Click Value in the New Value area. Type 1 in the New Value text box. Click Add to place this specification in the Old-> New text box. Repeat to recode values (ages) 31 and above to new value 2, using Range, value through HIGHEST. Remember to click Add. Click Continue to return to the Recode into Different Variables window. Click OK to perform the recoding operation. SPSS adds the new variable agecat to the next empty slot in the Data Editor window. Look at the right-most column in the Data View pane of the window (Figure 40). Also look at the Variable View pane to see that agecat has been added as variable number 13. Whilst in the variable view panel you might like to add the value labels for this new variable agecat (see Task 5 if you have forgotten how).
Figure 40 - SPSS Data Editor window with extra variable agecat added
49
21.4
Use Frequencies to tabulate agecat and check the results. You should find that 3 subjects are aged 30 or less and 7 subjects are age 31 or more.
50
Next, in the Numeric Expression text box you need to give the instructions for calculating the variable, in this case taking the log of age. 21.6 Initially you need to find the appropriate function. Log is an arithmetic function: Click on Arithmetic in the Function group window. Scroll down and click on Ln in the Functions and Special Variables window. Click the up arrow next to the Function group box. Note A Function group must be highlighted as well as a Function otherwise the up arrow alongside the functions group does not operate.)
Select Age last birthday (age). Click to move age to replace the ? in the function definition. Now you have a compute statement: logage=LN(age). Click OK. SPSS creates the new variable and places it in the next free column in the Data Editor window. The newly created variable logage may now be used in SPSS procedures.
two or more variables. For example, you might want to identify the group of people who are smokers who strongly agree with the statement that smoking is dangerous to their health. The SPSS Data Editor window contains the Transform menu with the Compute option and the If box. 21.7 To create a new variable firstly choose a name for the variable, eg dangersmok. We are going to give those who smoke and think it dangerous the value 1 for dangersmok: Select Transform. Select Compute. Type dangersmok in the Target Variable box (Figure 42). Next, in the Numeric Expression text box you need to give the instructions for calculating the variable, in this case just type 1. 21.8 Now you need to go to the If box so that you can say who it is who are going to have the value of 1: Click on IF. Choose the lower option of Include if case satisfies condition: Put the variable smoke into the top window and type =1 and and then add the variable danger by clicking on Do you think smoking is dangerous?. Then type =4. The result should look like the figure bellow.
Note
This will only compute 1 for those people who have both the features smoking =1 and danger=4. This means that all those who do not have both these functions will have a missing value for this new variable. If you would like to give the other people a value such as zero, you have to go through this procedure a second time, this time choosing 0 as the value you are calculating, and also using the statement smoke=2 or danger ne 4 Here ne can be used by SPSS to mean not equal to, you can also use the button provided which has the symbol ~=. Dont forget to use the or option to join the conditions together in the second statement instead of and.
Click OK. SPSS creates the new variable and places it in the next free column in the Data Editor window. To retain the syntax open the compute window back up and press paste. 52
Important
Save this syntax using paste from the Figure 37 window you will need it for a future task 26 Check the variable days by using the frequency procedure and asking for a histogram from the charts options.
53
Task 23 Correlations
The correlations procedure calculates the (Pearson parametric) correlation between variables and is used to measure the strength of linear association between 2 variables. 23.1 To obtain the Pearson correlation coefficients of tax, danger and cinemas: Select Analyze. Select Correlate. Select Bivariate (see Figure 44).
Move the following to the Variables box: Do you think tax on tobacco too high? (tax) Do you think smoking is dangerous to your health? (danger) Do you think smoking should be allowed in cinemas? (cinemas) to the Variables box. Click Flag significant correlations to put a tick in the box. (It may already be ticked) Click OK. There is a significant correlation between danger and cinemas (0.746, p<0.05), which means that people who feel that smoking is dangerous also tend to think that smoking should not be allowed in cinemas. 54
Since the three variables in this correlation analysis are ordinal in measurement scale, it would have been statistically more proper to use Spearman's Rank correlation coefficients to measure the relationship. 23.2 Repeat the above analysis but this time select the Spearman correlation coefficient. Note that the same relationship remains significant, although its value is slightly lower than the Pearson correlation (Figure 45).
55
Click on Do you think smoking should be allowed in cinemas? [cinemas] in the source variable list. Click to place cinemas in the Y Axis text box. 56
Click on Do you think smoking is dangerous to health? [danger] in the source variable list. Click to place danger in the X Axis text box. Using a variable such as sex, we can also see if there is a difference between the men and women. Click Sex of respondent [sex[ in the source variable list. Click to place sex in the Set Markers by box. Click OK. This produces a plot of cinemas versus danger; the variable sex is used to mark each observation on the plot as male and female. Where duplicate values occur, this may not be an accurate representation. Have a look at the data points for the data in Figure 47.
57
58
Task 26 Getting SPSS to read data from other spreadsheet formats e.g. Excel
When working the Computer Centre training rooms you will notice that in the directory C:\Training\Stats there is a file called Large Smoking Data.xls. This is an Excel spreadsheet that has the same 12 variables as you have used so far in this workbook, but with many more cases than you have entered. 26.1 Browse to C:\Training\Stats Double click on the file name to enable Excel to open the file (Figure 48). You will notice that the first row of the Excel spreadsheet contains the names of the twelve variables exactly as you used them before. You will analyse this large file using SPSS. Close the Excel file at this point.
Figure 48 - the first part of the Excel data file Large Smoking Data.xls
26.2
To input the Excel file: Go back into SPSS 18.0 for Windows. Select Cancel to the query window. From the File menu select Open and then select Data. Ensure the directory in the Look in box is correct. If you are in one of the Computer Centre training rooms change the directory to C:\Training\Stats. The Files of type window will be showing SPSS. This needs to be changed to Excel by clicking on the down arrow at the right-hand end of the Files of Type box 59
and selecting Excel (*xls, *xlsx, *xlsm). The file name Large Smoking Data should now be visible (see Figure 49). Double click the file name.
Figure 49 - the SPSS Open File menu window showing an Excel file for input
The next menu confirms that the Excel file has been recognised. You should check that the box Read variable names from first row of data is ticked (Figure 50). Click Continue
You now have a data file in SPSS with all the additional cases. Unfortunately all the information you entered into the Variable View part of your small original SPSS file is missing. You could re-type this information as you did earlier, but you can get SPSS to import the original. 60
26.3
In the Data Editor window click on Data. Click on Copy Data Properties (see Figure 51).
You wish to use the properties of the file smoking.sav which you saved earlier in C;\Training\Stats. Select An external SPSS data file and either by direct entry or browsing insert into the file name box C:\Training\Stats\smoking.sav Click Finish. (If you were to click Next, you would have a number of alternatives offered that would allow you to control more finely what properties are copied) 26.4 Using procedures you mastered earlier, answer the following:- (keep your SPSS output for Task 24) How many females answered the questionnaire? (Clue Task 12). Show a histogram of respondents ages? (Clue Task 15). Are men and women equally likely to smoke? (Clue Task 16). What is Spearmans rank correlation between danger and cinemas? (Clue Task 20).
61
Task 27 Saving output from SPSS into word processor documents e.g. Microsoft Word
The output file produced by SPSS can only be read by the SPSS program. This is complicated further by the fact that versions from 15 onwards are unable to read the output from previous versions of SPSS. Most users would like to take share their results and discuss them with other members of their research team, many of whom may not have access to SPSS, and if they do not quite the same version. Sharing results usually means transferring all or parts of their SPSS output into a Microsoft package. We shall learn how this can be achieved by moving output to a Microsoft Word document. There are two ways of achieving this process. Lets first consider allowing SPSS to do most of the work. 27.1 Make sure you are in the SPSS Viewer window. From the File menu select Export.
In the Objects to Export box, ensure that All Visible objects is selected Under Document Type select Word/RTF file (*.doc) from the drop down menu. In the File Name box type C:\Training\Stats\wordoutput. Click OK. 27.2 Minimise SPSS Viewer and Data Editor windows and then browse to and open the document wordoutput.doc. You will see that all your original output has been transferred. 62
27.3
There is an alternative way of achieving the movement of small amounts of output where you have much more control. Keep your wordoutput document open but position your cursor at the end of the present file. Minimise the Word document and maximise SPSS viewer file. Point and click on any single table or chart in your output. You will notice that a rectangular box appears around the table. Click on Edit and select Copy. Maximise your Word file and click on Edit and Paste. This process will work for transferring all text, tabular or chart output from SPSS.
Note Note
Never copy and paste SPSS Graphs Always export graphs before placing them in your Power point presentations. Failure to do this can make them invisible in the presentation if you are using a computer which does not have SPSS.
63
Click on Define Select days and put it into the variable box Select Do you smoke (smoker) and put it into the Category Axis box (Figure 53).
64
Select OK You should have a box plot like the one below
65
Click on this graph with the left mouse button to select it Click on the graph with the right mouse button to be given the option of editing it or exporting it. Go back to the define boxplot popup window to be given the chance to paste this syntax into your syntax window so that you can redo the analysis again at any time.
66
The independent samples T test box should now look like this:
67
The value labelled Sig here is not the p value of the t-test. This is the value of the Levene's Test for Equality of Variances. The p-value for the t-test is labelled Sig (2-tailed). Can you find the mean difference and 95% confidence intervals? These are -5.58 with a 95% confidence interval of -8.72, -2.42 Can you tell which group has the lower mean number of days? What is the mean number of days for each group?
68
Click on OK The output should appear as below in Figure 59. Note the p value is referred to as Asymp Sig (2-tailed).
69
70
Graphs
If you cant find the graph you are looking for under Legacy dialogue, you can create your own using Chart Builder. It is often quite time consuming to do this properly and outside the scope of this course.
71
Appendix A References
One of the reasons for using SPSS in the University is that it has a very comprehensive on-line documentation facility. When the software is installed you should take a copy of the Manuals disc. On this you will find a series of pdf files that give full documentation in each area of SPSS. Listed below are the titles of the important files. Manuals relating to statistical issues: SPSS Brief Guide 18.0 SPSS Base Users Guide 18.0 SPSS Tables 18.0 SPSS Data Preparation 18.0 SPSS Advanced Models 18.0 SPSS Regression Models 18.0 SPSS Trends 18.0 SPSS Categories 18.0 SPSS Classification Trees 18.0 SPSS Complex Samples 18.0 SPSS Conjoint 18.0 SPSS Exact Tests SPSS Missing Value Analysis 18.0 Manuals relating to computational issues: SPSS 18.0 Algorithms SPSS 18.0 Command Syntax Reference SPSS Programming and Data Management, 4th Edition
72