Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
34 views

Practical File

This document provides instructions for performing various tasks in IBM SPSS, including: 1) Entering data by navigating to an existing file or creating a new one. Data is entered in the Data View screen while variable properties are specified in the Variable View screen. 2) Generating new variables by defining their name, type, value labels, and missing values in the Variable View. 3) Replacing missing values using procedures like replacing with the series mean or median. 4) Computing new variables by defining a formula involving existing variables. 5) Recoding variables by combining values or ranges into new categories. 6) Selecting cases for analysis based on conditional criteria.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Practical File

This document provides instructions for performing various tasks in IBM SPSS, including: 1) Entering data by navigating to an existing file or creating a new one. Data is entered in the Data View screen while variable properties are specified in the Variable View screen. 2) Generating new variables by defining their name, type, value labels, and missing values in the Variable View. 3) Replacing missing values using procedures like replacing with the series mean or median. 4) Computing new variables by defining a formula involving existing variables. 5) Recoding variables by combining values or ranges into new categories. 6) Selecting cases for analysis based on conditional criteria.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

SPSS Practical File

Sri Guru Gobind Singh College Of Commerce

Submitted to-
Dr Meenu Chopra

Submitted by-
ISHANJALI MADAAN 219005
VRIDHHI JAAIN 219020
PRANWAT SINGH ARORA 219059
INDEX

S.NO Topic Page No. Sign

1 Data Entry 3

2 Generating new variable 4

3 Replacing missing value 5

4 Computing variable 6

5 Recoding variable 7

6 Selecting cases 9

7 Sorting and merging 10

8 Graphs 13

9 Descriptive statistics procedure 17

10 T- test 22

11 ANOVA 26

12 2-way ANOVA 29

13 Parametric Vs Non-parametric tests 30

14 Chi – square 31

15 Bi-variate correlation and simple scatter plot 33

16 Regression Analysis 40

17 Factor Analysis 47

18 Cluster Analysis 54

19 Discriminant Analysis 55

2
1: Data Entry

We can navigate to an existent file, run the tutorial, type in data, and so on. The spreadsheet
that is initially displayed. This view, which is the default display, is called by IBM SPSS the
Data View because it is, quite literally, where we enter and view our data. We will need to
work in both the Data View and the Variable View screens. Although we can deal with these
screens in any order, but to begin with the Variable View screen when entering a new data
set.

Selecting the Variable View at the bottom of the new (blank) spreadsheet gives rise to the
screen shown below. This display is editable and allows us to specify the various properties
of the variables in the data file.
We use value labels to appoint numerical values to categorical data, so that we can easily
perform statistical tests on the same.

3
2: Generating new variable

If the SPSS file is new and there is no data in the sheet, we start by generating variables.
● To record a variable in SPSS, you can use the "Variable View" tab of the SPSS Data
Editor.
Here are the steps:
○ Open your dataset in SPSS.
○ Click on the "Variable View" tab at the bottom of the screen.
○ In the first row of the table, enter a name for your variable in the "Name" column. ○ Select
the appropriate "Type" for your variable (e.g. "Numeric" for a variable with numerical
values, "String" for a variable with text values).
○ If your variable is categorical, you can define its categories by clicking on the "Values"
button and entering the category labels and corresponding values.
○ If your variable has missing values, you can define them by clicking on the "Missing"
button and specifying the type of missingness (e.g. "System missing" for values that were not
collected, "User missing" for values that were not provided by the participant).
○ Once you have defined all of the properties of your variable, click on the "Data View" tab
to enter data for your variable

4
● The nominal data can be converted into categorical data using value specifications. ○ In the
above variable view we can see that Gender has been changed into categorical data using
value ○ Steps: In variable view – go to value cell (4) of whichever variable we want to
specify. – Put numeric value to be assigned – add label – click add – after adding all
categories click ok.

3: Replacing Missing Value


There may be many cases where the respondent has not answered some of the questions in a
questionnaire. In those cases, it may not be necessary to remove the entire case. Instead,
SPSS allows us to fill the missing values using many options such as- series mean, series
median, series mode etc. It helps us to easily analyse the data.

Steps for replacing missing values in SPSS-

● Step 1: Click Transform - Replace Missing Values.


● Step 2: Move the variable (resale value) from the left box to the right.
● Step 3: Choose the best method for replacing missing values from the drop-down
menu. Usually, Series mean or median is used. We have used the series mean.
● Step 4: Click “OK.”

5
4: Compute Variable
If we want to make a new variable after the survey has been done. We can do the same using
this command. It computes a new variable based on an equation of the existing variables. We
can name the target variable according to our wishes. For e.g.- In this example we need to
make a new variable called "Ratio", which is sales value divided by resale value. We can do
so by using the compute variable command and entering the existing variables and the
relationship between them.

Steps for Computing Variables in SPSS

Step 1: Click “Transform”, then click “Compute Variable”.


Step 2: Give your new (target) variable name. We have used Ratio the new variable name.
Step 3: Type in your request.

1. Move one variable over to the “Numeric Expression” box.


2. Press the divide(/) on the calculator.
3. Move the second variable over to “Numeric Expression” box.
Step 4: Click OK.

6
5: Recoding variables
This command helps us to combine the cases. We frequently use this function when we must
eliminate outliers and/or extreme values. For example, we have taken numerical variable as
"Engines" and output variable as "Engine". We must provide a new value to the range of
numbers existing in the Engines variable

Steps for Recoding Variables in Spss

1. Click on Transform - Recode into Different Variables


2. Drag and drop the variable you wish to recode (engines) over into the Input
Variable - Output Variable box
3. Create a new name for your output variable in the Output Variable (Engine
Size) text box, and click the Change button
4. Click the Old and New Values… button
5. Type the first value of your input variable into the Old Value (Value) text box,
and the value you want to replace it with into the New Value (Value) text box.
Then click Add to confirm the recoding
7
6. Repeat this process for all the existing values of your input variable
7. Press Continue, and then OK to do the recoding

8
6: Selecting cases
This function helps us to select only those cases that are relevant to us. SPSS would run the
tests on only the selected cases and the other deselected cases would have a cross mark on
their serial number. For example- We have selected only those cases in which the value of the
sales is more than 9. Therefore, we can see that all the other cases have been deselected.
Steps for selecting cases-
1. Data-Select Cases and Click on ‘If condition is satisfied’
2. Then click on the ‘IF’ push button, highlight my variable.
3. Click on the middle arrow to bring it over to the Expression box. Then specify
‘var=1’ AND ‘var=2’. When you do so, all the cases become unselected.

9
4. To select cases based on missing values, MISSING (variable). Returns true or 1 if the
value is system-missing or user-missing.

7: Sorting and merging cases


Sorting cases

We can sort the cases in either ascending or descending order on the basis of a particular
variable. For example- We have sorted the cases on the basis of resale value in ascending
order.

Cases are arranged in the data file in the order that they were entered. To sort the data file
based on another variable, from the IBM SPSS main menu select Data ➔ Sort Cases. This

10
produces the Sort Cases dialog window. Double-click age to move it from the Variable List
to the Sort by panel. Ascending is the default option that is checked under Sort Order, and we
retain it. Click OK. IBM SPSS acknowledges in an output window that it has executed the
sorting by age; this output file can be deleted without saving.

1. Click Data - Sort Cases.


2. Double-click on the variable you want to sort your data by to move them to
the Sort by box. If you are sorting by two or more variables, then the order that the
variables appear in the "Sort by" list matters. You can click and drag the variables
to reorder them within the Sort by box. We have used number of people prefer a
certain case.
3. In the Sort Order area, you can choose an “Ascending” or “Descending” sort order
for each variable in the "Sort by" list. Click on the variable in the Sort by box to
highlight it, then click the radio button that corresponds to your sort order choice.

11
12
Merging files

We have 2 options for merging files, we can either merge cases or merge variables. We use
the merge cases options if we have same variables (same questionnaire) but the cases are
collected by different people and we need to combine them. We use merge variables if we
have different variables ( The questionnaire is broken into different parts) but same cases
(same people have filled all the different parts of the questionnaire).

1. From the menus chooseData-Merge files


2. Select Add cases or Add variables.

Add casesSelect this option when the datasets contain the same variables (columns)
but different cases (rows). Adding cases pairs similar variables.

Add variablesSelect this option when the datasets contain the same cases (rows) but
different variables (columns). Adding variables sets common variables as keys.

3. Select a dataset to merge with the active dataset. You can drag an existing file to
the Drag-and-drop dataset file (.sav) here area in the user interface, browse for an
existing dataset, or select an open dataset.

13
Topic 8: Graphs

Graphs are used to depict the data which is too numerical and/or complicated so that it can be
read and analysed easily. Different graphs depict different things. Such as a bar graph and
line charts depict the frequency. Pie charts help us to know the percentage. Box plots and
histograms help us to know the normality of the data. Scatter plots tell us about the
correlation and regression of the data.

Steps for Line Chart

1. graphs -line -multiple and separate variables - define -select the variables (4 year
resale value) -OK
Double-click on the graph to open the SPSS Chart Editor

● change line style: highlight one line - format - line style - make selection -apply -
close

● change line color: highlight one line - format - color - make selection - apply -close

● change labelling on x-axis: highlight x-axis -chart - axis - change axis title to
"Gender", center it - labels -make changes - continue – OK

● change text size: highlight text - format - text - make selections - apply -close

● change labelling on y-axis: highlight y-axis - chart - axis - make selections – OK

Similar steps are to be followed for each type of graph and chart on spss.

14
8.1 Line Chart

8.2 Pie Chart

15
8.3 Scatter Plot

16
8.4 Histogram

8.5 Box Plot

17
18
9: Descriptives

9.1 Frequency

Steps to perform frequencies in SPSS:


1 Open your data file in SPSS.
2 Select the "Analyze" menu and choose "Descriptive Statistics" and then
"Frequencies."
3 In the "Frequencies" dialog box, select the variable you want to examine
from the list of variables on the left.
4 Click the arrow button to move the selected variable to the "Variable(s)"
box on the right.
5 Click the "Statistics" button to specify the statistics you want to
calculate. You can select options such as mean, median, mode, range,
standard deviation, variance, and more.
6 Click "OK" to create the frequencies table.
7 The frequencies table will appear in the output viewer. You can use this
table to examine the distribution of values for the variable.

19
9.2 Descriptives

Steps to perform descriptive statistics in SPSS:

● Open your data file in SPSS.


● Select the "Analyze" menu and choose "Descriptive Statistics" and then
"Descriptives."
● In the "Descriptives" dialog box, select the variable(s) you want to summarize from
the "Variables" list on the left.
● Click the "Options" button and select the descriptive statistics you want to include,
such as "Mean," "Standard Deviation," "Minimum," and "Maximum."

20
● Click "OK" to run the descriptive statistics.
● The descriptive statistics will appear in the output viewer. Look for the "Mean,"
"Standard Deviation," and other statistics you selected to describe the dataset.

9.3 Explore

Steps for running explore:

● Open your data file in SPSS.


● Select the "Analyze" menu and choose "Descriptive Statistics" and then "Explore."
● In the "Explore" dialog box, select the variable(s) you want to analyse from the
"Variables" list on the left.
● Click the "Plots" button and select the graphs you want to include, such as "Boxplot,"
"Histogram," and "Normality Plot."
● Click the "Statistics" button and select the descriptive statistics you want to include,
such as "Mean," "Standard Deviation," and "Correlation Coefficients."
● Click "OK" to run the "Explore" analysis.
● The results of the "Explore" analysis will appear in the output viewer. Look for the
descriptive statistics, graphs, and correlation coefficients to gain insights into the
characteristics of the dataset.

21
9.4 Cross Tabs

To describe the relationship between two categorical variables, we use a special type of table
called a cross-tabulation (or "crosstab" for short). In a cross-tabulation, the categories of one
variable determine the rows of the table, and the categories of the other variable determine
the columns. The cells of the table contain the number of times that a particular combination
of categories occurred. The "edges" (or "margins") of the table typically contain the total
number of observations for that category.
This type of table is also known as:
Crosstab.
Two-way table.
Contingency table.

Steps for running Crosstabs:

Step 1: Type your data into an SPSS worksheet. Contingency tables require at least two
variables (columns) of data. For this example question, type ages into the first column and
then type Healthcare into the second column. Change the variable names (the column
headers) by clicking the “Variable” button at the bottom of the sheet and typing over the
variable name.

Step 2: Click “Analyze,” then hover over “Descriptive Statistics” and then click “Crosstabs.”
The Crosstabs dialog window will open.

22
Step 3: Select one variable in the left window and then click the top arrow to populate the
“Row(s)” box. Select a variable to populate the “Column(s)” box and then click the center
arrow. For this sample problem, “Age” was selected for “Row(s)” and “Healthcare Type”
was selected for “Column(s).” Once you have made your selection, click “Cells.”

Step 4: Check which percentages you want to see (rows or columns). What you select will
depend upon what variables you put in rows and what you put in columns. For this sample
problem, “Healthcare Type” was placed in the columns, so check “Column” under
percentages.

Step 5: Click “Continue” and then click “OK.” The Crosstabs window will appear.

23
10: T-test

10.1 One sample T-test

The One Sample T-test examines whether the mean of a population is statistically different
from a known or hypothesized value. The One Sample t Test is a parametric test.
The variable used in this test is known as:
Test variable
In a One Sample t Test, the test variable's mean is compared against a "test value", which is a
known or hypothesized value of the mean in the population.

Steps:
○ Open SPSS and select the variable that you want to test.
○ Click on Analyze, then Compare Means, and then One-Sample T Test.
○ In the One-Sample T Test dialog box (22), select the variable you want to test
in the Test Variable(s) box.
○ In the Test Value box, enter the value you want to test the mean against. For
example, if you want to test whether the mean is significantly different from 5,
enter 5 in this box.
○ Select any other options you want to include in the analysis, such as
confidence intervals (95%), and click OK.
○ SPSS will generate output that includes the mean, standard deviation, and
standard error of the mean for the variable you selected, as well as the t-
statistic, degrees of freedom, and p-value for the one-sample t-test.

24
25
10.2 Independent sample T- test

Steps of Independent sample T- test

● Click on Analyze\Compare means\Independent Samples T test.


● Move the dependent continuous variable into the Test Variable box.
● Move the independent categorical variable into the Grouping Variable section.
● Click on Define groups and type in the numbers used in the data set to code each
group.
● Click on Continue and then OK.

26
10.3 Paired sample T-test
The Paired Samples t Test compares the means of two measurements taken from the same
individual, object, or related units.

Steps of paired sample T-test:

○ Open your data in SPSS and select the two variables that you want to test. These
should be continuous numerical data that are paired, such as before and after
measurements on the same individuals.
○ Click on Analyze, then Compare Means, and then Paired-Samples T Test.
○ In the Paired-Samples T Test dialog box, select the two variables you want to test in
the Paired Variables box.
○ Select any other options you want to include in the analysis, such as confidence
intervals, and click OK.
○ SPSS will generate output that includes the mean, standard deviation, and standard
error of the mean for the difference between the paired observations, as well as the t-
statistic, degrees of freedom, and p-value for the paired t-test.

27
28
11: ANOVA

Assumptions:
● Dependent variable that is continuous (i.e., interval or ratio level)
● Independent variable that is categorical (i.e., two or more groups)
● Cases that have values on both the dependent and independent variables
● Independent samples/groups (i.e., independence of observations)
a. There is no relationship between the subjects in each sample. This means
that:
i. subjects in the first group cannot also be in the second group
ii. no subject in either group can influence subjects in the other
group
iii. no group can influence the other group

● Random sample of data from the population


● Normal distribution (approximately) of the dependent variable for each group (i.e.,
for each level of the factor)
● Homogeneity of variances (i.e., variances approximately equal across groups)
● No outliers

Variables: sales ,engine revised(engine size recoded into three categories:0-2.5 , 2.6-4.5 ,
4.5 and above )

29
30
Levene's Test and assumption of homogeneity of variances is violated as Sig
0.014<0.05,therefore the null hypothesis stating that variance of sales for each of the
engine
sizes are same is rejected.
Therefore, Welch Test is done.

31
H0: The mean sales across the three groups of engine sizes are equal.
H1: The mean sales of at least one of the three groups of engine sizes are not equal.
Since 0.526>0.05 , we fail to reject the null hypothesis.
In ANOVA table also, 0.337>0.05 , therefore the mean sales value across all the three
groups
of engine sizes are same.
In multiple comparisons tables for Tukey and both Games - Howell, the significance value
across different combinations of three groups is more than 0.05 , therefore the mean sales
across the three groups of engine sizes are equal.

Post Hoc Tests

Steps:
Run an ANOVA or chi-square test to determine if there are significant differences between
groups.

● In the output viewer, look for the "Sig." value for the ANOVA or chi-square test. If
the value is less than 0.05, it indicates that there are significant differences between
groups.
● Select the "Analyze" menu and choose "Compare Means" and then "One-Way
ANOVA."
● In the "One-Way ANOVA" dialog box, select the variable you used in the ANOVA
test as the "Dependent Variable."
● Click the "Post Hoc" button and select the post hoc test you want to run. Popular
options include Bonferroni, Tukey HSD, and LSD.
● Click "OK" to run the post hoc test.
● The post hoc test results will appear in the output viewer. Look for the "Sig." value
to determine if there are significant differences between specific groups.

12:Two Way Anova


Steps
● Click Analyse, select General Linear Model, and then click Univariate.
● Click Your dependent variable (in this example, that’s Cleanliness), and then click the
arrow by the “Dependent Variable box”.
● Click your main independent variable—the main focus of your experiment (for this
example, that’s Brand of Cleaner, and then click the arrow for the Fixed Factor(s):
box.

32
● Click your interaction variable (Water temperature in this example), and then click the
arrow for Fixed Factor(s).
● Click EM Means box.
● Click your independent variable (“Brands”) in the Univariate Estimated Marginal
Means window. Hold down Ctrl on your keyboard and then click Water and
Brand*Water. Click the arrow to move all three to the Display Means for: box.
● Click Compare main effects.
● Click Continue.
● Click Options.
● Click Descriptive statistics and Homogeneity tests. You can change the alpha level
here, but in this example, I’m going to leave it at the 5% level.
● Click Continue and then click OK.

13: Parametric Vs Non-Parametric tests

Parametric tests Non-Parametric tests


Assumptions: Normality assumption is Normality assumption is not required.
required.

Uses the metric data that is scale Ordinal or interval scale data is used.
data.

Can be applied for both small Can be applied for small samples
and large samples

Applications: One sample using Z or T One sample using the sign test
statistics

Two independent samples using Two independent samples using the


a T or Z test. Mann-Whitney U statistics.

Two paired samples using a T or Two paired samples using the sign test
Z test. and Wilcoxon matched pair rank test

Randomness - no test in Randomness - using Runs test.


parametric is available.

Several independent samples Several independent samples using


using the F test in ANOVA. Kruskal-Wallis test.

33
14: Chi-Square Test
Chi-square test is a statistical test for categorical data. It is used to determine whether your
data are significantly different from what you expected. For the use of this test, the data is
required in the form of frequencies.

Properties of Chi-square are:


● Chi-square distribution is non-symmetric.
● The value of chi-square is greater than or equal to 0.
● The shape of chi-square distribution depends upon the degrees of freedom.

There are Two types of chi-square tests


1. Goodness of fit - It is a statistical test of how well the observed data supports the
assumption about distribution of a problem. It examines how well an assumed
distribution fits the data.
2. Independence of variables - This test makes use of contingency tables. This test is
used to test the independence of two variables and define whether the two variables
are related or not.

Steps of Chi-Square Test:

1. Open your data file in SPSS.


2. Select the "Analyze" menu and choose "Descriptive Statistics" and then "Crosstabs."
3. In the "Crosstabs" dialog box, select the variable you want to cross-tabulate as the
"Row" variable and the variable you want to cross-tabulate against as the "Column"
variable
4. Click the "Statistics" button and select the "Chi-square" checkbox.
5. Click the "Cells" button and select the "Expected" checkbox to show the expected
values for each cell.
6. Click "OK" to create the cross-tabulation table and run the chi-square test.
7. The chi-square test results will appear in the output viewer. Look for the "Chi-Square"
value and the "p-value" to determine the significance of the relationship between the
two variables.
8. You can also look at the "Expected" and "Observed" tables in the output viewer to see
the expected and observed frequencies for each cell in the cross-tabulation table.

34
Interpretation-
Ho- There is no relation between Manufacturer and Vehicle Type
H1- There is a relation between Manufacturer and Vehicle Type

The Asymptotic Significance of .021<0.05, therefore the null hypothesis is rejected and there

is a relationship between vehicle type and engine size.

35
15 :Bi-variate correlation and simple scatter plots

Correlation measures the degree of association between two or more variables.


There are three types of correlation:
1. Positive correlation - When two variables X and Y move in the same direction, the
correlation between the two is positive. It implies that if one variable increases, the
other variable also increases and if one variable decreases, the other variable also
decreases.
2. Negative correlation - When two variables X and Y move in the opposite direction,
the correlation is negative. It implies that if one variable increases, the other variable
decreases and if one variable decreases, the other variable increases.
3. Zero correlation - The two variables X and Y are said to have zero correlation if the
variables move in no connection with each other.
Simple scatter plot -
Scatter plots are the graphs that present the relationship between two variables in a data set
which is represented on a cartesian system or a dimensional plane. The independent variable
is plotted on the X-axis, while the dependent variable is plotted on the Y-axis.

36
2 WAY ANOVA

Two-way ANOVA (analysis of variance) is a statistical method used to analyze the effects of
two independent variables (factors) on a continuous dependent variable. The two-way ANOVA
involves examining the effects of two factors simultaneously, and it allows for the assessment
of the main effects of each factor and the interaction between the two factors. The main effects
of each factor represent the independent contribution of each factor to the dependent variable,
while the interaction effect reflects the combined effect of the two factors on the dependent
variable.

Several assumptions that must be met for two-way ANOVA:

1. Normality: The data should be normally distributed within each group


combination of the two independent variables.
2. Homogeneity of variances: The variance of the dependent variable should be
approximately equal across all group combinations of the two independent
variables.
3. Independence: The observations within each group combination should be
independent of each other.
4. Random sampling: The samples used in the study should be randomly selected
from the population.

Steps to perform a two-way ANOVA in SPSS:

1. Open the data file in SPSS and select Analyze from the menu bar, then select General
Linear Model and Univariate.
2. Move the dependent variable (the variable you want to analyze) to the Dependent
Variable box, and the independent variables (the two factors you want to analyze) to
the Fixed Factors box.
3. Click the Options button, and select Descriptive statistics and Fixed and factor
interactions.
4. Click OK to run the analysis.

Between-Subjects Factors
Value Label N
Employment Category 1 Clerical 363
2 Custodial 27
3 Manager 84
Gender F Female 216
M Male 258

37
Descriptive Statistics
Dependent Variable: Current Salary
Employment Category Gender Mean Std. Deviation N
Clerical Female $25,003.69 $5,812.838 206
Male $31,558.15 $7,997.978 157
Total $27,838.54 $7,567.995 363
Custodial Male $30,938.89 $2,114.616 27
Total $30,938.89 $2,114.616 27
Manager Female $47,213.50 $8,501.253 10
Male $66,243.24 $18,051.570 74
Total $63,977.80 $18,244.776 84
Total Female $26,031.92 $7,558.021 216
Male $41,441.78 $19,499.214 258
Total $34,419.57 $17,075.661 474

Levene's Test of Equality of Error Variancesa,b


Levene Statistic df1 df2 Sig.
Current Salary Based on Mean 33.383 4 469 .000
Based on Median 30.319 4 469 .000
Based on Median and with 30.319 4 218.439 .000
adjusted df
Based on trimmed mean 31.727 4 469 .000
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
a. Dependent variable: Current Salary
b. Design: Intercept + jobcat + gender + jobcat * gender

Tests of Between-Subjects Effects


Dependent Variable: Current Salary
Type III Sum of
Source Squares df Mean Square F Sig.
Corrected Model 96456357285.1 4 24114089321.2 272.780 .000
04a 76
Intercept 177271943071. 1 177271943071. 2005.313 .000
927 927
Jobcat 32316332041.2 2 16158166020.6 182.782 .000
98 49
Gender 5247440731.56 1 5247440731.56 59.359 .000
8 8
jobcat * gender 1247682866.73 1 1247682866.73 14.114 .000
7 7
Error 41460138151.2 469 88401147.444
36
Total 699467436925. 474
000
Corrected Total 137916495436. 473
340
a. R Squared = .699 (Adjusted R Squared = .697)

38
Current Salary
Subset
Employment Category N 1 2
Tukey HSDa,b,c Clerical 363 $27,838.54
Custodial 27 $30,938.89
Manager 84 $63,977.80
Sig. .179 1.000
Ryan-Einot-Gabriel-Welsch Clerical 363 $27,838.54
Rangec Custodial 27 $30,938.89
Manager 84 $63,977.80
Sig. .226 1.000
Means for groups in homogeneous subsets are displayed.
Based on observed means.
The error term is Mean Square(Error) = 88401147.444.
a. Uses Harmonic Mean Sample Size = 58.031.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels
are not guaranteed.
c. Alpha = .05.

39
40
16: Regression Analysis

16.1. Simple Linear Regression

Simple Linear Regression Simple linear regression is a statistical method used to study the
relationship between two continuous variables by finding a linear equation that best predicts
the value of one variable based on the value of the other variable.

Assumptions for Simple Linear Regression are as follows:


1. Linearity: There is a linear relationship between the independent variable and the
dependent variable.
2. Normality: The errors are normally distributed. This means that the distribution of the
residuals follows a normal or bell-shaped curve.
3. Homoscedasticity: The variance of the errors is constant across all levels of the predictor
variable. This means that the variability of the residuals (the differences between the
predicted values and the actual values) is the same for all levels of the independent
variable.

Steps to run simple linear regression are as follows:


1. Open SPSS and load your dataset.
2. Click on "Analyze" in the top menu and select "Regression" -> "Linear".
3. Select the dependent variable and the independent variable.
4. Click on "Statistics" and select the options you want to include, such as "Descriptive
statistics" and "Coefficients".
5. Click on "Plots" and select the options you want to include, such as "Scatter Plot" and
"Normal probability plot".
6. Click on "OK" to run the analysis.

16.1 Linear Regression

Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 Engine sizeb . Enter
a. Dependent Variable: Horsepower
b. All requested variables entered.

Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .837a .701 .699 31.096
a. Predictors: (Constant), Engine size

41
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 349403.231 1 349403.231 361.346 .000b
Residual 148910.358 154 966.950
Total 498313.590 155
a. Dependent Variable: Horsepower
b. Predictors: (Constant), Engine size

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 46.834 7.730 6.058 .000
Engine size 45.449 2.391 .837 19.009 .000
a. Dependent Variable: Horsepower

16.2. Multiple Linear Regression


Multiple regression is an extension of simple linear regression. It is used when we want to
predict the value of a variable based on the value of two or more other variables. The variable
we want to predict is called the dependent variable (or sometimes, the outcome, target or
criterion variable). The variables we are using to predict the value of the dependent variable
are called the independent variables (or sometimes, the predictor, explanatory or regressor
variables). Assumptions for Multiple Regression: All the assumptions for simple regression
(with one independent variable) also apply for multiple regression with one addition. If two of
the independent variables are highly related, this leads to a problem called multicollinearity.
This causes problems with the analysis and interpretation. To investigate possible
multicollinearity, first look at the correlation coefficients for each pair of continuous (scale)
variables. Correlations of 0.8 or above suggest a strong relationship and only one of the two
variables is needed in the regression analysis. SPSS also provides Collinearity diagnostics
within the Statistics menu of regression which assess the relationships between each
independent variable and all the other variables.

Steps to run multiple linear regression are as follows:

1. Open SPSS and load your dataset.


2. Click on "Analyze" in the top menu and select "Regression" -> "Linear".
3. Select the dependent variable and the independent variables.
4. Click on "Statistics" and select the options you want to include, such as "Descriptive
statistics" and "Coefficients".
5. Click on "Plots" and select the options you want to include, such as "Scatter Plot" and
"Normal probability plot".
6. Click on "OK" to run the analysis.

42
It can be seen that the independent variables can explain the dependent variable by 72.6%
which is our R square.

43
As our Anova is significant (p value is less than 0.05) we can say that our model is
statistically significant therefore there is a relation between our dependent and independent
variables.

Our regression equation is as follows-

Dependent Variable (Y)- Price in thousands

Independent Variable (X1)- Horsepower

Independent Variable (X2)- Sales in thousands

Equation- Y= -9.041 + 0.205X1 – 0.031X2

16.3 Binary Logistic Regression

Binary logistic regression is a statistical method used to analyze the relationship between a
binary dependent variable (also known as the response variable) and one or more independent
variables (also known as predictor variables). It is a type of regression analysis used when the
dependent variable is binary, which means it takes one of two values, such as "yes" or "no",
"true" or "false", or "success" or "failure". In logistic regression, the model estimates the
coefficients (also called beta coefficients or regression coefficients) of the independent
variables that best predict the probability of the dependent variable taking a certain value
(usually 0 or 1). These coefficients represent the change in the log odds of the dependent
variable for a one-unit increase in the corresponding independent variable, holding all other
variables constant.

Some of the key assumptions of binary logistic regression include:


1. Linearity: The relationship between the independent variables and the log odds of the
dependent variable should be linear. This means that the effect of the independent
variables on the dependent variable should be constant across all levels of the
independent variables.
2. Independence of observations: The observations should be independent of each other. In
other words, the outcome of one observation should not be influenced by the outcome of
another observation.

44
3. No multicollinearity: The independent variables should not be highly correlated with
each other. Multicollinearity can lead to unstable estimates of the coefficients and reduce
the power of the analysis.
4. Binary dependent variable: The dependent variable should be binary, meaning it should
only have two possible outcomes.

Here are the steps for performing binary logistic regression in SPSS:
1. Open SPSS and navigate to the "Data Editor" window.
2. Enter the data into the spreadsheet, making sure that the dependent variable (i.e., the binary
outcome variable) is coded as 0 and 1.
3. Go to the "Analyze" menu and select "Regression" > "Binary Logistic."
4. In the "Binary Logistic Regression" dialog box, move the dependent variable to the
"Dependent" box and the independent variables to the "Covariates" box.Specify any
additional options, such as interactions or categorical variables, by clicking on the
"Categorical" or "Interaction" buttons.
6. Click on the "Options" button to specify any output options, such as including odds ratios
or goodness-of-fit statistics.
7. Click "OK" to run the analysis.
8. Interpret the output, which will include information on the coefficients, odds ratios,
standard errors, confidence intervals, and significance tests for each variable. Also, check the
assumptions of the logistic regression, such as linearity, independence of observations,
multicollinearity, and absence of perfect separation.

Case Processing Summary


Unweighted Casesa N Percent
Selected Cases Included in Analysis 2421 100.0
Missing Cases 0 .0
Total 2421 100.0
Unselected Cases 0 .0
Total 2421 100.0
a. If weight is in effect, see classification table for the total number of
cases.

Dependent Variable
Encoding
Original Value Internal Value
No 0
Yes 1

Block 1: Method = Enter

Omnibus Tests of Model Coefficients


Chi-square df Sig.

45
Step 1 Step 9.872 1 .002
Block 9.872 1 .002
Model 9.872 1 .002

The value of Chi-square test is 9.872 and the significance level is 0.02 ie less than 0.05
therefore this model is a good fit for our data.

46
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 2.335 2 .311

The significance level is 0.311 which is more than 0.05 therefore we fail to reject the
null hypothesis.
We can conclude that there is no difference between observed and predicted model.

Contingency Table for Hosmer and Lemeshow Test


History of diabetes = History of diabetes =
No Yes
Observed Expected Observed Expected Total
Step 1 1 562 564.504 32 29.496 594
2 852 851.886 58 58.114 910
3 628 620.715 48 55.285 676
4 211 215.895 30 25.105 241

47
Classification Tablea
Predicted
History of diabetes Percentage
Observed No Yes Correct
Step 1 History of No 2253 0 100.0
diabetes Yes 168 0 .0
Overall Percentage 93.1
a. The cut value is .500

As the overall percentage is 93.1% this means that our model is able to predict the
correct category of the dependent variable 93.1%.

Variables in the Equation


B S.E. Wald df Sig. Exp(B)
Step 1a Age .267 .085 9.906 1 .002 1.306
category
Constant -3.218 .221 211.457 1 .000 .040
a. Variable(s) entered on step 1: Age category.

Exp(B) or Odds ratio is 1.306, this means that the odds of a case falling in the assumed
or desired category (in this case “yes” or 1) is more by 1.306.

48
17 : Factor Analysis
Factor analysis is a data reduction technique that is commonly used to identify a smaller set
of variables from a larger set of variables. It is a statistical method that is used to identify
underlying variables, known as factors, that explain patterns of correlation within a set of
observed variables.

Here are the steps to perform factor analysis in SPSS:


1. Click on "Analyze" from the menu bar and then select "Dimension Reduction"
and then "Factor" from the dropdown menu.
2. In the "Factor" dialog box, select the variables you want to include in the
analysis and move them to the "Variables" box.
3. In the “Descriptives” tab , Check KMO and Bartlett’s test of sphericity.
4. Specify the extraction method that you want to use. The most commonly used
extraction method is "Principal Component Analysis (PCA)".
5. Specify the number of factors you want to extract in the "Extract" section. Also
change the maximum number of iterations to 100.
6. Choose the rotation method you want to use. The most commonly used rotation
method is "Varimax". Also check for Displaying the “Rotated Solution”.
7. In the "Options" tab, you can choose to display the factor scores, which represent
the scores for each individual on each factor. You can also choose to suppress
small factor loadings, which will remove any loadings below a certain threshold.
8. Click on "OK" to run the factor analysis.

49
50
51
We can see that only 8 components have an eigenvalue of more than 1, therefore only these
components are taken in consideration. Also it can be noted that these components
collectively explain 76.662% of the variation in the dependent variable.

52
53
54
55
18: Cluster Analysis
Cluster Analysis Cluster analysis is a statistical method used in research methodology to classify
thedata into groups or clusters based on the similarities or dissimilarities between the
observations. In other words, it is a method used to group similar objects into respective
categories. It can also be referred to as segmentation analysis. The goal of the analysis is to
sort different objects into groups in a manner that the degree of association between two
objects is high if they belong to the same group, and low if they belong to different groups.

1. The assumptions of the cluster analysis are:


2. The data variables are independent of each other.
3. The variables are homogenous within each cluster formed.
4. The variables are heterogeneous if belonging t different clusters.
5. Number of clusters is not known in advance.

The applications of cluster analysis include:

1) It can be used in market segmentation to identify different market segments between


customers or products.
2) It can be used in social sciences and research to group individuals based on their
behaviour, attitude, or personality traits.
3) Insurance companies often uses cluster analysis to form groups of various claims
under different categories of causes and areas to learn what is giving rise to these
claims.
4) Used by geologists to evaluate seismic risk and the weaknesses of earthquake-prone
regions. Overall, it is a powerful and helpful technique that can help researchers to
better understand the structure of their data and to magnify meaningful relationship
among their observations.

56
19: Discriminant Analysis

Discriminant analysis is a statistical method often used by market researchers to classify


observations into two or more groups or categories. It’s primary goal is to predict the group
membership of a given individual based on a set of predictor variables.

It is carried on on the following assumptions:

1. ⇒ Samples ought to be free from one another and independent.


2. ⇒ The groups being compared are normally distributed.
3. ⇒ The variance-covariance matrices for each group should be the same
(homoscedasticity).
4. ⇒ The sample size is sufficient relative to the number of predictor variables.
Applications of Discriminant Analysis:
5. ⇒ It can be used to determine which features of a given product are most important to
different consumer groups/segments.
6. ⇒ In medical research, it can be used to determine the symptoms which are strongly
associated with a particular disease.
7. ⇒ In educational research, it can be used to identify the factors that distinguish the high-
achieving and low-achieving students.
8. ⇒ In law and order, as it can help in identifying the variables which are strongly
associated with fraudulent behavior..

57

You might also like