Practical File
Practical File
Submitted to-
Dr Meenu Chopra
Submitted by-
ISHANJALI MADAAN 219005
VRIDHHI JAAIN 219020
PRANWAT SINGH ARORA 219059
INDEX
1 Data Entry 3
4 Computing variable 6
5 Recoding variable 7
6 Selecting cases 9
8 Graphs 13
10 T- test 22
11 ANOVA 26
12 2-way ANOVA 29
14 Chi – square 31
16 Regression Analysis 40
17 Factor Analysis 47
18 Cluster Analysis 54
19 Discriminant Analysis 55
2
1: Data Entry
We can navigate to an existent file, run the tutorial, type in data, and so on. The spreadsheet
that is initially displayed. This view, which is the default display, is called by IBM SPSS the
Data View because it is, quite literally, where we enter and view our data. We will need to
work in both the Data View and the Variable View screens. Although we can deal with these
screens in any order, but to begin with the Variable View screen when entering a new data
set.
Selecting the Variable View at the bottom of the new (blank) spreadsheet gives rise to the
screen shown below. This display is editable and allows us to specify the various properties
of the variables in the data file.
We use value labels to appoint numerical values to categorical data, so that we can easily
perform statistical tests on the same.
3
2: Generating new variable
If the SPSS file is new and there is no data in the sheet, we start by generating variables.
● To record a variable in SPSS, you can use the "Variable View" tab of the SPSS Data
Editor.
Here are the steps:
○ Open your dataset in SPSS.
○ Click on the "Variable View" tab at the bottom of the screen.
○ In the first row of the table, enter a name for your variable in the "Name" column. ○ Select
the appropriate "Type" for your variable (e.g. "Numeric" for a variable with numerical
values, "String" for a variable with text values).
○ If your variable is categorical, you can define its categories by clicking on the "Values"
button and entering the category labels and corresponding values.
○ If your variable has missing values, you can define them by clicking on the "Missing"
button and specifying the type of missingness (e.g. "System missing" for values that were not
collected, "User missing" for values that were not provided by the participant).
○ Once you have defined all of the properties of your variable, click on the "Data View" tab
to enter data for your variable
4
● The nominal data can be converted into categorical data using value specifications. ○ In the
above variable view we can see that Gender has been changed into categorical data using
value ○ Steps: In variable view – go to value cell (4) of whichever variable we want to
specify. – Put numeric value to be assigned – add label – click add – after adding all
categories click ok.
5
4: Compute Variable
If we want to make a new variable after the survey has been done. We can do the same using
this command. It computes a new variable based on an equation of the existing variables. We
can name the target variable according to our wishes. For e.g.- In this example we need to
make a new variable called "Ratio", which is sales value divided by resale value. We can do
so by using the compute variable command and entering the existing variables and the
relationship between them.
6
5: Recoding variables
This command helps us to combine the cases. We frequently use this function when we must
eliminate outliers and/or extreme values. For example, we have taken numerical variable as
"Engines" and output variable as "Engine". We must provide a new value to the range of
numbers existing in the Engines variable
8
6: Selecting cases
This function helps us to select only those cases that are relevant to us. SPSS would run the
tests on only the selected cases and the other deselected cases would have a cross mark on
their serial number. For example- We have selected only those cases in which the value of the
sales is more than 9. Therefore, we can see that all the other cases have been deselected.
Steps for selecting cases-
1. Data-Select Cases and Click on ‘If condition is satisfied’
2. Then click on the ‘IF’ push button, highlight my variable.
3. Click on the middle arrow to bring it over to the Expression box. Then specify
‘var=1’ AND ‘var=2’. When you do so, all the cases become unselected.
9
4. To select cases based on missing values, MISSING (variable). Returns true or 1 if the
value is system-missing or user-missing.
We can sort the cases in either ascending or descending order on the basis of a particular
variable. For example- We have sorted the cases on the basis of resale value in ascending
order.
Cases are arranged in the data file in the order that they were entered. To sort the data file
based on another variable, from the IBM SPSS main menu select Data ➔ Sort Cases. This
10
produces the Sort Cases dialog window. Double-click age to move it from the Variable List
to the Sort by panel. Ascending is the default option that is checked under Sort Order, and we
retain it. Click OK. IBM SPSS acknowledges in an output window that it has executed the
sorting by age; this output file can be deleted without saving.
11
12
Merging files
We have 2 options for merging files, we can either merge cases or merge variables. We use
the merge cases options if we have same variables (same questionnaire) but the cases are
collected by different people and we need to combine them. We use merge variables if we
have different variables ( The questionnaire is broken into different parts) but same cases
(same people have filled all the different parts of the questionnaire).
Add casesSelect this option when the datasets contain the same variables (columns)
but different cases (rows). Adding cases pairs similar variables.
Add variablesSelect this option when the datasets contain the same cases (rows) but
different variables (columns). Adding variables sets common variables as keys.
3. Select a dataset to merge with the active dataset. You can drag an existing file to
the Drag-and-drop dataset file (.sav) here area in the user interface, browse for an
existing dataset, or select an open dataset.
13
Topic 8: Graphs
Graphs are used to depict the data which is too numerical and/or complicated so that it can be
read and analysed easily. Different graphs depict different things. Such as a bar graph and
line charts depict the frequency. Pie charts help us to know the percentage. Box plots and
histograms help us to know the normality of the data. Scatter plots tell us about the
correlation and regression of the data.
1. graphs -line -multiple and separate variables - define -select the variables (4 year
resale value) -OK
Double-click on the graph to open the SPSS Chart Editor
● change line style: highlight one line - format - line style - make selection -apply -
close
● change line color: highlight one line - format - color - make selection - apply -close
● change labelling on x-axis: highlight x-axis -chart - axis - change axis title to
"Gender", center it - labels -make changes - continue – OK
● change text size: highlight text - format - text - make selections - apply -close
Similar steps are to be followed for each type of graph and chart on spss.
14
8.1 Line Chart
15
8.3 Scatter Plot
16
8.4 Histogram
17
18
9: Descriptives
9.1 Frequency
19
9.2 Descriptives
20
● Click "OK" to run the descriptive statistics.
● The descriptive statistics will appear in the output viewer. Look for the "Mean,"
"Standard Deviation," and other statistics you selected to describe the dataset.
9.3 Explore
21
9.4 Cross Tabs
To describe the relationship between two categorical variables, we use a special type of table
called a cross-tabulation (or "crosstab" for short). In a cross-tabulation, the categories of one
variable determine the rows of the table, and the categories of the other variable determine
the columns. The cells of the table contain the number of times that a particular combination
of categories occurred. The "edges" (or "margins") of the table typically contain the total
number of observations for that category.
This type of table is also known as:
Crosstab.
Two-way table.
Contingency table.
Step 1: Type your data into an SPSS worksheet. Contingency tables require at least two
variables (columns) of data. For this example question, type ages into the first column and
then type Healthcare into the second column. Change the variable names (the column
headers) by clicking the “Variable” button at the bottom of the sheet and typing over the
variable name.
Step 2: Click “Analyze,” then hover over “Descriptive Statistics” and then click “Crosstabs.”
The Crosstabs dialog window will open.
22
Step 3: Select one variable in the left window and then click the top arrow to populate the
“Row(s)” box. Select a variable to populate the “Column(s)” box and then click the center
arrow. For this sample problem, “Age” was selected for “Row(s)” and “Healthcare Type”
was selected for “Column(s).” Once you have made your selection, click “Cells.”
Step 4: Check which percentages you want to see (rows or columns). What you select will
depend upon what variables you put in rows and what you put in columns. For this sample
problem, “Healthcare Type” was placed in the columns, so check “Column” under
percentages.
Step 5: Click “Continue” and then click “OK.” The Crosstabs window will appear.
23
10: T-test
The One Sample T-test examines whether the mean of a population is statistically different
from a known or hypothesized value. The One Sample t Test is a parametric test.
The variable used in this test is known as:
Test variable
In a One Sample t Test, the test variable's mean is compared against a "test value", which is a
known or hypothesized value of the mean in the population.
Steps:
○ Open SPSS and select the variable that you want to test.
○ Click on Analyze, then Compare Means, and then One-Sample T Test.
○ In the One-Sample T Test dialog box (22), select the variable you want to test
in the Test Variable(s) box.
○ In the Test Value box, enter the value you want to test the mean against. For
example, if you want to test whether the mean is significantly different from 5,
enter 5 in this box.
○ Select any other options you want to include in the analysis, such as
confidence intervals (95%), and click OK.
○ SPSS will generate output that includes the mean, standard deviation, and
standard error of the mean for the variable you selected, as well as the t-
statistic, degrees of freedom, and p-value for the one-sample t-test.
24
25
10.2 Independent sample T- test
26
10.3 Paired sample T-test
The Paired Samples t Test compares the means of two measurements taken from the same
individual, object, or related units.
○ Open your data in SPSS and select the two variables that you want to test. These
should be continuous numerical data that are paired, such as before and after
measurements on the same individuals.
○ Click on Analyze, then Compare Means, and then Paired-Samples T Test.
○ In the Paired-Samples T Test dialog box, select the two variables you want to test in
the Paired Variables box.
○ Select any other options you want to include in the analysis, such as confidence
intervals, and click OK.
○ SPSS will generate output that includes the mean, standard deviation, and standard
error of the mean for the difference between the paired observations, as well as the t-
statistic, degrees of freedom, and p-value for the paired t-test.
27
28
11: ANOVA
Assumptions:
● Dependent variable that is continuous (i.e., interval or ratio level)
● Independent variable that is categorical (i.e., two or more groups)
● Cases that have values on both the dependent and independent variables
● Independent samples/groups (i.e., independence of observations)
a. There is no relationship between the subjects in each sample. This means
that:
i. subjects in the first group cannot also be in the second group
ii. no subject in either group can influence subjects in the other
group
iii. no group can influence the other group
Variables: sales ,engine revised(engine size recoded into three categories:0-2.5 , 2.6-4.5 ,
4.5 and above )
29
30
Levene's Test and assumption of homogeneity of variances is violated as Sig
0.014<0.05,therefore the null hypothesis stating that variance of sales for each of the
engine
sizes are same is rejected.
Therefore, Welch Test is done.
31
H0: The mean sales across the three groups of engine sizes are equal.
H1: The mean sales of at least one of the three groups of engine sizes are not equal.
Since 0.526>0.05 , we fail to reject the null hypothesis.
In ANOVA table also, 0.337>0.05 , therefore the mean sales value across all the three
groups
of engine sizes are same.
In multiple comparisons tables for Tukey and both Games - Howell, the significance value
across different combinations of three groups is more than 0.05 , therefore the mean sales
across the three groups of engine sizes are equal.
Steps:
Run an ANOVA or chi-square test to determine if there are significant differences between
groups.
● In the output viewer, look for the "Sig." value for the ANOVA or chi-square test. If
the value is less than 0.05, it indicates that there are significant differences between
groups.
● Select the "Analyze" menu and choose "Compare Means" and then "One-Way
ANOVA."
● In the "One-Way ANOVA" dialog box, select the variable you used in the ANOVA
test as the "Dependent Variable."
● Click the "Post Hoc" button and select the post hoc test you want to run. Popular
options include Bonferroni, Tukey HSD, and LSD.
● Click "OK" to run the post hoc test.
● The post hoc test results will appear in the output viewer. Look for the "Sig." value
to determine if there are significant differences between specific groups.
32
● Click your interaction variable (Water temperature in this example), and then click the
arrow for Fixed Factor(s).
● Click EM Means box.
● Click your independent variable (“Brands”) in the Univariate Estimated Marginal
Means window. Hold down Ctrl on your keyboard and then click Water and
Brand*Water. Click the arrow to move all three to the Display Means for: box.
● Click Compare main effects.
● Click Continue.
● Click Options.
● Click Descriptive statistics and Homogeneity tests. You can change the alpha level
here, but in this example, I’m going to leave it at the 5% level.
● Click Continue and then click OK.
Uses the metric data that is scale Ordinal or interval scale data is used.
data.
Can be applied for both small Can be applied for small samples
and large samples
Applications: One sample using Z or T One sample using the sign test
statistics
Two paired samples using a T or Two paired samples using the sign test
Z test. and Wilcoxon matched pair rank test
33
14: Chi-Square Test
Chi-square test is a statistical test for categorical data. It is used to determine whether your
data are significantly different from what you expected. For the use of this test, the data is
required in the form of frequencies.
34
Interpretation-
Ho- There is no relation between Manufacturer and Vehicle Type
H1- There is a relation between Manufacturer and Vehicle Type
The Asymptotic Significance of .021<0.05, therefore the null hypothesis is rejected and there
35
15 :Bi-variate correlation and simple scatter plots
36
2 WAY ANOVA
Two-way ANOVA (analysis of variance) is a statistical method used to analyze the effects of
two independent variables (factors) on a continuous dependent variable. The two-way ANOVA
involves examining the effects of two factors simultaneously, and it allows for the assessment
of the main effects of each factor and the interaction between the two factors. The main effects
of each factor represent the independent contribution of each factor to the dependent variable,
while the interaction effect reflects the combined effect of the two factors on the dependent
variable.
1. Open the data file in SPSS and select Analyze from the menu bar, then select General
Linear Model and Univariate.
2. Move the dependent variable (the variable you want to analyze) to the Dependent
Variable box, and the independent variables (the two factors you want to analyze) to
the Fixed Factors box.
3. Click the Options button, and select Descriptive statistics and Fixed and factor
interactions.
4. Click OK to run the analysis.
Between-Subjects Factors
Value Label N
Employment Category 1 Clerical 363
2 Custodial 27
3 Manager 84
Gender F Female 216
M Male 258
37
Descriptive Statistics
Dependent Variable: Current Salary
Employment Category Gender Mean Std. Deviation N
Clerical Female $25,003.69 $5,812.838 206
Male $31,558.15 $7,997.978 157
Total $27,838.54 $7,567.995 363
Custodial Male $30,938.89 $2,114.616 27
Total $30,938.89 $2,114.616 27
Manager Female $47,213.50 $8,501.253 10
Male $66,243.24 $18,051.570 74
Total $63,977.80 $18,244.776 84
Total Female $26,031.92 $7,558.021 216
Male $41,441.78 $19,499.214 258
Total $34,419.57 $17,075.661 474
38
Current Salary
Subset
Employment Category N 1 2
Tukey HSDa,b,c Clerical 363 $27,838.54
Custodial 27 $30,938.89
Manager 84 $63,977.80
Sig. .179 1.000
Ryan-Einot-Gabriel-Welsch Clerical 363 $27,838.54
Rangec Custodial 27 $30,938.89
Manager 84 $63,977.80
Sig. .226 1.000
Means for groups in homogeneous subsets are displayed.
Based on observed means.
The error term is Mean Square(Error) = 88401147.444.
a. Uses Harmonic Mean Sample Size = 58.031.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels
are not guaranteed.
c. Alpha = .05.
39
40
16: Regression Analysis
Simple Linear Regression Simple linear regression is a statistical method used to study the
relationship between two continuous variables by finding a linear equation that best predicts
the value of one variable based on the value of the other variable.
Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 Engine sizeb . Enter
a. Dependent Variable: Horsepower
b. All requested variables entered.
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .837a .701 .699 31.096
a. Predictors: (Constant), Engine size
41
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 349403.231 1 349403.231 361.346 .000b
Residual 148910.358 154 966.950
Total 498313.590 155
a. Dependent Variable: Horsepower
b. Predictors: (Constant), Engine size
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 46.834 7.730 6.058 .000
Engine size 45.449 2.391 .837 19.009 .000
a. Dependent Variable: Horsepower
42
It can be seen that the independent variables can explain the dependent variable by 72.6%
which is our R square.
43
As our Anova is significant (p value is less than 0.05) we can say that our model is
statistically significant therefore there is a relation between our dependent and independent
variables.
Binary logistic regression is a statistical method used to analyze the relationship between a
binary dependent variable (also known as the response variable) and one or more independent
variables (also known as predictor variables). It is a type of regression analysis used when the
dependent variable is binary, which means it takes one of two values, such as "yes" or "no",
"true" or "false", or "success" or "failure". In logistic regression, the model estimates the
coefficients (also called beta coefficients or regression coefficients) of the independent
variables that best predict the probability of the dependent variable taking a certain value
(usually 0 or 1). These coefficients represent the change in the log odds of the dependent
variable for a one-unit increase in the corresponding independent variable, holding all other
variables constant.
44
3. No multicollinearity: The independent variables should not be highly correlated with
each other. Multicollinearity can lead to unstable estimates of the coefficients and reduce
the power of the analysis.
4. Binary dependent variable: The dependent variable should be binary, meaning it should
only have two possible outcomes.
Here are the steps for performing binary logistic regression in SPSS:
1. Open SPSS and navigate to the "Data Editor" window.
2. Enter the data into the spreadsheet, making sure that the dependent variable (i.e., the binary
outcome variable) is coded as 0 and 1.
3. Go to the "Analyze" menu and select "Regression" > "Binary Logistic."
4. In the "Binary Logistic Regression" dialog box, move the dependent variable to the
"Dependent" box and the independent variables to the "Covariates" box.Specify any
additional options, such as interactions or categorical variables, by clicking on the
"Categorical" or "Interaction" buttons.
6. Click on the "Options" button to specify any output options, such as including odds ratios
or goodness-of-fit statistics.
7. Click "OK" to run the analysis.
8. Interpret the output, which will include information on the coefficients, odds ratios,
standard errors, confidence intervals, and significance tests for each variable. Also, check the
assumptions of the logistic regression, such as linearity, independence of observations,
multicollinearity, and absence of perfect separation.
Dependent Variable
Encoding
Original Value Internal Value
No 0
Yes 1
45
Step 1 Step 9.872 1 .002
Block 9.872 1 .002
Model 9.872 1 .002
The value of Chi-square test is 9.872 and the significance level is 0.02 ie less than 0.05
therefore this model is a good fit for our data.
46
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 2.335 2 .311
The significance level is 0.311 which is more than 0.05 therefore we fail to reject the
null hypothesis.
We can conclude that there is no difference between observed and predicted model.
47
Classification Tablea
Predicted
History of diabetes Percentage
Observed No Yes Correct
Step 1 History of No 2253 0 100.0
diabetes Yes 168 0 .0
Overall Percentage 93.1
a. The cut value is .500
As the overall percentage is 93.1% this means that our model is able to predict the
correct category of the dependent variable 93.1%.
Exp(B) or Odds ratio is 1.306, this means that the odds of a case falling in the assumed
or desired category (in this case “yes” or 1) is more by 1.306.
48
17 : Factor Analysis
Factor analysis is a data reduction technique that is commonly used to identify a smaller set
of variables from a larger set of variables. It is a statistical method that is used to identify
underlying variables, known as factors, that explain patterns of correlation within a set of
observed variables.
49
50
51
We can see that only 8 components have an eigenvalue of more than 1, therefore only these
components are taken in consideration. Also it can be noted that these components
collectively explain 76.662% of the variation in the dependent variable.
52
53
54
55
18: Cluster Analysis
Cluster Analysis Cluster analysis is a statistical method used in research methodology to classify
thedata into groups or clusters based on the similarities or dissimilarities between the
observations. In other words, it is a method used to group similar objects into respective
categories. It can also be referred to as segmentation analysis. The goal of the analysis is to
sort different objects into groups in a manner that the degree of association between two
objects is high if they belong to the same group, and low if they belong to different groups.
56
19: Discriminant Analysis
57