RM Project Devansh
RM Project Devansh
RM Project Devansh
ON
“RESEARCH
METHODOLOGY”
GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY
BACHELOR OF BUSINESS
ADMINISTRATION
BATCH 2020-2023
SUBMITTED BY : SAKSHI GARG
SUBMITTED TO:-
DR. MADHU ARORA
CERTIFICATE
The assistance rendered during the study has been duly acknowledged.
No part of this work has been submitted for any other degree.
Any accomplished requires the effort of many people and this work is not different.
Regardless of the source, I wish to express my gratitude to those who may have
contributed to this work, even though anonymously.
My final thanks goes out to my parents, family members, teachers and friends who
encouraged me countless time to preserve through this entire process.
TABLE OF CONTENTS
S.No. CONTENTS Page No.
2.3 Graphs
Annexure Questionnaire
EXERCISE
1
INTRODUCTION TO
SPSS
SPSS - What Is It?
SPSS means “Statistical Package for the Social Sciences" and
was first launched in 1968. Since SPSS was acquired by IBM in
2009, it's officially known as IBM SPSS Statistics but most users
still just refer to it as "SPSS”.
SPSS is software for editing and analysing all sorts of data.
These data may come from basically any source: scientific
research, a customer database, Google Analytics or even the
server log files of a website. SPSS can open all file formats that
are commonly used for structured data such as
spreadsheets from MS Excel or Open Office;
plain text files (.txt or .csv); • relational (SQL) databases;
Stata and SAS.
SPSS Data View
After opening data, SPSS displays them in a spreadsheet-like
fashion as shown in the screenshot below from free lancers.
Sav .
This sheet called data view- always displays our data values. For
instance, our first record seems to contain a male respondent
from 1979 and so on. A more detailed explanation on the exact
meaning of our variables and data values is found in a second
sheet shown below.
Inferential Statistics
SPSS contains all basic statistical tests and multivariate analyses
such as
. t-tests;
. chi-square tests
. ANOVA
. correlations and other association measures;
. regression
. nonparametric tests
. factor analysis
. cluster analysis
Some analyses are available only after purchasing additional
SPSS options on top of the main program. An overview of all
commands and the options to which they belong is presented in
Overview All SPSS Commands
SPSS One Sample T-Test Output Example
Saving Data and Output
SPSS data can be saved as a variety of file formats, including
. MS Excel
. plain text (.txt or .csv);
. Stata
. OSAS
The options for output are even more elaborate: charts are often
copy-pasted as images inpng format. For tables, rich text format
is often used because it retains the tables' layout ,fonts and
borders Besides copy-pasting individual output items, all output
items can be exported in one go pdf, HTML, MS Word and
many other file formats. A terrific strategy for writing a report is
creating an SPSS output file with nicely styled tables and chart.
Then export the entire document to Word and insert explanatory
text and titles between the output items .Right, I hope that gives
at least a basic idea of what SPSS is and what it does. Let's now
explore SPSS in some more detail, starting off with the Data
Editor window. We'll present many more examples in the next
couple of tutorials as well.
APPLICATIONS OF SPSS
Statistical Package for the social sciences (SPSS) is a window-
based program first launched in 1968. In 2009, SPSS is acquired
by IBM. Hence, it is officially known as IBM SPSS statistics.
SPSS is widely used in the social and behavioural sciences. It is
also used by health researchers, market researchers, survey
companies, education researchers, government, etc. Various
windows can be opened when using SPSS such as data editor,
output navigator, pivot table editor, chart editor, text output
editor, and syntax editor. The data editor is a spreadsheet in
which variables can be defined and entered into the data. Each
row corresponds to a case while each column represents a
variable. This window opens automatically when SPSS is
started. The output navigator window displays the statistical
results, tables, and charts from the analysis. Output displayed in
pivot tables can be modified in many ways with the pivot table
editor. It is possible to modify and save high-resolution charts
and plots by invoking the chart editor for a certain chart in an
output navigator window. Text output not displayed in pivot
tables can be modified with the text output editor. SPSS contains
all basic statistical tests and multivariate analyses such as t-tests,
chi-square tests, ANOVA, correlations and regressions, non-
parametric tests, cluster analysis, etc.IBM SPASS statistics 26
continues to increase accessibility to advanced analytics through
improved tools, integration, output, and ease-of-use features.
This release mainly focuses on increasing the analytic
capabilities of the software through quantile regression, ROC
analysis, Bayesian statistics, one sample binomial and Poisson
enhancements, reliability analysis, and command enhancements.
SPSS software is used for editing and analysing all sorts of data
available from scientific research, clinical studies, customer
database, Google Analytics, etc. SPSS can open all file formats
that are commonly used for structured data such as spreadsheets
from MS Excel, plain text files, relational database, stats, SAS,
etc. SPSS Statistics can read and write data from ASCII text
files, other statistics packages, spreadsheets, and databases.
Statistical output is a proprietary file format and the proprietary
output can be exported to text or Microsoft word, pdf, excel, and
other formats. The typical workflow of SPSS software is as
follows:
• Opening data files in SPSS file format or others.
• Editing data such as computing sums and means over
columns or rows of data.
• Creating tables and charts containing frequency counts or
summary statistics over cases and variables.
• Running inferential statistics such as one-way ANOVA,
two-way ANOVA, regression, correlation, factor analysis, etc.
• Saving data and output in different file formats.
OPEN SPSS
The SPSS program has three main types of windows: the data editor, output window and syntax
window. The data editor window is open by default, and contains the data set. It consists of two
views, the Data View and the Variable View. This window is described in more detail in the
section on Working With Data and Variables. Data files are saved with a file type of .sav.
The output window holds the results of analyses. This window will open automatically once an
analysis is requested. The tables of the Output Viewer are saved (click File, Save or Save As)
with a file type of .spv, which can only be opened with SPSS software.
The syntax window contains written commands corresponding to each menu command and
options. Syntax can be created by hitting {Paste} instead of {oK} on main windows for each
procedure. Using {Paste} will not cause the procedure to be performed. To run procedures from
the syntax window, click on .
The syntax window will only open if a syntax file is opened by the user, or if the paste option is
used when executing a command. Output and syntax files can be saved and opened using the File
menu. Multiple output and syntax files can be open at the same time. Syntax files are saved as
plain text and almost any text editor can open them, but with a file extension of .sps.
Dialogue Boxes
Although each dialog box is unique, they have many common features. A fairly typical example
is the dialog box for producing frequency tables (tables with counts and percents). To bring up
this dialog box from the menus in the data window, click on Analyze Descriptive Statistics
Frequencies.
Working with Data and Variables
Data in SPSS can be viewed in two different ways: data view and variable view. The data view
allows the user to look at the entire data set, with each row showing a different observation, and
each column representing a different variable. Another way to view the data is to use the variable
view. This shows the variable names and general properties for each variable. The user can
alternate between these views using the tabs at the bottom left hand side of the SPSS data editor
window, Figure 8 below shows the data view.
Define variable properties
To define or change the attributes of variables, change to “Variable View” to see a list of all the
variables with their properties from the current data file. Click or double click the variable you
would like to specify or change. Descriptions of each attribute are shown below
Name is the name of the variable. Rules for establishing variable names can be found on IBM
SPSS help Command Syntax Reference Universals Variables Variable Names.
Type is the type of a variable. Common options are Numeric for numbers, Date for dates, and
String for character strings. The string option allows the user to type in any set of characters
including punctuation marks and blank spaces. It is ideal for inputting open- ended questions
which are not coded.
Width is the maximal number of characters or digits allowed for a variable. Generally a width
large enough to accommodate all the possible values of the variable should be chosen; otherwise
any values with length greater than the specified value will be truncated.
Decimals are valid for numeric variables only. It specifies the number of decimals to be kept
for a variable. All the extra decimals will be rounded up and the rounded numbers will be used in
all the analysis, so be careful to specify the number of decimals to fit the required precision.
Label is the descriptive label for a variable. One can assign descriptive variable labels up to
256 characters long, and variable labels can contain spaces and reserved characters not allowed in
variable names.
Values is the descriptive value labels for each value of a variable. This is particularly useful if
the data file uses numeric codes to represent non-numeric categories (for example, codes of 1 and
2 for male and female).
Missing specifies some data values as user-missing values. Refer to the Missing Values section
for more detail.
Columns is the column width for a variable. Column formats affect only the display of values
in the Data Editor. Changing the column width does not change the defined width of a variable. If
the defined and actual width of a value are wider than the column, asterisks (*) are displayed in
the Data 10 view. Column widths can also be changed in the Data view by clicking and dragging
the column borders.
Align controls the display of data values and/or value labels in the Data view. The default
alignment is right for numeric variables and left for string variables. This setting affects only the
display in the Data view.
Measure is the level of measurement as scale (numeric data on an interval or ratio scale),
ordinal, or nominal. Nominal and ordinal data can be either string (alphanumeric) or numeric.
Nominal and ordinal are both treated as categorical. The variable, origin (Country of Origin) is
measured on a nominal scale as the cars are distinguished on the basis of a name or label, i.e.
American, European, and Japanese; whilst the variable gallon (miles per gallon) is measured on a
scale, specifically, ratio measurement scale because the difference between measurements and
ratios are meaningful, and that they have a true zero value.
To download the Cars data file as an SPSS file (i.e. with the .sav extension and all variable
attributes edited as in the example) already click here.
Missing Values
Missing values are a topic that deserves special attention. This section explains why they arise
and how to define them. In SPSS there are two types of missing values: user defined missing
values and system missing values. By default in SPSS, both types of missing values will be
disregarded in all statistical procedures, except for analyses devoted specifically to missing
values, for example, replacing missing values. In frequency tables, missing values will be shown,
but they will be marked as such and will not be used in computation.
User Defined Missing Values
User defined missing values indicate data values that are either missing, due to reasons like
nonresponse, or are not desired to be used in most analyses (e.g. “Not Applicable”.) By default
SPSS uses “.” to represent missing values. In some cases, there might be the need to distinguish
between data missing because a respondent refused to answer and data missing because the
question did not apply to that respondent, and thus would like more than one expression for
missing values. One can achieve this by setting up the “Missing” property of the corresponding
variable to specify some data values as missing values. These options allows one to enter up to
three discrete missing values, a range of missing values, or a range plus one discrete missing
value. All string values, including null or blank values, are considered valid values unless they
are explicitly defined as missing. To define null or blank values as missing for a string variable,
enter a single space in one of the fields for discrete missing values. You will notice missing
values denoted by “.”, for the variable mpg observations 11-15. The example in below shows
how to specify user defined missing values for variable mpg by setting up its Missing” property
System Missing Values
System missing values occur when no value can be obtained for a variable during data
transformations. For example, if there are two variables, one indicating a person’s gender and the
other whether she or he is married and you create a new variable that tells whether (a) a person is
male and married, (b) female and married, (c) male and not married, all females that are not
married will have a system missing value (“.”) instead of a real value.
Insert
The easiest way to manually input a new variable is to scroll through the data view spreadsheet
horizontally until the first empty column is encountered, and entering in the data. The new
variable can be named appropriately in the variable view spreadsheet. Alternatively, selecting the
“Insert Variable” option under the “Data” menu allows insertion of a new variable at other
locations in the table. By default, this inserts the new variable in the first column of the
spreadsheet, but this can be changed by highlighting the column to the right of the desired
location.
Recode
The recode function can be used to collapse ranges of data into categorical variables, and
reassigning existing values to other values. To create a new variable as a function of another (log,
sin, etc), use “Compute” (described in the next section.)
1. Select Recode into Different Variables under the Transform menu. Recoding into Same
Variables is not recommended, since it will change existing variables and you will lose the
original values.
2. Select each variable to be transformed, and move it into the section on the right hand side
using the button. Note that the same transformation will be applied to all of these variables. If
different types of transformations are required, each transformation needs to be done separately.
3. If new variables are being created, define name for the output variable on the right hand side. If
desired, a label can be entered as well, though it is not required. Once the desired name and label
are entered, you must click the Change button.
4. Select the Old and New Values button and the window below will appear. In the Old Value
side of the window, select the appropriate original values to be recoded.
a) By selecting Value one can specify a value to replace (e.g. “male” or “1”). It is case sensitive,
so “A” and “a” are considered to be different values.
c) The three range options partition numeric variables into categories. The above figure
demonstrates how a range of continuous variables can be condensed into a category. Rather than
running any procedures to find out the range of variables, the range options with LOWEST
through value: and value through HIGHEST: can be used to catch every point in the data set.
d) All other values can be used to pick up values not specifically referenced elsewhere.
5. On the New Value side, type in the new value. Then click the Add button to add it to the Old-
>New list. When recoding into different variables, one has the option of changing numbers to
strings, or converting numbers saved as strings to numbers. Unless otherwise specified, the new
variable will be saved in the same format as the original variable. Click Continue to close the
window. On the main screen, click OK or Paste to finish
Compute
Suppose you want to create a new variable, measuring the ratio of the vehicle’s weight to its
horsepower, you define the new variable as weihorse for the weight per unit horsepower. To
create a new variable as a function of one or more existing variables, select Compute from the
Transform menu. Enter the name of the new variable, weihorse, in the Target Variable box. In the
Numeric Expression box, use the keypad, function list, and the variable list to write out the
equation used to compute the new variable, (in this example: weight/horse). Click OK or Paste to
close the window..
CREATING GRAPHS
Graphs in SPSS may be generated using one of two options. The first option is the Legacy
Dialogs, which allows one to create basic charts and graphs. The second option is to use the Chart
Builder which allows one to generate charts either from a predefined gallery or by specifying
individual parts (for example, axes and bars). The steps to create a few common graphs are
shown below. However, SPSS has the ability to produce many other graphs such as population
pyramid, error bar, and 3-D bar chart. The Chart Builder allows more flexibility in creating
graphs. For any graph generated in SPSS, one can double click on the graph to invoke a Chart
Editor window, inside which one can double click any part of the graph to edit it.
Scatterplot
Suppose, we seek to investigate the linear relationship between miles per gallon and the vehicle
weight, we first plot a scatterplot to see the direction in which they are related. We will introduce
the Simple Scatterplot. In the “Graphs” menu, choose Legacy Dialogs Scatter/dot. Select
Simple Scatter, click on the Define button to get the window shown below. Select a variable for
the Y-axis and a variable for the X-axis. These variables must be numeric and not in date format.
One can also select a categorical variable to define rows of panels and another categorical
variable to define columns of panels. Using the “Title” button one may specify the title, subtitles
and the footnotes for the plot. In the following example we are plotting mpg against vehicle
weight, using model year to define rows of panels
Histogram
A histogram shows the distribution of a single numeric variable. By selecting Legacy Dialogs
Histogram in the Graphs menu, one can generate a histogram. One can check the Display normal
curve option to require an estimated normal curve displayed over the histogram. Suppose, you
want to draw histograms of the miles per gallon based on the origin of the vehicle, in the Panel by
dialogue box, either in the rows or columns, you can put the variable, origin, as shown below
Q-Q Plot
The Q-Q Plot (quantile-quantile plot) procedure plots the quantiles of a variable's distribution
against the quantiles of a variable from a test distribution. Q-Q plots are generally used to
investigate whether the distribution of a variable is consistent with a proposed distribution.
Specifically, Q-Q plots can be used to investigate whether a variable (e.g. residuals in a
regression model) follows a Normal distribution. If the distribution of the variable and the
proposed distribution are the same, points in the Q-Q plot follow a straight line. If the
distributions are not similar, points in the Q-Q plot deviate from the straight line. Suppose you
want to generate a Q-Q plot with a Normal distribution as the test distribution. Select Descriptive
Statistics Q-Q Plots in the Analyze menu. Enter the variables you want to plot into the
Variables box, and select Normal by clicking Test Distribution. Click OK to generate the plot.
Syntax File
Here is an example of the syntax for the Q-Q plot in Figure 18. After selecting Descriptive
Statistics QQ Plots in the Analyze menu. You enter the variables, mpg (Miles per Gallon), you
want to plot into the Variables box, and select Normal by clicking Test Distribution. Then you
click Paste to generate the syntax, below
You can save the syntax as a .sps file for later access in running the analysis.
DATA VIEW of SPSS
VARIABLE VIEW of SPSS
EXERCISE 1
DESCRIPTIVE STATISTICS
In the Analyze menu, the option Descriptive Statistics produces a submenu with the
choices Frequencies, Descriptives, Explore, Crosstabs, and Ratio. Of these,
Crosstabs and Descriptives have some particularly useful features which this
manual will cover. For more information on the other three, more information can
be found in the SPSS help menu, which is discussed on section “Help in SPSS” of
this manual.
Descriptives
[DataSet1]
Statistics
N Valid 9 9 9 9 9 9 9 9 9
Missi
1 1 1 1 1 1 1 1 1
ng
Frequency Table
Cumulative
Frequency Percent Valid Percent Percent
Age
Cumulative
Frequency Percent Valid Percent Percent
martial status
Cumulative
Frequency Percent Valid Percent Percent
Cumulative
Frequency Percent Valid Percent Percent
Cumulative
Frequency Percent Valid Percent Percent
Cumulative
Frequency Percent Valid Percent Percent
Cumulative
Frequency Percent Valid Percent Percent
EXERCISE 2
Histogram
Crosstabs
The Crosstabs procedure forms two-way and multi-way tables and
provides a variety of tests and measures of association for two-way
tables. Multi-way tables are formed using the ‘Layer’ button. Note that
tests are not made across layers. When layers are used, comparisons are
made for the row and column variables at each value of the layer
variable. The Statistics button at the bottom allows various statistics to
be computed, including correlations and Chi-square tests. To help
uncover patterns in the data that contribute to a significant chi-square
test, the Cells button provides options for displaying expected
frequencies and three types of residuals (deviates) that measure the
difference between observed and expected frequencies. Each cell of the
table can contain any combination of counts, percentages, and residuals
selected
Cases
Crosstab
martial status
2.0 Count 4 0 4
Chi-Square Tests
a. 4 cells (100.0%) have expected count less than 5. The minimum expected count is .89.
b. Computed only for a 2x2 table
Chi-Square Tests
EXERCISE 4
CHI – SQUARE
This test utilizes a contingency table to analyze the data. A contingency table (also
known as a cross-tabulation, crosstab, or two-way table) is an arrangement in
which data is classified according to two categorical variables. The categories for
one variable appear in the rows, and the categories for the other variable appear in
columns.
Chi-Square Tests
A t-test looks at the t-statistic, the t-distribution values, and the degrees of
freedom to determine the statistical significance. To conduct a test with three or
more means, one must use an analysis of variance.
T-Test
Group Statistics
The Independent Samples t Test compares the means of two independent groups in
order to determine whether there is statistical evidence that the associated
population means are significantly different. The Independent Samples t Test is a
parametric test.
Independent t Test
Independent Measures t Test
Independent Two-sample t Test
Student t Test
Two-Sample t Test
Uncorrelated Scores t Test
Unpaired t Test
Unrelated t Test
Equal
varianc
es not -.281 6.727 .787 -.2000 .7118 -1.8971 1.4971
assum
ed
how was the Equal
purchasing varianc
experience ? es 1.896 .211 .266 7 .798 .2000 .7521 -1.5785 1.9785
assum
ed
Equal
varianc
es not .281 6.727 .787 .2000 .7118 -1.4971 1.8971
assum
ed
how was the after Equal
purchasing varianc
services(warranty, es .007 .934 .220 7 .832 .2000 .9071 -1.9450 2.3450
customer services, assum
etc.) ed
Equal
varianc
es not .218 6.287 .834 .2000 .9165 -2.0181 2.4181
assum
ed
how was our brand Equal 1.922 .208 .702 7 .505 .5500 .7835 -1.3028 2.4028
is better than other varianc
brands ? es
assum
ed
Equal
varianc
es not .776 5.080 .472 .5500 .7089 -1.2636 2.3636
assum
ed
EXERCISE 7
Oneway ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any
statistically significant differences between the means of two or more independent (unrelated)
groups (although you tend to only see it used when there are a minimum of three, rather than two
groups). For example, you could use a one-way ANOVA to understand whether exam
performance differed based on test anxiety levels amongst students, dividing students into three
independent groups (e.g., low, medium and high-stressed students). Also, it is important to realize
that the one-way ANOVA is an omnibus test statistic and cannot tell you which specific groups
were statistically significantly different from each other; it only tells you that at least two groups
were different. Since you may have three, four, five or more groups in your study design,
determining which of these groups differ from each other is important. You can do this using a
post hoc test (N.B., we discuss post hoc tests later in this guide).
This "quick start" guide shows you how to carry out a one-way ANOVA using SPSS Statistics, as
well as interpret and report the results from this test. Since the one-way ANOVA is often
followed up with a post hoc test, we also show you how to carry out a post hoc test using SPSS
Statistics. However, before we introduce you to this procedure, you need to understand the
different assumptions that your data must meet in order for a one-way ANOVA to give you a
valid result. We discuss these assumptions next.
ANOVA
how did the product Between Groups 3.189 2 1.594 1.678 .264
perform ? Within Groups 5.700 6 .950
Total 8.889 8
how was the purchasing Between Groups 6.689 2 3.344 9.121 .015
experience ? Within Groups 2.200 6 .367
Total 8.889 8
how was the after purchasing Between Groups 3.189 2 1.594 .986 .426
services(warranty, customer Within Groups 9.700 6 1.617
services, etc.) Total 12.889 8
how was our brand is better Between Groups 2.422 2 1.211 .932 .444
than other brands ? Within Groups 7.800 6 1.300
Total 10.222 8
EXERCISE 8
Correlations
The Bivariate Correlations procedure computes Pearson's correlation coefficient, Spearman's rho,
and Kendall's tau-b with their significance levels. Correlations measure how variables or rank
orders are related. Before calculating a correlation coefficient, screen your data for outliers
(which can cause misleading results) and evidence of a linear relationship. Pearson's correlation
coefficient is a measure of linear association. Two variables can be perfectly related, but if the
relationship is not linear, Pearson's correlation coefficient is not an appropriate statistic for
measuring their association.
For each variable: number of cases with nonmissing values, mean, and standard deviation. For
each pair of variables: Pearson's correlation coefficient, Spearman's rho, Kendall's tau- b, cross-
product of deviations, and covariance.
Correlations
N 9 9 9 9
how was the purchasing Pearson Correlation .800** 1 .644 .466
experience ? Sig. (2-tailed) .010 .061 .206
N 9 9 9 9
how was the after purchasing Pearson Correlation .571 .644 1 .474
services(warranty, customer Sig. (2-tailed) .108 .061 .197
services, etc.) N 9 9 9 9
how was our brand is better Pearson Correlation .478 .466 .474 1
than other brands ? Sig. (2-tailed) .193 .206 .197
N 9 9 9 9
**. Correlation is significant at the 0.01 level (2-tailed).
EXERCISE 9
3D BAR GRAPH
Three-dimensional graphs are rarely used in practice except for didactic purposes.
They are kind of cool though and especially helpful for visualizing the idea of the
regression plane in a two-predictor multiple regression. Good luck visualizing a
four-dimensional graph for three predictors, however! The 3d orientation of the
plots for the various plotting methods below seems to vary considerably. I like the
orientation used on the R scatterplot3d package the best.
SPSS
The GGRAPH command is used and there are a number of options for appearances that I did not
employ. The order of the dimensions under the GUIDE statements is dimension 1 (x-width),
dimension 2 (y-depth), and dimension 3 (z-height). The dependent variables is typically put on
the vertical axis (z dimension). The name "graphdataset" appearing on the NAME keyword is an
arbitrary name and it can be any name you choose. It names the data set read out and used in the
later SOURCE command, so these two names must match exactly. Note that the Years Since PhD
axis values are descending rather than ascending.
EXERCISE 10
PIE CHART
A Pie Chart is a type of graph that displays data in a circular graph. The pieces of the graph are
proportional to the fraction of the whole in each category. In other words, each slice of the pie is
relative to the size of that category in the group as a whole. The entire “pie” represents 100
percent of a whole, while the pie “slices” represent portions of the whole.
Pie charts give you a snapshot of how a group is broken down into smaller pieces. The
following chart shows what New Yorkers throw in their trash cans. You could read that New
Yorkers (perhaps surprisingly) throw a lot of recyclables into their trash, but a pie graph gives a
clear picture of the large percentage of recyclables that find their way into the trash.
BM SPSS statistics is software specifically designed for stats, especially in the social sciences.
The software is capable of creating a large number of graph types with a huge variety of options.
Unlike simpler programs like Excel, SPSS gives you a lot of options for creating pie charts.
Questionnaire