0% found this document useful (0 votes)

34 views

SPSS Session

The document discusses quantitative data analysis with SPSS. It covers selecting appropriate statistical techniques based on research objectives and variable types. It also discusses preparing a code book and setting up an SPSS data file, including defining variable names, labels, types and formats. Additional topics covered include screening data for errors, dealing with missing data, and computing descriptive statistics.

Uploaded by

Zain Chughtai

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

SPSS Session

Uploaded by

Zain Chughtai

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 133

Q DATA ANALYSIS with SPSS

Objectives

Research Questionnaire: Design &

Development/Selection

Selection & search of scales

Scale adaptation
Data analysis with SPSS

2
Selecting Appropriate Statistical
Technique
 Research Objectives
 Group Comparison
 Relationship Exploration

 Type/Nature of IV & DV
 Categorical
 Continuous

 Number of IVs & DVs (LEVEL OF ANALYSIS)

 Univariate statistics summarize only one variable at a time.
 Does not examine relationships or causes
 Count the number of Boys and Girls in the class (describe data and find patterns)
 Mean, median, mode, dispersion etc.
 Bivariate statistics compare two variables (Gender and test grades).
 Correlation, Regression
3
 Multivariate statistics compare more than two variables
 To check all kinds of effects occur together.
Introduction to Research
Sample Research Model

Outcome/
Mediator Depen-dent
variable
H+
HI SC
Independent
H-
Variable (IV) H-

Pricing
H-

Outcome/
H+ Depen-dent
variable
H+
VBJ Consumption
5
Mediator
Research Objectives

Size
choice

Male/
Female Group Comparison
Size
choice

FW
H+

Large size
Relationship Exploration
H+

Consumption
6
QUESTIONNAIRE DESIGN &
DEVELOPMENT/SELECTION

7
Development of Research Instrument/Questionnaire
Important tips for
developing/adapting a scale
1. Better to use already developed scales which have demonstrated high reliability and validity
2. Use focus group study to adapt the existing scales into your context, if your research context is new
3. Don’t mention the name of the construct that you are intended to measure through a selected scale
4. Don’t number your questions, specially when you have large number of questions
5. Use appropriate responses for the selected questions and introduce category of Not Applicable/Don’t Know
if possible
 Nominal
 Ordinal/categorical variable
 Scale/continuous variable
5. Attach cover letter highlighting the purpose of research and ensuring participant’s anonymity.
6. Try to avoid
 long complex questions
 double negatives
8  jargon or abbreviations
 culture-speciﬁc terms
 words with double meanings
Questionnaire Design & Development/Selection

9
Quantitative Data Analysis with SPSS

Prepare a code book

Setup structure of data file
Data entry
Flow chart of data
Data screening
analysis process
Exploration of data
Explore Relationships Compare Groups
Factor Analysis Non Parametric
Correlation T-Test
Regression ANOVA
Logistic Regress: MANOVA
10

Adapted from Figure 1.4 The logic of research process (Vaus, 2009)
Preparing a Code Book & Setting Up a SPSS File

11
The following steps must be taken in the order presented below:

1. Develop a Coding Scheme: Use a blank survey to:

A. Specify any additional variables that may need to be added to the data
set, i.e., variable ID
B. Identify your reverse items (items that will have to be reverse coded
before they are used in any computation and/or analysis).
C. Assign variable names to all your variables.
D. Specify how you wish to code the values of each variable.
E. Specify how you will code missing values (e.g., non-responses).
 A true non-response is ordinarily coded as a blank.

“ For additional tips on developing coding scheme read part 1 of SPSS Survival Manual, 4 th edition”

12
Original Survey Coded Survey

13
2. Create your new SPSS FILE and define the attributes of your variables

Make sure to follow the ground rules you have established in your coding scheme. That is, to
designate:
A. Variable Names

A. Variable Labels

14
B. Value Labels (e.g., for variable gender, 0=Male, 1=Female this is
especially important for categorical variables.)
C. User-Missing Values

D. Variable Types (i.e., numeric or string for text info)

E. Variable Formats (maximum number of digits and decimals

required for coding the values)
F. Column Formats (column width used to display a variable on the
screen)
15
3. Code the data into your SPSS file following the ground rules
established in your coding scheme and observed when creating your SPSS
file.
4. Reverse code your reverse items--only after all the coding is
completed (Menu Bar: Transform, Recode into a new var.)
5. Compute and print summary/descriptive statistics--e.g., mean, std.
dev., min., and max. for metric variables; and frequencies, min., and max. for
categorical variables

16
6. Examine the data and summary statistics (e.g., frequencies, min. &
max. values) to spot and correct coding errors.
7. Calculate reliability ONLY for each of your summated multi-
item scale and, (a) identify items detracting from reliability, and (b)
refine the scale accordingly
8. Create a composite/summated variable for each summated
multi-item scale
(Menu Bar: Transform, Compute--Target Variable = a mathematical
17
expression)
9. Compute Descriptive Statistics for different subgroups
separately
(Menu Bar: Data, Split File, Organize Output by Groups--specify the grouping variable) OR
(Menu Bar: Analyze, Compare Means, Means, Dep. List, Indep. List, Select Options, OK)
10. Save and print your SPSS data file (Menu Bar: File, Save As…)
11. Analyze the data (Menu Bar: Analyze--select analysis options)

“ For additional help read part 2 & 3 of SPSS Survival Manual, 4 th edition”
SCREENING AND CLEANING THE DATA FILE
DATA SCREENING

 Data screening (also known for us as “data screaming”) ensures your data is “clean”
and ready to go before you conduct further your planned statistical analyses.
 Data must always be screened to ensure the data is reliable, and valid for testing the
type of causal theory your have planned for.
 Screening and cooking are not synonymous – screening is like preparing the best
ingredients for your gourmet food!
STATISTICAL PROBLEMS WITH MISSING DATA

 If you are missing much of your data, this can cause several problems;
e.g., can’t calculate the estimated model.
 EFA, CFA, and path models require a certain minimum number of data
points in order to compute estimates – each missing data point reduces
your valid n by 1.
 Greater model complexity (number of items, number of paths) and
improved power require larger samples.
LOGICAL PROBLEM WITH MISSING DATA

 Missing data will indicate systematic bias because respondents may not have answered particular
questions in your survey because of a common cause (poor formulation, sensitivity etc).
 For example, if you ask about gender, and if females are less likely to report their gender than
males, then you will have “male-biased” data.
 Perhaps only 50% of the females reported their gender, but 95% of the males reported gender.

 If you use gender as moderator in your causal models, then you will be heavily biased toward
males,
 Because you will not end up using the unreported responses from females.
 You may also have biased sample from female respondents.
CASE SCREENING
 Cases refers to the rows in your data set
 Missing data in rows
 Unengaged responses
 Outliers on continuous variables
 Go to transform and replace the missing values
 Median for ordinal(other variables)
 Mean for continuous or interval scale(experience)
 Change with median of near by points for others and series mean for
experience
WHAT DO WE REPORT?

 We had 7 variables with missing

values, all <5% missing which we
replaced with median for ordinal
scales and mean for continuous
scales.Moreover, we deleted two
rows due to fully incomplete
responses as more than 20%
responses were missing
WHAT IF YOU HAVE MORE THAN 5 OR 10% MISSING VALUES ON
A SINGLE COLUMN

 All the effects of the regression or correlation will be diluted or

dampened because you start brining everything towards mean so that’s
not ideal
UNENGAGED RESPONSES

 This is tricky and not everyone looks into this

 When somebody is taking your survey and not really paying
attention
 How do we detect this?
 You cant visually track the data of more than 300
 We can insert an attention trap
 If you are still paying attention please answer strongly disagree
 so you are justified to remove their data

 Another way is by recording the time required for a survey of

60q
 Use reverse coded items
VARIABLE SCREENING- NORMALITY-
SKEWNESS AND KURTOSIS

 Go to analyze-descriptive stats
 Select all variables except id
 Frequencies, skewness and kurtosis
 We don’t really need skewness for a 5-point Likert scale

 If your skewness value is greater than 1 then you are positive (right) skewed, if it is less than -1 you are
negative (left) skewed, if it is in between, then you are fine.
 Some published thresholds are a bit more liberal and allow for up to +/-2.2, instead of +/-1.
 Values over 3 are problematic
 If the absolute value of the skewness/kurtosis is less than three times the standard error, then you are
fine; otherwise, you are skewed
 You can delete the item if you have many items of the construct but if you have less then you better watch it during
factor analysis
CORRELATION AND REGRESSION WITH SPSS
WHAT IS A CORRELATION?

 It is a way of measuring the extent to which two

variables are related.
 It measures the pattern of responses across
variables.
160

140

Appreciation of Dimmu Borgir

120

100

-20
10 20 30 40 50 60 70 80 90

Age
90

Appreciation of Dimmu Borgir

10
10 20 30 40 50 60 70 80 90

Age
100

Appreciation of Dimmu Borgir

-20
10 20 30 40 50 60 70 80 90
Slide 33 Age
MEASURING RELATIONSHIPS

 We need to see whether as one variable increases, the other increases, decreases or
stays the same.
 This can be done by calculating the Covariance.
 We look at how much each score deviates from the mean.
 If both variables deviate from the mean by the same amount, they are likely to be related.
Run this data in you SPSS to check for correlation and Regression

Are the findings significant?

REVISION OF VARIANCE

 The variance tells us by how much scores deviate from the mean for a
single variable.
 It is closely linked to the sum of squares.
 Covariance is similar – it tells is by how much scores on two variables
differ from their respective means.
COVARIANCE

 Calculate the error between the mean and each subject’s score for the first variable
(x).
 Calculate the error between the mean and their score for the second variable (y).
 Multiply these error values.
 Add these values and you get the cross product deviations.
 The covariance is the average cross-product deviations:
 xi  x  yi  y 
Cov( x, y )  N 1
PROBLEMS WITH COVARIANCE

 It depends upon the units of measurement.

 E.g. The Covariance of two variables measured in Miles might be 4.25, but if the same scores are
converted to Km, the Covariance is 11.
 One solution: standardise it!
 Divide by the standard deviations of both variables.

 The standardised version of Covariance is known as the Correlation coefficient.

 It is relatively unaffected by units of measurement.
THE CORRELATION COEFFICIENT

Cov xy
r sx s y
 xi  x  yi  y 
 N 1s x s y
THINGS TO KNOW ABOUT THE CORRELATION

 It varies between -1 and +1

 0 = no relationship

 It is an effect size
 ±.1 = small effect
 ±.3 = medium effect
 ±.5 = large effect

 Coefficient of determination, r2
 By squaring the value of r you get the proportion of variance in DV shared by the variance in the other IV
 It is also the effect size

 Coefficient of Alienation

 1-r2
 It is the proportion of variance in the dependent variable (Y) unexplained by variance in the independent variable (X)
CORRELATION AND CAUSALITY

 The third-variable problem:

 In any correlation, causality between two variables cannot be assumed because there may
be other measured or unmeasured variables affecting the results.
 Direction of causality:
 Correlation coefficients say nothing about which variable causes the other to change
 Although it is intuitively appealing to conclude that watching adverts causes us to buy
packets of toffees, there is no statistical reason why buying packets of toffees cannot cause
us to watch more adverts.
CAN YOU CALCULATE CORRELATION WITH SPSS?

Enter the advert data and use the chart editor to

produce a scatterplot (number of packets bought
on the y-axis, and adverts watched on the x-axis) of
the data.

47
SCATTERPLOT

48
TYPES OF CORRELATION

 There are two types of correlation: bivariate and partial.

 A bivariate correlation is a correlation between two variables
 A partial correlation looks at the relationship between two variables while
‘controlling’ the effect of one or more additional variables.

49
ASSUMPTIONS OF PEARSON’S R

 Sampling distribution has to be normally distributed for both variables

 There is one exception to this rule: one of the variables can be a categorical variable provided there are
only two categories
 If your data are non-normal or are not measured at the interval level then you should deselect the Pearson
tick-box.

50
CORRELATION: EXAMPLE

 QRM file
INTERPRETATION

 Each variable is perfectly correlated with itself (obviously) and so r = 1

 PQ is + related to BL with a Pearson correlation coefficient of r = .288 and the significance value is
less than .01 (as indicated by the double asterisk after the coefficient).
 Our criterion for significance is usually .05 so SPSS marks any correlation coefficient significant at
this level with an asterisk
 As PQ increases BL also increases.

52
USING R 2 FOR INTERPRETATION

 Although we cannot make direct conclusions about causality from a correlation, we can take the correlation
coefficient a step further by squaring it.
 The correlation coefficient squared (known as the coefficient of determination, R2) is a measure of the
amount of variability in one variable that is shared by the other.
 (.288)2 = .083. This value tells us how much of the variability in BL is shared by PQ.
 Although PQ was correlated with BL, it can account for only 8.3% of BL.
 To put this value into perspective, this leaves 91.7% of the variability still to be accounted for by other
variables.

53
 Note that although R2 is an extremely useful measure of the substantive
importance of an effect, it cannot be used to infer causal relationships.
 Although we usually talk in terms of ‘the variance in y accounted for by x’, or
even the variation in one variable explained by the other
 This still says nothing about which way causality runs. So, although PQ can
account for 8.3% of the variation in BL, it does not necessarily cause this
variation.

54
REPORTING THE RESULTS- EX

Exam performance was significantly correlated with exam

anxiety, r = -.44, and time spent revising, r = .40; the time
spent revising was also correlated with exam anxiety, r =
-.71 (all ps < .001).
FIND R2 OF REST OF THE VARIABLES IN THE TABLE AND
REPORT THEM.

56
(POINT-)BISERIAL CORRELATION

 Point-biserial correlation, rpb: relationship between a continuous variable and a

variable that is a discrete dichotomy
 Biserial correlation, rb: relationship between a continuous variable and a variable
that is a continuous dichotomy (continuum underlying the two categories passing or
failing an exam)
 A point–biserial correlation is simply a Pearson correlation when the dichotomous
variable is coded with 0 for one category and 1 for the other
USE FILE ROAMING CATS TO CALCULATE POINT–BISERIAL
CORRELATION

 R2 = (0.378)2 = .143. Hence, we can conclude that gender accounts for 14.3% of
the variability in time spent away from home.
58
Nonparametric Correlation

59
NONPARAMETRIC CORRELATION

 Spearman’s Rho
 Pearson’s correlation on the ranked data

 Kendall’s Tau
 Better than Spearman’s for small samples

 World’s best Liar Competition

 68 contestants
 Measures
 Where they were placed in the competition (first,
second, third, etc.)
 Creativity questionnaire (maximum score 60)
CORRELATION OUTPUT
SPEARMAN’S RHO
CORRELATION OUTPUT
KENDALL’S TAU
REGRESSION WITH SPSS
Moving beyond Correlation

• Correlation is useful to tell us the relationship about two variables,

but it tells us nothing about the predictive model to our data and use
that model to predict values of the Dependent variable from one or
more independent variables.
A straight line…
1.The slope (or gradient) of the line; and
2.The point at which the line crosses the vertical
axis of the graph (known as the intercept of the
line).
The method of least squares

• We need to find a “model” that has the least “variances” and best fit
the data.

• It means we need to find a straight line to “describe” our data.

A Regression Line: a line that minimizes the sum of squared differences.

Discovering Statistics Using SPSS

Goodness-of-fit:
how “fit” is the
line?

SSM = SSR - SST

Looking at SST
• The sum of squares total, denoted SST, is the squared differences between the observed dependent
variable and its mean. You can think of this as the dispersion of the observed variables around
the mean – much like the variance in descriptive statistics.
• The line representing the mean is flat, which means that as the predictor variable changes, the value of the
outcome does not change (because for each level of the predictor variable, we predict that the outcome
will equal the mean value).
• The important point here is that a bad model (such as the mean) will have regression coefficients of 0 for
predictors.
• A regression coefficient of 0 means:
• (1) A unit change in the predictor variable results in no change in the predicted value of the outcome (the
predicted value of the outcome does not change at all)
• (2) the gradient of the regression line is 0, meaning the regression line is flat

It is a measure of the total

variability of the dataset.
SSR
• The sum of squares due to regression, or SSR.
• It is the sum of the differences between the predicted value and the mean of the dependent
variable. Think of it as a measure that describes how well our line fits the data
• If this value of SSR is equal to the sum of squares total, it means our regression model captures
all the observed variability and is perfect
SSE or SSM
• The error is the difference between the observed value and
the predicted value.
• We usually want to minimize the error. The smaller the error, the
better the estimation power of the regression.

If the value of the SSm is larger, then the regression model is

very different from using the mean to predict the dependent
variable.

If the value of the SSm is small, then using the regression model
is little better than using the mean as the model.
How they are related?
• Mathematically, SST = SSR + SSE.

• The rationale is the following: the total

variability of the data set is equal to the
variability explained by the regression
line plus the unexplained variability, known as
error.
• Given a constant total variability
• Lower error will cause a better regression.
• Higher error will cause a less powerful regression.
• And that’s what you must remember, no
matter the notation.
R square F ratio
• It represents the amount of • Is a measure of how much the
variance in the outcome model has improved the
explianed by the SSR relative prediction of the outcome
to how much variation was to compared to the level of
explain by the SST (mean). inaccuracy of the model.
• A good model should have a
• Thus, R² = SSR/SST large F-ratio.
Regression: An Example
• A record company boss was interested in predicting record sales from
advertising.
• Data
• 200 different album releases
• Outcome variable:
• Sales (CDs and Downloads) in the week after release
• Predictor variables
• The amount (in £s) spent promoting the record before release (see last
lecture)
• Number of plays on the radio (new variable)
The Model with One Predictor

Slide 75
Discovering Statistics Using SPSS
• Advertising expenditure can account for 33.5% of the variation in
record sales.
• 66% of the variation in record sales cannot be explained by advertising
alone.
• Therefore, there must be other variables that have an influence also.
• F is 99.59, which is significant at p < .001
• This result tells us that there is less than a 0.1% chance that an F-ratio this large would
happen if the null hypothesis were true.
• Therefore, we can conclude that our regression model results in significantly better
prediction of record sales than if we used the mean value of record sales.
• In short, the regression model overall predicts record sales significantly well
• The ANOVA tells us whether the model, overall, results in a significantly good degree of
prediction of the outcome variable.
• However, the ANOVA doesn’t tell us about the individual contribution of variables in the
model
• Intercept is b0 = 134.14, and this can be interpreted as meaning that when no money is spent on advertising (when X
= 0), the model predicts that 134,140 records will be sold
• b1 is 0.096 . Although this value is the slope of the regression line
• The change in the outcome associated with a unit change in the predictor.
• If our predictor variable is increased by one unit (if the advertising budget is increased by 1), then our model predicts that 0.096
extra records will be sold.
• For an increase in advertising of £1000 the model predicts 96 (0.096 × 1000 = 96) extra record sales.
• Is it good for company?
• As you might imagine, this investment is pretty bad for the record company: it invests £1000 and gets only
96 extra sales.
• Fortunately, as we already know, advertising accounts for only one-third of record sales
• R square = .335, which tells us that advertising expenditure can account for 33.5% of the
variation in record sales.

• ANOVA test = F ratio = 99.587

• Beta = the change in the outcome associated with a unit change in the predictor = if our
independent variable is increased by 1 unit, our model predicts that 0.096 extra records will be sold.

• T-test = tests the null hypothesis that the value of beta is 0: therefore, if it is significant,
• we accept the hypothesis that the beta value is significantly different from zero
• the predictor variable contributes significantly to our ability to estimate values of the outcome.
• Like F, the t-statistic is also based on the ratio of explained variance against
unexplained variance or error.
Using the model
• record salesi = b0 + b1advertising budget
= 134:14 + ð0:096 × advertising budgetiÞ

• Imagine a record executive wanted to spend £100,000 on advertising a new

record.
• Record sales should be around 144,000 for the first week of sales
MULTIPLE
REGRESSION
• A logical extension of the simple regression model to situations in
which there are several independent variables.

• We talked about regression LINE in a simple regression model, now

we are talking about a regression PLANE
• Linear Regression is a model to predict the value of one variable from
another.
• Multiple Regression is a natural extension of this model:
• We use it to predict values of an outcome from several predictors.
• It is a hypothetical model of the relationship between several variables.
Multiple Regression as an Equation

• With multiple regression the relationship is

described using a variation of the equation of a
straight line.

y  b0  b1 X 1 b2 X 2    bn X n   i

Slide 84
b0

• b0 is the intercept.
• The intercept is the value of the Y variable when
all Xs = 0.

• This is the point at which the regression

plane crosses the Y-axis (vertical).

Slide 85
Beta Values

• b1 is the regression coefficient for

variable 1.
• b2 is the regression coefficient for
variable 2.
• bn is the regression coefficient for nth
variable.

Slide 86
Doing Multiple Regression

Slide 87
Methods of Regression

• Hierarchical:
• (Blockwise Entry): based on early research findings ,experimenter decides
the order in which variables are entered into the model.
• Forced Entry:
• All predictors are entered simultaneously.
• Stepwise:
• Predictors are selected using their semi-partial correlation with the
outcome.
• Exploratory

Slide 88
Doing Multiple Regression

Slide 89
Regression Statistics
Regression
Diagnostics
Output: Model Summary

Slide 92
Output: ANOVA

Slide 93
Analysis of Variance: ANOVA

• The F-test
• looks at whether the variance explained by the model (SSM)
is significantly greater than the error within the model
(SSR).
• It tells us whether using the regression model is
significantly better at predicting values of the outcome
than using the mean.
• Regression is used to predict a continuous outcome on the
basis of one or more continuous predictor variables.
• Whereas ANOVA is used to predict a continuous outcome on
the basis of one or more categorical predictor variables.
Slide 94
Output: betas

Slide 95
How to Interpret Beta Values

• Beta values:
• the change in the outcome associated with a unit
change in the predictor.
• Standardised beta values:
• tell us the same but expressed as standard
deviations.

Slide 96
How to Interpret Beta Values
How to Interpret Beta Values
Constructing a Model
y  b0  b1 X 1  b2 X 2
Sales  41124  0.087Adverts  3589plays

£1 Million Advertising, 15 plays

Sales  41124  0.087  1,000,000  3589  15
 41124  87000  53835
 181959

Slide 99
Standardised Beta Values

• They tell us the number of standard deviations that the outcome will
change as a result of one standard deviation change in the predictor.
• The standardized beta values are all measured in standard deviation
units and so are directly comparable: therefore, they provide a better
insight into the ‘importance’ of a predictor in the model

Slide 100
Reporting the Model
Logistic
Regression
SIMPLE REGRESSION VS LOGISTIC
• We can use regression to predict future outcomes based on past data, when
the outcome is a continuous variable
• Logistic regression, an extension of regression that allows us to predict
categorical outcomes based on predictor variables.
• We can predict which of two categories a person is likely to belong to given certain
other information.
When and Why
• To predict an outcome variable that is categorical
from one or more categorical or continuous
predictor variables.
• Used because having a categorical outcome
variable violates the assumption of linearity in
normal regression.
Assessing the Model: the log-likelihood
statistic

• It is an indicator of how much unexplained information there is after the model

has been fitted.
• Large values indicate poorly fitting statistical models.
• The larger the value of the log-likelihood, the more unexplained observations
there are.
Assessing the model: R and RL2
• This R-statistic
• is the partial correlation between the outcome variable
and each of the predictor variables.
• It can vary between −1 and 1.
• A positive value indicates, that as the predictor variable
increases, so does the likelihood of the event occurring.
• A negative value implies that as the predictor variable
increases, the likelihood of the outcome occurring decreases.
• If a variable has a small value of R, then it contributes only a
small amount to the model.
Assessing Predictors: The Wald Statistic
b
Wald  SE b

• As in linear regression, we want to know not only how well the model overall fits the
data, but also the individual contribution of predictors.
• In linear regression, we used the estimated regression coefficients (b) and their standard
errors to compute a t-statistic.
• In logistic regression there is an analogous statistic known as the Wald statistic, which
has a special distribution known as the chi-square distribution.
• Wald statistic tells us whether the b coefficient for that predictor is significantly different
from zero. If the coefficient is significantly different from zero then we can assume that
the predictor is making a significant contribution to the prediction of the outcome (Y):
EXAMPLE
We do need to decide whether to use the first category as our baseline ( )
or the last category ( ).

• Choose forward LR because study is the first in the field and so we have no past research to tell us which
variables to expect to be reliable predictors and it follows a stepwise procedure
• By default, SPSS uses Indicator coding, which is the standard dummy variable 0 and 1
• 236.992 represents the fit of the most basic
model to the data. When including
only the constant, the computer bases the model
on assigning every participant to a single
category of the outcome variable.

• 61 chose large whereas 127 chose small.

Therefore, if SPSS predicts that every
consumer chose large then this prediction will
be correct
61 times out of 188 (i.e. 32% approx.).
• However, if SPSS predicted that every
consumer did not choose large(chosen
small), then this prediction would be
correct only 127 times out of 188 (68%
approx.)
• Overall, the model correctly classifies
67.6% of consumers
Classification table-SP
Effect size measures for the model.

• The model predicts that all of the consumers who were

in supersized pricing condition chose large.
• There were 48 consumers who chose large, so the
model predicts that these 48 chose large; it is correct for
27 of these consumers, but misclassifies 21 consumers
as chosen large who did not
• This new model predicts that all of the 140 consumers who
were not in SP did not chose large; for these consumers the
model is correct 106 times but misclassifies as 34
consumers who did.
• In linear regression b represents the change in the outcome resulting from a unit change in the predictor
variable.
• The interpretation of this coefficient in logistic regression is very similar in that it represents the change in
the logit of the outcome variable associated with a one-unit change in the predictor variable.
• Wald statistic tells us whether the b coefficient for that predictor is significantly different from zero.
• If the coefficient is significantly different from zero then we can assume that the predictor is making a
significant contribution to the prediction of the outcome (Y)
• Having the supersized pricing (or not) is a significant predictor of whether the consumer chose large(note
that the significance of the Wald statistic is less than .05).
• Odds of consumer in the supersized pricing choosing large are 1.893 times higher than those who are in
linear price condition
• we can be fairly confident that the population value of the odds ratio lies between 1.017 and 3.524
Price condition was significant (b=.638, Wald = 4.053, p <.05) with an OR=1.893, which
suggests that participants in the supersized pricing condition almost two times likely to
choose the larger bottle.
Logistic Regression Predicting Likelihood of Reporting size choice under supersized vs. linear pricing condition

Price condition was significant (b=1.178, Wald = 6.694, p <.05) with an OR=3.248, which suggests that participants in the supersized pricing condition were more than three times likely to
choose the larger bottle supporting H1. More than 56% of participants chose the large size of bottle in the supersized pricing condition compared to 26.7% in linear price condition explaining
the influence of supersize pricing on consumers’ size choice decision.
Nutritional label condition was not significant (b = -.110, Wald = .052, p >.82) but more importantly, the two-way interaction between Price and Nutritional label condition was significant (b
= -1.318, Wald = 3.851, p =.05). This means that nutritional label alone does not predict whether a person will choose the large size or smile size but its interaction with price does.
Interestingly, In the control condition (no Nutritional label manipulation), the effect of the price condition was significant (OR = 3.536, p < .01). In contrast, in the Nutritional label condition,
there was no effect of price condition (OR = .856, ns) on bottle size selection. Therefore, we can conclude that the presence of a nutritional label is responsible for the enhanced health salience
which caused the influence of supersized pricing on size choice to decrease. the choice of larger bottle size significantly reduced from more than 56% to 21.7% when participants who were in
supersized pricing condition were exposed to nutritional label.
Moreover, liking for the product (b = .296[ns]), Brand preference (b = .339[ns]), Attitude towards the product (b= -.565 [ns]) and the level of thirst (b=-.014 [ns]) were not significant. It
suggests that supersized pricing is such a substantial influence on consumers’ decision related to the choice of package size, that it overrides many other factors which are often considered to
be important influencers in our food choice decisions such as whether the consumer likes the product? Any special preference for the brand? Level of thirst or even the attitude that an
individual has about the brand.
Split file check for individual results (NL)

In the control condition (no Nutritional label manipulation), the effect of the price condition
was significant (OR = 3.536, p < .01). In contrast, in the Nutritional label condition, there was
no effect of price condition (OR = .856, ns) on bottle size selection
The choice of larger bottle size
significantly reduced from 56.3% to
21.7% when participants who were in
supersized pricing condition were
exposed to nutritional label.
Chi square test

Bold values signify conditions with significant differences (p < .05) in size
choice based on pricing method.
STATISTICAL TECHNIQUE TO
COMPARE GROUPS
COMPARING MEANS: Independent sample
• t-test
Test the difference between two means
• Independent means
• Each person has only been measured once
• DV has been measured on Continuous scale
• LAVENE’s test measure the homogeneity of variance assumption
• T test assumes that variance or SD in both groups/samples are same
• They don’t have to be exactly the same, but they have to be same to the extent that they are not significantly
different from each other
• T test is relatively robust to violation of this assumption, you should still test and report it in results
• if F is significant, variance of two groups is not equal
• F value is not significant, homogeneity of variance assumption does hold

An independent sample t test was

conducted, to compare pricing
manipulation check for linear vs
supersized pricing. There was
significant difference between
scores (linear=2.95, SD=1.009 and
supersized pricing=4.02, SD=1.261,
t(186)=-6.449), p<.01
Paired sample t test
FS decreased after intervention from t1=40.17, SD=5.16 to
t2=37.5, SD=5.15

There is sig
difference bw two
scores (FS at t1
and t2)
ONE WAY Analysis of Variance (ANOVA)

Discovering Statistics Using SPSS

ONE WAY BW GROUP ANOVA WITH
POSTHOC TEST
Assumption of
homogeneity of variance
is not violated

• Sig difference somewhere

among mean scores on DV of
3 groups
• But which group is different
from which group?
The statistical significance of difference
between each pair of group is provided via

• An * means that two groups being compared are significantly different from one another
• g1 and g3 are statistically significantly different from each other
Discovering Statistics Using SPSS
One-way Repeated Measures ANOVA
• Each subject is exposed to two or more
different conditions
• Or measured on same continuous
scale for three or more occasions
• Can also be used to compared
respondents’ responses to two or more
different questions
• Questions should be measured
using same scale- 1=SD to 5=SA
Multivariate test
•All these tests yield same results but most
common is Wilk’s Lambda, which shows statistically
sig effect for time
•There was a change in confidence score across 3
time periods
Options-Pairwise comparisons

Data Analysis Using SPSS: Research Workshop Series
No ratings yet
Data Analysis Using SPSS: Research Workshop Series
86 pages
SPSS Introduction Course at PSB, UUM
100% (1)
SPSS Introduction Course at PSB, UUM
122 pages
Quantitative Research Techniques
No ratings yet
Quantitative Research Techniques
11 pages
Spss
No ratings yet
Spss
87 pages
Practical File Of: "Research Methodology Lab"
No ratings yet
Practical File Of: "Research Methodology Lab"
73 pages
Ibm Spss
No ratings yet
Ibm Spss
20 pages
How To Work With IBM's SSP When Analyzing Data From Your Qualitative and Quantitative Data Collection
No ratings yet
How To Work With IBM's SSP When Analyzing Data From Your Qualitative and Quantitative Data Collection
19 pages
Data Analysis IA Instructions
No ratings yet
Data Analysis IA Instructions
8 pages
Topic 2 - Introduction To SPSS
No ratings yet
Topic 2 - Introduction To SPSS
31 pages
C7-Processing Data (LNH)
No ratings yet
C7-Processing Data (LNH)
52 pages
DP Laboratory Work Planner Exploration Title:: Research Question: Explain A Problem OR Question To Be Tested. It
No ratings yet
DP Laboratory Work Planner Exploration Title:: Research Question: Explain A Problem OR Question To Be Tested. It
10 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Madhur BRM Practical File Final
No ratings yet
Madhur BRM Practical File Final
105 pages
Quantitative Data
No ratings yet
Quantitative Data
3 pages
Biostatistics Lab - 4 Sem
No ratings yet
Biostatistics Lab - 4 Sem
17 pages
Unit 4 - Final
No ratings yet
Unit 4 - Final
28 pages
Lab Report Format Guidelines 2025
No ratings yet
Lab Report Format Guidelines 2025
3 pages
UMAY-CHAPTER-3_20250122_210602_0000.pptx_20250129_201224_0000.pptx_20250209_150905_0000
No ratings yet
UMAY-CHAPTER-3_20250122_210602_0000.pptx_20250129_201224_0000.pptx_20250209_150905_0000
38 pages
spss questions and answers
No ratings yet
spss questions and answers
16 pages
Lab Mannual Spss Final
No ratings yet
Lab Mannual Spss Final
67 pages
Project 5 Team Assignment 1.1
No ratings yet
Project 5 Team Assignment 1.1
6 pages
CSIT Module IV Notes
No ratings yet
CSIT Module IV Notes
19 pages
Research Chapter 6
No ratings yet
Research Chapter 6
29 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
Using R With Multivariate Statistics by Randall E. Schumacker
No ratings yet
Using R With Multivariate Statistics by Randall E. Schumacker
471 pages
Introductory Statistics Using SPSS Herschel (Hersch) E. (Edmond) Knapp 2024 Scribd Download
100% (4)
Introductory Statistics Using SPSS Herschel (Hersch) E. (Edmond) Knapp 2024 Scribd Download
51 pages
Data Analysis notes
No ratings yet
Data Analysis notes
3 pages
AMS 315 F2024 Computing Assignment 2
No ratings yet
AMS 315 F2024 Computing Assignment 2
4 pages
Linear Regression Makes Several Key Assumptions
No ratings yet
Linear Regression Makes Several Key Assumptions
5 pages
Lab 2 Worksheet
No ratings yet
Lab 2 Worksheet
3 pages
Week 2. Data Presentation, Analysis and Discussions_015512
No ratings yet
Week 2. Data Presentation, Analysis and Discussions_015512
9 pages
Hypothesis Testing & SPSS
No ratings yet
Hypothesis Testing & SPSS
34 pages
How To Use Minitab 1 Basics
No ratings yet
How To Use Minitab 1 Basics
28 pages
RESEARCHMETHODOLOGYLAB
No ratings yet
RESEARCHMETHODOLOGYLAB
20 pages
Chapter 5 Data AnalysisRevised
No ratings yet
Chapter 5 Data AnalysisRevised
33 pages
BRM Short Terms
No ratings yet
BRM Short Terms
10 pages
FDS Sem5
No ratings yet
FDS Sem5
15 pages
Basic SPSS
No ratings yet
Basic SPSS
36 pages
Spss Presentation To Dqam&E: Please Ask Where You Will Not Have Understood Asi Present!
No ratings yet
Spss Presentation To Dqam&E: Please Ask Where You Will Not Have Understood Asi Present!
17 pages
Psych Stats Reviewer
No ratings yet
Psych Stats Reviewer
35 pages
Task 2 (Refer To Individual Project Instructions)
No ratings yet
Task 2 (Refer To Individual Project Instructions)
9 pages
wk2_factor-analysis
No ratings yet
wk2_factor-analysis
35 pages
Published With Written Permission From SPSS Statistics 11111
No ratings yet
Published With Written Permission From SPSS Statistics 11111
21 pages
Chapter No 4 5 8
No ratings yet
Chapter No 4 5 8
16 pages
Quantitative Data Analysis: (Version 0.7, 1/4/05) Daniel K. Schneider, TECFA, University of Geneva
No ratings yet
Quantitative Data Analysis: (Version 0.7, 1/4/05) Daniel K. Schneider, TECFA, University of Geneva
24 pages
Introductory SPSS Proposal 2016
No ratings yet
Introductory SPSS Proposal 2016
3 pages
Sample RM File PDF
No ratings yet
Sample RM File PDF
39 pages
Chapter 12 - Using Descriptive Analysis
No ratings yet
Chapter 12 - Using Descriptive Analysis
49 pages
Lecture 2 2
No ratings yet
Lecture 2 2
22 pages
Steps Quantitative Data Analysis
100% (1)
Steps Quantitative Data Analysis
4 pages
2 4 Module Lectures
No ratings yet
2 4 Module Lectures
10 pages
Group 2 PPT report
No ratings yet
Group 2 PPT report
38 pages
Designing Data Analysis Procedure
No ratings yet
Designing Data Analysis Procedure
15 pages
Written Report
No ratings yet
Written Report
3 pages
Quantitative Analysis Using Spss
No ratings yet
Quantitative Analysis Using Spss
42 pages
Summary of Statistics
No ratings yet
Summary of Statistics
8 pages
SPSS Anova
No ratings yet
SPSS Anova
20 pages
Student I A Lab Report Format
No ratings yet
Student I A Lab Report Format
2 pages
Workshop: R For Statistical Analysis
No ratings yet
Workshop: R For Statistical Analysis
16 pages
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
From Everand
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
Daniel J. Denis
No ratings yet
Circular Motion - PHY1121 - PHY151 - May2014
No ratings yet
Circular Motion - PHY1121 - PHY151 - May2014
7 pages
Question 31005
No ratings yet
Question 31005
9 pages
TVM Quesions
No ratings yet
TVM Quesions
2 pages
Chap 6 MCQs
No ratings yet
Chap 6 MCQs
4 pages
Chap 9 MCQs
No ratings yet
Chap 9 MCQs
5 pages
Chap 4 MCQs
No ratings yet
Chap 4 MCQs
4 pages
Maths Statistic S3 Notes
No ratings yet
Maths Statistic S3 Notes
43 pages
Class 1
No ratings yet
Class 1
48 pages
Business Decision Making: Week 6
No ratings yet
Business Decision Making: Week 6
12 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
27 pages
Process Capability Calculator
No ratings yet
Process Capability Calculator
150 pages
Lesson 2 - Testing The Difference Between To Population Means (Independent Samples)
No ratings yet
Lesson 2 - Testing The Difference Between To Population Means (Independent Samples)
5 pages
Differences Between Mean
No ratings yet
Differences Between Mean
5 pages
5 Measures of Position
No ratings yet
5 Measures of Position
16 pages
Lesson 3 Hypothesis Test Concerning Means
No ratings yet
Lesson 3 Hypothesis Test Concerning Means
30 pages
Rohini 27786294869
No ratings yet
Rohini 27786294869
10 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
Age of Respondent
No ratings yet
Age of Respondent
63 pages
CT 2021-2022 (34) QTM BT
No ratings yet
CT 2021-2022 (34) QTM BT
3 pages
Final Demo Lesson Plan
No ratings yet
Final Demo Lesson Plan
14 pages
Karl Pearson's Coefficient of Correlation: Formula
No ratings yet
Karl Pearson's Coefficient of Correlation: Formula
2 pages
Sample Module Outline
No ratings yet
Sample Module Outline
14 pages
Stat MEM Chapter II Correlation
No ratings yet
Stat MEM Chapter II Correlation
5 pages
002 Frequency Distribution PSY102
No ratings yet
002 Frequency Distribution PSY102
59 pages
Nisfatin Rifkah Nurdiana 1910811021 Statistika: 1. Buatlah Analisa Deskriptifnya Dengan Spss
No ratings yet
Nisfatin Rifkah Nurdiana 1910811021 Statistika: 1. Buatlah Analisa Deskriptifnya Dengan Spss
21 pages
6.2 Measures of Spread Updated
No ratings yet
6.2 Measures of Spread Updated
2 pages
Measures of Position - The Quartiles For Ungrouped Data
No ratings yet
Measures of Position - The Quartiles For Ungrouped Data
29 pages
Corelatii
No ratings yet
Corelatii
16 pages
Central Tendency
No ratings yet
Central Tendency
53 pages
Lesson 2 Quantitative Methods Statistics
No ratings yet
Lesson 2 Quantitative Methods Statistics
32 pages
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
No ratings yet
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
7 pages
ENGINEERING DATA ANALYSIS-Discrete Probability Distribution
No ratings yet
ENGINEERING DATA ANALYSIS-Discrete Probability Distribution
4 pages
Latihan Simpangan Baku & Varians 2
No ratings yet
Latihan Simpangan Baku & Varians 2
5 pages
Tablas Presion Arterial
No ratings yet
Tablas Presion Arterial
4 pages
Correlation Between GDP Per Capita and Meat Consumption Per Capita
No ratings yet
Correlation Between GDP Per Capita and Meat Consumption Per Capita
26 pages
EBE Ch6
No ratings yet
EBE Ch6
11 pages