Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
34 views

SPSS Session

The document discusses quantitative data analysis with SPSS. It covers selecting appropriate statistical techniques based on research objectives and variable types. It also discusses preparing a code book and setting up an SPSS data file, including defining variable names, labels, types and formats. Additional topics covered include screening data for errors, dealing with missing data, and computing descriptive statistics.

Uploaded by

Zain Chughtai
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

SPSS Session

The document discusses quantitative data analysis with SPSS. It covers selecting appropriate statistical techniques based on research objectives and variable types. It also discusses preparing a code book and setting up an SPSS data file, including defining variable names, labels, types and formats. Additional topics covered include screening data for errors, dealing with missing data, and computing descriptive statistics.

Uploaded by

Zain Chughtai
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 133

Q DATA ANALYSIS with SPSS

Objectives

Research Questionnaire: Design &


Development/Selection

Selection & search of scales


Scale adaptation
Data analysis with SPSS

2
Selecting Appropriate Statistical
Technique
 Research Objectives
 Group Comparison
 Relationship Exploration

 Type/Nature of IV & DV
 Categorical
 Continuous

 Number of IVs & DVs (LEVEL OF ANALYSIS)


 Univariate statistics summarize only one variable at a time.
 Does not examine relationships or causes
 Count the number of Boys and Girls in the class (describe data and find patterns)
 Mean, median, mode, dispersion etc.
 Bivariate statistics compare two variables (Gender and test grades).
 Correlation, Regression
3
 Multivariate statistics compare more than two variables
 To check all kinds of effects occur together.
Introduction to Research
Sample Research Model

Outcome/
Mediator Depen-dent
variable
H+
HI SC
Independent
H-
Variable (IV) H-

Pricing
H-

Outcome/
H+ Depen-dent
variable
H+
VBJ Consumption
5
Mediator
Research Objectives

Size
choice

Male/
Female Group Comparison
Size
choice

FW
H+

Large size
Relationship Exploration
H+

Consumption
6
QUESTIONNAIRE DESIGN &
DEVELOPMENT/SELECTION

7
Development of Research Instrument/Questionnaire
Important tips for
developing/adapting a scale
1. Better to use already developed scales which have demonstrated high reliability and validity
2. Use focus group study to adapt the existing scales into your context, if your research context is new
3. Don’t mention the name of the construct that you are intended to measure through a selected scale
4. Don’t number your questions, specially when you have large number of questions
5. Use appropriate responses for the selected questions and introduce category of Not Applicable/Don’t Know
if possible
 Nominal
 Ordinal/categorical variable
 Scale/continuous variable
5. Attach cover letter highlighting the purpose of research and ensuring participant’s anonymity.
6. Try to avoid
 long complex questions
 double negatives  
8  jargon or abbreviations
 culture-specific terms  
 words with double meanings
Questionnaire Design & Development/Selection

9
Quantitative Data Analysis with SPSS

Prepare a code book


Setup structure of data file
Data entry
Flow chart of data
Data screening
analysis process
Exploration of data
Explore Relationships Compare Groups
Factor Analysis Non Parametric
Correlation T-Test
Regression ANOVA
Logistic Regress: MANOVA
10

Adapted from Figure 1.4 The logic of research process (Vaus, 2009)
Preparing a Code Book & Setting Up a SPSS File

11
The following steps must be taken in the order presented below:

1. Develop a Coding Scheme: Use a blank survey to:


A. Specify any additional variables that may need to be added to the data
set, i.e., variable ID
B. Identify your reverse items (items that will have to be reverse coded
before they are used in any computation and/or analysis).
C. Assign variable names to all your variables.
D. Specify how you wish to code the values of each variable.
E. Specify how you will code missing values (e.g., non-responses).
 A true non-response is ordinarily coded as a blank.

“ For additional tips on developing coding scheme read part 1 of SPSS Survival Manual, 4 th edition”

12
Original Survey Coded Survey

13
2. Create your new SPSS FILE and define the attributes of your variables

Make sure to follow the ground rules you have established in your coding scheme. That is, to
designate:
A. Variable Names

A. Variable Labels

14
B. Value Labels (e.g., for variable gender, 0=Male, 1=Female this is
especially important for categorical variables.)
C. User-Missing Values

D. Variable Types (i.e., numeric or string for text info)

E. Variable Formats (maximum number of digits and decimals


required for coding the values)
F. Column Formats (column width used to display a variable on the
screen)
15
3. Code the data into your SPSS file following the ground rules
established in your coding scheme and observed when creating your SPSS
file.
4. Reverse code your reverse items--only after all the coding is
completed (Menu Bar: Transform, Recode into a new var.)
5. Compute and print summary/descriptive statistics--e.g., mean, std.
dev., min., and max. for metric variables; and frequencies, min., and max. for
categorical variables

16
6. Examine the data and summary statistics (e.g., frequencies, min. &
max. values) to spot and correct coding errors.
7. Calculate reliability ONLY for each of your summated multi-
item scale and, (a) identify items detracting from reliability, and (b)
refine the scale accordingly
8. Create a composite/summated variable for each summated
multi-item scale
(Menu Bar: Transform, Compute--Target Variable = a mathematical
17
expression)
9. Compute Descriptive Statistics for different subgroups
separately
(Menu Bar: Data, Split File, Organize Output by Groups--specify the grouping variable) OR
(Menu Bar: Analyze, Compare Means, Means, Dep. List, Indep. List, Select Options, OK)
10. Save and print your SPSS data file (Menu Bar: File, Save As…)
11. Analyze the data (Menu Bar: Analyze--select analysis options)

18

“ For additional help read part 2 & 3 of SPSS Survival Manual, 4 th edition”
SCREENING AND CLEANING THE DATA FILE
DATA SCREENING

 Data screening (also known for us as “data screaming”) ensures your data is “clean”
and ready to go before you conduct further your planned statistical analyses.
 Data must always be screened to ensure the data is reliable, and valid for testing the
type of causal theory your have planned for.
 Screening and cooking are not synonymous – screening is like preparing the best
ingredients for your gourmet food!
STATISTICAL PROBLEMS WITH MISSING DATA

 If you are missing much of your data, this can cause several problems;
e.g., can’t calculate the estimated model.
 EFA, CFA, and path models require a certain minimum number of data
points in order to compute estimates – each missing data point reduces
your valid n by 1.
 Greater model complexity (number of items, number of paths) and
improved power require larger samples.
LOGICAL PROBLEM WITH MISSING DATA

 Missing data will indicate systematic bias because respondents may not have answered particular
questions in your survey because of a common cause (poor formulation, sensitivity etc).
 For example, if you ask about gender, and if females are less likely to report their gender than
males, then you will have “male-biased” data.
 Perhaps only 50% of the females reported their gender, but 95% of the males reported gender.

 If you use gender as moderator in your causal models, then you will be heavily biased toward
males,
 Because you will not end up using the unreported responses from females.
 You may also have biased sample from female respondents.
CASE SCREENING
 Cases refers to the rows in your data set
 Missing data in rows
 Unengaged responses
 Outliers on continuous variables
 Go to transform and replace the missing values
 Median for ordinal(other variables)
 Mean for continuous or interval scale(experience)
 Change with median of near by points for others and series mean for
experience
WHAT DO WE REPORT?

 We had 7 variables with missing


values, all <5% missing which we
replaced with median for ordinal
scales and mean for continuous
scales.Moreover, we deleted two
rows due to fully incomplete
responses as more than 20%
responses were missing
WHAT IF YOU HAVE MORE THAN 5 OR 10% MISSING VALUES ON
A SINGLE COLUMN

 All the effects of the regression or correlation will be diluted or


dampened because you start brining everything towards mean so that’s
not ideal
UNENGAGED RESPONSES

 This is tricky and not everyone looks into this


 When somebody is taking your survey and not really paying
attention
 How do we detect this?
 You cant visually track the data of more than 300
 We can insert an attention trap
 If you are still paying attention please answer strongly disagree
 so you are justified to remove their data

 Another way is by recording the time required for a survey of


60q
 Use reverse coded items
VARIABLE SCREENING- NORMALITY-
SKEWNESS AND KURTOSIS

 Go to analyze-descriptive stats
 Select all variables except id
 Frequencies, skewness and kurtosis
 We don’t really need skewness for a 5-point Likert scale

 If your skewness value is greater than 1 then you are positive (right) skewed, if it is less than -1 you are
negative (left) skewed, if it is in between, then you are fine.
 Some published thresholds are a bit more liberal and allow for up to +/-2.2, instead of +/-1.
 Values over 3 are problematic
 If the absolute value of the skewness/kurtosis is less than three times the standard error, then you are
fine; otherwise, you are skewed
 You can delete the item if you have many items of the construct but if you have less then you better watch it during
factor analysis
CORRELATION AND REGRESSION WITH SPSS
WHAT IS A CORRELATION?

 It is a way of measuring the extent to which two


variables are related.
 It measures the pattern of responses across
variables.
160

140

Appreciation of Dimmu Borgir


120

100

80

60

40

20

-20
10 20 30 40 50 60 70 80 90

Age
90

80

Appreciation of Dimmu Borgir


70

60

50

40

30

20

10
10 20 30 40 50 60 70 80 90

Age
100

80

Appreciation of Dimmu Borgir


60

40

20

-20
10 20 30 40 50 60 70 80 90
Slide 33 Age
MEASURING RELATIONSHIPS

 We need to see whether as one variable increases, the other increases, decreases or
stays the same.
 This can be done by calculating the Covariance.
 We look at how much each score deviates from the mean.
 If both variables deviate from the mean by the same amount, they are likely to be related.
Run this data in you SPSS to check for correlation and Regression

Are the findings significant?


REVISION OF VARIANCE

 The variance tells us by how much scores deviate from the mean for a
single variable.
 It is closely linked to the sum of squares.
 Covariance is similar – it tells is by how much scores on two variables
differ from their respective means.
COVARIANCE

 Calculate the error between the mean and each subject’s score for the first variable
(x).
 Calculate the error between the mean and their score for the second variable (y).
 Multiply these error values.
 Add these values and you get the cross product deviations.
 The covariance is the average cross-product deviations:
 xi  x  yi  y 
Cov( x, y )  N 1
PROBLEMS WITH COVARIANCE

 It depends upon the units of measurement.


 E.g. The Covariance of two variables measured in Miles might be 4.25, but if the same scores are
converted to Km, the Covariance is 11.
 One solution: standardise it!
 Divide by the standard deviations of both variables.

 The standardised version of Covariance is known as the Correlation coefficient.


 It is relatively unaffected by units of measurement.
THE CORRELATION COEFFICIENT

Cov xy
r sx s y
 xi  x  yi  y 
 N 1s x s y
THINGS TO KNOW ABOUT THE CORRELATION

 It varies between -1 and +1


 0 = no relationship

 It is an effect size
 ±.1 = small effect
 ±.3 = medium effect
 ±.5 = large effect

 Coefficient of determination, r2
 By squaring the value of r you get the proportion of variance in DV shared by the variance in the other IV
 It is also the effect size

 Coefficient of Alienation

 1-r2
 It is the proportion of variance in the dependent variable (Y) unexplained by variance in the independent variable (X)
CORRELATION AND CAUSALITY

 The third-variable problem:


 In any correlation, causality between two variables cannot be assumed because there may
be other measured or unmeasured variables affecting the results.
 Direction of causality:
 Correlation coefficients say nothing about which variable causes the other to change
 Although it is intuitively appealing to conclude that watching adverts causes us to buy
packets of toffees, there is no statistical reason why buying packets of toffees cannot cause
us to watch more adverts.
CAN YOU CALCULATE CORRELATION WITH SPSS?

Enter the advert data and use the chart editor to


produce a scatterplot (number of packets bought
on the y-axis, and adverts watched on the x-axis) of
the data.

47
SCATTERPLOT

48
TYPES OF CORRELATION

 There are two types of correlation: bivariate and partial.


 A bivariate correlation is a correlation between two variables
 A partial correlation looks at the relationship between two variables while
‘controlling’ the effect of one or more additional variables.

49
ASSUMPTIONS OF PEARSON’S R

 Sampling distribution has to be normally distributed for both variables


 There is one exception to this rule: one of the variables can be a categorical variable provided there are
only two categories
 If your data are non-normal or are not measured at the interval level then you should deselect the Pearson
tick-box.

50
CORRELATION: EXAMPLE

 QRM file
INTERPRETATION

 Each variable is perfectly correlated with itself (obviously) and so r = 1


 PQ is + related to BL with a Pearson correlation coefficient of r = .288 and the significance value is
less than .01 (as indicated by the double asterisk after the coefficient).
 Our criterion for significance is usually .05 so SPSS marks any correlation coefficient significant at
this level with an asterisk
 As PQ increases BL also increases.

52
USING R 2 FOR INTERPRETATION

 Although we cannot make direct conclusions about causality from a correlation, we can take the correlation
coefficient a step further by squaring it.
 The correlation coefficient squared (known as the coefficient of determination, R2) is a measure of the
amount of variability in one variable that is shared by the other.
 (.288)2 = .083. This value tells us how much of the variability in BL is shared by PQ.
 Although PQ was correlated with BL, it can account for only 8.3% of BL.
 To put this value into perspective, this leaves 91.7% of the variability still to be accounted for by other
variables.

53
 Note that although R2 is an extremely useful measure of the substantive
importance of an effect, it cannot be used to infer causal relationships.
 Although we usually talk in terms of ‘the variance in y accounted for by x’, or
even the variation in one variable explained by the other
 This still says nothing about which way causality runs. So, although PQ can
account for 8.3% of the variation in BL, it does not necessarily cause this
variation.

54
REPORTING THE RESULTS- EX

Exam performance was significantly correlated with exam


anxiety, r = -.44, and time spent revising, r = .40; the time
spent revising was also correlated with exam anxiety, r =
-.71 (all ps < .001).
FIND R2 OF REST OF THE VARIABLES IN THE TABLE AND
REPORT THEM.

56
(POINT-)BISERIAL CORRELATION

 Point-biserial correlation, rpb: relationship between a continuous variable and a


variable that is a discrete dichotomy
 Biserial correlation, rb: relationship between a continuous variable and a variable
that is a continuous dichotomy (continuum underlying the two categories passing or
failing an exam)
 A point–biserial correlation is simply a Pearson correlation when the dichotomous
variable is coded with 0 for one category and 1 for the other
USE FILE ROAMING CATS TO CALCULATE POINT–BISERIAL
CORRELATION

 R2 = (0.378)2 = .143. Hence, we can conclude that gender accounts for 14.3% of
the variability in time spent away from home.
58
Nonparametric Correlation

59
NONPARAMETRIC CORRELATION

 Spearman’s Rho
 Pearson’s correlation on the ranked data

 Kendall’s Tau
 Better than Spearman’s for small samples

 World’s best Liar Competition


 68 contestants
 Measures
 Where they were placed in the competition (first,
second, third, etc.)
 Creativity questionnaire (maximum score 60)
CORRELATION OUTPUT
SPEARMAN’S RHO
CORRELATION OUTPUT
KENDALL’S TAU
REGRESSION WITH SPSS
Moving beyond Correlation

• Correlation is useful to tell us the relationship about two variables,


but it tells us nothing about the predictive model to our data and use
that model to predict values of the Dependent variable from one or
more independent variables.
A straight line…
1.The slope (or gradient) of the line; and
2.The point at which the line crosses the vertical
axis of the graph (known as the intercept of the
line).
The method of least squares

• We need to find a “model” that has the least “variances” and best fit
the data.

• It means we need to find a straight line to “describe” our data.


A Regression Line: a line that minimizes the sum of squared differences.

Discovering Statistics Using SPSS


Goodness-of-fit:
how “fit” is the
line?

SSM = SSR - SST


Looking at SST
• The sum of squares total, denoted SST, is the squared differences between the observed dependent
variable and its mean. You can think of this as the dispersion of the observed variables around
the mean – much like the variance in descriptive statistics.
• The line representing the mean is flat, which means that as the predictor variable changes, the value of the
outcome does not change (because for each level of the predictor variable, we predict that the outcome
will equal the mean value).
• The important point here is that a bad model (such as the mean) will have regression coefficients of 0 for
predictors.
• A regression coefficient of 0 means:
• (1) A unit change in the predictor variable results in no change in the predicted value of the outcome (the
predicted value of the outcome does not change at all)
• (2) the gradient of the regression line is 0, meaning the regression line is flat

It is a measure of the total


variability of the dataset.
SSR
• The sum of squares due to regression, or SSR.
• It is the sum of the differences between the predicted value and the mean of the dependent
variable. Think of it as a measure that describes how well our line fits the data
• If this value of SSR is equal to the sum of squares total, it means our regression model captures
all the observed variability and is perfect
SSE or SSM
• The error is the difference between the observed value and
the predicted value.
• We usually want to minimize the error. The smaller the error, the
better the estimation power of the regression. 

If the value of the SSm is larger, then the regression model is


very different from using the mean to predict the dependent
variable.

If the value of the SSm is small, then using the regression model
is little better than using the mean as the model.
How they are related?
• Mathematically, SST = SSR + SSE.

• The rationale is the following: the total


variability of the data set is equal to the
variability explained by the regression
line plus the unexplained variability, known as
error.
• Given a constant total variability
• Lower error will cause a better regression.
• Higher error will cause a less powerful regression.
• And that’s what you must remember, no
matter the notation.
R square F ratio
• It represents the amount of • Is a measure of how much the
variance in the outcome model has improved the
explianed by the SSR relative prediction of the outcome
to how much variation was to compared to the level of
explain by the SST (mean). inaccuracy of the model.
• A good model should have a
• Thus, R² = SSR/SST large F-ratio.
Regression: An Example
• A record company boss was interested in predicting record sales from
advertising.
• Data
• 200 different album releases
• Outcome variable:
• Sales (CDs and Downloads) in the week after release
• Predictor variables
• The amount (in £s) spent promoting the record before release (see last
lecture)
• Number of plays on the radio (new variable)
The Model with One Predictor

Slide 75
Discovering Statistics Using SPSS
• Advertising expenditure can account for 33.5% of the variation in
record sales.
• 66% of the variation in record sales cannot be explained by advertising
alone.
• Therefore, there must be other variables that have an influence also.
• F is 99.59, which is significant at p < .001
• This result tells us that there is less than a 0.1% chance that an F-ratio this large would
happen if the null hypothesis were true.
• Therefore, we can conclude that our regression model results in significantly better
prediction of record sales than if we used the mean value of record sales.
• In short, the regression model overall predicts record sales significantly well
• The ANOVA tells us whether the model, overall, results in a significantly good degree of
prediction of the outcome variable.
• However, the ANOVA doesn’t tell us about the individual contribution of variables in the
model
• Intercept is b0 = 134.14, and this can be interpreted as meaning that when no money is spent on advertising (when X
= 0), the model predicts that 134,140 records will be sold
• b1 is 0.096 . Although this value is the slope of the regression line
• The change in the outcome associated with a unit change in the predictor.
• If our predictor variable is increased by one unit (if the advertising budget is increased by 1), then our model predicts that 0.096
extra records will be sold.
• For an increase in advertising of £1000 the model predicts 96 (0.096 × 1000 = 96) extra record sales.
• Is it good for company?
• As you might imagine, this investment is pretty bad for the record company: it invests £1000 and gets only
96 extra sales.
• Fortunately, as we already know, advertising accounts for only one-third of record sales
• R square = .335, which tells us that advertising expenditure can account for 33.5% of the
variation in record sales.

• ANOVA test = F ratio = 99.587

• Beta = the change in the outcome associated with a unit change in the predictor = if our
independent variable is increased by 1 unit, our model predicts that 0.096 extra records will be sold.

• T-test = tests the null hypothesis that the value of beta is 0: therefore, if it is significant,
• we accept the hypothesis that the beta value is significantly different from zero
• the predictor variable contributes significantly to our ability to estimate values of the outcome.
• Like F, the t-statistic is also based on the ratio of explained variance against
unexplained variance or error.
Using the model
• record salesi = b0 + b1advertising budget
= 134:14 + ð0:096 × advertising budgetiÞ

• Imagine a record executive wanted to spend £100,000 on advertising a new


record.
• Record sales should be around 144,000 for the first week of sales
MULTIPLE
REGRESSION
• A logical extension of the simple regression model to situations in
which there are several independent variables.

• We talked about regression LINE in a simple regression model, now


we are talking about a regression PLANE
• Linear Regression is a model to predict the value of one variable from
another.
• Multiple Regression is a natural extension of this model:
• We use it to predict values of an outcome from several predictors.
• It is a hypothetical model of the relationship between several variables.
Multiple Regression as an Equation

• With multiple regression the relationship is


described using a variation of the equation of a
straight line.

y  b0  b1 X 1 b2 X 2    bn X n   i

Slide 84
b0

• b0 is the intercept.
• The intercept is the value of the Y variable when
all Xs = 0.

• This is the point at which the regression


plane crosses the Y-axis (vertical).

Slide 85
Beta Values

• b1 is the regression coefficient for


variable 1.
• b2 is the regression coefficient for
variable 2.
• bn is the regression coefficient for nth
variable.

Slide 86
Doing Multiple Regression

Slide 87
Methods of Regression

• Hierarchical:
• (Blockwise Entry): based on early research findings ,experimenter decides
the order in which variables are entered into the model.
• Forced Entry:
• All predictors are entered simultaneously.
• Stepwise:
• Predictors are selected using their semi-partial correlation with the
outcome.
• Exploratory

Slide 88
Doing Multiple Regression

Slide 89
Regression Statistics
Regression
Diagnostics
Output: Model Summary

Slide 92
Output: ANOVA

Slide 93
Analysis of Variance: ANOVA

• The F-test
• looks at whether the variance explained by the model (SSM)
is significantly greater than the error within the model
(SSR).
• It tells us whether using the regression model is
significantly better at predicting values of the outcome
than using the mean.
• Regression is used to predict a continuous outcome on the
basis of one or more continuous predictor variables.
• Whereas ANOVA is used to predict a continuous outcome on
the basis of one or more categorical predictor variables.
Slide 94
Output: betas

Slide 95
How to Interpret Beta Values

• Beta values:
• the change in the outcome associated with a unit
change in the predictor.
• Standardised beta values:
• tell us the same but expressed as standard
deviations.

Slide 96
How to Interpret Beta Values
How to Interpret Beta Values
Constructing a Model
y  b0  b1 X 1  b2 X 2
Sales  41124  0.087Adverts  3589plays

£1 Million Advertising, 15 plays


Sales  41124  0.087  1,000,000  3589  15
 41124  87000  53835
 181959

Slide 99
Standardised Beta Values

• They tell us the number of standard deviations that the outcome will
change as a result of one standard deviation change in the predictor.
• The standardized beta values are all measured in standard deviation
units and so are directly comparable: therefore, they provide a better
insight into the ‘importance’ of a predictor in the model

Slide 100
Reporting the Model
Logistic
Regression
SIMPLE REGRESSION VS LOGISTIC
• We can use regression to predict future outcomes based on past data, when
the outcome is a continuous variable
• Logistic regression, an extension of regression that allows us to predict
categorical outcomes based on predictor variables.
• We can predict which of two categories a person is likely to belong to given certain
other information.
When and Why
• To predict an outcome variable that is categorical
from one or more categorical or continuous
predictor variables.
• Used because having a categorical outcome
variable violates the assumption of linearity in
normal regression.
Assessing the Model: the log-likelihood
statistic

• It is an indicator of how much unexplained information there is after the model


has been fitted.
• Large values indicate poorly fitting statistical models.
• The larger the value of the log-likelihood, the more unexplained observations
there are.
Assessing the model: R and RL2
• This R-statistic
• is the partial correlation between the outcome variable
and each of the predictor variables.
• It can vary between −1 and 1.
• A positive value indicates, that as the predictor variable
increases, so does the likelihood of the event occurring.
• A negative value implies that as the predictor variable
increases, the likelihood of the outcome occurring decreases.
• If a variable has a small value of R, then it contributes only a
small amount to the model.
Assessing Predictors: The Wald Statistic
b
Wald  SE b

• As in linear regression, we want to know not only how well the model overall fits the
data, but also the individual contribution of predictors.
• In linear regression, we used the estimated regression coefficients (b) and their standard
errors to compute a t-statistic.
• In logistic regression there is an analogous statistic known as the Wald statistic, which
has a special distribution known as the chi-square distribution.
• Wald statistic tells us whether the b coefficient for that predictor is significantly different
from zero. If the coefficient is significantly different from zero then we can assume that
the predictor is making a significant contribution to the prediction of the outcome (Y):
EXAMPLE
We do need to decide whether to use the first category as our baseline ( )
or the last category ( ).

• Choose forward LR because study is the first in the field and so we have no past research to tell us which
variables to expect to be reliable predictors and it follows a stepwise procedure
• By default, SPSS uses Indicator coding, which is the standard dummy variable 0 and 1
• 236.992 represents the fit of the most basic
model to the data. When including
only the constant, the computer bases the model
on assigning every participant to a single
category of the outcome variable.

• 61 chose large whereas 127 chose small.


Therefore, if SPSS predicts that every
consumer chose large then this prediction will
be correct
61 times out of 188 (i.e. 32% approx.).
• However, if SPSS predicted that every
consumer did not choose large(chosen
small), then this prediction would be
correct only 127 times out of 188 (68%
approx.)
• Overall, the model correctly classifies
67.6% of consumers
Classification table-SP
Effect size measures for the model.

• The model predicts that all of the consumers who were


in supersized pricing condition chose large.
• There were 48 consumers who chose large, so the
model predicts that these 48 chose large; it is correct for
27 of these consumers, but misclassifies 21 consumers
as chosen large who did not
• This new model predicts that all of the 140 consumers who
were not in SP did not chose large; for these consumers the
model is correct 106 times but misclassifies as 34
consumers who did.
• In linear regression b represents the change in the outcome resulting from a unit change in the predictor
variable.
• The interpretation of this coefficient in logistic regression is very similar in that it represents the change in
the logit of the outcome variable associated with a one-unit change in the predictor variable.
• Wald statistic tells us whether the b coefficient for that predictor is significantly different from zero.
• If the coefficient is significantly different from zero then we can assume that the predictor is making a
significant contribution to the prediction of the outcome (Y)
• Having the supersized pricing (or not) is a significant predictor of whether the consumer chose large(note
that the significance of the Wald statistic is less than .05).
• Odds of consumer in the supersized pricing choosing large are 1.893 times higher than those who are in
linear price condition
• we can be fairly confident that the population value of the odds ratio lies between 1.017 and 3.524
Price condition was significant (b=.638, Wald = 4.053, p <.05) with an OR=1.893, which
suggests that participants in the supersized pricing condition almost two times likely to
choose the larger bottle.
Logistic Regression Predicting Likelihood of Reporting size choice under supersized vs. linear pricing condition

Price condition was significant (b=1.178, Wald = 6.694, p <.05) with an OR=3.248, which suggests that participants in the supersized pricing condition were more than three times likely to
choose the larger bottle supporting H1. More than 56% of participants chose the large size of bottle in the supersized pricing condition compared to 26.7% in linear price condition explaining
the influence of supersize pricing on consumers’ size choice decision.
Nutritional label condition was not significant (b = -.110, Wald = .052, p >.82) but more importantly, the two-way interaction between Price and Nutritional label condition was significant (b
= -1.318, Wald = 3.851, p =.05). This means that nutritional label alone does not predict whether a person will choose the large size or smile size but its interaction with price does.
Interestingly, In the control condition (no Nutritional label manipulation), the effect of the price condition was significant (OR = 3.536, p < .01). In contrast, in the Nutritional label condition,
there was no effect of price condition (OR = .856, ns) on bottle size selection. Therefore, we can conclude that the presence of a nutritional label is responsible for the enhanced health salience
which caused the influence of supersized pricing on size choice to decrease. the choice of larger bottle size significantly reduced from more than 56% to 21.7% when participants who were in
supersized pricing condition were exposed to nutritional label.
Moreover, liking for the product (b = .296[ns]), Brand preference (b = .339[ns]), Attitude towards the product (b= -.565 [ns]) and the level of thirst (b=-.014 [ns]) were not significant. It
suggests that supersized pricing is such a substantial influence on consumers’ decision related to the choice of package size, that it overrides many other factors which are often considered to
be important influencers in our food choice decisions such as whether the consumer likes the product? Any special preference for the brand? Level of thirst or even the attitude that an
individual has about the brand.
Split file check for individual results (NL)

In the control condition (no Nutritional label manipulation), the effect of the price condition
was significant (OR = 3.536, p < .01). In contrast, in the Nutritional label condition, there was
no effect of price condition (OR = .856, ns) on bottle size selection
The choice of larger bottle size
significantly reduced from 56.3% to
21.7% when participants who were in
supersized pricing condition were
exposed to nutritional label.
Chi square test

Bold values signify conditions with significant differences (p < .05) in size
choice based on pricing method.
STATISTICAL TECHNIQUE TO
COMPARE GROUPS
COMPARING MEANS: Independent sample
• t-test
Test the difference between two means
• Independent means
• Each person has only been measured once
• DV has been measured on Continuous scale
• LAVENE’s test measure the homogeneity of variance assumption
• T test assumes that variance or SD in both groups/samples are same
• They don’t have to be exactly the same, but they have to be same to the extent that they are not significantly
different from each other
• T test is relatively robust to violation of this assumption, you should still test and report it in results
• if F is significant, variance of two groups is not equal
• F value is not significant, homogeneity of variance assumption does hold

An independent sample t test was


conducted, to compare pricing
manipulation check for linear vs
supersized pricing. There was
significant difference between
scores (linear=2.95, SD=1.009 and
supersized pricing=4.02, SD=1.261,
t(186)=-6.449), p<.01
Paired sample t test
FS decreased after intervention from t1=40.17, SD=5.16 to
t2=37.5, SD=5.15

There is sig
difference bw two
scores (FS at t1
and t2)
ONE WAY Analysis of Variance (ANOVA)

Discovering Statistics Using SPSS


ONE WAY BW GROUP ANOVA WITH
POSTHOC TEST
Assumption of
homogeneity of variance
is not violated

• Sig difference somewhere


among mean scores on DV of
3 groups
• But which group is different
from which group?
The statistical significance of difference
between each pair of group is provided via

• An * means that two groups being compared are significantly different from one another
• g1 and g3 are statistically significantly different from each other
Discovering Statistics Using SPSS
One-way Repeated Measures ANOVA
• Each subject is exposed to two or more
different conditions
• Or measured on same continuous
scale for three or more occasions
• Can also be used to compared
respondents’ responses to two or more
different questions
• Questions should be measured
using same scale- 1=SD to 5=SA
Multivariate test
•All these tests yield same results but most
common is Wilk’s Lambda, which shows statistically
sig effect for time
•There was a change in confidence score across 3
time periods
Options-Pairwise comparisons

You might also like