JOURNAL OF THE ACADEMY
Winer / EXPERIMENTATION
OF MARKETING SCIENCE
IN THE 21ST CENTURY
SUMMER 1999
Experimentation in the
21st Century: The Importance
of External Validity
Russell S. Winer
University of California at Berkeley
Much of the consumer behavior literature is devoted to
what has been referred to as theory applications (TA) research in which the main focus is on laboratory experiments with student subjects and high internal validity. In
this article, the author argues that external validity concerns should be given more attention, particularly in TA
research. Three recommendations are made for implementing these concerns: (1) consumer behavior articles
should be required to have a section indicating how increased levels of external validity can be obtained with
other studies, (2) “joint ventures” between consumer behavior and marketing science researchers can be profitable and should be encouraged, and (3) analyses of
electronic scanner panel data or other secondary data can
be used to generate higher levels of external validity. Three
examples are given from the marketing literature of how
findings from experiments and scanner data can be combined to advance a stream of research.
One of the most nagging issues facing researchers
interested in consumer behavior is that of external validity.
External validity, of course, deals with the issue of generalizability of the results found to other populations, settings, and so forth (Campbell and Stanley 1963). One
manifestation of this concern about external validity is
that, periodically, an article appears in a leading marketing
journal excoriating consumer behavior researchers for
using student subjects. Since students are not “real” people, how can the lab results be generalized to the greater
Journal of the Academy of Marketing Science.
Volume 27, No. 3, pages 349-358.
Copyright © 1999 by Academy of Marketing Science.
population? Such articles are invariably followed by
responses that emphasize the fact that some researchers
are interested in “theory,” not generalization, and for their
purposes, student subjects are just fine. For these researchers, what is critical is internal not external validity. As long
as we have confidence that the results are truly due to what
has been manipulated, what difference does it make if the
subjects are “real” men, women, or, as one marketing academic has put it, “little green men from Mars”?
Internal validity is, of course, a necessary condition for
any experimental study. External validity is not of much
concern in experimental work if the researcher cannot adequately show that the results found from an experiment are
truly due to the manipulation(s). However, as Wells (1993)
has put it, “We have inherited a myth that says that internal
validity is adequate and external validity, if any, is up to
someone else” (p. 492). This attitude toward external validity is manifested in articles published in the Journal of Consumer Research, the Journal of Consumer Psychology, the
Journal of Marketing Research, this journal, and other
major marketing journals publishing empirical work.
Rarely does an author of an experimental study either worry
about how to establish external validity for the results or
actually perform additional studies that go a long way
toward establishing some degree of external validity.
There are several theses in this article. First, I argue that
as we move into the twenty-first century, the current state
of business schools in which marketing academics reside
will require more and more that our research is not only of
high quality but relevant. By relevant, I do not mean that
research has to directly inform practicing managers. However, I believe that it is incumbent on us to be concerned
about the generalizability of research results beyond the
lab into other contexts. This gives practitioners who are
interested in applying our work (and there is much more
350
JOURNAL OF THE ACADEMY OF MARKETING SCIENCE
even today that can be applied than is) to their problems
confidence that empirical results apply to more than 18- to
22-year-olds at “large midwestern universities.”
Second, I propose that experimental articles focusing
on internal validity in controlled, laboratory environments
have a mandatory section at the end of each article indicating what kind of studies are necessary to establish external
validity. This will put pressure on consumer behavior
researchers to think about external validity as an integral
part of their work and to not leave it totally “up to someone
else.” As part of this proposal, I strongly encourage consumer behavior researchers and marketing scientists, the
main group of marketing academics interested in scanner
panel data, to work more together rather than go their separate ways, as is currently the situation.
Third, I argue that there are readily available sources of
information that can be used to extend the generalizability
of many consumer behavior studies—scanner panel data.
These data have been around for about 40 years and are a
very rich source of information that has been rarely
exploited for this purpose. Scanner data represent observations of purchasing behavior of individuals in a real environment. While the environment itself is not the critical
factor (we are interested in external not ecological validity), scanner data studies that support results found in the
lab provide strong supporting evidence of external
validity.
THE EXTERNAL
VALIDITY CONTROVERSY
Researchers have developed three perspectives on
external validity (Lynch 1982). One perspective is statistical generalizability, in which the main issue is whether the
results from a study using a particular sampling approach
can be generalized to the larger population of interest.
Robustness is whether a relationship found in an experiment could be replicated with different subjects, research
settings, and time intervals. Realism (or ecological validity) is whether the research study (tasks, stimuli, settings)
was realistic and, therefore, the results likely to be generalizable to a more natural environment. For the purposes of
this article, I am particularly interested in the latter two
perspectives, although I acknowledge the fact that any one
study may not address either or both of them. Rather than
worrying about every single study satisfying both internal
and external validity criteria, I am more interested in a
stream of research that ultimately incorporates both perspectives leading to the more general notion of external
validity.
A number of articles appearing in the consumer behavior literature have shaped the discussion about external
validity in experimental contexts. I review these briefly
below.
SUMMER 1999
Ferber (1977)
In this editorial, Ferber argued that convenience samples, mainly students, should be avoided for two reasons.
First and more important for this article, students in the
sample may not actually be consumers for the products or
services being tested. For example, I have read articles that
use televisions and other expensive durable goods as stimuli with student subjects. My guess is that 18-year-old students are infrequently in the market for televisions. The
second point is that convenience samples are obviously not
probability samples randomly drawn. This criticism
applies to a wide variety of marketing studies, not just
laboratory studies with student subjects.
Calder, Phillips, and Tybout (1981)
This article really kicked off the controversy in the
1980s. Calder et al. (1981) drew a distinction between two
kinds of research: effects application (EA) research, in
which the researcher is interested in generalizing the
results to other settings and populations beyond the current
research setting, and theory application (TA), in which the
theory itself is expected to generalize and not the particular
effects or empirical results. An example of EA research is
the relationship between price and perceived quality. Marketing managers are interested in whether such a relationship exists and its boundary conditions (e.g., product categories, price ranges). To have confidence that such a
relationship exists, empirical results showing that a positive relationship between price and perceived quality must
be generalizable to the real world, that is, beyond the
experimental research setting. An example of TA is Petty
and Cacioppo’s (1981) elaboration likelihood model
(ELM). In this case, it is not important that the results of a
particular study are replicable but only that other studies
replicate the theoretical underpinnings of the model, that
is, that the forces of persuasion are different when some
subjects elaborate on a stimulus versus others that do not.
The authors advocate that there are implications of this
distinction on the selection of respondents, operationalizing the independent and dependent variables, the research
setting, and the research design. In particular, EA studies
need respondents that represent their real-world counterparts, while TA can use any respondent population, preferably as maximally homogeneous as possible for a strong
test of the theory. Variables used in EA studies need to correspond as closely as possible to the real world, while TA
variables must correspond to the needs of the theory. The
research setting used for EA research also needs to correspond as much as possible to the contexts in which generalizability is desired, whereas for TA research, the setting
can be artificial as the goal is to create an environment
free of sources of extraneous sources of variation that
could negatively affect internal validity. Finally, true
Winer / EXPERIMENTATION IN THE 21ST CENTURY
experimental designs are preferred for TA research, while
EA research can use any design that is again appropriate for the real-world context including “natural”
experiments.
This distinction between EA and TA research is impor1
tant. Calder et al. (1981) would feel that it is important for
price-perceived quality research be done in a way that generalizes to the real-world contexts of interest, that is, that
there are some product categories in, say, a supermarket,
where we would expect to see that kind of relationship.
However, they would feel that it is not important to show
that consumers actually use the ELM model when making
purchase decisions—only that the ELM model is not
rejected as a theoretical explanation for observed laboratory behavior. In general, EA studies show some concern
for external validity, while TA, none at all.
Lynch (1982)
Lynch disagreed with the Calder et al. (1981) position
2
on external validity. One of his major points was that if
any research findings lack external validity, the theory
lacks construct validity. In particular, he noted that a
researcher must distinguish effects of the independent,
manipulated variables from interactions of the independent variables with background factors that are supposed
to be irrelevant. These background factors include subject
and setting factors. Thus, if a theory has been tested using
only student subjects from one particular geographic area,
the researcher does not know if the results would be
affected by older subjects or even students from another
part of the country or world. Such interactions are threats
to external validity. In addition, authors should consider
the boundary conditions for their findings, which, besides
including omitted background factors, could involve
stretching the limits of the levels of manipulated variables.
Lynch (1982) developed three different approaches for
designing experimental studies that maximize external
validity. The first approach is to allow the different background factors to vary in an experimental design sense and
then control for these interactions when analyzing the data
(for more detail, see Lynch 1982). The problem with such
an approach, of course, is that there might be a very large
number of such background factors resulting in an
unwieldy and very expensive experiment. A second
approach is to be more selective and develop an a priori
notion of which background factors are most likely to
interact with the treatment variables. The third approach is
to deliberately attempt to maximize heterogeneity in terms
of respondents and study settings that fall within the
domain of the theory through replication. Thus, the
researcher might run one experiment using tightly controlled conditions in a laboratory, student subjects, and
351
running shoes and another using adults, automobiles, and
a mall-intercept setting. Assuming the results replicate,
confidence in the findings is significantly higher.
Lynch (1982) also makes a comment about how research in an area should progress:
For a given experiment to contribute to progress,
someone—whether the original researcher or others
in the field—must attempt to replicate it conceptually at some later point in time. Given the low incidence of conceptual replication in our discipline, we
undoubtedly would benefit if published research always included some small attempt to test the generality of the findings reported. (P. 237)
Later Contributions
The controversy about the need for external validity in
consumer research continued in the pages of the Journal of
Consumer Research. Calder, Phillips, and Tybout (1982)
developed a rejoinder to Lynch’s (1982) article indicating
that it was impossible to run enough experiments or to sufficiently maximize heterogeneity in experimental designs
to ensure that TA findings had external validity. In addition, they felt that research progress did not require any
attempt at external validity in any one study. McGrath and
Brinberg (1983) attempted to find common ground
between the parties while putting forth their own conceptions of how research progress can be made. They develop
a three-stage research process in which the final stage is
where the researcher or colleagues look for robustness of
the findings by systematically varying one or more of the
domains of the study (much like Lynch’s suggestion to
maximize heterogeneity). They also make an important
point: it is impossible to increase the external validity of a
given study within that study. The external validity of a
study can only be assessed in terms of results of another
study or series of studies.
Finally, Wells (1993) criticizes the progress that has
been made by consumer behavior researchers toward
achieving the goals set for the discipline in the early 1970s
by pioneers such as Jagdish Sheth, Ronald Frank, and others. He noted, for example, that Sheth called for research
to be done in “naturalistic and realistic” settings as far back
as 1972. Wells suggested five guidelines for breathing new
life into the field. One of these, “forsake mythodology,” is
particularly appropriate for this article, especially two
“myths.” The first is that “students represent consumers.”
This is, of course, old ground. The second is that “the laboratory represents the environment.” Lab studies do not represent the environment due to the control available to the
researcher that is unavailable in the real world, the fact that
experimental choices have no short- or long-term consequences for the subjects, the possible existence of demand
352
JOURNAL OF THE ACADEMY OF MARKETING SCIENCE
SUMMER 1999
effects, and the fact that experiments, unlike the real world,
have a sudden beginning and a sudden end.
materials for distribution to their constituents “translating” faculty research into terms they can understand and,
perhaps, even use.
While I deplore many of the Business Week effects (for
example, new business school deans stating their goal to
be a “top 10” school), I believe that the pressure on faculty
to ultimately develop more relevant research is well
placed. Marketing academics should be as well trained in
their basic discipline (usually psychology or economics)
as possible, and this training should be demonstrated in
their research. However, most marketing academics chose
marketing doctoral programs and to concomitantly take
faculty positions in business schools rather than social science departments. This choice not only implies that we
have to teach students who are more interested in the real
world than the laboratory world but also that we have to
think and should be interested in thinking about our
research in the same way.
Note that this does not mean that I think marketing academics should become consultants. In addition, I am not
calling for a ban on TA research. Our outstanding consumer behavior scholars primarily interested in TA
research should be encouraged to continue to do it. My
point is that given that we have chosen to be business
school faculty and the increased pressure on business
schools to produce students and research that informs
practitioners, our research should at least point the way
toward more generalization of empirical findings than is
currently the case.
Even with some thought given to the problem, it is, of
course, cheap and easy to pursue my second alternative,
that is, simply to recommend to others how external validity might be obtained. This is, as Wells (1993) put it, really
only making it someone else’s problem. However, I also
recognize the limits that training puts on a researcher’s
skills, that is, one’s “comparative advantage.” Scholars
trained to do TA research are not necessarily familiar with
other data or research methods that could be applied to the
same research problem and provide a considerable amount
of external validity to the work. In addition, scholars from
every discipline normally develop a research “routine” in
which they tend not to stray too far from what has made
them successful. Another way to say this is that there is a
considerable amount of inertia in research programs pursued by academics.
How can we break out of these routines? My second
recommendation is that more “joint ventures” should be
sought between consumer behavior researchers and people with other disciplinary approaches in marketing.
Excellent candidates for the latter are marketing scientists.
There are a large number of marketing scientists who are
interested in consumer behavior but who attack problems
from the perspective of another tradition. I consider myself
to be of this group. Rather than running tightly controlled
experiments, marketing scientists are more likely to use
RECOMMENDATIONS
I feel that Lynch (1982) is correct in his urging of consumer behavior researchers to be more concerned about
external validity, however you wish to define it. In addition, I agree in principle with his recommendation that
even TA studies should seek some generalization,
although not necessarily in the same study. I therefore
make three recommendations for furthering consumer
3
behavior research in the twenty-first century.
My first recommendation is that every consumer
behavior article, both EA and TA, have a section at the end
discussing external validity concerns and suspected
boundary conditions that, of course, limit external validity.
Obviously, EA articles have a greater degree of external
validity due to their research objectives and design. In this
case, the authors should discuss the limitations of the
external validity generated from the study. However, the
key point is that researchers more interested in TA research
should not be absolved from external validity concerns.
These authors should either (1) combine their TA-focused
experiments with some other experimental or quasiexperimental design in the same article or (2) develop a
detailed description of the kind of study or studies necessary to develop a greater confidence that it is not only the
theory that can be replicated and generalized but the
empirical results as well.
Why is this focus on external validity more important in
the future than today? Given the amount of attention this
topic has received, some would argue, of course, that it has
and should always been important. Researchers primarily
interested in TA research without concerns for external
validity are, of course, some of the best academics in marketing and highly valued colleagues to all of us. However,
there is a big distinction between being a social or cognitive psychologist in a psychology department versus being
a member of a marketing department or group in a business school.
As we all know, business schools have been under
increasing pressure to be “relevant” to the business world.
This need to be relevant has resulted in a number of
changes in business school faculty hiring practices, for
example. It is no longer enough to be only a promising
scholar to get a job at a top school. The most sought-after
candidates in the late 1990s also must have the potential to
be excellent teachers so that deans can mollify impatient
MBA students. This is sometimes referred to as the Business Week effect, resulting from the magazine’s biannual
survey of alumni and recruiters resulting in rankings of the
top U.S. schools. In addition, business school communications offices are developing newsletters and other
Winer / EXPERIMENTATION IN THE 21ST CENTURY
scanner panel or other secondary data to test consumer
behavior hypotheses. Often, the tests involve specifying
alternative models of consumer decision making, estimating the models, and then choosing the model with the best
fit or out-of-sample predictions as being most consistent
with the specified behavior.4 Alternatively, estimated values and statistical significance of the parameters of the
models are interpreted as providing evidence of the underlying consumer behavior.
Secondary data sources such as scanner panel data are
particularly appropriate for assessing external validity for
a wide variety of consumer behaviors. Scanner data and its
predecessor, diary panel data, have been analyzed by marketing academics for 40 years, beginning with Kuehn’s
studies in the late 1950s of the brand choice process
(Kuehn 1962). Scanner data present the researcher with
actual consumers making purchases in their real environment, the supermarket. The data are collected in an unobtrusive way simply by scanning in a bar-coded panel membership card that identifies the household and subsequently
the purchases made on that shopping trip through the bar
codes on the products. Researchers obtain measures of
brand choice, quantity, price paid, promotions used, time
of day, day of week, store choice, and several in-store
“causal” variables such as whether a particular brand was
being featured in the store. Some scanner panel data sets
also provide measures of television advertising exposure
(“single source” scanner data) and other measures such as
whether a brand was featured in a newspaper during a particular week.
Scanner data are not perfect. We do not know which
person in the household is making the purchases (except,
of course, for single-person households). This is important: for food items, multiple brand or flavor purchases can
represent different household preferences that are
unknown. We also do not have any consumption data. In
addition, while the samples are much more representative
than they were using the old diary technology and the data
are collected easily with no effort on the part of the panel
members, there are always questions about the kind of
people who agree to be on these panels as well as the “mortality” issue of panel dropouts. Finally, and important for
consumer behavior researchers, there are no process measures (e.g., attitudes) taken, as only purchasing behavior is
measured.
Despite these problems, scanner panel data represent
real people making real decisions in a real environment.
These three characteristics uniquely distinguish panel data
work from laboratory experiments. It is obvious what the
trade-offs are: internal validity (experiments) for external
validity (scanner data). However, what scanner panel data
offer is more than a realistic setting. If laboratory results
hold in an analysis of one or more scanner panel data sets,
this gives confidence that the lab results are not likely to
353
change by varying what Lynch (1982) referred to as background factors. That is, what the realistic setting provides
is not just the supermarket but the fact that most of the
background factors that a researcher cannot hold constant
are at work in the real world. The background factors that
cannot be controlled in the lab naturally vary in a supermarket. Different people in the household shop (students,
parents, retired people, etc.), products are sold at different
shelf heights, babies may or may not be screaming, brands
may or may not be on sale, and so forth. If results from the
lab can hold in this kind of “dirty” environment, what we
have is a strong form of external validity.
Thus, my “ideal” form of the research process is a lab
experiment in conjunction with a “natural” experiment
like scanner panel data, my third recommendation. Note
that the process does not have to work in the direction
experiment → scanner data. It is also possible for results
found in scanner data studies to be given internal validity
using lab experiments. I do not expect the same person to
do both kinds of work; however, partnerships between
scholars trained from different perspectives can and have
brought complementary insights and skills to bear on a
research problem that has resulted in greater generalizability to the results than would have resulted from either
alone.
ILLUSTRATIONS
I offer three examples of such partnerships that show
how laboratory experiments and scanner panel studies can
be used complementary for the purposes of generating
external validity.
Simonson (1990)/Simonson
and Winer (1992)
The research question posed by Simonson (1990) was,
How does purchase quantity and timing affect variety? In
particular, one of the main hypotheses was the following:
Consumers who simultaneously choose multiple
items in a category for sequential consumption are
more likely to choose different items than consumers who sequentially make the same number of
choices.
There were several motivations behind this hypothesis.
The desire for a “varied experience” has been widely
found to be an important driver of purchase and consumption. This is likely to be true when consumers are uncertain
about future preferences, particularly when multiple purchases are made simultaneously. Choosing variety also reduces risk and is an efficient strategy when someone is
having difficulty making a decision.
354
JOURNAL OF THE ACADEMY OF MARKETING SCIENCE
SUMMER 1999
Several studies were done to test this hypothesis. The
studies all used undergraduates in either business or psychology courses. The settings were classrooms or similar
environments. There was some variation in the reality of
the setting as one of the studies involved real choices (the
subjects did not have to spend their own money) over a
period of 3 weeks versus simulated choices based on the
subjects’imaginations. In addition, there was some variety
over product categories across the studies. The experiments all confirmed the above hypothesis in that, for
example, subjects who simultaneously chose three snacks
for three future consumption occasions were more likely
to select a variety of items than were those who chose three
snacks sequentially, one on each consumption occasion.
each purchase of an item counted as one observation. The
model was as follows:
As a result, we have an interesting finding with high
internal validity but little external validity. Student subjects in laboratory environments did indeed exhibit behavior supporting increased preference for variety when making simultaneous versus sequential choices. Although
some managerial implications of the research were discussed (as usual), no attempt was made to generalize the
findings to other settings or populations.
To extend these laboratory results, Simonson and
Winer (1992) analyzed a scanner panel data set from the
yogurt product category. Clearly, the kind of control available in the lab is not available with such secondary data.
With scanner data, the researcher has to do the best job
possible replicating the lab environment in an ex post fashion. A key issue was how to define variety within a product
category rather than between categories. If the key
dependent variable cannot be replicated, then the scanner
panel study may be producing interesting results, but they
are not producing results useful for the purposes of external validity.
In this study, variety was defined based on the frequency with which a particular flavor was purchased over
the entire purchasing history available. Each household
purchase of a unit of yogurt was assigned a variety index
based on that flavor’s share of all purchases. As a result,
smaller variety indexes were associated with flavors
purchased less often. The “amended” hypothesis was
the following:
As the number of items purchased in a category on a
given occasion increases, consumers are more likely
to choose flavors that they do not usually buy when
making fewer purchases.
In other words, the study used a within-household design
in which the hypothesis would be supported if more purchases on an occasion led to flavors with lower variety indexes being purchased.
Using households with more than 10 purchases of the
category, regressions were run for each household with
Variety index = f(number of items
purchased on that occasion, price, promotion).
Price and promotion were included to control for their effects. The hypothesis would be supported if the coefficients across households on the number of items
purchased were significantly negative. Of the 1,694 coefficients (one for each household across purchase occasions),
63 percent were significantly negative and 37 percent were
either insignificant or positive. To control for intrahousehold taste differences, the results were replicated on
single-person households. Thus, the scanner panel data results supported the lab studies: greater purchase quantity
on an occasion leads to greater variety chosen.
Kahn and Raju (1991)
In this article, the authors draw a distinction between
variety-seeking consumers in which the previous purchase
decreases the probability of the same brand being purchased on the next occasion and reinforcement consumers
in which the previous purchase increases that same probability. The issues examined are how the frequency of
price discounts affects the choice behavior of these two
groups and how the effects of discounts are mediated by
whether the brand in question is a major (large share, base
probability of purchasing >.5) or minor (p < .5) brand.
Based on a mathematical formulation and assuming
that the market consists of two brands with only one brand
promotion at a point in time, the authors develop two
propositions:
1. For all promotional frequencies (including no
promotion), the probability of choosing a major
brand is higher for reinforcement consumers
than for variety-seeking consumers. However,
an increase in the frequency of discounts has a
greater impact on purchase probabilities for
variety-seeking consumers.
2. For minor brands, when promotions are infrequent (including no promotion), the long-run
choice probability for a minor brand is higher for
the variety-seeking segment than for the reinforcement segment. However, when promotions
are frequent, the long-run choice probability of
choosing the minor brand is the reverse, that is, it
is lower for the variety-seeking segment than for
the reinforcement segment. In addition, an increase in the frequency of discounts has a greater
impact on purchase probabilities for reinforcement consumers.
An implication of the second proposition is that, for minor
brands, there is an interaction effect between the frequency
of promotion and the type of brand. Taking the first and
Winer / EXPERIMENTATION IN THE 21ST CENTURY
second proposition together implies a three-way interaction between (a) purchasing behavior (reinforcement or
variety seeking), (b) brand (major or minor), and (c)
whether the brand of interest is promoted.
These implications of the model were tested in a laboratory experiment run using 25 undergraduate business students. Using a within-subject design, subjects were asked
to make a series of choices between two hypothetical
brands of furniture polish. Each brand was described by its
attributes, brand name, and price. Subjects had to make 6
sets of 20 brand choices representing two brand choice
conditions (variety seeking and reinforcement) by three
promotion conditions (no promotion, promotion only on
one brand designated as the major brand, promotion only
on the minor brand). To simulate the brand choice conditions, subjects were told either that repeated polishings of a
desk by the same brand were necessary for maximum
benefit or that switching was necessary to avoid wax
buildup.
The findings generally supported the propositions. In
the no-promotion condition, the market share of the major
brand was larger under the reinforcement condition than
under the variety-seeking condition; the results were the
opposite for the minor brand. For the minor brand, increasing the frequency of promotion (from no discount to a discount) had a relatively larger effect in the reinforcement
condition than in the variety-seeking condition; the opposite effect for the major brand was insignificant. Finally,
the hypothesized three-way interaction was significant.
These results were extended using scanner panel data
from the cracker product category. Two brands were
selected from the saltine subcategory, one with a significantly larger share than the other. In the flavored cracker
subcategory, five brands were analyzed excluding private
labels. In both subcategories, panelists were assigned to
either the variety-seeking or reinforcement behavior segments based on their purchase patterns in periods of low
promotion. Thus, like the previous illustration, the lack of
control is manifested by the inability of the researchers to
randomly assign subjects to the different treatment groups;
that is, the experiment is “natural.” Again, while the effects
for the minor brands were somewhat weaker for the flavor
cracker subcategory, the propositions developed from the
authors’ theory were supported by the scanner panel data
analysis. These results clearly provide external validity for
those found in the laboratory experiment.
Leclerc and Little (1997)
In this study, the authors examine the role of advertising
copy in free-standing insert (FSI) promotions. In particular, the authors are interested in how the creativity of the
advertising copy accompanying the clip-out portion of the
promotion affects attitudes toward the promoted brands
and how these effects might vary over two segments
355
targeted by such promotions: customers loyal to competitors’ brands and brand switchers.
The theoretical arguments behind the authors’ propositions are based on work that has found that a person’s motivation to process arguments moderates persuasion. When
consumers examine an FSI, their motivation to process the
information will depend on their brand loyalty or commitment and the level of involvement in the product category.
Because of their preexisting attachment to a brand, loyal
customers have little reason to process the information in
an FSI. On the other hand, brand switchers have not made
a decision about which brand to purchase and are therefore
more likely to be motivated to examine the information in
an FSI, particularly in high-involvement product categories. Neither group, loyals or switchers, is likely to be
motivated to process FSI information in low-involvement
categories.
The authors first conducted an experiment with three
kinds of FSIs, each with a basic product display and a
headline: (a) the product display and headline only, (b) the
first ad plus brand information, and (c) the first ad plus an
attractive picture. The category was a high-involvement
beverage (cranberry juice). Based on the theory, brandloyal consumers should not be affected by advertisement
(b), as they have low motivation to process the information. However, because advertisement (c) did not involve
any processing but did contain an attractive peripheral cue,
loyal customers should be influenced positively toward
the advertised brand. Brand switchers should find advertisement (b) more appealing, as they are more motivated to
process information. However, they should not find (c)
appealing, as they are motivated to process information
and do not find any in that advertisement associated with
the FSI. Finally, advertisement (a) should not have any
impact on brand attitudes.
More formally, the first experiment tested the following
two hypotheses:
Hypothesis 1A: Brand loyalty interacts with executional
cues: for customers with high loyalty to a competitive brand, an advertisement featuring an attractive
picture and no target brand information will generate a more positive attitude (and higher propensity to
clip) than an advertisement providing brand information. For customers with low loyalty (switchers),
an advertisement featuring brand information will
generate a more positive attitude (and a higher propensity to clip) than an advertisement featuring an
attractive picture and no brand information.
Hypothesis 1B: Compared to an advertisement featuring
product display only, advertisements featuring executional cues will generate a more positive attitude
toward the brand (and a higher propensity to clip).
A laboratory experiment was conducted to test these
two hypotheses. The subjects, who averaged 37 years of
356
JOURNAL OF THE ACADEMY OF MARKETING SCIENCE
SUMMER 1999
age, were staff members of a university. The experimental
brand was real, and the subjects were asked to evaluate
FSIs similar to those actually found in Sunday newspapers. Subjects were screened for perceptions of the category as high involvement and for being nonusers of the
focal brand (since the focus of the study is on brand
switchers and consumers loyal to a competing brand). The
subjects were assigned randomly to one of three experimental conditions: (1) information-oriented advertisements, (2) a pleasant picture with no brand information,
and (3) a picture of the package only (no product information and no pleasant picture).
The experimental results supported the hypotheses.
When loyalty to the competing brand was high, the advertisement with the attractive picture generated more positive brand attitude than the advertisement with brand
information. In addition, as loyalty decreased (switching
propensity increased), the information-oriented ad
worked increasingly well. The result supporting Hypothesis 2, executional cues are superior to product display only,
was in the right direction but not significant. With respect
to actual clipping behavior, all the results were in the
appropriate direction but statistically insignificant. Overall, there was reasonably strong support for the hypothesized interaction between copy execution cues and loyalty
in the one category studied.
The authors attempted to support the general conceptual framework (motivation to process information moderates persuasion) using cross-sectional data measuring FSI
behavior in a scanner panel. With these data, loyalty measures are obviously not available, but there were multiple
categories and opportunities to measure degree of involvement with those category purchases. Thus, the authors
attempted to measure the interaction effects of involvement and executional cues on FSI redemption behavior.
The latter was measured by what is termed coupon efficiency, the proportion of coupon redemptions representing
incremental sales.
Although the previous studies focused on highinvolvement products, the framework also predicts that advertisements with brand information should not affect attitudes for switchers in low-involvement product categories
since switchers are not motivated to process information.
Thus, a hypothesis (numbered H3 in the article) is the following:
rine, fruit juice, bar soap, and breakfast cereals. Each
coupon was coded by independent raters for two executional cues: the degree of brand information and the degree
of visual elements (7-point scales). Each product category
was also evaluated by independent raters for involvement.
The model estimated was the following:
Hypothesis 3: Brand information will have no effect on
coupon efficiency for product categories generating
low levels of consumer involvement. As involvement increases, the effect of brand information on
efficiency will increase.
The authors studied 387 coupons evaluated by IRI’s
CouponScan service along with their measured efficiencies for six product categories: cookies, crackers, marga-
Efficiency = f( information, visual elements,
involvement, brand share, coupon face value,
Information × Involvement, Visual Elements × Involvement).
The two variables, brand share and coupon face value,
were included as control variables, and the Visual Elements × Involvement interaction tested an additional hypothesis not covered in this article.
As predicted, the Information × Involvement interaction was significant and positive. Also consistent with
Hypothesis 3, the main effect of information was insignificant. Thus, the scanner panel analysis adds a strong degree
of external validity to the laboratory experiments and, concomitantly, to the behavioral framework developed by the
authors.
CONCLUSION
These three illustrations show that there are considerable benefits from collaborations between scholars interested in studying consumer behavior from the psychological perspective and those interested in modeling behavior
using marketing science approaches. In all three studies,
research hypotheses were first developed using relevant
prior literature from psychology and/or marketing, controlled experiments were run, and then the experimental
results were generalized using scanner panel data. Interestingly, there was some variation in the experimental
work in terms of EA versus TA research. The Leclerc and
Little (1997) experimental study was clearly an example
of EA, as the design was intended to mimic as much as
possible the real-world environment in which consumers
clip FSI coupons. A careful reading of the Kahn and Raju
(1991) article also indicates that, despite using student
subjects and more artificial stimuli, the intent of the
experimental work is more in line with Calder et al.
(1981)’s notion of EA rather than TA. Simonson’s (1990)
work, however, is more clearly in the TA camp, as the
focus is more on the theoretical explanation for the finding
rather than whether the particular effect found would replicate in the real world.
Thus, scanner data were found to be useful for external
validity purposes for both EA and TA experimental studies. While the former may have less need than the latter for
empirical evidence favoring generalization, both kinds
benefit from such work. In addition, it can be seen that it is
not critical for both experimental and scanner studies to
Winer / EXPERIMENTATION IN THE 21ST CENTURY
appear in the same article. What is interesting is that the
EA experimental studies appeared in the same articles as
the scanner studies. A conjecture is that the researchers
had real-world effects in mind from the beginning that
affected the experimental designs.
It is clear that since the early 1980s when the external
validity debate appeared in Journal of Consumer Research, some movement in the direction of the recommendations of this article have already taken place. Many consumer behavior articles include the following:
• multiple studies in which previous results are replicated and different manipulations, subjects, or procedures are used;
• main-effects experiments followed by studies with
interaction effects looking for boundary conditions
and reversals;
• measuring subject motivation and involvement with
the experimental task and removing subjects who
fail these screens;
• increasing the realism of tasks; and
• using covariates to control for individual and situational interactions.
A good example of a study using some of these experimental improvements is Inman, Peter, and Raghubir
(1997). In this article, the authors examine the use of purchase restrictions as information used by consumers in
evaluating promotions. The authors used three different
research methods (grocery sales data, a simulated grocery
store experiment, and a survey), different samples (West
Coast, Midwest, and Hong Kong), and three different operationalizations of restrictions (purchase quantity limit,
purchase precondition, and time limit) to demonstrate that
imposing such purchase restrictions consistently increases
the choice probability of the restricted brand.
I should emphasize the fact that, although it was not the
topic of this article, much can be learned by reversing the
order of studies. In other words, following up scanner studies purporting to find some behavioral phenomenon with a
lab experiment brings much-need internal validity to the
research stream. This is to be encouraged as well. For
example, in Simonson and Winer (1992), the scanner
study that followed the lab study was itself followed by an
experiment to test some interesting questions raised by the
scanner data results concerning the display of products in
terms of groupings by flavor or brand. Thus, my argument
is not completely one-sided: marketing scientists have an
obligation to consider the experimental implications of
their work as well.
Thus, I strongly encourage journal editors to consider
the policy recommended in this article: behavioral
researchers focusing on TA should be asked to give specific recommendations concerning increasing the external
validity of that stream of research. Modelers should also be
357
encouraged to think of how experiments could be designed
to support their empirical analyses. In addition, research
joint ventures should be strongly supported. One way for
this to happen is through doctoral seminars: discussions of
TA work could include what kind of secondary data would
be useful for the purposes of external validity. Likewise,
marketing model seminars can assign students to thinking
about experimental work that can support empirical studies using secondary data like scanner data.
Marketing academics should and usually do think
about a research area as a stream of research. Recent questions have been raised about exactly what should constitute the “next” study in a research stream from a metaanalysis perspective (Farley, Lehmann, and Mann 1998).
In this article, I take the view that the next study, if necessary, should focus on external validity issues. These issues
will be central to our roles as marketing academics in business schools as we move into the twenty-first century.
ACKNOWLEDGMENTS
The author appreciates comments made on an earlier
draft by Priya Raghubir, Itamar Simonson, and Joydeep
Srivastava.
NOTES
1. In practice, it may be difficult to classify a given study as belonging
to one group or the other. An effects application study may share many of
the characteristics of a theory application study with the difference being
focused on the intent to find the experimental results in the real world.
2. I should be clear that Lynch (1982) did not completely disagree
with all of the arguments set forth in the Calder, Phillips, and Tybout
(1981) article.
3. My major focus is on consumer behavior research using classical
scientific methods rather than other approaches such as interpretive
methods.
4. See, for example, Stiving and Winer (1997).
REFERENCES
Calder, Bobby J., Lynn W. Phillips, and Alice M. Tybout. 1981. “Designing Research for Application.” Journal of Consumer Research 8
(September): 197-207.
, , and . 1982. “The Concept of External Validity.”
Journal of Consumer Research 9 (December): 240-244.
Campbell, Donald T. and Julian C. Stanley. 1963. Experimental and
Quasi-Experimental Designs for Research. Boston: Houghton
Mifflin.
Farley, John U., Donald R. Lehmann, and Lane H. Mann. 1998. “Designing the Next Study for Maximum Impact.” Journal of Marketing Research 35 (November): 496-501.
Ferber, Robert. 1977. “Research by Convenience.” Journal of Consumer
Research 4:57-58.
Inman, J. Jeffrey, Anil C. Peter, and Priya Raghubir. 1997. “Framing the
Deal: The Role of Restrictions in Accentuating Deal Value.” Journal
of Consumer Research 24 (June): 68-79.
358
JOURNAL OF THE ACADEMY OF MARKETING SCIENCE
Kahn, Barbara E. and Jagmohan S. Raju. 1991. “Effects of Price Promotions on Variety-Seeking and Reinforcement Behavior.” Marketing
Science 10 (Fall): 316-337.
Kuehn, Alfred A. 1962. “Consumer Brand Choice—A Learning Process?” Journal of Advertising Research 2 (December): 10-17.
Leclerc, France and John D. C. Little. 1997. “Can Advertising Copy
Make FSI Coupons More Effective?” Journal of Marketing Research
34 (November): 473-484.
Lynch, John G., Jr. 1982. “On the External Validity of Experiments in
Consumer Research.” Journal of Consumer Research 9 (December):
225-239.
McGrath, Joseph E. and David Brinberg. 1983. “External Validity and
the Research Process: A Comment on the Calder/Lynch Dialogue.”
Journal of Consumer Research 10 (June): 115-124.
Petty, Richard E. and John T. Cacioppo. 1981. Attitudes and Persuasion:
Classic and Contemporary Approaches. Dubuque, IA: William C.
Brown.
Simonson, Itamar. 1990. “The Effect of Purchase Quantity and Timing
on Variety Seeking Behavior.” Journal of Marketing Research 27
(May): 150-162.
and Russell S. Winer. 1992. “The Influence of Purchase Quantity
and Display Format on Consumer Preference for Variety.” Journal of
Consumer Research 19 (June): 133-138.
Stiving, Mark and Russell S. Winer. 1997. “An Empirical Analysis of
Price Endings with Scanner Data.” Journal of Consumer Research 24
(June): 57-67.
Wells, William D. 1993. “Discovery-oriented Consumer Research.”
Journal of Consumer Research 19 (March): 489-504.
SUMMER 1999
ABOUT THE AUTHOR
Russell S. Winer is the J. Gary Shansby Professor of Marketing
Strategy, the associate dean for academic affairs, and the chair of
the marketing group at the Haas School of Business, University
of California at Berkeley. He received a B.A. in economics from
Union College (New York) and an M.S. and Ph.D. in industrial
administration from Carnegie-Mellon University. He has been
on the faculties of Columbia and Vanderbilt universities and has
been a visiting faculty member at M.I.T., the Helsinki School of
Economics, the University of Tokyo, and Òcole Nationale des
Ponts et Chausées. He has written three books, Marketing Management, Analysis for Marketing Planning, and Product Management, and has authored more than 50 papers in marketing on a
variety of topics, including consumer choice, marketing research
methodology, marketing planning, advertising, and pricing. He
is the editor of the Journal of Marketing Research and is on the
editorial boards of the Journal of Marketing, Journal of Consumer Research, and the Journal of Interactive Marketing. He is
the academic director of the Fisher Center for the Strategic Use of
Information Technology. He has participated in executive education programs around the world and is an academic trustee of the
Marketing Science Institute.