SERVOUAL Concems
Measuring
Information Systems
Service Quaiity:
Concerns on the Use
of the SERVQUAL
Questionnaire^
By: Thomas P. Van Dyke
Department of Information Systems
and Technologies
College of Business and Economics
Weber State University
3804 University Circie
Ogden, Utah 84408-3804
U.S.A.
tvandyke @ weber.edu
Leon A. Kappelman
Business Computer Information
Systems Department
Coilege of Business Administration
University of North Texas
Denton, Texas 76203-3677
U.S.A.
kapp@unt.edu
Victor R. Prybutok
Business Computer information
Systems Department
Coilege of Business Administration
University of North Texas
Denton, Texas 76203-3677
U.S.A.
'Robert Zmud was the accepting senior editor for this paper.
Abstract
A recent MIS Quarterly article rightfully points
out that service is an important part of the role
of the information systems (IS) department and
that most IS assessment measures have a
product orientation (Pitt et al. 1995). The article
went on to suggest the use of an IS-contextmodified version of the SERVQUAL instrument
to assess the quality of the services supplied by
an information services provider (Parasuraman
et al. 1985, 1988, 1991).^ However, a number
of problems with the SERVQUAL instrument
have been discussed in the literature (e.g.,
Babakus and Boiler 1992; Carman 1990;
Cronin and Taylor 1992, 1994; Teas 1993).
This article reviews that literature and discusses
some of the implications for measuring sen/ice
quality In the information systems context.
Findings indicate that SERVQUAL suffers from
a number of conceptual and empirical difficulties. Conceptual difficulties include the operationalization of perceived service quality as a
difference or gap score, the ambiguity of the
expectations construct, and the unsuitability of
using a single measure of service quality across
different industries. Empirical problems, which
may be linked to the use of difference scores,
include reduced reliability, poor convergent
validity, and poor predictive validity. This suggests that (1) some alternative to difference
scores is preferable and should be utilized; (2) if
used, caution should be exercised in the interpretation of iS-SERVQUAL difference scores;
and (3) further work is needed in the development of measures for assessing the quality of
iS services..
Keywords: IS management, evaluation, measurement, service quality, user attitudes,
user expectations
iSRL Categories: AI0104, AlOlOg, A104,
E10206.03, GB02, GB07
The terms "information services provider" or "IS services
provider* are used to refer to any provider of information
systems services. This inciudes the information systems
function within an organization as weii as extemai vendors
of information systems products and/or services.
MIS Ouarterly/June 1997
195
Research Note
Introduction
Due to the growth of outsourcing, end-usercontrolled information assets, joint ventures,
and other alternative mechanisms by which
organizations are meeting their need for information systems services, IS managers are
increasingly concerned about improving the
perceived (as well as actual) service quality of
the IS function (Kettinger and Lee 1994). In
recent years, the use of SERVOUAL-based
instruments has become increasingly popular
with information systems (IS) researchers.
However, a review of the literature suggests
that the use of such instruments may result in
a number of measurement problems.
A recent article makes several important contributions to the assessment and evaluation of
the effectiveness of information systems (IS)
departments in organizations (Pitt et al. 1995).
The article:
1. Points out that, although service is an
important part of the role of the IS department, most IS assessment measures have
a product orientation.
2 Proposes an extension of the categorization
of IS success measures (DeLone and
McLean 1992) to include service quality.
3. Proposes the use of the SERVOUAL instrument from marketing (Parasuraman et al.
1985, 1988, 1991) to operationalize the IS
service quality construct and modify the
wording of the instrument to better accommodate its use in an IS context.
4. Adapts and augments a theory regarding
the determinants of service quality expectations to an IS context and offers ideas for
future research.
A number of studies, however, identify potential difficulties with the SERVOUAL instrument
(e.g., Babakus and Boiler 1992; Carman 1990;
Cronin and Taylor 1992, 1994; Teas 1993,
1994). This research note reviews some of the
literature regarding difficulties with the
196
MIS Quarteriy/June 1997
SERVOUAL instrument in general and examines the Implications of these difficulties to the
use of the instrument in an IS context.
The SERVQUAL Instrument:
Problems Identified in the
Literature
The difficulties with the SERVOUAL instrument
identified in the literature can be grouped into
two main categories: (1) conceptual and (2)
empirical; although, the boundary between
them blurs because they are closely inter-related. The conceptual problems center around
(1) the use of two separate instruments, one
for each of two constructs (i.e., perceptions
and expectations), to operationalize a third
conceptually distinct construct (i.e., perceived
service quaiity) that is itself the result of a complex psychological process; (2) the ambiguity
of the expectations construct; and (3) the suitability of using a single instrument to measure
service quality across different industries (i.e.,
content validity). The empirical problems are,
by and large, the result of these conceptual difficulties, most notably the use of difference
scores, in conjunction with the a theoretical
nature of the process used in the construction
of the original five dimensions of service quality. The empirical difficulties most often attributed to the SERVOUAL instrument include low
reliability, unstable dimensionality, and poor
convergent validity. A review of these conceptual and empirical problems should serve to
caution those who wish to use SERVOUAL to
measure the service quality of an information
system provider.
Conceptual difficuities with
SERVQUAL
Subtraction as a "Simuiation" of a
Psychoiogicai Process
Many of the difficulties associated with the
SERVQUAL instrument stem from the opera-
SERVQUAL Concems
tionalization of a service quality construct that
is theoretically grounded in a discrepancy or
gap model. In conceptualizing service quality,
Parasuraman et al. 1985, 1988, 1991, 1994b)
use the "service quality model," which posits
that one's perception of service quality is the
result of an evaluation process whereby "the
customer compares . . . the perceived service
against the expected semce" (Gronroos 1984,
p. 37).
hopes to receive (e.g., Parasuraman et al.
1985,1988, 1991; ZeithamI et al. 1993). These
multipie definitions and corresponding operationalizations of "expectations" in the
SERVQUAL literature result in a concept that
is loosely defined and open to multiple interpretations (Teas). Yet even when concise definitions are provided, various interpretations of
the expectations construct can result in potentially serious measurement validity problems.
Rather than develop an instrument to directly
measure the perception of service quaiity that
is the outcome of this cognitive evaluation
process, the SERVQUAL instrument
(Parasuraman et al. 1988, 1991) separately
measures the expected level of service and
the experienced level of service. Then service
quality scores are calculated as the difference
between these two measures. These three
sets of scores are commonly referred to as
expectation (E), perception (P), and
SERVQUAL (whereby, SERVQUAL = P - E).
Although not without precedent, and certainly
worthy of fair empirical evaluation, the implicit
assumption that subtraction accurately portrays this cognitive process seems overly simplistic. Even if one fully accepts this discrepancy model of experiences vis-^-vis expectations
as indicative of the general process whereby
one arrives at an evaluation of a service experience, the notion that the specific mechanism
is mereiy subtraction does not logically follow.
The use of differences was, and remains, an
operational decision. Regrettably, it does not
appear to have been a particularly good one.
The direct measurement of one's perception of
service quality that is the outcome of this cognitive evaluation process seems more likely to
yield a valid and reliable outcome. If the discrepancy is what one wants to measure, then
one should measure it directly.
The conceptualization of "expectations" consistent with the SERVQUAL model is the vector attribute interpretation—"that is one on
which a customer's ideal point is at an infinite
level" (Parasuraman et al. 1994, p. 116).
Unfortunately, as the proportion of extreme
responses (e.g., seven on a seven point scale)
increases, the expectation scores become less
useful as an increasing proportion of the variation in difference-based SERVQUAL scores is
due only to changes in perception scores.
Ambiguity of the "Expectations" Construct
Teas (1994) notes that SERVQUAL expectations are variously defined as desires, wants,
what a service provider should possess, normative expectations, ideal standards, desired
service, and the level of service a customer
Teas (1993) found three different interpretations
of "expectations" derived from an analysis of
follow-up questions to an administration of the
SERVQUAL questionnaire. Qne interpretation
of expectations is as a forecast or prediction.
The forecast interpretation of expectations cannot be discriminated from the disconfirmed
expectations model of consumer satisfaction
(Oliver 1980). This interpretation is inconsistent
with the definition of service quality put forth by
Parasaraman et al. (1988) and results in a discriminant validity problem with respect to consumer satisfaction. A second interpretation of
expectations is as a measure of attribute importance. When respondents use this interpretation, the resulting perception-minus-expectation
scores exhibit an inverse relationship between
attribute importance and perceived service
quality, all other things being equal.
The third interpretation identified by Teas
(1993) is the "classic ideal point" concept.
Parasuraman et al. (1994) describe this when
they note that "the P-E [i.e., perceptionsminus-expectations] specification could be
problematic when a service attribute is a dassic ideai point attribute—that is one on which a
customer's ideal point is at a finite level and
therefore, performance beyond which will dis-
MIS Quarterly/June 1997
197
Research Note
please the customer (e.g., friendiiness of a
salesperson in a retail store)" (p. 116). This
interpretation of expectations results in an
inverse of the relationship between the
SERVQUAL score, calculated as perception
minus expectation (P - E), and actual service
quaiity for all values when perception scores
are greater than expectation scores (i.e., P >
E). This interpretation is consistent with the
finding that user satisfaction scores were highest when actual user participation was in congruence with the user's need for participation,
rather than merely maximized (Doll and
Torkzadeh 1989).
These various interpretations of the "expectation" construct lead to a number of measurement problems. The findings suggest that a
considerable portion of the variance in the
SERVQUAL instrument is the result of measurement error induced by respondent's varying interpretations of the "expectations" construct (Teas 1993).
Three separate types of expectations have
been described (Boulding et al. 1993): (1) the
will expectation, what the customer believes
will happen in their next service encounter; (2)
the should expectation, what the customer
believes should happen in the next service
encounter; and (3) an ideal expectation, what
a customer wants in an ideal sense. The ideai
interpretation of expectation is often used in
the SERVQUAL literature (Boulding et al.
1993). Boulding et al. (1993) differentiate
between shouid and ideai expectations by stating that what customers think shouid happen
may change as a result of what they have
been told to expect by the service provider, as
well as what the consumer views as reasonable and feasible based on what they have
been told and their experience with the firm or
a competitor's service. In contrast, an ideal
expectation may "be unrelated to what is reasonable/feasible and/or what the service
provider tells the customer to expect"
(Boulding et al. 1993, p. 9). A series of experiments demonstrated results that were incompatibie with the gap model of service quality
(Boulding et al. 1993). Instead, the results
demonstrate that service quality is influenced
only by perceptions. Moreover, the results indi-
198
MIS Quarterly/June 1997
cate that perceptions are influenced by both
will and should expectations, but in opposite
directions. Increasing w///expectations leads to
a higher perception of service quality whereas
an increasing expectation of what should be
delivered during a service encounter will actually decrease the ultimate perception of the
quality of the service provided (Boulding et al.
1993). Not only do these findings faii to support the gap model of service quality, but these
results also demonstrate the wildly varying
impact of different interpretations of the expectations construct.
Different methods to operationalize "expectations" in developing their IS versions of
SERVQUAL have been used (Pitt el al. 1995;
Kettinger and Lee 1994). Qne study used the
instructions to the survey to urge the respondents to "think about the kind of IS unit that
would deliver excellent quality of service" (Pitt
et al. 1995). The items then take a form such
as: El They wiii have up-to-date hardware and
software. Whereas the second study (Kettinger
and Lee 1994) used the form: E1 Excellent
college computing services will have up-todate equipment
Recall that some respondents to SERVQUAL
were found to interpret expectations as forecasts or predictions (Teas 1993). This interpretation corresponds closely with the will expectation (Boulding et al. 1993). It is easy to see
how this interpretation might be formed especially with the "They will" phrasing (Pitt et al.
1995). Unfortunately, the impact of the will
expectation on perceptions of service quality is
opposite from that intended by the
SERVQUAL authors and the (P-E) or gap
model of service quality (Boulding et al. 1993).
In summary, a review of the literature indicates
that respondents to SERVQUAL may have
numerous interpretations of the expectations
construct and that these various interpretations
have different and even opposite impacts on
perceptions of service quality. Moreover, some
of the findings demonstrate that expectations
influence only perceptions and that perceptions alone directly influence overall service
quality (Boulding et al. 1993). These findings
fail to support the (P-E) gap model of service
SERVOUAL Concems
quality and indicate that the use of the expectations construct as operationalized by
SERVQUAL-based instruments is problematic.
Applicability of SERVQUAL Across
Industries
Another often mentioned conceptual problem
with SERVQUAL concerns the applicability of
a single instrument for measuring service quality across different industries. Several
researchers have articulated their concerns on
this issue. A study of SERVQUAL across four
different industries found it necessary to add
as many as 13 additional items to the instrument in order to adequately capture the service quality construct in various settings, while
at the same time dropping as many as 14
items from the original instrument based on
the results of factor analysis (Carman 1990).
The conclusion arrived at was that considerable customization was required to accommodate differences in service settings. Another
study attempted to utilize SERVQUAL in the
banking industry (Brown et al. 1993). The
authors were struck by the omission of items
which they thought a priori would be critical to subject's evaluation of service quality. They
concluded that it takes more than simple adaptation of the SERVQUAL items to effectively
address service quality across diverse settings. A study of sen/ice quality for the retail
sector also concluded that utilizing a single
measure of service quality across industries is
not feasible (Dabholkar et al. 1996).
Researchers of service quality in the information systems context appear to lack consensus
on this issue. Pitt et al. (1995) state that they
could not discern any unique features of IS
that make the standard SERVQUAL dimensions inappropriate nor could they discern any
dimensions with some meaning of service
quality in the IS domain that had been excluded from SERVQUAL. Kettinger and Lee
(1994), however, found that SERVQUAL
should be used as a supplement to the UIS
(Baroudi and Qrlikowski 1988) because that
instrument also contains items that are infiportant determinants of IS service quality. Their
findings suggest that neither the UIS nor
SERVQUAL alone can capture all of the factors which contribute to perceived service
quality in the IS domain. For example, items
contained in the UIS include the degree of
training provided to users by the IS staff, the
level of communication between the users and
the IS staff, and the time required for new systems development and implementation, all of
which possess strong face validity as determinants of IS service quality. In addition,
Kettinger and Lee dropped the entire tangibies
dimension from their IS version of SERVQUAL
based on the results of confirmatory factor
analysis. These finding contradict the belief
that all dimensions of SERVQUAL are relevant
and that there are of no unique features of the
IS domain not included in the standard
SERVQUAL instrument (Pitt et al. 1995). It is
difficult to argue that items concerning the
manner of dress of IS employees and the visual attractiveness of IS facilities (i.e., tangibles)
should be retained as important factors in the
IS domain while issues such as training, communication, and time to complete new systems
development are excluded. We agree that
using a single measure of service quality
across industries is not feasible (Dabholkar et
al. 1996) and therefore future research should
involve the development of industry-specific
measures of service quality.
Empirical difficulties with the
SERVQUAL instrument
A difference score Is created by subtracting
the measure of one construct from the measure of another in an attempt to create a measure of a third distinct construct. For example,
in scoring the SERVQUAL instrument, an
expectation score is subtracted from a perception score to create such a "gap" measure of
service quality. Even if one assumes that the
discrepancy theory is correct and that these
are the only (or at least, the last) two inputs
into this cognitive process, it still raises the
question: Can calculated difference scores
operationalize the outcome of a cognitive discrepancy? It appears that several problems
with the use of difference scores make them a
MiS Ouarteriy/June 1997
199
Research Note
poor measure of psychological constructs
(e.g., Edwards 1995; Johns 1981; Lord 1958;
Peter et al. 1993; Wall and Payne 1973).
Among the difficulties related to the use of difference measures discussed in the literature
are low reliability, unstable dimensionality, and
poor predictive and convergent validities.
Reiiabiiity Probiems With Difference Scores
Many studies demonstrate that Cronbach's
(1951) alpha, a widely-used method of estimating instrument reliability, is inappropriate
for difference scores (e.g., Cronbach and
Furby 1970; Edwards 1995; Johns 1981; Lord
1958; Peter et al. 1993; Prakash and
Lounsbury 1983; Wall and Payne 1973). This
is because the reliability of a difference score
is dependent on the reliability of the component scores and the correlation between them.
The correct formula for calculating the reliability of a difference score (rD) is:
where r,, and r^ are the reliabilities of the two
component scores,CT,^and a^^ are the variances of the component scores, and r,2 is the
correlation between the component scores
(Johns 1981).
This formula shows that as the correlation of
the component scores increases, the reliability
of the difference scores is decreased. An
example was provided where the reliability of
the difference score formed by subtracting one
component from another with an average reliability of .70, and a correlation of .40, is only .50
(Johns 1981). Thus, while the average reliability of the two components is .70, which is considered acceptable (Pitt et al. 1995; cf.,
Nunnally 1978), the correlation between the
components reduces the reliability of the difference score to a level that most researchers
would consider unacceptable (Peter et al.
1993).
An example of the overestimatlon for the reliability caused by the misuse of Cronbach's
alpha can be found in the analysis of service
quality for a computer manufacturer
(Parasuraman et al. 1994a; see Table 1). Note
that Cronbach's alpha consistently overestimates the actual reliability for the difference
scores of each dimension (column 2). Also
note that the use of the correct formula for calculating the reliability of a difference score has
demonstrated that the actual reliabilities for the
SERVQUAL dimensions may be as much as
.10 lower than reported by researchers incorrectly using Cronbach's alpha. In addition,
these findings show that the non-difference,
direct response method results in consistently
higher reliability scores than the (P-E) difference method of scoring.
These results have important implications for
the IS-SERVQUAL (Pitt et al. 1995).
Table 1. Reliability of SERVQUAL: The iUlisuse of Cronbach's Alpha
Cronbachs' a
(Non-Difference)
Cronbachs' a
(Difference)
Johns' a for
Differences
(Difference)
Tangibles
.83
.75
.65
Reliability
.91
.87
.83
Responsiveness
.87
.84
.81
Assurance
.86
.81
.71
Empathy
.90
.85
.81
A Priori
Dimensions
Note: Difference scores calculated as perception minus expectation (P - E).
200
MIS Quarterly/June 1997
SERVOUAL Concems
Cronbach's alpha, which consistently overestimates the reliability of difference scores, was
used incorrectly. Even when using the inflated
alpha scores, Pitt et al. note that two of three
reliability measures for the tangibles dimension
fall below the 0.70 level required for commercial applications. Had they utilized the appropriate modified alpha, they may have concluded that the tangibles dimension is not reliable
in the IS context, a finding which would have
been consistent with the results of Kettinger
and Lee (1994).
A review of the literature clearly indicates that
by utilizing Cronbach's alpha, researchers tend
to overestimate the reliabilities of difference
scores especially when the component scores
are highly correlated: Such is the case with the
SERVOUAL instrument (Peter et al. 1993).
Predictive and Convergent Vaiidity issues
With Difference Scores
Another problem with the SERVOUAL instrument concerns the poor predictive and convergent validities of the measure. Convergent
validity Is concerned with the extent to which
multiple measures of the same construct agree
with each other (Cambell and Fiske 1959).
Predictive validity refers to the extent to which
scores of one construct are empirically related
to scores of other conceptually-related constructs (Bagozzi et al. 1992; Kappelman 1995;
Parasuraman et al. 1991).
One study reported that perceptions-only
SERVOUAL scores had higher correlations
with an overall service quality measure (i.e.,
convergent measure) and with complaint resolution scores (i.e., the predictive measure)
than did the perception-minus-expectation difference scores used with SERVOUAL
(Babakus and Boiler 1992). A different study ,
performed regression analyses in which an
overall single-question service quality rating
was regressed separately on both difference
scores (i.e., perception minus expectation)
and perception-only scores (Parasuraman et
al. 1991). The perception-only SERVOUAL
scores produced higher adjusted r-squared
values (ranging from .72 to .81) compared to
the SERVOUAL difference scores (ranging
from .51 to .71).
The predictive validity of difference scores, a
non-difference direct response score, and the
perceptions only scores for SERVOUAL in the
context of a financial institution hae been compared (Brown et al. 1993). Correlation analysis
was performed between the various scores
and a three-item behavioral intentions scale.
Behavioral intentions include such concepts as
whether the customer would recommend the
financial institution to a friend or whether they
would consider the financial institution first
when seeking new services. The results of the
study show that both the perceptions only (.31)
and direct response (.32) formats demonstrated higher correlations with the behavioral
intentions scale than did the traditional difference score (.26).
The superior predictive and convergent validity
of perception-only scores was confirmed
(Cronin and Taylor 1992). Those results indicated higher adjusted r-squared values for perception-only scores across four different industries. The perception component of the perception-minus-expectation score consistently
performs better as a predictor of overall service quality than the difference score itself
(Babakus and Boiler 1992; Boulding et al.
1993; Cronin and Taylor 1992; Parasuraman
etal. 1991).
Unstabie Dimensionaiity of the SERVQUAL
Instrument
The unstable nature of the factor structure for
the SERVOUAL instrument may be related to
the atheoretical process by which the original
dimensions were defined. The SERVOUAL
questionnaire is based on a multi-dimensional
model (i.e., theory) of service quality. A 10
dimensional model of service quality based
on a review of the service quality literature
and the extensive use of both executive and
focus group interviews was developed
(Parasuraman et al. 1985). During instrument
development, Parasuraman et al. (1988)
MIS Quarterly/June 1997
201
Research Note
Table 2. Unstable Dimensionality of SERVQUAL
Study
Instrument
Analysis
Factor
Structure
Carman (1990)
Four modified
SERVOUALs using
12-21 of the original
items
Principal axis factor
analysis with oblique
rotation
Five to nine factors
Bresinger and Lambert
(1990)
Original 22 items
Principal axis factor
anaiysis with obiique
rotation
Four factors with
eigenvalues > 1
Parasuraman, ZeithamI,
and Berry (1991)
Original 22 items
Principal axis factor
analysis with oblique
rotation
Five factors, but different
from a priori model.
Tangibles dimension spiit
into two factors, while responsiveness and assurance
dimensions loaded on a single
factor.
Finn and Lamb (1991)
Originai 22 items
LISREL confirmatory
factor analysis
Five-factor model
had poor fit.
Babakus and Boiler
(1991)
Original 22 items
(1) Principal axis factor
analysis with oblique
rotation.
(2) Confirmatory factor
analysis
(1) Five-factor modei
not supported
(2) Two factors
Cronin and Taylor (1992)
Originai 22 items
Principal axis factor
analysis with oblique
rotation
Unidimensional
structure
"Van Dyke and Popelka
(1993)
19 of original 22 items
Principal axis factor
analysis with oblique
rotation
Unidimensional
structure
*Pitt, Watson, and Kavan
(1995)
Original 22 items
Principal components
and maximum iikelihood
with verimax rotation
(1) Financial institution
seven-factor model with
tangibies and empathy spWt
into two.
(2) Consulting firm fivefactors, none matching the
original.
(3) Information systems
service firm—three-factor
model.
*Kettinger and Lee
(1994)
Original 22 items
LISREL confirmatory
factor analysis
Four-factor model, tangibles
dimension dropped.
Original 22 items
Principal axis factor
analysis with oblique
rotation
(1) Korea, three-factor
model, tangibies retained.
(2) Hong Kong, four-factor
model, tangibies retained.
Kettinger
•
, Lee, and Lee
(1995)
"Measured information systems service quality.
202
MiS Quarterly/June 1997
SERVQUAL Concerns
began with 97 paired questions (i.e., one for
expectation and one for perception), items
(i.e., question pairs) were first dropped on the
basis of within-dimension Cronbach coefficient aiphas, reducing the pool to 54 question
pairs. More items were then dropped or reassigned based on oblique-rotation factor
loadings and within-dimension Cronbach
coefficient alphas resulting in a 34 paired-item
instrument with a proposed seven-dimensional structure. A second data collection and
analysis with this "revised" definition and
operationalization of service quaiity resulted
in the 22 paired-item SERVQUAL instrument
with a proposed five-dimensional structure.
Two of these five dimensions contained items
representing seven of the original 10 dimensions. We are cautioned, however, that those
who wish to interpret factors as real dimensions shoulder a substantial burden of proof
(Gronbach and Meehl 1955). Moreover, such
proof must rely on more than just empirical
evidence (e.g., Bynner 1988; Galletta and
Lederer 1989).
The results of several studies have demonstrated that the five dimensions claimed for the
SERVQUAL instrument are unstabie (see
Tabie 2). SERVQUAL studies in the information systems domain have also demonstrated
the unstable dimensionality of the SERVQUAL
instrument. The service quality of IS services
was measured in three different industries, a
financial institution, a consulting firm, and an
information systems service business (Pitt et
al. 1995). Factor analysis was conducted using
principal components and maximum likelihood
methods with varimax rotation for a range of
models. Analysis indicated differing factor
structures for each type of firm. Analysis of the
results for a financiai institution indicated a
seven-factor modei with both the tangibles and
empathy dimensions split into two. These
breakdowns should not be surprising. Pitt et al.
note that "up-to-date hardware and software"
are quite distinct from physical appearances in
the IS domain. The empathy dimension was
created by the original SERVQUAL authors
from two distinctly different constructs, namely
understanding and access, which were combined due to the factor ioadings alone, without
regard to underlying theory. Not only did IS-
SERVQUAL not match the proposed model, its
factor structure varied across settings.
Analysis of the data from the consulting firm
resuited in a five-factor model although none
of these matched the originai a priori factors.
The factor analysis of the information systems
business data resulted in the extraction of only
three factors.
LISREL confirmatory factor analysis was used
on SERVQUAL data collected from users (i.e.,
students) of a college computing services
department (Kettinger and Lee 1994).
Analysis of this data resuited in a four factor
solution. The entire tangibies dimension was
dropped. An IS version of SERVQUAL was
used in a cross-national study (Kettinger et ai.
1995). Results of exploratory common factor
analysis with oblique rotation indicated a
three-factor model from a Korean sample and
a four-factor model was extracted from a Hong
Kong data set. The tangibles dimension was
retained in the analysis of both of the Asian
samples.
The unstable dimensionality of SERVQUAL
demonstrated in many domains, including
information services, is not just a statistical
curiosity. The scoring procedure for
SERVQUAL calls for averaging the P-E gap
scores within each dimension (Parasuraman et
al. 1988). Thus a high expectation coupled
with a iow perception for one item would be
canceled by a low expectation and high perception for another item within the same
dimension. This scoring method is oniy appropriate if all of the items in that dimension are
interchangeable. This type of analysis would
be justified if SERVQUAL demonstrated a
clear and consistent dimensional structure.
However, given the unstable number and pattern of the factor structures, averaging groups
of items to calculate separate scores for each
dimension cannot be justified. Therefore, for
scoring purposes, each item should be treated
individually and not as part of some a priori
dimension.
MIS Quarteriy/June 1997
203
Research Note
In summary, numerous problems with the original SERVQUAL instrument are described in
the literature (e.g., Babakus and Boiler 1992;
Carman 1990; Cronin and Taylor 1992, 1994;
Teas 1993, 1994). The evidence suggests that
difference scores, like the SERVQUAL perception-minus-expectation calculation, tend to
exhibit reduced reliability, poor discriminant
validity, spurious correlations, and restricted
variance problems (Edwards 1995; Peter et al.
1993). The fact that the perception component
of the difference score exhibits better reliability, convergent validity, and predictive validity
than the perception-minus-expectation difference score itself calls into question the empirical and practical usefulness of both the expectations scores as well as the difference scores
(Babakus and Boiler 1992; Cronin and Taylor
1992; Parasuraman et al. 1994). Moreover,
inconsistent definitions and/or interpretations
of the "expectation" construct lead to a number
of problems. The findings of Teas (1993) suggest that a considerable portion of the variance
in SERVQUAL scores is the result of measurement error induced by respondents' varying
interpretations of the expectations construct. In
addition, since expectations, as well as perceptions, are subject to revision based on
experience, concerns regarding the temporal
reliability of SERVQUAL difference scores are
raised. Furthermore, the dimensionality of the
SERVQUAL instrument is problematic.
It was reported that an analysis of ISSERVQUAL difference scores resulted in
either three, five, or seven factors depending
on the industry (Pitt et al. 1995). A portion of
the instability in the dimensionality of
SERVQUAL can be traced to the development
of the original instrument (i.e., Parasuraman et
al. 1988). Given these problems, users of the
SERVQUAL instrument should be cautioned to
assess the dimensionality implicit in their specific data set in order to determine whether the
hypothesized five-factor structure that has
been proposed (Parasuraman et al. 1988,
1991) is supported in their particular domain.
Moreover, if the item elimination and dimen-
204
MiS Quarterty/June 1997
sional collapsing utilized in the development of
SERVQUAL has resulted in a 22 paired-item
instrument that in fact does not measure all of
the theoretical dimensions of the service quality construct (i.e., content validity), then the use
of linear sums of those items for purposes of
measuring overall service quality is problematic as well (Galletta and Lederer 1989).
Many of the difficulties identified with the
SERVQUAL instrument also apply to the ISmodified versions of the SERVQUAL instrument used by Pitt et al. (1995) and by
Kettinger and Lee (1994). It appears that the
IS-SERVQUAL instrument, utilizing difference
scores, is neither a reliable nor a valid measurement for operationalizing the sen/ice quality construct for an information systems services provider. The IS versions of the
SERVQUAL instrument, much like the original
instrument (Parasuraman et al. 1988), suffer
from unstable dimensionality and are likely to
exhibit relatively poor predictive and convergent validity, as well as reduced reliability
when compared to non-difference scoring
methods. The existing literature provides
impressive evidence that the use of perception-minus-expectation difference scores is
problematic.
This critique of the perceptions-minus-expectations gap score should not be interpreted as a
claim that expectations are not important or
that they should not be measured. Qn the contrary, evidence indicates that both shouid and
wiii expectations are precursors to perceptions
but that perceptions alone directly influence
overall perceived sen/ice quality (Boulding et
al. 1993). Our criticism is not with the concept
of expectations per se, but rather the operationalization of service quality as a simple subtraction of an ambiguously defined expectations construct from the perceptions of the service actually delivered.
IS professionals have been known to raise
expectations to an unrealistically high level in
order to gain user commitment to new systems
and technologies. This can make it much more
difficult to deliver systems and services that will
be perceived as successful. According to the
model developed by Boulding et al. (1993), per-
SERVOUAL Concems
ceived service quality can be increased by
either improving actual performance or by managing expectations, specifically by reducing
should expectations and/or increasing will
expectations. These two different types of
expectations are not differentiated by the traditional SERVQUAL gap scoring method. A better
approach to understanding the impact of expectations on perceived service quality may be to
measure wiii and shou/af expectations separately and then compare them to a service quality
measure that utilizes either a direct response or
perceptions-only method of scoring.
Prescriptions for the use of
SERVQUAL
The numerous problems associated with the
use of difference scores suggest the need for
an alternative response format. Qne alternative is to use the perceptions-only method of
scoring. A review of the literature (Babakus
and Boiler 1992; Boulding et al. 1993; Cronin
and Taylor 1992; Parasuraman et al. 1991,
1994), indicates that perceptions-only scores
are superior to the perception-minus-expectation difference scores in terms of reliability,
convergent validity, and predictive validity. In
addition, the use of perceptions-only scores
reduces by 50% the number of items that must
be answered and measured (44 items to 22).
Moreover, the findings of Boulding et al. (1993)
suggest that expectations are a precursor to
perceptions and that perceptions alone directly
influence service quality.
A second alternative, suggested by Carman
(1990) and Babakus and Boiler (1992), is to
revise the wording of the SERVQUAL items
into a format combining both expectations and
perceptions into a single question. Such an
approach would maintain the theoretical value
of expectations and perceptions in assessing
service quality, as well as reduce the number
of questions to be answered by 50%. This
direct response format holds promise for overcoming the inherent problems with calculated
difference scores. Items with this format could
be presented with anchors such as "falls far
short of expectations" and "greatly exceeds
expectations." Qne study indicates that such
direct measures possess higher reliability and
Improved convergent and predictive validity
when compared to difference scores
(Parasuraman et al. 1994a).
Conclusion
Recognizing that we cannot manage what we
cannot measure, the increasingly competitive
market for IS services has emphasized the
need to develop valid and reliable measures of
the service quality of information systems services providers, both internal and external to
the organization. An important contribution to
this effort was made with the suggestion of a
IS-modified version of the SERVQUAL instrument (Pitt et al. 1995). However, earlier studies raised several important questions concerning the SERVQUAL instrument (e.g.,
Babakus and Boiler 1992; Carman 1990;
Cronin and Taylor 1992, 1994; Peter et al.
1993; Teas 1993, 1994). A review of the literature suggests that the use of difference scores
with the IS-SERVQUAL instrument results in
neither a valid nor reliable measure of perceived IS service quality. Those choosing to
use any version of the IS-SERVQUAL instrument are cautioned. Scoring problems aside,
the consistently unstable dimensionality of the
SERVQUAL and IS-SERVQUAL instruments
intimates that further research is needed to
determine the dimensions underlying the construct of service quality. Given the importance
of the service quality concept in IS theory and
practice, the development of improved measures of service quality for an information systems services provider deserves further theoretical and empirical research.
References
Babakus, E., and Boiler, G. W. "An Empirical
Assessment of the SERVQUAL Scale,"
Joumai of Business Research (24:3), 1992,
pp. 253-268.
Bagozzi, R., Davis, F., and Warshaw, P.
"Development and Test of a Theory of
MIS Ouarterly/June 1997
205
Research Note
Technological Learning and Usage,"
Human Relations, (45:)7, 1992, pp.
659-686.
Baroudi, J. and Orlikowski, W. "A Short-Form
Measure of User Information Satisfaction: A
Psychometric Evaluation and Notes on
Use," Journai of Management information
Systems (4:4), 1988, pp. 44-49.
Boulding, W., Kaira, A., Staelin, R., and
ZeithamI, V. A. "A Dynamic Process Model
of Service Ouality: From Expectations to
Behavioral Intentions," Journai of Mari<eting
Research (30:1), 1993, pp. 7-27.
Brensinger, R. P., and Lambert, D. M. "Can
the SERVQUAL Scale be Generalized to
Business-to-Business Services?" in
Knowiedge Deveiopment in Mari<eting,
AMA's Summer Educators Conference
Proceedings, Boston, MA, 1990, p. 289.
Brown, T. J., Churchill, G. A., and Peter, J. P.
"Improving the Measurement of Service
Ouality," Journai of Retaiiing (69:1), 1993,
pp. 127-139.
Bynner, J. "Factor Analysis and the Construct
Indicator Relationship," Human Reiations
(41:5), 1988, pp. 389-405.
Campbell, D. T., and Fiske, D. W. "Convergent
and Discriminant Validation by the
Multitrait-Multimethod Matrix," Psychoiogicai Buiietin (56), 1959, pp. 81-105.
Carman, J. M. "Consumer Perceptions of
Service Ouality: An Assessment of the
SERVOUAL Dimensions," Journai of
Retaiiing (66:1), 1990, pp. 33-55.
Cronbach, L. J. "Coefficient Alpha and the
Internal Structure of Tests," Psychometrii<a
(16), 1951, pp. 297-334.
Cronbach, L. J., and Furby, L. "How We
Should Measure 'Change'-or Should We?"
Psychoiogicai Buiietin (74), July 1970, pp.
74-73.
Cronbach, L. J., and Meehl, P. "Construct
Validity in Psychological Tests,"
Psychoiogicai Buiietin (21), 1955, pp.
281-302.
Cronin, J. J., and Taylor, S. A. "Measuring
Service Ouality: A Reexamination and
Extension," Journai of Mari<eting (56:3),
1992, pp. 55-68.
Cronin, J. J., and Taylor, S. A. "SERVPERF
versus
SERVOUAL:
Reconciling
Performance-Based and Perceptions206
MIS Ouarterly/June 1997
Minus-Expectations Measurements of
Service Ouality," Journai of Marketing
(58:1), 1994, pp. 125-131.
Dabholkar, P. A., Thorpe, I. D., and Rentz, J.
O. "A Measure of Service Ouality for Retail
Stores: Scale Development and Validation,"
Journai of the Academy of Marketing
Sciences (24:1), 1996, pp. 3-16.
DeLone, W., and McLean, E. "Information
Systems Success: The Ouest for the
Dependent Variable," information Systems
Research (3:1), March 1992, pp. 60-95.
Doll, W. J., and Torkzadeh, G. "End-User
Computing Involvement: Discrepancy
Model," Management Science (35:10),
1989, pp. 1151-1171.
Edwards. J. R. "Alternatives to Difference
Scores as Dependent Variables in the
Study of Congruence in Organizational
Research," Organizationai Behavior and
Human Decision Processes (64:3), 1995,
pp. 307-324.
Finn, D. W., and Lamb, C. W. "An Evaluation
of the SERVOUAL Scales in a Retailing
Setting," Advances in Consumer Research
(18), 1991, pp. 338-357.
Galletta, D. F., and Lederer, A. L. "Some
Cautions on the Measurement of User
Information Satisfaction," Decision
Sciences (20:3), 1989, pp. 419-439.
Gronroos, C. "A Service Ouality Model and its
Marketing Implications," European Joumai
of Marketing (18:4), 1984, 36-55.
Johns, G. "Difference Score Measures of
Organizational Behavior Variables: A
Critique," Organizationai Behavior and
Human Performance (27), 1981, pp.
443-463.
Kappelman, L. "Measuring User Involvement:
A Diffusion of Innovation Approach," The
DATA BASE for Advances in information
Systems, (26:2,3), 1995, pp. 65-83.
Kettinger, W. J., and Lee, C. C. "Perceived
Service Ouality and User Satisfaction with
the Information Services Function,"
Decision Sciences (25:5), 1994, pp.
737-766.
Kettinger, W., Lee C, and Lee, S. "Global
Measurements of Information Service
Ouality: A Cross-National Study," Decision
Sciences (26:5), 1995, pp. 569-585.
SERVQUAL Concems
Lord, F. M. "The Utilization of Unreliable
Difference Scores," Journal of Educationai
Psychoiogy (49:3), 1958, pp. 150-152.
Nunnaliy, J. Psychometric Theory, 2nd ed.,
McGraw-Hill, New York, 1978.
Oliver, R. L. "A Cognitive Modei of the
Antecedents and Consequences of
Satisfaction Decisions," Journai of
Mari<eting Research (17:3), 1980, pp.
460-469.
Parasuraman, A., ZeithamI, V. A., and Berry,
L. L. "A Conceptual Model of Service
Quality and its Implications for Future
Research," Journal of Mari<eting (49:4),
1985, pp. 41-50.
Parasuraman, A., ZeithamI, V. A., and Berry,
L L. "SERVQUAL: A Multiple Item Scale for
Measuring Consumer Perceptions of
Service Quality," Journai of Retaiiing (64:1),
1988, pp. 12-40.
Parasuraman, A., ZeithamI, V. A., and Berry,
L. L. "Refinement and Reassessment of the
SERVQUAL Scale," Journai of Retaiiing
(67:4), 1991, pp. 420-450.
Parasuraman, A., ZeithamI, V. A., and Berry,
L. L. "Alternative Scales for Measuring
Service
Quality:
A
Comparative
Assessment Based on Psychometric and
Diagnostic Criteria," Journai of Retaiiing
(70:3), 1994a, pp. 201-229.
Parasuraman, A., ZeithamI, V. A., and Berry,
L. L. "Reassessment of Expectations as a
Comparison in Measuring Service Quality:
Implications for Further Research," Journai
ofMari<eting(5a:^), 1994b, pp. 111-124.
Peter, J. P., Churchill, G. A., and Brown, T. J.
"Caution in the Use of Difference Scores in
Consumer Research," Journai of Consumer
Research (19:1), 1993, pp. 655-662.
Pitt, L. F., Watson, R. T., and Kavan, C. B.
"Service Quality: A Measure of Information
Systems Effectiveness," MiS Ouarteriy
(19:2), June 1995, pp. 173-187.
Prakash, V., and Lounsbury, J. W. "A
Reliability Problem in the Measurement of
Disconfirmation of Expectations," in
Advances in Consumer Research, Vol. 10,
Tybout, A.M. and Bagozzi, R. P. (eds.)
Association for Consumer Research, Ann
Arbor, Ml, 1983, pp. 244-249.
Teas, R. K. "Expectations, Performance
Evaluation and Consumer's Perception of
Quality," Journai of Mari<eting (57:4), 1993,
pp. 18-34.
Teas, R. K. "Expectations as a Comparison
Standard in Measuring Service Quality: An
Assessment of a Reassessment," Journai
of Mari<eting (58:^), 1994, pp. 132-139.
Van Dyke, T. P., and Popelka, M. E.
"Development of a Quality Measure for an
Information Systems Provider," in The
Proceedings of the Decision Sciences
institute (3), 1993, pp. 1910-1912.
Wall, T. D., and Payne, R. "Are Deficiency
Scores Deficient?" Journai of Appiied
Psychoiogy (56:3), 1973, pp. 322-326.
ZeithamI, V. A., Berry, L., and Parasuraman,
A. "The Nature and Determinant of
Customer Expectation of Service Quality,"
Journai of the Academy of Mariteting
Science (2V.\), 1993, pp. 1-12.
About the Authors
Thomas P. Van Dyke is an assistant professor of information systems and technologies at
Weber State University and recently completed his a doctoral dissertation in business computer information systems at the University of
North Texas. His current research interests
include the effects of alternative presentation
formats on biases and heuristics in human
decision making and MIS evaluation and
assessment. He has published articles in The
Journai of Computer information Systems and
Proceedings of the Decision
Sciences
institute.
Leon A. Kappelman is an associate professor
of business computer information systems in
the College of Business Administration at the
University of North Texas and associate director of the Center for Quality and Productivity.
After a successful career in industry, he
received his Ph.D. (1990) in management
information systems from Georgia State
University. He has pubiished over two dozen
journal articles. His work has appeared in
Communications of the ACM, Journai
of Management information
Systems,
DATA BASE for Advances in information
Systems, Journai of Systems Management,
Journai of Computer information Systems,
MIS Quarteriy/June 1997 207
Research Note
InformationWeek,
National
Productivity
Review, Project Management Joumal, Joumal
of Information Technology Management,
Industrial Management, as well as other journals and conference proceedings. He authored
Information Systems for Managers, McGrawHill (1993). His current research interests
include the management of information assets,
information systems (IS) development and
implementation, IS project management, and
MIS evaluation and assessment. He is cochair of the Society for Information
Management's (SIM) Year 2000 Working
Group.
Victor R. Prybutok is the director of the
University of North Texas Center for Quality
208
MIS Quarteriy/June 1997
and Productivity and an associate professor of
management science in the Business
Computer Information System Department. He
has published articles in Technometrics,
Operations Research, Economic Quality
Control, and Ouaiity Progress, as well as other
journals and conference proceedings. Dr.
Prybutok is a senior member of the American
Society for Quaiity Control (ASQC), an ASQC
certified quality engineer, a certified quality
auditor, and a 1993 Texas quaiity award
examiner. His current research interests
include project management, assessment of
quality programs, neural networks, and MIS
evaluation and assessment.