Measuring Hotel Service Quality From Online Consumer Reviews: A Proposed Method
Measuring Hotel Service Quality From Online Consumer Reviews: A Proposed Method
Measuring Hotel Service Quality From Online Consumer Reviews: A Proposed Method
net/publication/273383876
CITATIONS READS
9 1,139
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Michelle Bonera on 11 December 2015.
Abstract This paper proposes a new method to measure hotel service quality
from online consumer reviews and ratings. In essence, a word frequency analysis
is performed on comments collected from a website such as TripAdvisor, and
these frequencies are used to obtain a score for each of the following dimensions:
Room, Facilities, Surroundings, Employees and Reliability. A comparison of
scores can be made based on the ratings that consumers give, and/or can be studied
over time. The method offers researchers and hotel managers a useful new tool,
which can guide quality improvement efforts and help focus marketing commu-
nication. In this paper the development of the approach is described, and a short
example is presented where the method is applied on a single hotel.
1 Introduction
Imagine that you are a hotel manager, and you care about your customers. You
actively collect and read feedback forms, and each year you perform a service
quality survey. But you wonder: with so many people leaving reviews of my hotel
2 Theory
Although service quality no longer gets as much attention from researchers as one
or two decades ago, its importance cannot be overstated. It is strongly linked with
word-of-mouth and customer satisfaction, and indirectly with purchase intention
and customer loyalty (Cronin and Taylor 1992; Harrison-Walker 2001; Suresh-
chandar et al. 2002). Research by Chang and Chen (1998) indicates that service
quality is an important antecedent to business profitability. Most of the research
conducted regarding service quality was conducted in the 1990s and early 2000s,
and resulted in a large number of conceptual models and measurement instruments
(see Seth et al. 2005).
Service quality is often defined as an attitude; the perceived difference (or gap)
between expectation and performance (Parasuraman et al. 1985; Seth et al. 2005).
Its most popular measurement scale is SERVQUAL (Parasuraman et al. 1988,
2004), which is based on the premise that service quality consists of five dimen-
sions: tangibles, reliability, responsiveness, assurance and empathy. Although
SERVQUAL is not without criticism because of both conceptual and methodo-
logical concerns (Carman 1990; Babakus and Boller 1992; Buttle 1996), it is the
370 E. Boon et al.
most commonly used scale for service quality and it therefore seems appropriate to
consider it as one of our options to divide TripAdvisor comments into different
service quality dimensions.
SERVQUAL has also been used in research that focused on the hospitality
industry, with varying results. Saleh and Ryan (1991) applied SERQUAL suc-
cessfully in a study of consumers’ perception of service quality in hotels, and
Akbaba (2006) confirmed that there were five dimensions but found a different
factor structure. A number of researchers developed adapted measurement scales,
including HOLSERV (Mei et al. 1999) and LODGSERV (Knutson et al. 1990) for
hotels and DINESERV (Stevens et al. 1995) for restaurants.
3 Methodology
To develop the method three studies were performed in sequence. The purpose of
the first study was to select the service quality scale that would be appropriate, i.e.
whose dimensions are most distinctive and meaningful when they are applied to
consumer reviews. To test the scales three researchers separately tried to assign the
dimensions to 48 TripAdvisor comments that had been selected randomly (from
12 different hotels in 4 different English-speaking locations). Afterwards the
results from different researchers were compared and discussed to select the
appropriate scale to continue with. The second study was very similar to the first,
as again the three researchers categorized the same 48 comments along the
dimensions of the chosen scale. However, this time the purpose was to explore
which words in the reviews were representative of each dimension.
The third study was different: a large database was used of over 10,000
TripAdvisor comments for 90 hotels in Northern Italy, which was available from
another (unpublished) study. The comments were analyzed using RapidMiner, an
open-source data mining program (Rapid-I 2013). First, with RapidMiner the
words of all comments were ranked based on their overall frequency (or total word
count). Next, each of the three researchers assigned each of these top-1000 words
to a dimension. Words that were irrelevant were ignored, and words that were
ambiguous were isolated for later discussion.
4 Results
This study made it apparent that HOLSERV is a much better instrument for our
purpose than SERVQUAL, because the three dimensions are much more dis-
tinctive. The ambiguity between employees and reliability was fixed as follows:
comments that referred to the staff and the service in a general way were assigned
to employees, while reliability referred to specific problems or actions where the
staff made extra effort to help their guests. This distinction was again tested on the
48 comments, and led to consist scores between researchers.
Another conclusion was drawn from this study: the range of subjects covered by
tangibles is very wide, possibly too much so. Nearly all comments (47 out of 48)
mentioned tangibles. It includes things that are within a hotel manager’s imme-
diate control, such as breakfast and cleanliness, but it can also refer to the style of
the building and the hotel’s location. Therefore the decision was made to test an
adapted version of the HOLSERV scale in which the dimension tangibles was
broken down into three others: room, facilities and surroundings. The researchers
found that scoring along these dimensions was simple because they are very
distinctive, and that it made the results more meaningful. Therefore this adapted
scale was used for consecutive studies, and it will be referred to as HOLSERV
Plus, with the dimensions room, facilities, surroundings, employees and reliability.
Table 3 shows a definition of each dimension.
After the scale was selected, the next step was to test whether it is possible to score
comments based on the chosen dimensions, and to identify which words in the
comments are important when dimensions are assigned. Table 4 shows the results
for the same sample comments that were used earlier. In comment 1 the word
‘breakfast’ was the only one that was specific to a particular dimension, namely
facilities. In comment 2 several words were found that allowed dimensions to be
assigned to it: ‘food’ related to facilities, ‘surroundings’ to surroundings, ‘service’
to employees and ‘special treatment’ to reliability.
374 E. Boon et al.
The three researchers scored the comments quite consistently, and in most cases
reached an agreement quite easily when the differences were discussed.
However, it was evident from this study that for many words it is ambiguous to
which dimension it belongs. For example, the words ‘comfortable’ and ‘clean’ can
relate to the room but also to the hotel as a whole, and the word ‘course’ can be
related to the restaurant (the dimension facilities), but can also be from the general
expression ‘of course’. In these cases the word was looked up in multiple com-
ments to see how it was most commonly used (in these particular cases leading to
‘comfortable’ and ‘clean’ being assigned to room, and ‘course’ to facilities). In
some cases a deeper discussion was necessary; for example the word ‘noisy’
normally related to noise outside the hotel, but researchers agreed that although it
is dependent on the hotel’s location (i.e. surroundings), hotel managers have
influence over the noise level, e.g. by installing double glazing or closing a terrace
early, so it was added to the dimension facilities.
Study 2 gave researchers sufficient confidence that a definitive list for each
dimension could be created, and therefore the next study was conducted.
This study was performed with a large word frequency list generated from
TripAdvisor comments for hotels in Northern Italy, which was available from
another (unpublished) study. The top-1000 words of this list were each assigned to
a specific dimension; those that could not be assigned would either be labelled
ambiguous (for further discussion) or unclassified (no discussion necessary).
Although RapidMiner removes stop words, many words in the top-1000 were
irrelevant. Only 241 words were considered for the dimensions, out of which 44
were too ambiguous to allocate (e.g. water, park, and entertainment could refer to
different dimensions). The remaining 197 words were assigned to the dimensions
Measuring Hotel Service Quality from Online Consumer Reviews 375
as shown in Table 5; the 10 highest-ranking words for each dimension are also
shown.
Although this analysis resulted in five word lists that seem to satisfy our needs,
a number of things were found that need closer scrutiny. First, the number of
words per dimension varies greatly, from 63 for the dimension facilities to 13 for
reliability. This is not a problem in itself, but it does need to be considered when
results are presented. In particular, results will need to be presented as indices or
percentages rather than total word counts to make comparison between the
dimensions possible. Second, the two highest-frequency words for the dimension
surroundings are Lake and Garda, because the hotels in the database are located at
the Garda Lake. Obviously this same word list cannot be used for hotels at another
location, so either a generic list should be used (which means that location-specific
information will be lost) or a list should be created for each hotel or region
individually (which reduces consistency between studies). Finally, the number of
words in the dimension reliability is very low and they are very similar in
meaning, which indicates that this dimension may not offer very rich data to
researchers. This causes some concern, and should be a focus point during further
application and testing of the dimension word lists.
5 Discussion
Based on the three studies, the conclusion was drawn that, although there are a
number of concerns that need to be addressed, the method should offer researchers
and hotel managers a useful tool to measure service quality. The five dimensions
that were created are distinctive and meaningful, and when the method is applied
to individual hotels it should help managers to focus quality improvement efforts
on the right issues.
376 E. Boon et al.
Although the initial studies were performed by hand, the idea is that this method
will ultimately be performed by a computer algorithm; an Internet crawler can
copy the comments from the website, the program can perform a word frequency
count, and it can use the standard word list to assign comments to specific
dimensions. Although the researcher will have to monitor and interpret the results,
reliability (i.e. consistency) should not be a major concern.
Validity is a more challenging criterion. The method seems to have face
validity, since it is based on the HOLSERV instrument and its dimensions are
straightforward and logical, but more research (e.g. comparison with survey
results) is necessary to prove a higher level of validity.
In this example the method was used for a specific hotel to test how easily it could
be applied and how informative it is for researchers and managers. The hotel that
was chosen is the Best Western Plus Academy Plaza Hotel in Dublin because it is
near the ENTER conference location.
Only the 500 most recent comments (out of 1,649) were used, which were
written between 19 February 2012 and 25 June 2013. The comments were copied
manually from TripAdvisor to Excel. The data mining program RapidMiner was
used to calculate word frequencies, both overall and for each rating level (1 to 5
stars). The words were then assigned to a dimension based on the lists from study
3, and the word frequencies for each dimension were summed. Table 6 shows the
indices for each rating level and each dimension versus the average for all ratings
together. The 1-star and 2-star ratings are grouped together because of the low
number of comments for these levels (9 and 16 respectively). Following this, the
word frequency lists (not shown) were inspected to understand the reason behind
noticeably high or low indices.
It is apparent from the table that reviewers who gave the hotel a 1, 2 and 3-star
rating talked a lot about the room, and the frequency list showed that they used the
words ‘tiny’ and ‘small’ much more than average. In particular those who gave a 1
or 2-star rating focused a lot on reliability and employees, indicating that they were
unhappy with the service. The words ‘manager’ and ‘person’ were high up in their
word list which suggests that they had particular issues that were not resolved to
their satisfaction. Although these low-rating reviewers focused less on the hotel’s
facilities, the words ‘parking’ and ‘lobby’ were high on their lists, indicating that
these could be areas of improvement. In contrast, consumers who gave a 4-star
rating seemed particularly pleased with the location of the hotel (surroundings).
Perhaps surprisingly, those who left a 5-star rating focused less on surroundings
but much more on employees, using words such as ‘friendly’, ‘helpful’ and
‘efficient’.
Measuring Hotel Service Quality from Online Consumer Reviews 377
6 Conclusion
This paper proposes a new method to measure service quality from online con-
sumer reviews and ratings. The essence of this method is that a word frequency
analysis is performed with comments that are collected from websites such as
TripAdvisor, and these comments are then assigned to particular service quality
dimensions. The dimensions are based on the HOLSERV scale (Mei et al. 1999),
but were adapted to allow distinction between different tangibles. The five
resulting dimensions are: room, facilities, surroundings, employees and reliability.
In the studies and the example it was shown that, if applied properly, the
method offers hotel managers useful information that can be used to identify
quality improvement points and to guide communication strategy. The index
scores provide insight in what consumers who write reviews focus on, both for low
and high ratings. The word frequency list then offers additional depth to under-
stand consumers’ reasoning. How often the method can be used to measure service
quality depends on the number of reviews that are available on TripAdvisor, which
varies per hotel. A superficial assessment of TripAdvisor shows that hotels receive
between 50 and 500 reviews per year, so an annual survey seems manageable for
most.
In addition to its usefulness to managers, the proposed methodology provides
academic researchers with a new way to assess how consumers prioritize different
dimensions of service quality in online reviews, and offers the opportunity to
identify differences based on for example geographical location or price category.
A number of limitations need to be acknowledged. Although the results pre-
sented in this paper are promising, the method will need to be applied and tested a
number of times in different settings to show how robust it is and to learn how the
information that it offers can be used to guide strategic planning. Although the
378 E. Boon et al.
standardized word lists and automated word frequency counts make the method
reliable, future research should address possible validity concerns. Additionally,
the proposed methodology used keywords to categorize comments into dimen-
sions, but made no distinction between positive and negative words (e.g. ‘clean’
vs. ‘dirty’). We suggest that the reviewer’s rating (from one to five) allows us to
assess whether the experience was positive or negative, but it is possible that
refinement of the methodology will be necessary in the near future.
Future research could address a number of these shortcomings, for example by
evaluating validity in different contexts (e.g. city vs. resort, business vs. tourism),
or comparing results with those of service quality surveys. The methodology offers
a range of potential applications, for example by looking at particular regions or
hotel types. Finally, variants for this methodology could be developed for different
industries, such as restaurants and other service providers.
Nevertheless, the method seems to offer researchers and hotel managers an
efficient new tool. Although it should not replace regular service quality surveys,
this method can be used in tandem. The analysis is fast and easy to carry out, and
may offer hotel managers who don’t have the budget or knowledge to perform a
survey an alternative approach.
References
Akbaba, A. (2006). Measuring service quality in the hotel industry: A study in a business hotel in
Turkey. International Journal of Hospitality Management, 25(2), 170–192.
Anderson, E. W. (1998). Customer satisfaction and word of mouth. Journal of Service Research,
1(1), 5–17.
Babakus, E., & Boller, G. W. (1992). An empirical assessment of the SERVQUAL scale. Journal
of Business Research, 24(3), 253–268.
Buttle, F. (1996). SERVQUAL: Review, critique, research agenda. European Journal of
Marketing, 30(1), 8–32.
Carman, J. M. (1990). Consumer perceptions of service quality: An assessment of the
SERVQUAL dimensions. Journal of Retailing, 66, 33–55.
Chang, T. Z., & Chen, S. J. (1998). Market orientation, service quality and business profitability:
a conceptual model and empirical evidence. Journal of Services Marketing, 12(4), 246–264.
Cronin Jr, J. J., & Taylor, S. A. (1992). Measuring service quality: A reexamination and
extension. ****The Journal of Marketing, 56, 55–68.
Dellarocas, C. (2003). The digitization of word of mouth: Promise and challenges of online
feedback mechanisms. Management Science, 49(10), 1407–1424.
Harrison-Walker, L. J. (2001). The measurement of word-of-mouth communication and an
investigation of service quality and customer commitment as potential antecedents. Journal of
Service Research, 4(1), 60–75.
Haywood, K. M. (1989). Managing word of mouth communications. Journal of Services
Marketing, 3(2), 55–67.
Hennig-Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word-of-
mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on
the internet? Journal of interactive marketing, 18(1), 38–52.
Measuring Hotel Service Quality from Online Consumer Reviews 379
Herr, P. M., Kardes, F. R., & Kim, J. (1991). Effects of word-of-mouth and product-attribute
information on persuasion: An accessibility-diagnosticity perspective. Journal of Consumer
Research, 454–462.
Knutson, B., Stevens, P., Wullaert, C., Patton, M., & Yokoyama, F. (1990). LODGSERV: A
service quality index for the lodging industry. Journal of Hospitality and Tourism Research,
14(2), 277–284.
Litvin, S. W., Goldsmith, R. E., & Pan, B. (2008). Electronic word-of-mouth in hospitality and
tourism management. Tourism Management, 29(3), 458–468.
Mei, A. W. O., Dean, A. M., & White, C. J. (1999). Analysing service quality in the hospitality
industry. Managing Service Quality, 9(2), 136–143.
O’Connor, P. (2008). User-generated content and travel: A case study on Tripadvisor.com. In
Information and Communication Technologies in Tourism 2008 (pp. 47–58). Springer,
Vienna.
O’Connor, P. (2010). Managing a hotel’s image on TripAdvisor. Journal of Hospitality
Marketing and Management, 19(7), 754–772.
Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (1985). A conceptual model of service quality
and its implications for future research. The Journal of Marketing, 41–50.
Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (1988). Servqual. Journal of Retailing, 64(1),
12–37.
Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (2004). Refinement and reassessment of the
SERVQUAL scale. Journal of Retailing, 67(4), 114.
Rapid-I (2013). RapidMiner overview. url:http://rapid-i.com/content/view/181/190/. Accessed 15
June 2013.
Richins, M. L. (1983). Negative word-of-mouth by dissatisfied consumers: a pilot study. The
Journal of Marketing, 68–78.
Saleh, F., & Ryan, C. (1991). Analysing service quality in the hospitality industry using the
SERVQUAL model. Service Industries Journal, 11(3), 324–345.
Seth, N., Deshmukh, S. G., & Vrat, P. (2005). Service quality models: A review. International
Journal of Quality and Reliability Management, 22(9), 913–949.
Stevens, P., Knutson, B., & Patton, M. (1995). DINESERV: A tool for measuring service quality
in restaurants. Cornell Hotel and Restaurant Administration Quarterly, 36(2), 56–60.
Sureshchandar, G. S., Rajendran, C., & Anantharaman, R. N. (2002). The relationship between
service quality and customer satisfaction–a factor specific approach. Journal of Services
Marketing, 16(4), 363–379.
Trusov, M., Bucklin, R. E., & Pauwels, K. (2009). Effects of word-of-mouth versus traditional
marketing: Findings from an internet social networking site. Journal of Marketing, 73(5),
90–102.
Vermeulen, I. E., & Seegers, D. (2009). Tried and tested: The impact of online hotel reviews on
consumer consideration. Tourism Management, 30(1), 123–127.
Ye, Q., Law, R., & Gu, B. (2009). The impact of online user reviews on hotel room sales.
International Journal of Hospitality Management, 28(1), 180–182.
Zhang, Z., Ye, Q., Law, R., & Li, Y. (2010). The impact of on the online popularity of
restaurants: A comparison of consumer reviews and editor reviews. International Journal of
Hospitality Management, 29(4), 694–700.