short-paper

Fatten Features and Drop Wastes: Finding Repeaters' Reviews by Feature Generation and Feature Selection

Authors:

Naoki Muramoto,

Hiromi Shiraga,

Hiroaki OhshimaAuthors Info & Claims

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

Pages 161 - 165

https://doi.org/10.1145/3366030.3366133

Published: 22 February 2020 Publication History

Abstract

In this paper, we proposed a method for determining whether a given restaurant review comment is a repeater's review, or not. We often use restaurant review sites to decide which restaurant to go to. When we read a restaurant review comment, we can know whether the reviewer is a repeater of the restaurant. If a certain restaurant has many repeaters, the restaurant must be great. However, restaurant review sites usually do not provide a "revisit rate". Therefore, we tackle a problem for determining whether a review is a repeater's review, or not. There are many sentences in a review comment that are completely not useful for determining whether the review is a repeater review, such as what was ordered, what was delicious, or how was the price. To confront such difficulties, we have taken the following approach. First, very various features are extracted from review comments so as not to miss the features that represent repeaters' reviews. Next, from the very various features, only the necessary features that really contribute to the classification is selected by a feature selection method. Finally, classification is performed using a classifier. We have implemented the proposed method using super-CWC [12], a state-of-the-art feature selection method, and SVM. The experimental results show that the proposed method is better than other methods.

References

[1]

Li-Chen Cheng, Judy C.R.Tseng, and Tsai-Yu Chung. 2017. Case Study of Fake Web Reviews. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2017). 706--709.

Digital Library

[2]

Kushal Dave, Steve Lawrence, and David M. Pennock. 2003. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the 12th International Conference on World Wide Web (WWW 2003). 519--528.

[3]

Xiaowen Ding and Bing Liu. 2007. The Utility of Linguistic Rules in Opinion Mining. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007). 811--812.

Digital Library

[4]

Amir Fayazi, Kyumin Lee, James Caverlee, and Anna Squicciarini. 2015. Uncovering Crowdsourced Manipulation of Online Reviews. In Proceedings of the 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015). 233--242.

Digital Library

[5]

Eric Gilbert and Karrie Karahalios. 2010. Understanding Deja Reviewers. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (CSCW 2010). 225--228.

Digital Library

[6]

Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying Conditional Random Fields to Japanese Morphological Analysis. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004). 230--237.

[7]

Theodoros Lappas, Mark Crovella, and Evimaria Terzi. 2012. Selecting a Characteristic Set of Reviews. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012). 832--840.

Digital Library

[8]

Ankan Mullick, Surjodoy Ghosh D, Shivam Maheswari, Srotaswini Sahoo, Suman Kalyan Maity, Soumya C, and Pawan Goyal. 2018. Identifying Opinion and Fact Subcategories from the Social Web. In Proceedings of the 2018 ACM Conference on Supporting Groupwork (GROUP2018). 145--149.

Digital Library

[9]

Michael P. O'Mahony and Barry Smyth. 2009. Learning to Recommend Helpful Hotel Reviews. In Proceedings of the Third ACM Conference on Recommender Systems (RecSys 2009). 305--308.

Digital Library

[10]

Deanna Osman, John Yearwood, and Peter Vamplew. 2007. Using Corpus Analysis to Inform Research into Opinion Detection in Blogs. In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics (AusDM 2007). 65--75.

Digital Library

[11]

Kilho Shin, Danny Fernandes, and Seiya Miyazaki. 2011. Consistency measures for feature selection: a formal definition, relative sensitivity comparison and a fast algorithm. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011). 1491--1497.

[12]

Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto, and Dave Shepard. 2015. Super-CWC and super-LCC: Super Fast Feature Selection Algorithms. In Proceedings of the 2015 IEEE International Conference on Big Data (BigData 2015). 1--7.

Digital Library

[13]

Phong Minh Vu, Hung Viet Pham, Tam The Nguyen, and Tung Thanh Nguyen. 2016. Phrase-based Extraction of User Opinions in Mobile App Reviews. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). 726--731.

Digital Library

[14]

Derry Tanti Wijaya and Stéphane Bressan. 2008. A Random Walk on the Red Carpet: Rating Movies with User Reviews and Pagerank. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008). 951--960.

Digital Library

[15]

José P. Zagal, Amanda Ladd, and Terris Johnson. 2009. Characterizing and Understanding Game Reviews. In Proceedings of the 4th International Conference on Foundations of Digital Games (FDG 2009). 215--222.

Digital Library

Index Terms

Fatten Features and Drop Wastes: Finding Repeaters' Reviews by Feature Generation and Feature Selection
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

Features: the more the better
ISCGAV'08: Proceedings of the 8th conference on Signal processing, computational geometry and artificial vision

In pattern recognition problems, it is usually recommended to extract a low number of features in order to avoid the computational cost. However, using today's computer capabilities we are able to extract and process more features than before. In this ...
Class dependent feature scaling method using naive Bayes classifier for text datamining

The problem of feature selection is to find a subset of features for optimal classification. A critical part of feature selection is to rank features according to their importance for classification. The naive Bayes classifier has been extensively used ...
Evaluation of Features on Sentimental Analysis
Abstract
Sentimental analysis is the method of finding sentiment such as positive or negative from a text data. In this paper we are using some feature selection techniques such as Mutual information, Chi-Square, Information gain and TF-idf to select ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

December 2019

709 pages

ISBN:9781450371797

DOI:10.1145/3366030

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

JKU: Johannes Kepler Universität Linz
@WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

iiWAS2019

iiWAS2019: The 21st International Conference on Information Integration and Web-based Applications & Services

December 2 - 4, 2019

Munich, Germany

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
53
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents