research-article

Handling Imbalance in Fraudulent Reviewer Detection based on Expectation Maximization and KL Divergence

Authors:

Qiang WangAuthors Info & Claims

WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Pages 421 - 427

https://doi.org/10.1145/3498851.3498989

Published: 11 April 2022 Publication History

Abstract

Online review fraud and review manipulation hurt the profits of stakeholders and undermine the value of online reviews. For this reason, it is critical to detect online review fraud and fraudulent reviewers effectively for the development of e-commerce. Extent studies propose various fraud detection techniques to detect fraudulent reviewers. However, most of these studies do not handle the data imbalance problem in fraudulent reviewer detection. To fill this research gap, this paper proposes a novel approach to detect fraudulent reviewers in handling the data imbalance based on Expectation Maximization (EM) and Kullback–Leibler (KL) divergence (called EMKL). We first use the expectation maximization algorithm to model the latent topic distributions of reviewers on the review features. Then, we adopt the Kullback–Leibler divergence to measure the similarities of reviewers based on their topic distributions to detect fraudulent reviewers. The experiment on Yelp dataset shows that the EMKL approach has a good performance in detecting fraudulent reviewers. In addition, the proposed EMKL method performs better than the performance of state-of-the-art techniques.

References

[1]

Leman Akoglu, Rishi Chandy, and Christos Faloutsos. 2013. Opinion fraud detection in online reviews by network effects. In Proceedings of the 7th International Conference on Weblogs and Social Media, ICWSM 2013, 2–11.

[2]

Arno De Caigny, Kristof Coussement, and Koen W. De Bock. 2018. A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur. J. Oper. Res. 269, 2 (2018), 760–772.

[3]

Canvas8. 2020. The critical role of reviews in Internet trust. Retrieved from https://business.trustpilot.com/guides-reports/build-trusted-brand/the-critical-role-of-reviews-in-internet-trust

[4]

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data Via the EM Algorithm . J. R. Stat. Soc. Ser. B 39, 1 (1977), 1–22.

[5]

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, 50–57.

Digital Library

[6]

S. Kullback and R. A. Leibler. 1951. On Information and Sufficiency. Ann. Math. Stat. 22, 1 (1951), 79–86.

[7]

Naveen Kumar, Deepak Venugopal, Liangfei Qiu, and Subodha Kumar. 2018. Detecting Review Manipulation on Online Platforms with Hierarchical Supervised Learning. J. Manag. Inf. Syst. 35, 1 (2018), 350–380.

[8]

Naveen Kumar, Deepak Venugopal, Liangfei Qiu, and Subodha Kumar. 2019. Detecting Anomalous Online Reviewers: An Unsupervised Approach Using Mixture Models. J. Manag. Inf. Syst. 36, 4 (2019), 1313–1346.

[9]

Wei Chao Lin, Chih Fong Tsai, Ya Han Hu, and Jing Shang Jhang. 2017. Clustering-based undersampling in class-imbalanced data. Inf. Sci. (Ny). 409–410, (2017), 17–26.

[10]

Michael Luca. 2012. Reviews, Reputation, and Revenue: The Case of Yelp.Com.

[11]

Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and yelp review fraud. Manage. Sci. 62, 12 (2016), 3412–3427.

Digital Library

[12]

Wenjuan Luo, Fuzhen Zhuang, Weizhong Zhao, Qing He, and Zhongzhi Shi. 2015. QPLSA: Utilizing quad-tuples for aspect identification and rating. Inf. Process. Manag. 51, 1 (2015), 25–41.

[13]

Edward C. Malthouse and Georgios Askalidis. How Do Rating and Reviews Affect Conversion Rates on E-Commerce Sites? Retrieved from https://spiegel.medill.northwestern.edu/from-reviews-to-revenue/

[14]

Bundit Manaskasemsak, Jirateep Tantisuwankul, and Arnon Rungsawang. 2021. Fake review and reviewer detection through behavioral graph partitioning integrating deep neural network. Neural Comput. Appl. (2021).

[15]

Dina Mayzlin, Yaniv Dover, and Judith Chevalier. 2014. Promotional reviews: An empirical investigation of online review manipulation. American Economic Review 104, 2421–2455.

[16]

Jay F. Nunamaker, Judee K. Burgoon, and Justin Scott Giboney. 2016. Special Issue: Information Systems for Deception Detection. Journal of Management Information Systems 33, 327–331.

[17]

Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 985–994.

Digital Library

[18]

Greg Sterling. Study finds 61 percent of electronics reviews on Amazon are ‘fake.’ Retrieved from https://martech.org/study-finds-61-percent-of-electronics-reviews-on-amazon-are-fake/

[19]

Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. 2012. Identify online store review spammers via social review graph. ACM Transactions on Intelligent Systems and Technology 3, 1–21.

Digital Library

[20]

Wenyang Wang and Dongchu Sun. 2021. The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. (Ny). 563, (2021), 358–374.

[21]

Zhuo Wang, Songmin Gu, and Xiaowei Xu. 2018. GSLDA: LDA-based group spamming detection in product reviews. Appl. Intell. 48, 9 (2018), 3094–3107.

Digital Library

[22]

Dongsong Zhang, Lina Zhou, Juan Luo Kehoe, and Isil Yakut Kilic. 2016. What Online Reviewer Behaviors Really Matter? Effects of Verbal and Nonverbal Behaviors on Detection of Fake Online Reviews. J. Manag. Inf. Syst. 33, 2 (2016), 456–481.

[23]

Fuzhi Zhang, Xiaoyan Hao, Jinbo Chao, and Shuai Yuan. 2020. Label propagation-based approach for detecting review spammer groups on e-commerce websites. Knowledge-Based Syst. 193, (2020).

[24]

[Wen Zhang, Chaoqi Bu, Taketoshi Yoshida, and Siguang Zhang. 2016. CoSpa: A co-training approach for spam review identification with support vector machine. Inf. 7, 1 (2016).

[25]

Wen Zhang, Yuhang Du, Taketoshi Yoshida, and Qing Wang. 2018. DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Inf. Process. Manag. 54, 4 (2018), 576–592.

Digital Library

[26]

Wen Zhang, Qiang Wang, Xiangjun Li, Taketoshi Yoshida, and Jian Li. 2019. DCWord: A Novel Deep Learning Approach to Deceptive Review Identification by Word Vectors. J. Syst. Sci. Syst. Eng. 28, 6 (2019), 731–746.

Cited By

Liu JQuan PZhang W(2024)A Study on Fake Review Detection Based on RoBERTa and Behavioral FeaturesProcedia Computer Science10.1016/j.procs.2024.08.131242(1323-1330)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.08.131

Handling Imbalance in Fraudulent Reviewer Detection based on Expectation Maximization and KL Divergence
1. Information systems
  1. Information systems applications

Recommendations

The role of KL divergence in anomaly detection
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

We study the role of Kullback-Leibler divergence in the framework of anomaly detection, where its abilities as a statistic underlying detection have never been investigated in depth. We give an in-principle analysis of network attack detection, showing ...
Segmentation of clusters by template rotation expectation maximization
Highlights
- We introduce TREM, an algorithm for resolving clusters of nearly identical objects.
Abstract
To solve the task of segmenting clusters of nearly identical objects we here present the template rotation expectation maximization (TREM) approach which is based on a generative model. We explore both a general purpose optimization ...
Estimating divergence functionals and the likelihood ratio by convex risk minimization

We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f-divergences, which allows the problem of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

December 2021

541 pages

ISBN:9781450391870

DOI:10.1145/3498851

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
Beijing Youth Talent Fund

Conference

WI-IAT '21

Sponsor:

SIGAI

WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence

December 14 - 17, 2021

VIC, Melbourne, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
64
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu JQuan PZhang W(2024)A Study on Fake Review Detection Based on RoBERTa and Behavioral FeaturesProcedia Computer Science10.1016/j.procs.2024.08.131242(1323-1330)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.08.131

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents