Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3498851.3498989acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Handling Imbalance in Fraudulent Reviewer Detection based on Expectation Maximization and KL Divergence

Published: 11 April 2022 Publication History

Abstract

Online review fraud and review manipulation hurt the profits of stakeholders and undermine the value of online reviews. For this reason, it is critical to detect online review fraud and fraudulent reviewers effectively for the development of e-commerce. Extent studies propose various fraud detection techniques to detect fraudulent reviewers. However, most of these studies do not handle the data imbalance problem in fraudulent reviewer detection. To fill this research gap, this paper proposes a novel approach to detect fraudulent reviewers in handling the data imbalance based on Expectation Maximization (EM) and Kullback–Leibler (KL) divergence (called EMKL). We first use the expectation maximization algorithm to model the latent topic distributions of reviewers on the review features. Then, we adopt the Kullback–Leibler divergence to measure the similarities of reviewers based on their topic distributions to detect fraudulent reviewers. The experiment on Yelp dataset shows that the EMKL approach has a good performance in detecting fraudulent reviewers. In addition, the proposed EMKL method performs better than the performance of state-of-the-art techniques.

References

[1]
Leman Akoglu, Rishi Chandy, and Christos Faloutsos. 2013. Opinion fraud detection in online reviews by network effects. In Proceedings of the 7th International Conference on Weblogs and Social Media, ICWSM 2013, 2–11.
[2]
Arno De Caigny, Kristof Coussement, and Koen W. De Bock. 2018. A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur. J. Oper. Res. 269, 2 (2018), 760–772.
[3]
Canvas8. 2020. The critical role of reviews in Internet trust. Retrieved from https://business.trustpilot.com/guides-reports/build-trusted-brand/the-critical-role-of-reviews-in-internet-trust
[4]
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data Via the EM Algorithm . J. R. Stat. Soc. Ser. B 39, 1 (1977), 1–22.
[5]
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, 50–57.
[6]
S. Kullback and R. A. Leibler. 1951. On Information and Sufficiency. Ann. Math. Stat. 22, 1 (1951), 79–86.
[7]
Naveen Kumar, Deepak Venugopal, Liangfei Qiu, and Subodha Kumar. 2018. Detecting Review Manipulation on Online Platforms with Hierarchical Supervised Learning. J. Manag. Inf. Syst. 35, 1 (2018), 350–380.
[8]
Naveen Kumar, Deepak Venugopal, Liangfei Qiu, and Subodha Kumar. 2019. Detecting Anomalous Online Reviewers: An Unsupervised Approach Using Mixture Models. J. Manag. Inf. Syst. 36, 4 (2019), 1313–1346.
[9]
Wei Chao Lin, Chih Fong Tsai, Ya Han Hu, and Jing Shang Jhang. 2017. Clustering-based undersampling in class-imbalanced data. Inf. Sci. (Ny). 409–410, (2017), 17–26.
[10]
Michael Luca. 2012. Reviews, Reputation, and Revenue: The Case of Yelp.Com.
[11]
Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and yelp review fraud. Manage. Sci. 62, 12 (2016), 3412–3427.
[12]
Wenjuan Luo, Fuzhen Zhuang, Weizhong Zhao, Qing He, and Zhongzhi Shi. 2015. QPLSA: Utilizing quad-tuples for aspect identification and rating. Inf. Process. Manag. 51, 1 (2015), 25–41.
[13]
Edward C. Malthouse and Georgios Askalidis. How Do Rating and Reviews Affect Conversion Rates on E-Commerce Sites? Retrieved from https://spiegel.medill.northwestern.edu/from-reviews-to-revenue/
[14]
Bundit Manaskasemsak, Jirateep Tantisuwankul, and Arnon Rungsawang. 2021. Fake review and reviewer detection through behavioral graph partitioning integrating deep neural network. Neural Comput. Appl. (2021).
[15]
Dina Mayzlin, Yaniv Dover, and Judith Chevalier. 2014. Promotional reviews: An empirical investigation of online review manipulation. American Economic Review 104, 2421–2455.
[16]
Jay F. Nunamaker, Judee K. Burgoon, and Justin Scott Giboney. 2016. Special Issue: Information Systems for Deception Detection. Journal of Management Information Systems 33, 327–331.
[17]
Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 985–994.
[18]
Greg Sterling. Study finds 61 percent of electronics reviews on Amazon are ‘fake.’ Retrieved from https://martech.org/study-finds-61-percent-of-electronics-reviews-on-amazon-are-fake/
[19]
Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. 2012. Identify online store review spammers via social review graph. ACM Transactions on Intelligent Systems and Technology 3, 1–21.
[20]
Wenyang Wang and Dongchu Sun. 2021. The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. (Ny). 563, (2021), 358–374.
[21]
Zhuo Wang, Songmin Gu, and Xiaowei Xu. 2018. GSLDA: LDA-based group spamming detection in product reviews. Appl. Intell. 48, 9 (2018), 3094–3107.
[22]
Dongsong Zhang, Lina Zhou, Juan Luo Kehoe, and Isil Yakut Kilic. 2016. What Online Reviewer Behaviors Really Matter? Effects of Verbal and Nonverbal Behaviors on Detection of Fake Online Reviews. J. Manag. Inf. Syst. 33, 2 (2016), 456–481.
[23]
Fuzhi Zhang, Xiaoyan Hao, Jinbo Chao, and Shuai Yuan. 2020. Label propagation-based approach for detecting review spammer groups on e-commerce websites. Knowledge-Based Syst. 193, (2020).
[24]
[Wen Zhang, Chaoqi Bu, Taketoshi Yoshida, and Siguang Zhang. 2016. CoSpa: A co-training approach for spam review identification with support vector machine. Inf. 7, 1 (2016).
[25]
Wen Zhang, Yuhang Du, Taketoshi Yoshida, and Qing Wang. 2018. DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Inf. Process. Manag. 54, 4 (2018), 576–592.
[26]
Wen Zhang, Qiang Wang, Xiangjun Li, Taketoshi Yoshida, and Jian Li. 2019. DCWord: A Novel Deep Learning Approach to Deceptive Review Identification by Word Vectors. J. Syst. Sci. Syst. Eng. 28, 6 (2019), 731–746.

Cited By

View all
  • (2024)A Study on Fake Review Detection Based on RoBERTa and Behavioral FeaturesProcedia Computer Science10.1016/j.procs.2024.08.131242(1323-1330)Online publication date: 2024
  1. Handling Imbalance in Fraudulent Reviewer Detection based on Expectation Maximization and KL Divergence

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
    December 2021
    541 pages
    ISBN:9781450391870
    DOI:10.1145/3498851
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Expectation maximization
    2. Fraudulent reviewer detection
    3. Imbalanced data
    4. Kullback–Leibler (KL) divergence

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    WI-IAT '21
    Sponsor:
    WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence
    December 14 - 17, 2021
    VIC, Melbourne, Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Study on Fake Review Detection Based on RoBERTa and Behavioral FeaturesProcedia Computer Science10.1016/j.procs.2024.08.131242(1323-1330)Online publication date: 2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media