Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM

Published: 01 January 2022 Publication History

Abstract

Spam reviews misguide decision makings of consumers and may seriously affect fair trading in the online markets. Existing methods for detecting spam reviews mainly focus on feature designs from linguistic and psychological clues, but they hardly reveal the potential semantics. Recent research works apply deep learning to capture semantics features, while these models fail to extract multi-granularity information of the text structures nor consider the mutual influence among the sentences. We propose a hierarchical attention network in which distinct attentions are purposely used at the two layers to capture important, comprehensive, and multi-granularity semantic information. At the first layer, we especially use an N-gram CNN to extract the multi-granularity semantics of the sentences. We then use a combination of convolution structure and Bi-LSTM to extract important and comprehensive semantics in a document at the second layer. Extensive experiments on public datasets demonstrate that our model has superior detection performance over the state-of-the-art baselines, improving F 1 score in the mixed-domain to 89.3% (with 4.8 points absolute improvement), F 1 score in the Doctor domain to 92.8% (with 9.9 points absolute improvement), F 1 score in the Hotel domain to 86.1% (with 2.4 points absolute improvement) and F 1 score in the cross-domain to 84.7% (with 10.4 points absolute improvement).

Highlights

We proposed a novel hierarchical attention architecture for spam review detection.
The Word2Sent-level captures multi-granularity and informative information.
The Sent2Doc-level extracts comprehensive and important information.
Extensive experiments demonstrate that our model has superior detection performance.

References

[1]
S. Kennedy, N. Walsh, K. Sloka, A. McCarren, J. Foster, Fact or factitious? Contextualized opinion spam detection, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2019, pp. 344–350.
[2]
López V., Del Río S., Benítez J.M., Herrera F., Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data, Fuzzy Sets and Systems 258 (2015) 5–38.
[3]
Y.-R. Chen, H.-H. Chen, Opinion spam detection in web forum: a real case study, in: Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 173–183.
[4]
Soliman A., Girdzijauskas S., Adaptive graph-based algorithms for spam detection in social networks, 2016.
[5]
Yao C., Wang J., Kodama E., A spam review detection method by verifying consistency among multiple review sites, in: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), IEEE, 2019, pp. 2825–2830.
[6]
Liu Y., Pang B., Opinion spam detection based on annotation extension and neural networks., Comput. Inf. Sci. 12 (2) (2019) 87.
[7]
Hussain N., Mirza H.T., Hussain I., Iqbal F., Memon I., Spam review detection using the linguistic and spammer behavioral methods, IEEE Access 8 (2020) 53801–53816.
[8]
N. Jindal, B. Liu, Opinion spam and analysis, in: Proceedings of the 2008 International Conference on Web Search and Data Mining, 2008, pp. 219–230.
[9]
Rastogi A., Mehrotra M., Ali S.S., Effective opinion spam detection: A study on review metadata versus content, J. Data Inf. Sci. 5 (2) (2020) 76–110.
[10]
Ren Y., Ji D., Neural networks for deceptive opinion spam detection: An empirical study, Inform. Sci. 385 (2017) 213–224.
[11]
A. Li, Z. Qin, R. Liu, Y. Yang, D. Li, Spam review detection with graph convolutional networks, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 2703–2711.
[12]
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, Hierarchical attention networks for document classification, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1480–1489.
[13]
Shu K., Wang S., Lee D., Liu H., Mining disinformation and fake news: Concepts, methods, and recent advancements, 2020, arXiv preprint arXiv:2001.00623.
[14]
Graves A., Schmidhuber J., Framewise phoneme classification with bidirectional lstm and other neural network architectures, Neural Netw. 18 (5–6) (2005) 602–610.
[15]
Khalil K., Eldash O., Kumar A., Bayoumi M., Economic LSTM approach for recurrent neural networks, IEEE Trans. Circuits Syst. II: Express Briefs 66 (11) (2019) 1885–1889.
[16]
Tavakoli M., Heydari A., Ismail Z., Salim N., A framework for review spam detection research, World Acad. Sci. Eng. Technol. Int. J. Comput. Electr. Autom. Control Inf. Eng. 10 (1) (2015) 67–71.
[17]
Akram A.U., Khan H.U., Iqbal S., Iqbal T., Munir E.U., Shafi M., Finding rotten eggs: A review spam detection model using diverse feature sets., KSII Trans. Internet Inf. Syst. 12 (10) (2018).
[18]
Heydari A., Tavakoli M., Salim N., Detection of fake opinions using time series, Expert Syst. Appl. 58 (2016) 83–92.
[19]
Wang Z., Hou T., Song D., Li Z., Kong T., Detecting review spammer groups via bipartite graph projection, Comput. J. 59 (6) (2016) 861–874.
[20]
Wang Z., Gu S., Xu X., GSLDA: LDA-based group spamming detection in product reviews, Appl. Intell. 48 (9) (2018) 3094–3107.
[21]
Liu Y., Pang B., A unified framework for detecting author spamicity by modeling review deviation, Expert Syst. Appl. 112 (2018) 148–155.
[22]
Wang Z., Hu R., Chen Q., Gao P., Xu X., Collueagle: Collusive review spammer detection using markov random fields, 2019, arXiv preprint arXiv:1911.01690.
[23]
Noekhah S., binti Salim N., Zakaria N.H., Opinion spam detection: Using multi-iterative graph-based model, Inf. Process. Manage. 57 (1) (2020).
[24]
Z. You, T. Qian, B. Liu, An attribute enhanced domain adaptive model for cold-start spam review detection, in: Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1884–1895.
[25]
Stanton G., Irissappane A.A., Gans for semi-supervised opinion spam detection, 2019, arXiv preprint arXiv:1903.08289.
[26]
Li L., Qin B., Ren W., Liu T., Document representation and feature combination for deceptive spam review detection, Neurocomputing 254 (2017) 33–41.
[27]
Yuan C., Zhou W., Ma Q., Lv S., Han J., Hu S., Learning review representations from user and product level information for spam detection, 2019, arXiv preprint arXiv:1909.04455.
[28]
Huang Z., Xu X., Zhu H., Zhou M., An efficient group recommendation model with multiattention-based neural networks, IEEE Trans. Neural Netw. Learn. Syst. (2020).
[29]
Kim J., Jang S., Park E., Choi S., Text classification using capsules, Neurocomputing 376 (2020) 214–221.
[30]
Yadav V., Bethard S., A survey on recent advances in named entity recognition from deep learning models, 2019, arXiv preprint arXiv:1910.11470.
[31]
Koehn P., Neural Machine Translation, Cambridge University Press, 2020.
[32]
Socher R., Perelygin A., Wu J., Chuang J., Manning C.D., Ng A.Y., Potts C., Recursive deep models for semantic compositionality over a sentiment treebank, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1631–1642.
[33]
Kalchbrenner N., Grefenstette E., Blunsom P., A convolutional neural network for modelling sentences, 2014, arXiv preprint arXiv:1404.2188.
[34]
Kim Y., Convolutional neural networks for sentence classification, 2014, arXiv preprint arXiv:1408.5882.
[35]
Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P., Natural language processing (almost) from scratch, J. Mach. Learn. Res. 12 (Aug) (2011) 2493–2537.
[36]
Ren Y., Zhang Y., Zhang M., Ji D., Context-sensitive twitter sentiment classification using neural network, in: Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 215–221.
[37]
Er M.J., Zhang Y., Wang N., Pratama M., Attention pooling-based convolutional neural network for sentence modelling, Inform. Sci. 373 (2016) 388–403.
[38]
A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models, in: Proc. Icml, Vol. 30, 2013, p. 3.
[39]
Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) (2014) 1929–1958.
[40]
Zeiler M.D., Adadelta: an adaptive learning rate method, 2012, arXiv preprint arXiv:1212.5701.
[41]
J. Li, M. Ott, C. Cardie, E. Hovy, Towards a general rule for identifying deceptive opinion spam, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 1566–1576.
[42]
Mikolov T., Chen K., Corrado G., Dean J., Efficient estimation of word representations in vector space, 2013, arXiv preprint arXiv:1301.3781.

Cited By

View all
  • (2024)Detecting Spam Movie Review Under Coordinated Attack With Multi-View Explicit and Implicit Relations Semantics FusionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344194719(7588-7603)Online publication date: 12-Aug-2024
  • (2024)A deep feature interaction and fusion model for fake review detectionNeurocomputing10.1016/j.neucom.2024.128097598:COnline publication date: 14-Sep-2024
  • (2024)Multiscale cascaded domain-based approach for Arabic fake reviews detection in e-commerce platformsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10192636:2Online publication date: 25-Jun-2024
  • Show More Cited By

Index Terms

  1. Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Information Systems
        Information Systems  Volume 103, Issue C
        Jan 2022
        247 pages

        Publisher

        Elsevier Science Ltd.

        United Kingdom

        Publication History

        Published: 01 January 2022

        Author Tags

        1. Deceptive review detection
        2. Hierarchical attention
        3. N-gram CNN
        4. Document representation
        5. Bi-LSTM

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 28 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Detecting Spam Movie Review Under Coordinated Attack With Multi-View Explicit and Implicit Relations Semantics FusionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344194719(7588-7603)Online publication date: 12-Aug-2024
        • (2024)A deep feature interaction and fusion model for fake review detectionNeurocomputing10.1016/j.neucom.2024.128097598:COnline publication date: 14-Sep-2024
        • (2024)Multiscale cascaded domain-based approach for Arabic fake reviews detection in e-commerce platformsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10192636:2Online publication date: 25-Jun-2024
        • (2024)Collusive spam detection from Chinese community question answering sitesInformation Sciences: an International Journal10.1016/j.ins.2024.120379667:COnline publication date: 1-May-2024
        • (2024)NRWalk2Vec-HIN: spammer group detection based on heterogeneous information network embedding over social mediaThe Journal of Supercomputing10.1007/s11227-023-05537-080:2(1818-1851)Online publication date: 1-Jan-2024
        • (2024)DHMFRD – TER: a deep hybrid model for fake review detection incorporating review texts, emotions, and ratingsMultimedia Tools and Applications10.1007/s11042-023-15193-483:2(4533-4549)Online publication date: 1-Jan-2024
        • (2024)Fake review detection techniques, issues, and future research directions: a literature reviewKnowledge and Information Systems10.1007/s10115-024-02118-266:9(5071-5112)Online publication date: 1-Sep-2024
        • (2023)Transfer Learning-based Forensic Analysis and Classification of E-Mail ContentACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3604592Online publication date: 28-Jun-2023
        • (2023)ParsingPhraseInformation Sciences: an International Journal10.1016/j.ins.2023.03.089633:C(531-548)Online publication date: 1-Jul-2023
        • (2023)Natural Language Generation Using Sequential Models: A SurveyNeural Processing Letters10.1007/s11063-023-11281-655:6(7709-7742)Online publication date: 12-May-2023
        • Show More Cited By

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media