Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

TransVAE-PAM: A Combined Transformer and DAG-based Approach for Enhanced Fake News Detection in Indian Context

Online AM: 02 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    In this study, we introduce a novel method, “TransVAE-PAM”, for the classification of fake news articles, tailored specifically for the Indian context. The approach capitalizes on state-of-the-art contextual and sentence transformer-based embedding models to generate article embeddings. Furthermore, we also try to address the issue of compact model size. In this respect, we employ a Variational Autoencoder (VAE) and β-VAE to reduce the dimensions of the embeddings, thereby yielding compact latent representations. To capture the thematic essence or important topics in the news articles, we use the Pachinko Allocation Model (PAM) model, a Directed Acyclic Graph (DAG) based approach, to generate meaningful topics. These two facets of representation - the reduced-dimension embeddings from the VAE and the extracted topics from the PAM model - are fused together to create a feature set. This representation is subsequently channeled into five different methods for fake news classification. Furthermore, we use eight distinct transformer-based architectures to test the embedding generation. To validate the feasibility of the proposed approach, we have conducted extensive experimentation on a proprietary dataset. The dataset is sourced from “Times of India” and other online media. Considering the size of the dataset, large-scale experiments are conducted on an NVIDIA supercomputer. Through this comprehensive numerical investigation, we have achieved an accuracy of 96.2% and an F1 score of 96% using the DistilBERT transformer architecture. By complementing the method via topic modeling, we record a performance improvement with the accuracy and F1 score both at 97%. These results indicate a promising direction toward leveraging the combination of advanced topic models into existing classification schemes to enhance research on fake news detection.

    References

    [1]
    Omar Abu Arqoub, Adeola Abdulateef Elega, Bahire Efe Özad, Hanadi Dwikat, and Felix Adedamola Oloyede. 2022. Mapping the scholarship of fake news research: A systematic review. Journalism Practice 16, 1 (2022), 56–86.
    [2]
    Stavros P Adam, Stamatios-Aggelos N Alexandropoulos, Panos M Pardalos, and Michael N Vrahatis. 2019. No free lunch theorem: A review. Approximation and optimization: Algorithms, complexity and applications (2019), 57–82.
    [3]
    Khosrow Ahmadi, Taleb Khafaie, and Maziyar Ganjoo. 2021. Rumor Propagation Detection in Complex Networks Based on ILSR Model and Nodes Degree. Journal of Communication Engineering 11, 42 (2021), 55–68.
    [4]
    Hadeer Ahmed, Issa Traore, and Sherif Saad. 2017. Detection of online fake news using n-gram analysis and machine learning techniques. In International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, 127–138.
    [5]
    Sajjad Ahmed, Knut Hinkelmann, and Flavio Corradini. 2022. Combining machine learning with knowledge engineering to detect fake news in social networks-a survey. arXiv preprint arXiv:2201.08032(2022).
    [6]
    Saadaldeen Rashid Ahmed, Emrullah Sonuç, Mohammed Rashid Ahmed, and Adil Deniz Duru. 2022. Analysis Survey on Deepfake detection and Recognition with Convolutional Neural Networks. In 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). 1–7.
    [7]
    Bashar Al Asaad and Madalina Erascu. 2018. A tool for fake news detection. In 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, 379–386.
    [8]
    Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of economic perspectives 31, 2 (2017), 211–36.
    [9]
    Mohammed S Alzaidi, Chatti Subbalakshmi, TV Roshini, Piyush Kumar Shukla, Surendra Kumar Shukla, Papiya Dutta, and Musah Alhassan. 2022. 5G-Telecommunication Allocation Network Using IoT Enabled Improved Machine Learning Technique. Wireless Communications and Mobile Computing 2022 (2022).
    [10]
    Adrian MP Braşoveanu and Răzvan Andonie. 2021. Integrating machine learning techniques in semantic fake news detection. Neural Processing Letters 53, 5 (2021), 3055–3072.
    [11]
    Cody Buntain and Jennifer Golbeck. 2017. Automatically identifying fake news in popular twitter threads. In 2017 IEEE international conference on smart cloud (smartCloud). 208–215.
    [12]
    Wilson Ceron, Mathias-Felipe de Lima-Santos, and Marcos G Quiles. 2021. Fake news agenda in the era of COVID-19: Identifying trends through fact-checking content. Online Social Networks and Media 21 (2021), 100116.
    [13]
    Yixuan Chen, Dongsheng Li, Peng Zhang, Jie Sui, Qin Lv, Lu Tun, and Li Shang. 2022. Cross-modal Ambiguity Learning for Multimodal Fake News Detection. In Proceedings of the ACM Web Conference 2022. 2897–2905.
    [14]
    Katia Ciampa, Zora M Wolfe, and Briana Bronstein. 2023. ChatGPT in education: Transforming digital literacy practices. Journal of Adolescent & Adult Literacy(2023).
    [15]
    Jan Christian Blaise Cruz, Julianne Agatha Tan, and Charibeth Cheng. 2019. Localization of fake news detection via multitask transfer learning. arXiv preprint arXiv:1910.09295(2019).
    [16]
    Mansour Davoudi, Mohammad R Moosavi, and Mohammad Hadi Sadreddini. 2022. DSS: A hybrid deep model for fake news detection using propagation tree and stance network. Expert Systems with Applications 198 (2022), 116635.
    [17]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
    [18]
    Apoorva Dhawan, Malvika Bhalla, Deeksha Arora, Rishabh Kaushal, and Ponnurangam Kumaraguru. 2022. FakeNewsIndia: A benchmark dataset of fake news incidents in India, collection methodology and impact assessment in social media. Computer Communications 185 (2022), 130–141.
    [19]
    Thomas Felber. 2021. Constraint 2021: Machine learning models for COVID-19 fake news detection shared task. arXiv preprint arXiv:2101.03717(2021).
    [20]
    Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2020. Language-agnostic BERT sentence embedding. arXiv preprint arXiv:2007.01852(2020).
    [21]
    Raphael Antonius Frick and Inna Vogel. 2022. Fraunhofer SIT at CheckThat! 2022: ensemble similarity estimation for finding previously fact-checked claims. Working Notes of CLEF(2022), 05–08.
    [22]
    Jaynil Gaglani, Yash Gandhi, Shubham Gogate, and Aparna Halbe. 2020. Unsupervised WhatsApp fake news detection using semantic search. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, 285–289.
    [23]
    Shlok Gilda. 2017. Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms for fake news detection. In 2017 IEEE 15th student conference on research and development (SCOReD). 110–115.
    [24]
    Jonathan Gray, Liliana Bounegru, and Tommaso Venturini. 2020. ‘Fake news’ as infrastructural uncanny. New media & society 22, 2 (2020), 317–341.
    [25]
    Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206.
    [26]
    Yang He, Ning Yu, Margret Keuper, and Mario Fritz. 2021. Beyond the spectrum: Detecting deepfakes via re-synthesis. arXiv preprint arXiv:2105.14376(2021).
    [27]
    Benjamin D Horne, Jeppe Nørregaard, and Sibel Adali. [n. d.]. Robust fake news detection over time and attack. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 1([n. d.]), 1–23.
    [28]
    Mohammad Hosseini, David B Resnik, and Kristi Holmes. 2023. The ethics of disclosing the use of artificial intelligence tools in writing scholarly manuscripts. Research Ethics (2023), 17470161231180449.
    [29]
    Marjan Hosseini, Alireza Javadian Sabet, Suining He, and Derek Aguiar. 2023. Interpretable fake news detection with topic and deep variational models. Online Social Networks and Media 36 (2023), 100249.
    [30]
    Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the 25th ACM international conference on Multimedia. 795–816.
    [31]
    Budi Juarto and Abba Suganda Girsang. 2021. Neural collaborative with sentence BERT for news recommender system. JOIV: International Journal on Informatics Visualization 5, 4(2021), 448–455.
    [32]
    Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma. 2019. Mvae: Multimodal variational autoencoder for fake news detection. In The world wide web conference. 2915–2921.
    [33]
    Niteesh Kumar, P Pranav, Vishal Nirney, and V Geetha. 2021. Deepfake Image Detection using CNNs and Transfer Learning. In 2021 International Conference on Computing, Communication and Green Engineering (CCGE). 1–6.
    [34]
    Peiguang Li, Xian Sun, Hongfeng Yu, Yu Tian, Fanglong Yao, and Guangluan Xu. 2021. Entity-oriented multi-modal alignment and fusion network for fake news detection. IEEE Transactions on Multimedia 24 (2021), 3455–3468.
    [35]
    Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, and Tianyin Xu. 2023. Configuration Validation with Large Language Models. arXiv preprint arXiv:2310.09690(2023).
    [36]
    Jwen Fai Low, Benjamin CM Fung, Farkhund Iqbal, and Shih-Chia Huang. 2022. Distinguishing between fake news and satire with transformers. Expert Systems with Applications 187 (2022), 115824.
    [37]
    Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. Detecting rumors from microblogs with recurrent neural networks. (2016).
    [38]
    Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM international on conference on information and knowledge management. 1751–1754.
    [39]
    Kun Ma, Changhao Tang, Weijuan Zhang, Benkuan Cui, Ke Ji, Zhenxiang Chen, and Ajith Abraham. 2022. DC-CNN: Dual-channel Convolutional Neural Networks with attention-pooling for fake news detection. Applied Intelligence(2022), 1–16.
    [40]
    Shreyash Mishra, S Suryavardan, Amrit Bhaskar, Parul Chopra, Aishwarya Reganti, Parth Patwa, Amitava Das, Tanmoy Chakraborty, Amit Sheth, Asif Ekbal, et al. 2022. Factify: A multi-modal fact verification dataset. In Proceedings of the First Workshop on Multimodal Fact-Checking and Hate Speech Detection (DE-FACTIFY).
    [41]
    Dmitry Nikolaev and Sebastian Padó. 2023. Representation biases in sentence transformers. arXiv preprint arXiv:2301.13039(2023).
    [42]
    Xu Peng and Bao Xintong. 2022. An effective strategy for multi-modal fake news detection. Multimedia Tools and Applications 81, 10 (2022), 13799–13822.
    [43]
    Yuvraj Prakash and Dilip Kumar Sharma. 2023. Aspect Based Sentiment Analysis for Amazon Data Products using PAM. In 2023 6th International Conference on Information Systems and Computer Networks (ISCON). IEEE, 1–5.
    [44]
    Nishant Rai, Deepika Kumar, Naman Kaushik, Chandan Raj, and Ahad Ali. 2022. Fake News Classification using transformer based enhanced LSTM and BERT. International Journal of Cognitive Computing in Engineering 3 (2022), 98–105.
    [45]
    Shaina Raza. 2021. Automatic Fake News Detection in Political Platforms - A Transformer-based Approach. In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), Ali Hürriyetoğlu (Ed.). Association for Computational Linguistics, Online, 68–78. https://doi.org/10.18653/v1/2021.case-1.10
    [46]
    Shaina Raza and Chen Ding. 2022. Fake news detection based on news content and social contexts: a transformer-based approach. International Journal of Data Science and Analytics 13, 4 (2022), 335–362.
    [47]
    Santwana Sagnika, Bhabani Shankar Prasad Mishra, and Saroj K Meher. 2021. An attention-based CNN-LSTM model for subjectivity detection in opinion-mining. Neural Computing and Applications 33 (2021), 17425–17438.
    [48]
    Rohith Saji, Sai Krishna Anand, and BR Chandavarkar. 2021. Comparing CNNs and GANs for Image Completion. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT). 1–7.
    [49]
    Isabel Segura-Bedmar and Santiago Alonso-Bartolome. 2022. Multimodal Fake News Detection. Information 13, 6 (2022), 284.
    [50]
    Uma Sharma, Sidarth Saran, and Shankar M Patil. 2020. Fake news detection using machine learning algorithms. International Journal of Creative Research Thoughts (IJCRT) 8, 6(2020), 509–518.
    [51]
    Harshvardhan Sikka, Weishun Zhong, Jun Yin, and Cengiz Pehlevant. 2019. A Closer Look at Disentangling in β-VAE. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 888–895.
    [52]
    Vivek K Singh, Isha Ghosh, and Darshan Sonagara. 2021. Detecting fake news stories via multimodal analysis. Journal of the Association for Information Science and Technology 72, 1(2021), 3–17.
    [53]
    Kassym-Jomart Tokayev. 2023. Ethical Implications of Large Language Models A Multidimensional Exploration of Societal, Economic, and Technical Concerns. International Journal of Social Analytics 8, 9 (2023), 17–33.
    [54]
    Shivani Tufchi, Ashima Yadav, and Tanveer Ahmed. 2023. A comprehensive survey of multimodal fake news detection techniques: advances, challenges, and opportunities. International Journal of Multimedia Information Retrieval 12, 2(2023), 28.
    [55]
    Shivani Tufchi, Ashima Yadav, Tanveer Ahmed, Arnav Tyagi, Tanmay Singh, and Parijat Rai. 2023. FakeRealIndian Dataset: A Benchmark Indian Context Dataset. In Doctoral Symposium on Computational Intelligence. Springer, 319–325.
    [56]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention Is All You Need. arxiv:1706.03762  [cs.CL]
    [57]
    Ike Vayansky and Sathish AP Kumar. 2020. A review of topic modeling methods. Information Systems 94(2020), 101582.
    [58]
    Jingzi Wang, Hongyan Mao, and Hongwei Li. 2022. FMFN: Fine-Grained Multimodal Fusion Networks for Fake News Detection. Applied Sciences 12, 3 (2022), 1093.
    [59]
    Jiaxin Wang, Zhenghao Sun, Jiahui Wang, Hua Wu, and Xiaoyan Hu. 2020. A Two-Stage Attention-Based Model for Fake News Detection. arXiv preprint arXiv:2004.14420(2020).
    [60]
    Jinwei Wang, Kehui Zeng, Bin Ma, Xiangyang Luo, Qilin Yin, Guangjie Liu, and Sunil Kr Jha. 2022. GAN-generated fake face detection via two-stream CNN with PRNU in the wild. Multimedia Tools and Applications(2022), 1–19.
    [61]
    Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. 2018. Eann: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining. 849–857.
    [62]
    Krzysztof Węcel, Marcin Sawiński, Milena Stróżyna, Włodzimierz Lewoniewski, Ewelina Księżniak, Piotr Stolarski, and Witold Abramowicz. 2023. Artificial intelligence—friend or foe in fake news campaigns. Economics and Business Review 9, 2 (2023), 41–70.
    [63]
    Kuai Xu, Feng Wang, Haiyan Wang, and Bo Yang. 2019. Detecting fake news over online social media via domain reputations and content understanding. Tsinghua Science and Technology 25, 1 (2019), 20–27.
    [64]
    Jiachen Yang, Shuai Xiao, and Zhihan Lv. 2022. Protecting the trust and credibility of data by tracking forgery trace based on GANs. Digital Communications and Networks(2022).
    [65]
    Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, Vol.  33. 7370–7377.
    [66]
    Cheng Yu and Wenmin Wang. 2022. Fast transformation of discriminators into encoders using pre-trained GANs. Pattern Recognition Letters 153 (2022), 92–99.
    [67]
    Feng Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan, et al. 2017. A Convolutional Approach for Misinformation Identification. In IJCAI. 3901–3907.
    [68]
    Huaiwen Zhang, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2021. Multi-modal meta multi-task learning for social media rumor detection. IEEE Transactions on Multimedia 24 (2021), 1449–1459.
    [69]
    Shenhao Zhang, Yihui Wang, and Chengxiang Tan. 2018. Research on text classification for identifying fake news. In 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC). IEEE, 178–181.
    [70]
    Tong Zhang, Di Wang, Huanhuan Chen, Zhiwei Zeng, Wei Guo, Chunyan Miao, and Lizhen Cui. 2020. BDANN: BERT-based domain adaptation neural network for multi-modal fake news detection. In 2020 international joint conference on neural networks (IJCNN). 1–8.

    Index Terms

    1. TransVAE-PAM: A Combined Transformer and DAG-based Approach for Enhanced Fake News Detection in Indian Context
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
          ISSN:2375-4699
          EISSN:2375-4702
          Table of Contents
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Online AM: 02 March 2024
          Accepted: 24 February 2024
          Revised: 18 December 2023
          Received: 09 October 2023

          Check for updates

          Author Tags

          1. Fake news
          2. Embeddings
          3. Autoencoders
          4. Topic Modelling
          5. DAG

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 262
            Total Downloads
          • Downloads (Last 12 months)262
          • Downloads (Last 6 weeks)36
          Reflects downloads up to 26 Jul 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Full Access

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media