Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Bi-Directional Recurrent Attentional Topic Model

Published: 28 September 2020 Publication History
  • Get Citation Alerts
  • Abstract

    In a document, the topic distribution of a sentence depends on both the topics of its neighbored sentences and its own content, and it is usually affected by the topics of the neighbored sentences with different weights. The neighbored sentences of a sentence include the preceding sentences and the subsequent sentences. Meanwhile, it is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a bi-Directional Recurrent Attentional Topic Model (bi-RATM) for document embedding. The bi-RATM not only takes advantage of the sequential orders among sentences but also uses the attention mechanism to model the relations among successive sentences. To support to the bi-RATM, we propose a bi-Directional Recurrent Attentional Bayesian Process (bi-RABP) to handle the sequences. Based on the bi-RABP, bi-RATM fully utilizes the bi-directional sequential information of the sentences in a document. Online bi-RATM is proposed to handle large-scale corpus. Experiments on two corpora show that the proposed model outperforms state-of-the-art methods on document modeling and classification.

    References

    [1]
    Amr Ahmed and Eric P. Xing. 2008. Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: With applications to evolutionary clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining. 219--230.
    [2]
    Mohammadreza Babaee, Duc Tung Dinh, and Gerhard Rigoll. 2018. A deep convolutional neural network for video sequence background subtraction. Pattern Recognition 76 (April 2018), 635--649.
    [3]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    [4]
    David M. Blei. 2012. Probabilistic topic models. Communications of the ACM 55, 4 (2012), 77--84.
    [5]
    David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. 2010. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM 57, 2 (2010), 30.
    [6]
    David M. Blei and John D. Lafferty. 2006. Correlated topic models. In Proceedings of the Advances in Neural Information Processing Systems. 147--154.
    [7]
    David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022.
    [8]
    Léon Bottou and Olivier Bousquet. 2008. The tradeoffs of large scale learning. In Proceedings of the Advances in Neural Information Processing Systems. 161--168.
    [9]
    Jordan L. Boyd-Graber and David M. Blei. 2009. Syntactic topic models. In Proceedings of the Advances in Neural Information Processing Systems. 185--192.
    [10]
    Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based -gram models of natural language. Computational Linguistics 18, 4 (1992), 467--479.
    [11]
    Deng Cai, Qiaozhu Mei, Jiawei Han, and Chengxiang Zhai. 2008. Modeling hidden topics on document manifold. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. 911--920.
    [12]
    Olivier Cappé and Eric Moulines. 2009. On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 3 (2009), 593--613.
    [13]
    Jianguo Chen, Kenli Li, Kashif Bilal, Keqin Li, S. Yu Philip, et al. 2018. A bi-layered parallel training architecture for large-scale convolutional neural networks. IEEE Transactions on Parallel and Distributed Systems 30, 5 (2018), 965--976.
    [14]
    Jianguo Chen, Kenli Li, Zhuo Tang, Kashif Bilal, Shui Yu, Chuliang Weng, and Keqin Li. 2016. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2016), 919--933.
    [15]
    Yong Chen, Junjie Wu, Jianying Lin, Rui Liu, Hui Zhang, and Zhiwen Ye. 2019. Affinity regularized non-negative matrix factorization for lifelong topic modeling. IEEE Transactions on Knowledge and Data Engineering 32, 7 (2019), 1249--1262.
    [16]
    Zhiyuan Chen and Bing Liu. 2014. Topic modeling using topics from many domains, lifelong learning and big data. In Proceedings of the International Conference on Machine Learning. 703--711.
    [17]
    Jan K. Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Proceedings of the Advances in Neural Information Processing Systems. 577--585.
    [18]
    Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 795--804.
    [19]
    Adji B. Dieng, Chong Wang, Jianfeng Gao, and John Paisley. 2017. TopicRnn: A recurrent neural network with long-range semantic dependency. Proceedings of the 5th International Conference on Learning Representations.
    [20]
    Chris Ding, Tao Li, and Wei Peng. 2008. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics and Data Analysis 52, 8 (2008), 3913--3927.
    [21]
    Lan Du, Wray Buntine, Huidong Jin, and Changyou Chen. 2012. Sequential latent Dirichlet allocation. Knowledge and Information Systems 31, 3 (2012), 475--503.
    [22]
    Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J. Smola, and Le Song. 2015. Dirichlet-Hawkes processes with applications to clustering continuous-time document streams. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 219--228.
    [23]
    Li Fei-Fei and Pietro Perona. 2005. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. 524--531.
    [24]
    Volkmar Frinken, Andreas Fischer, R. Manmatha, and Horst Bunke. 2011. A novel word spotting method based on recurrent neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 2 (2011), 211--224.
    [25]
    Zhe Gan, Changyou Chen, Ricardo Henao, David Carlson, and Lawrence Carin. 2015. Scalable deep Poisson factor analysis for topic modeling. In Proceedings of the International Conference on Machine Learning. 1823--1832.
    [26]
    Thomas L. Griffiths, Mark Steyvers, David M. Blei, and Joshua B. Tenenbaum. 2005. Integrating topics and syntax. In Proceedings of the Advances in Neural Information Processing Systems. 537--544.
    [27]
    Amit Gruber, Yair Weiss, and Michal Rosen-Zvi. 2007. Hidden topic Markov models. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics. 163--170.
    [28]
    Ricardo Henao, Zhe Gan, James Lu, and Lawrence Carin. 2015. Deep poisson factor modeling. In Proceedings of the Advances in Neural Information Processing Systems. 2800--2808.
    [29]
    Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2009. Replicated softmax: An undirected topic model. In Proceedings of the Advances in Neural Information Processing Systems. 1607--1614.
    [30]
    Matthew D. Hoffman, David M. Blei, and Francis R. Bach. 2010. Online learning for latent dirichlet allocation. In Proceedings of the Advances in Neural Information Processing Systems. 856--864.
    [31]
    Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 50--57.
    [32]
    Michael I. Jordan, Zoubin Ghahramani, Tommi Jaakkola, and Lawrence K. Saul. 1999. An introduction to variational methods for graphical models. Machine Learning 37, 2 (1999), 183--233.
    [33]
    Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Proceedings of the Advances in Neural Information Processing Systems. 3294--3302.
    [34]
    Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2267--2273.
    [35]
    Hugo Larochelle and Stanislas Lauly. 2012. A neural autoregressive topic model. In Proceedings of the Advances in Neural Information Processing Systems. 2708--2716.
    [36]
    Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188--1196.
    [37]
    Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic modeling for short texts with auxiliary word embeddings. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 165--174.
    [38]
    Shuangyin Li, Yu Zhang, Rong Pan, Mingzhi Mao, and Yang Yang. 2017. Recurrent attentional topic model. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. Vol. 17. 3223--3229.
    [39]
    Percy Liang and Dan Klein. 2009. Online EM for unsupervised models. In Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 611--619.
    [40]
    Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. 2015. Hierarchical recurrent neural network for document modeling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 899--907.
    [41]
    Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, and Silvio Amir. 2015. Not all contexts are created equal: Better word representations with variable attention. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1367--1372.
    [42]
    Benjamin M. Marlin. 2004. Modeling user rating profiles for collaborative filtering. In Proceedings of the Advances in Neural Information Processing Systems. 627--634.
    [43]
    Jon D. Mcauliffe and David M. Blei. 2008. Supervised topic models. In Proceedings of the Advances in Neural Information Processing Systems. 121--128.
    [44]
    Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association.
    [45]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems. 3111--3119.
    [46]
    Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 746--751.
    [47]
    David Mimno, Hanna M. Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 262--272.
    [48]
    David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. Journal of Machine Learning Research 10, 8 (2009), 1801--1828.
    [49]
    David Newman, Youn Noh, Edmund Talley, Sarvnaz Karimi, and Timothy Baldwin. 2010. Evaluating topic models for digital libraries. In Proceedings of the 10th Annual Joint Conference on Digital Libraries. ACM, 215--224.
    [50]
    David Newman, Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2008. Distributed inference for latent dirichlet allocation. In Proceedings of the Advances in Neural Information Processing Systems. 1081--1088.
    [51]
    Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving topic models with latent feature word representations. Transactions of the Association for Computational Linguistics 3 (2015), 299--313.
    [52]
    Srivastava Nitish, Ruslan Salakhutdinov, and Geoffrey E. Hinton. 2013. Modeling documents with a deep boltzmann machine. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI’13). AUAI Press, Arlington, Virginia, 616--624.
    [53]
    Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    [54]
    Jim Pitman and Marc Yor. 1997. The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Annals of Probability 25, 2 (1997), 855--900.
    [55]
    Yong Ren, Yining Wang, and Jun Zhu. 2017. Spectral learning for supervised topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 3 (2017), 726--739.
    [56]
    Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015), 379--389.
    [57]
    Masa-Aki Sato and Shin Ishii. 2000. On-line EM algorithm for the normalized Gaussian network. Neural Computation 12, 2 (2000), 407--432.
    [58]
    Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 13th AAAI Conference on Artificial Intelligence. 3776--3783.
    [59]
    Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. 373--374.
    [60]
    Alexander Smola and Shravan Narayanamurthy. 2010. An architecture for parallel topic models. Proceedings of the VLDB Endowment 3, 1–2 (2010), 703--710.
    [61]
    Nitish Srivastava and Ruslan Salakhutdinov. 2012. Learning representations for multimodal data with deep belief nets. In Proceedings of the International Conference on Machine Learning Workshop. Vol. 79.
    [62]
    Nitish Srivastava and Russ R. Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. In Proceedings of the Advances in Neural Information Processing Systems. 2222--2230.
    [63]
    Keith Stevens, Philip Kegelmeyer, David Andrzejewski, and David Buttler. 2012. Exploring topic coherence over many models and many topics. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 952--961.
    [64]
    Ilya Sutskever, James Martens, and Geoffrey E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning. 1017--1024.
    [65]
    Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.
    [66]
    Jian Tang, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, and Ming Zhang. 2014. Understanding the limiting factors of topic modeling via posterior contraction analysis. In Proceedings of the International Conference on Machine Learning. 190--198.
    [67]
    Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2006. Hierarchical dirichlet processes. Journal of the American Statistical Association 101, 476 (2006), 1566--1581.
    [68]
    Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1, 1–2 (2008), 1--305.
    [69]
    Hanna M Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1105--1112.
    [70]
    Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794--7803.
    [71]
    Jônatas Wehrmann, Gabriel S. Simões, Rodrigo C. Barros, and Victor F. Cavalcante. 2018. Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272 (January 2018), 432--438.
    [72]
    Xing Wei and W. Bruce Croft. 2006. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 178--185.
    [73]
    Guoqing Xiao, Kenli Li, and Keqin Li. 2017. Reporting l most influential objects in uncertain databases based on probabilistic reverse top-k queries. Information Sciences 405 (September 2017), 207--226.
    [74]
    Mingyang Xu, Ruixin Yang, Steve Harenberg, and Nagiza F. Samatova. 2017. A lifelong learning topic model structured using latent embeddings. In Proceedings of the 2017 IEEE 11th International Conference on Semantic Computing (ICSC’17). IEEE, 260--261.
    [75]
    Min Yang, Tianyi Cui, and Wenting Tu. 2015. Ordering-sensitive and semantic-aware topic modeling. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2353--2359.
    [76]
    Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad L. Alkhouja. 2012. Mr. LDA: A flexible large scale topic modeling package using variational inference in mapreduce. In Proceedings of the 21st International Conference on World Wide Web. ACM, 879--888.
    [77]
    Mingyuan Zhou, Yulai Cong, and Bo Chen. 2016. Augmentable gamma belief networks. The Journal of Machine Learning Research 17, 1 (2016), 5656--5699.
    [78]
    Xu Zhou, Kenli Li, Yantao Zhou, and Keqin Li. 2015. Adaptive processing for distributed skyline queries over uncertain data. IEEE Transactions on Knowledge and Data Engineering 28, 2 (2015), 371--384.

    Cited By

    View all
    • (2024)Hidden Variable Models in Text Classification and Sentiment AnalysisElectronics10.3390/electronics1310185913:10(1859)Online publication date: 10-May-2024
    • (2024)A transformer-based neural network framework for full names prediction with abbreviations and contextsData & Knowledge Engineering10.1016/j.datak.2023.102275150:COnline publication date: 1-Mar-2024
    • (2023)A survey of topic modelsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23355145:6(9929-9953)Online publication date: 1-Jan-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 6
    December 2020
    376 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3427188
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 September 2020
    Accepted: 01 July 2020
    Revised: 01 February 2020
    Received: 01 December 2018
    Published in TKDD Volume 14, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bi-directional recurrent attentions
    2. recurrent attentional Bayesian process
    3. topic modeling

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Key-Area Research and Development Program
    • Basic and Applied Basic Research Fund
    • Guangdong Science and Technology
    • National Key-Area Research and Development Program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)313
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hidden Variable Models in Text Classification and Sentiment AnalysisElectronics10.3390/electronics1310185913:10(1859)Online publication date: 10-May-2024
    • (2024)A transformer-based neural network framework for full names prediction with abbreviations and contextsData & Knowledge Engineering10.1016/j.datak.2023.102275150:COnline publication date: 1-Mar-2024
    • (2023)A survey of topic modelsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23355145:6(9929-9953)Online publication date: 1-Jan-2023
    • (2023)Advancing Multinomial Regression and Topic Modeling with Beta-Liouville Distributions2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00292(1928-1935)Online publication date: 15-Dec-2023
    • (2023)Generalized Dirichlet-Multinomial Regression: Leveraging Arbitrary Features for Topic Modelling2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00128(884-891)Online publication date: 17-Dec-2023
    • (2023)On the modeling of cyber-attacks associated with social engineeringJournal of Information Security and Applications10.1016/j.jisa.2023.10350175:COnline publication date: 26-Jul-2023
    • (2022)An Improved Gray Wolf Optimization Algorithm with a Novel Initialization Method for Community DetectionMathematics10.3390/math1020380510:20(3805)Online publication date: 15-Oct-2022
    • (2022)A user-based topic model with topical word embeddings for semantic modelling in social networkJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-21261443:1(1467-1480)Online publication date: 1-Jan-2022
    • (2022)Social Group Query Based on Multi-Fuzzy-Constrained Strong SimulationACM Transactions on Knowledge Discovery from Data10.1145/348164016:3(1-27)Online publication date: 30-Jun-2022
    • (2022)Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysisArtificial Intelligence Review10.1007/s10462-022-10254-w56:6(5133-5260)Online publication date: 26-Oct-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media