Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132847.3132952acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory

Published: 06 November 2017 Publication History

Abstract

Deep Generative Models (DGMs) are able to extract high-level representations from massive unlabeled data and are explainable from a probabilistic perspective. Such characteristics favor sequence modeling tasks. However, it still remains a huge challenge to model sequences with DGMs. Unlike real-valued data that can be directly fed into models, sequence data consist of discrete elements and require being transformed into certain representations first. This leads to the following two challenges. First, high-level features are sensitive to small variations of inputs as well as the way of representing data. Second, the models are more likely to lose long-term information during multiple transformations. In this paper, we propose a Hierarchical Deep Generative Model With Dual Memory to address the two challenges. Furthermore, we provide a method to efficiently perform inference and learning on the model. The proposed model extends basic DGMs with an improved hierarchically organized multi-layer architecture. Besides, our model incorporates memories along dual directions, respectively denoted as broad memory and deep memory. The model is trained end-to-end by optimizing a variational lower bound on data log-likelihood using the improved stochastic variational method. We perform experiments on several tasks with various datasets and obtain excellent results. The results of language modeling show our method significantly outperforms state-of-the-art results in terms of generative performance. Extended experiments including document modeling and sentiment analysis, prove the high-effectiveness of dual memory mechanism and latent representations. Text random generation provides a straightforward perception for advantages of our model.

References

[1]
Ryan Prescott Adams, Hanna M. Wallach, and Zoubin Ghahramani. 2010. Learning the Structure of Deep Sparse Graphical Models. (2010), 1--8. http://www.jmlr.org/proceedings/papers/v9/adams10a.html
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR Vol. abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473
[3]
Yoshua Bengio. 2009. Learning Deep Architectures for AI. Vol. Vol. 2. 1--127 pages.
[4]
J M Bernardo, M J Bayarri, J O Berger, A P Dawid, D. Heckerman, A F M Smith, M. West, Christopher M Bishop, and Julia Lasserre. 2007. Generative or Discriminative Getting the Best of Both Worlds. Bayesian Statistics, Vol. 8, 3 (2007), 3--24.
[5]
Antoine Bordes and Jason Weston. 2016. Learning End-to-End Goal-Oriented Dialog. CoRR Vol. abs/1605.07683 (2016). http://arxiv.org/abs/1605.07683
[6]
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Józefowicz, and Samy Bengio. 2016. Generating Sentences from a Continuous Space. (2016), 10--21. http://aclweb.org/anthology/K/K16/K16--1002.pdf
[7]
Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics Vol. 19, 2 (1993), 263--311.
[8]
Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. 2015. Importance Weighted Autoencoders. CoRR Vol. abs/1509.00519 (2015). http://arxiv.org/abs/1509.00519
[9]
Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, and Yoshua Bengio. 2015. A Recurrent Latent Variable Model for Sequential Data. (2015), 2980--2988. http://papers.nips.cc/paper/5653-a-recurrent-latent-variable-model-for-sequential-data
[10]
Dipanjan Das and Andre F T Martins. 2007. A survey on automatic text summarization. MIT Press,. 18--19 pages.
[11]
Chao Du, Jun Zhu, and Bo Zhang. 2015. Learning Deep Generative Models with Doubly Stochastic MCMC. CoRR Vol. abs/1506.04557 (2015). http://arxiv.org/abs/1506.04557
[12]
Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks Advances in Neural Information Processing Systems. 1019--1027.
[13]
Edouard Grave, Armand Joulin, and Nicolas Usunier. 2016. Improving Neural Language Models with a Continuous Cache. CoRR Vol. abs/1612.04426 (2016). http://arxiv.org/abs/1612.04426
[14]
Alex Graves. 2013. Generating Sequences With Recurrent Neural Networks. CoRR Vol. abs/1308.0850 (2013). http://arxiv.org/abs/1308.0850
[15]
Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised Learning with Deep Generative Models. (2014), 3581--3589. http://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models
[16]
Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. CoRR Vol. abs/1312.6114 (2013). http://arxiv.org/abs/1312.6114
[17]
Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, and Joshua B. Tenenbaum. 2015. Deep Convolutional Inverse Graphics Network. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7--12, 2015, Montreal, Quebec, Canada. 2539--2547. http://papers.nips.cc/paper/5851-deep-convolutional-inverse-graphics-network
[18]
Ken Lang. 1995. NewsWeeder: Learning to Filter Netnews. In Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9--12, 1995. 331--339.
[19]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research Vol. 5 (2004), 361--397. http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lewis04a.pdf
[20]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19--24 June, 2011, Portland, Oregon, USA. 142--150. http://www.aclweb.org/anthology/P11--1015
[21]
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer Sentinel Mixture Models. CoRR Vol. abs/1609.07843 (2016). http://arxiv.org/abs/1609.07843
[22]
Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural Variational Inference for Text Processing. Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19--24, 2016. 1727--1736. http://jmlr.org/proceedings/papers/v48/miao16.html
[23]
Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, FL, USA, December 2--5, 2012. 234--239.
[24]
T A Mikolov. 2012. Statistical Language Models Based on Neural Networks. Presentation at Google, Mountain View, 2nd April (2012).
[25]
Andriy Mnih and Karol Gregor. 2014. Neural Variational Inference and Learning in Belief Networks. (2014), 1791--1799. http://jmlr.org/proceedings/papers/v32/mnih14.html
[26]
Craig G. Nevill-Manning and Ian H. Witten. 1997. Identifying Hierarchical Structure in Sequences: A linear-time algorithm. CoRR Vol. cs.AI/9709102 (1997). http://arxiv.org/abs/cs.AI/9709102
[27]
Rainer W. Paine and Jun Tani. 2005. How Hierarchical Control Self-organizes in Artificial Adaptive Systems. Adaptive Behaviour, Vol. 13, 3 (2005), 211--225.
[28]
G. Parisi. 1986. Asymmetric neural networks and the process of learning. Journal of Physics A General Physics Vol. 19, 11 (1986), L675.
[29]
Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational Inference with Normalizing Flows. (2015), 1530--1538. http://jmlr.org/proceedings/papers/v37/rezende15.html
[30]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. (2014), 1278--1286. http://jmlr.org/proceedings/papers/v32/rezende14.html
[31]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 323, 6088 (1986), 533--536.
[32]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015. 379--389. http://aclweb.org/anthology/D/D15/D15--1044.pdf
[33]
Ruslan Salakhutdinov. 2009. Learning Deep Generative Models. 361--385.
[34]
Jürgen Schmidhuber. 1992. Learning Complex, Extended Sequences Using the Principle of History Compression. Vol. Vol. 4. 234--242 pages.
[35]
Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona, USA. 3776--3784. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11957
[36]
Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca A. Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer. 2000. Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. CoRR Vol. cs.CL/0006023 (2000). http://arxiv.org/abs/cs.CL/0006023
[37]
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 2015. End-To-End Memory Networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7--12, 2015, Montreal, Quebec, Canada. 2440--2448. http://papers.nips.cc/paper/5846-end-to-end-memory-networks
[38]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 3104--3112. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks
[39]
Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016. 287--296.
[40]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent Neural Network Regularization. CoRR Vol. abs/1409.2329 (2014). http://arxiv.org/abs/1409.2329
[41]
Qi Zhang, Yeyun Gong, Jindou Wu, Haoran Huang, and Xuanjing Huang. 2016. Retweet Prediction with Attention-based Deep Neural Network Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016. 75--84.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dual memory mechanism
  2. hierarchical deep generative models
  3. inference and learning
  4. sequence modeling

Qualifiers

  • Research-article

Funding Sources

  • National Nature Science Foundation of China
  • National Key Research and Development Program of China

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 258
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media