research-article

Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory

Authors:

Lei JiAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 1369 - 1378

https://doi.org/10.1145/3132847.3132952

Published: 06 November 2017 Publication History

Abstract

Deep Generative Models (DGMs) are able to extract high-level representations from massive unlabeled data and are explainable from a probabilistic perspective. Such characteristics favor sequence modeling tasks. However, it still remains a huge challenge to model sequences with DGMs. Unlike real-valued data that can be directly fed into models, sequence data consist of discrete elements and require being transformed into certain representations first. This leads to the following two challenges. First, high-level features are sensitive to small variations of inputs as well as the way of representing data. Second, the models are more likely to lose long-term information during multiple transformations. In this paper, we propose a Hierarchical Deep Generative Model With Dual Memory to address the two challenges. Furthermore, we provide a method to efficiently perform inference and learning on the model. The proposed model extends basic DGMs with an improved hierarchically organized multi-layer architecture. Besides, our model incorporates memories along dual directions, respectively denoted as broad memory and deep memory. The model is trained end-to-end by optimizing a variational lower bound on data log-likelihood using the improved stochastic variational method. We perform experiments on several tasks with various datasets and obtain excellent results. The results of language modeling show our method significantly outperforms state-of-the-art results in terms of generative performance. Extended experiments including document modeling and sentiment analysis, prove the high-effectiveness of dual memory mechanism and latent representations. Text random generation provides a straightforward perception for advantages of our model.

References

[1]

Ryan Prescott Adams, Hanna M. Wallach, and Zoubin Ghahramani. 2010. Learning the Structure of Deep Sparse Graphical Models. (2010), 1--8. http://www.jmlr.org/proceedings/papers/v9/adams10a.html

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR Vol. abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473

[3]

Yoshua Bengio. 2009. Learning Deep Architectures for AI. Vol. Vol. 2. 1--127 pages.

Digital Library

[4]

J M Bernardo, M J Bayarri, J O Berger, A P Dawid, D. Heckerman, A F M Smith, M. West, Christopher M Bishop, and Julia Lasserre. 2007. Generative or Discriminative Getting the Best of Both Worlds. Bayesian Statistics, Vol. 8, 3 (2007), 3--24.

[5]

Antoine Bordes and Jason Weston. 2016. Learning End-to-End Goal-Oriented Dialog. CoRR Vol. abs/1605.07683 (2016). http://arxiv.org/abs/1605.07683

[6]

Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Józefowicz, and Samy Bengio. 2016. Generating Sentences from a Continuous Space. (2016), 10--21. http://aclweb.org/anthology/K/K16/K16--1002.pdf

[7]

Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics Vol. 19, 2 (1993), 263--311.

Digital Library

[8]

Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. 2015. Importance Weighted Autoencoders. CoRR Vol. abs/1509.00519 (2015). http://arxiv.org/abs/1509.00519

[9]

Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, and Yoshua Bengio. 2015. A Recurrent Latent Variable Model for Sequential Data. (2015), 2980--2988. http://papers.nips.cc/paper/5653-a-recurrent-latent-variable-model-for-sequential-data

Digital Library

[10]

Dipanjan Das and Andre F T Martins. 2007. A survey on automatic text summarization. MIT Press,. 18--19 pages.

[11]

Chao Du, Jun Zhu, and Bo Zhang. 2015. Learning Deep Generative Models with Doubly Stochastic MCMC. CoRR Vol. abs/1506.04557 (2015). http://arxiv.org/abs/1506.04557

[12]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks Advances in Neural Information Processing Systems. 1019--1027.

Digital Library

[13]

Edouard Grave, Armand Joulin, and Nicolas Usunier. 2016. Improving Neural Language Models with a Continuous Cache. CoRR Vol. abs/1612.04426 (2016). http://arxiv.org/abs/1612.04426

[14]

Alex Graves. 2013. Generating Sequences With Recurrent Neural Networks. CoRR Vol. abs/1308.0850 (2013). http://arxiv.org/abs/1308.0850

[15]

Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised Learning with Deep Generative Models. (2014), 3581--3589. http://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models

Digital Library

[16]

Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. CoRR Vol. abs/1312.6114 (2013). http://arxiv.org/abs/1312.6114

[17]

Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, and Joshua B. Tenenbaum. 2015. Deep Convolutional Inverse Graphics Network. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7--12, 2015, Montreal, Quebec, Canada. 2539--2547. http://papers.nips.cc/paper/5851-deep-convolutional-inverse-graphics-network

Digital Library

[18]

Ken Lang. 1995. NewsWeeder: Learning to Filter Netnews. In Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9--12, 1995. 331--339.

Digital Library

[19]

David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research Vol. 5 (2004), 361--397. http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lewis04a.pdf

Digital Library

[20]

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19--24 June, 2011, Portland, Oregon, USA. 142--150. http://www.aclweb.org/anthology/P11--1015

Digital Library

[21]

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer Sentinel Mixture Models. CoRR Vol. abs/1609.07843 (2016). http://arxiv.org/abs/1609.07843

[22]

Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural Variational Inference for Text Processing. Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19--24, 2016. 1727--1736. http://jmlr.org/proceedings/papers/v48/miao16.html

Digital Library

[23]

Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, FL, USA, December 2--5, 2012. 234--239.

[24]

T A Mikolov. 2012. Statistical Language Models Based on Neural Networks. Presentation at Google, Mountain View, 2nd April (2012).

[25]

Andriy Mnih and Karol Gregor. 2014. Neural Variational Inference and Learning in Belief Networks. (2014), 1791--1799. http://jmlr.org/proceedings/papers/v32/mnih14.html

Digital Library

[26]

Craig G. Nevill-Manning and Ian H. Witten. 1997. Identifying Hierarchical Structure in Sequences: A linear-time algorithm. CoRR Vol. cs.AI/9709102 (1997). http://arxiv.org/abs/cs.AI/9709102

Digital Library

[27]

Rainer W. Paine and Jun Tani. 2005. How Hierarchical Control Self-organizes in Artificial Adaptive Systems. Adaptive Behaviour, Vol. 13, 3 (2005), 211--225.

[28]

G. Parisi. 1986. Asymmetric neural networks and the process of learning. Journal of Physics A General Physics Vol. 19, 11 (1986), L675.

[29]

Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational Inference with Normalizing Flows. (2015), 1530--1538. http://jmlr.org/proceedings/papers/v37/rezende15.html

Digital Library

[30]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. (2014), 1278--1286. http://jmlr.org/proceedings/papers/v32/rezende14.html

Digital Library

[31]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 323, 6088 (1986), 533--536.

[32]

Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015. 379--389. http://aclweb.org/anthology/D/D15/D15--1044.pdf

[33]

Ruslan Salakhutdinov. 2009. Learning Deep Generative Models. 361--385.

[34]

Jürgen Schmidhuber. 1992. Learning Complex, Extended Sequences Using the Principle of History Compression. Vol. Vol. 4. 234--242 pages.

[35]

Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona, USA. 3776--3784. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11957

Digital Library

[36]

Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca A. Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer. 2000. Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. CoRR Vol. cs.CL/0006023 (2000). http://arxiv.org/abs/cs.CL/0006023

Digital Library

[37]

Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 2015. End-To-End Memory Networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7--12, 2015, Montreal, Quebec, Canada. 2440--2448. http://papers.nips.cc/paper/5846-end-to-end-memory-networks

Digital Library

[38]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 3104--3112. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks

Digital Library

[39]

Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016. 287--296.

Digital Library

[40]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent Neural Network Regularization. CoRR Vol. abs/1409.2329 (2014). http://arxiv.org/abs/1409.2329

[41]

Qi Zhang, Yeyun Gong, Jindou Wu, Haoran Huang, and Xuanjing Huang. 2016. Retweet Prediction with Attention-based Deep Neural Network Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016. 75--84.

Digital Library

Index Terms

Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Learning in probabilistic graphical models
        Bayesian network models
        Latent variable models
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Language models

Recommendations

Auxiliary deep generative models
ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational ...
Asymmetric deep generative models

Amortized variational inference, whereby the inferred latent variable posterior distributions are parameterized by means of neural network functions, has invigorated a new wave of innovation in the field of generative latent variable modeling, giving ...
Combining deep generative and discriminative models for Bayesian semi-supervised learning
Highlights
- Modelling framwork that enables Bayesian semi-supervised learning.
- Bayesian ...
Abstract
Generative models can be used for a wide range of tasks, and have the appealing ability to learn from both labelled and unlabelled data. In contrast, discriminative models cannot learn from unlabelled data, but tend to outperform their ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Nature Science Foundation of China
National Key Research and Development Program of China

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
258
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents