research-article

Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations

Authors:

Zhen-Hua Ling, and

Xiaodan ZhuAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 19, Issue 6

Article No.: 79, Pages 1 - 13

https://doi.org/10.1145/3402884

Published: 13 October 2020 Publication History

Abstract

In this article, conditional-transforming variational autoencoders (CTVAEs) are proposed for generating diverse short text conversations. In conditional variational autoencoders (CVAEs), the prior distribution of latent variable z follows a multivariate Gaussian distribution with mean and variance modulated by the input conditions. Previous work found that this distribution tended to become condition-independent in practical applications. Thus, this article designs CTVAEs to enhance the influence of conditions in CVAEs. In a CTVAE model, the latent variable z is sampled by performing a non-linear transformation on the combination of the input conditions and the samples from a condition-independent prior distribution N (0, I). In our experiments using a Chinese Sina Weibo dataset, the CTVAE model derives z samples for decoding with better condition-dependency than that of the CVAE model. The earth mover’s distance (EMD) between the distributions of the latent variable z at the training stage, and the testing stage is also reduced by using the CTVAE model. In subjective preference tests, our proposed CTVAE model performs significantly better than CVAE and sequence-to-sequence (Seq2Seq) models on generating diverse, informative, and topic-relevant responses.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7–9, 2015. http://arxiv.org/abs/1409.0473.

[2]

Antoine Bordes and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. CoRR abs/1605.07683 (2016). arxiv:1605.07683 http://arxiv.org/abs/1605.07683.

[3]

Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 10--21.

[4]

Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1657--1668.

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems. 13042--13054.

[7]

Jiachen Du, Wenjie Li, Yulan He, Ruifeng Xu, Lidong Bing, and Xuan Wang. 2018. Variational autoregressive decoder for neural response generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3154--3163.

[8]

Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, and Michel Galley. 2018. A knowledge-grounded neural conversation model. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.

[9]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.

[10]

Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In Proceedings of the International Conference on Machine Learning. 1587--1596.

[11]

Pei Ke, Jian Guan, Minlie Huang, and Xiaoyan Zhu. 2018. Generating informative responses with controlled sentence function. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1499--1508.

[12]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7--9, 2015.

[13]

Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. 3581--3589.

Digital Library

[14]

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014. http://arxiv.org/abs/1312.6114

[15]

Xiaopeng Yang, Xiaowen Lin, Shunda Suo, and Ming Li. 2018. Generating thematic Chinese poetry using conditional variational autoencoders with hybrid decoders. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI'18), July 13-19, 2018, Stockholm, Sweden, Jé;rôme Lang (Ed.). ijcai.org, 4539--4545.

[16]

Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of NAACL-HLT. 110--119.

[17]

Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 994--1003.

[18]

Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep reinforcement learning for dialogue generation. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1192--1202.

[19]

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2122--2132.

[20]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.

[21]

Wentao Ma, Yiming Cui, Nan Shao, Su He, Weinan Zhang, Ting Liu, Shijin Wang, and Guoping Hu. 2019. TripleNet: Triple attention network for multi-turn response selection in retrieval-based chatbots. In CoNLL.

[22]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov. (2008), 2579--2605.

[23]

Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association.

[24]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf (2018).

[25]

Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Zhigang Chen, and Nitin Indurkhya. 2019. Condition-transforming variational autoencoder for conversation response generation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7215--7219.

[26]

Yossi Rubner and Carlo Tomasi. 2001. The earth mover’s distance. In Perceptual Metrics for Image Database Navigation. Springer, 13--28.

[27]

Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40, 2 (2000), 99--121.

Digital Library

[28]

Jost Schatzmann, Karl Weilhammer, Matthew N. Stuttle, and Steve J. Young. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review 21, 2 (2006), 97--126.

Digital Library

[29]

Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona 3776--3784.

[30]

Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI. 3295--3301.

[31]

Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 1577--1586.

[32]

Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, and Yusuke Miyao. 2016. Overview of the NTCIR-12 short text conversation task. In NTCIR.

[33]

Xiaoyu Shen, Hui Su, Shuzi Niu, and Vera Demberg. 2018. Improving variational encoder-decoders in dialogue generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.

[34]

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. 3483--3491.

[35]

Haoyu Song, Weinan Zhang, Yiming Cui, Dong Wang, and Ting Liu. 2019. Exploiting persona information for diverse generation of conversational responses. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019. 5190--5196.

[36]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.

Digital Library

[37]

Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).

[38]

Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gasic, Lina M. Rojas Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 438--449.

[39]

Jason D. Williams and Steve J. Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech 8 Language 21, 2 (2007), 393--422.

[40]

Yu Wu, Wei Wu, Dejian Yang, Can Xu, and Zhoujun Li. 2018. Neural response generation with dynamic vocabularies. In Thirty-Second AAAI Conference on Artificial Intelligence.

[41]

Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In AAAI. 3351--3357.

[42]

Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016. 55--64.

Digital Library

[43]

Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, 776--791.

[44]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 5754--5764.

[45]

Steve J. Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE 101, 5 (2013), 1160--1179.

[46]

Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 654--664.

[47]

Ganbin Zhou, Ping Luo, Rongyu Cao, Fen Lin, Bo Chen, and Qing He. 2017. Mechanism-aware neural machine for dialogue response generation. In AAAI. 3400--3407.

[48]

Xianda Zhou and William Yang Wang. 2018. MojiTalk: Generating emotional responses at scale. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1128--1137.

[49]

Qingfu Zhu, Lei Cui, Weinan Zhang, Furu Wei, and Ting Liu. 2019. Retrieval-enhanced adversarial training for neural response generation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers. 3763--3773. https://www.aclweb.org/anthology/P19-1366/.

Cited By

Chen WTian JFan CLi YHe HJin YWilliams BChen YNeville J(2023)Preference-controlled multi-objective reinforcement learning for conditional text generationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26490(12662-12672)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i11.26490
Fukumoto KTakeuchi RNadamoto A(2022)Method for Evaluating Quality of Automatically Generated Product DescriptionsProceedings of the 11th International Symposium on Information and Communication Technology10.1145/3568562.3568583(52-58)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1145/3568562.3568583
Vo T(2022)GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clusteringNeural Computing and Applications10.1007/s00521-021-06563-w34:6(4321-4341)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s00521-021-06563-w

Index Terms

Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation
WWW '21: Proceedings of the Web Conference 2021

Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, ...
Read More
VAEPP: Variational Autoencoder with a Pull-Back Prior
Neural Information Processing
Abstract
Many approaches to training generative models by distinct training objectives have been proposed in the past. Variational Autoencoder (VAE) is an outstanding model of them based on log-likelihood. In this paper, we propose a novel learnable prior, ...
Read More
FVAE: a regularized variational autoencoder using the Fisher criterion
Abstract
As a deep generative model, the variational autoencoder (VAE) is widely applied to solve problems of insufficient samples and imbalanced labels. In the VAE, the distribution of latent variables affects the quality of the generated samples. To ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 19, Issue 6

November 2020

277 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3426881

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2020

Accepted: 01 May 2020

Revised: 01 March 2020

Received: 01 July 2019

Published in TALLIP Volume 19, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Nature Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
201
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Chen WTian JFan CLi YHe HJin YWilliams BChen YNeville J(2023)Preference-controlled multi-objective reinforcement learning for conditional text generationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26490(12662-12672)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i11.26490
Fukumoto KTakeuchi RNadamoto A(2022)Method for Evaluating Quality of Automatically Generated Product DescriptionsProceedings of the 11th International Symposium on Information and Communication Technology10.1145/3568562.3568583(52-58)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1145/3568562.3568583
Vo T(2022)GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clusteringNeural Computing and Applications10.1007/s00521-021-06563-w34:6(4321-4341)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s00521-021-06563-w

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents