Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations

Published: 13 October 2020 Publication History
  • Get Citation Alerts
  • Abstract

    In this article, conditional-transforming variational autoencoders (CTVAEs) are proposed for generating diverse short text conversations. In conditional variational autoencoders (CVAEs), the prior distribution of latent variable z follows a multivariate Gaussian distribution with mean and variance modulated by the input conditions. Previous work found that this distribution tended to become condition-independent in practical applications. Thus, this article designs CTVAEs to enhance the influence of conditions in CVAEs. In a CTVAE model, the latent variable z is sampled by performing a non-linear transformation on the combination of the input conditions and the samples from a condition-independent prior distribution N (0, I). In our experiments using a Chinese Sina Weibo dataset, the CTVAE model derives z samples for decoding with better condition-dependency than that of the CVAE model. The earth mover’s distance (EMD) between the distributions of the latent variable z at the training stage, and the testing stage is also reduced by using the CTVAE model. In subjective preference tests, our proposed CTVAE model performs significantly better than CVAE and sequence-to-sequence (Seq2Seq) models on generating diverse, informative, and topic-relevant responses.

    References

    [1]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7–9, 2015. http://arxiv.org/abs/1409.0473.
    [2]
    Antoine Bordes and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. CoRR abs/1605.07683 (2016). arxiv:1605.07683 http://arxiv.org/abs/1605.07683.
    [3]
    Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 10--21.
    [4]
    Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1657--1668.
    [5]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [6]
    Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems. 13042--13054.
    [7]
    Jiachen Du, Wenjie Li, Yulan He, Ruifeng Xu, Lidong Bing, and Xuan Wang. 2018. Variational autoregressive decoder for neural response generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3154--3163.
    [8]
    Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, and Michel Galley. 2018. A knowledge-grounded neural conversation model. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
    [9]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.
    [10]
    Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In Proceedings of the International Conference on Machine Learning. 1587--1596.
    [11]
    Pei Ke, Jian Guan, Minlie Huang, and Xiaoyan Zhu. 2018. Generating informative responses with controlled sentence function. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1499--1508.
    [12]
    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7--9, 2015.
    [13]
    Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. 3581--3589.
    [14]
    Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014. http://arxiv.org/abs/1312.6114
    [15]
    Xiaopeng Yang, Xiaowen Lin, Shunda Suo, and Ming Li. 2018. Generating thematic Chinese poetry using conditional variational autoencoders with hybrid decoders. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI'18), July 13-19, 2018, Stockholm, Sweden, Jé;rôme Lang (Ed.). ijcai.org, 4539--4545.
    [16]
    Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of NAACL-HLT. 110--119.
    [17]
    Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 994--1003.
    [18]
    Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep reinforcement learning for dialogue generation. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1192--1202.
    [19]
    Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2122--2132.
    [20]
    Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.
    [21]
    Wentao Ma, Yiming Cui, Nan Shao, Su He, Weinan Zhang, Ting Liu, Shijin Wang, and Guoping Hu. 2019. TripleNet: Triple attention network for multi-turn response selection in retrieval-based chatbots. In CoNLL.
    [22]
    Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov. (2008), 2579--2605.
    [23]
    Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association.
    [24]
    Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf (2018).
    [25]
    Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Zhigang Chen, and Nitin Indurkhya. 2019. Condition-transforming variational autoencoder for conversation response generation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7215--7219.
    [26]
    Yossi Rubner and Carlo Tomasi. 2001. The earth mover’s distance. In Perceptual Metrics for Image Database Navigation. Springer, 13--28.
    [27]
    Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40, 2 (2000), 99--121.
    [28]
    Jost Schatzmann, Karl Weilhammer, Matthew N. Stuttle, and Steve J. Young. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review 21, 2 (2006), 97--126.
    [29]
    Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona 3776--3784.
    [30]
    Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI. 3295--3301.
    [31]
    Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 1577--1586.
    [32]
    Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, and Yusuke Miyao. 2016. Overview of the NTCIR-12 short text conversation task. In NTCIR.
    [33]
    Xiaoyu Shen, Hui Su, Shuzi Niu, and Vera Demberg. 2018. Improving variational encoder-decoders in dialogue generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
    [34]
    Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. 3483--3491.
    [35]
    Haoyu Song, Weinan Zhang, Yiming Cui, Dong Wang, and Ting Liu. 2019. Exploiting persona information for diverse generation of conversational responses. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019. 5190--5196.
    [36]
    Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.
    [37]
    Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).
    [38]
    Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gasic, Lina M. Rojas Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 438--449.
    [39]
    Jason D. Williams and Steve J. Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech 8 Language 21, 2 (2007), 393--422.
    [40]
    Yu Wu, Wei Wu, Dejian Yang, Can Xu, and Zhoujun Li. 2018. Neural response generation with dynamic vocabularies. In Thirty-Second AAAI Conference on Artificial Intelligence.
    [41]
    Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In AAAI. 3351--3357.
    [42]
    Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016. 55--64.
    [43]
    Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, 776--791.
    [44]
    Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 5754--5764.
    [45]
    Steve J. Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE 101, 5 (2013), 1160--1179.
    [46]
    Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 654--664.
    [47]
    Ganbin Zhou, Ping Luo, Rongyu Cao, Fen Lin, Bo Chen, and Qing He. 2017. Mechanism-aware neural machine for dialogue response generation. In AAAI. 3400--3407.
    [48]
    Xianda Zhou and William Yang Wang. 2018. MojiTalk: Generating emotional responses at scale. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1128--1137.
    [49]
    Qingfu Zhu, Lei Cui, Weinan Zhang, Furu Wei, and Ting Liu. 2019. Retrieval-enhanced adversarial training for neural response generation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers. 3763--3773. https://www.aclweb.org/anthology/P19-1366/.

    Cited By

    View all
    • (2023)Preference-controlled multi-objective reinforcement learning for conditional text generationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26490(12662-12672)Online publication date: 7-Feb-2023
    • (2022)Method for Evaluating Quality of Automatically Generated Product DescriptionsProceedings of the 11th International Symposium on Information and Communication Technology10.1145/3568562.3568583(52-58)Online publication date: 1-Dec-2022
    • (2022)GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clusteringNeural Computing and Applications10.1007/s00521-021-06563-w34:6(4321-4341)Online publication date: 1-Mar-2022

    Index Terms

    1. Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 6
      November 2020
      277 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3426881
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 October 2020
      Accepted: 01 May 2020
      Revised: 01 March 2020
      Received: 01 July 2019
      Published in TALLIP Volume 19, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Neural network
      2. conversation
      3. text generation
      4. variational autoencoder

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • National Nature Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Preference-controlled multi-objective reinforcement learning for conditional text generationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26490(12662-12672)Online publication date: 7-Feb-2023
      • (2022)Method for Evaluating Quality of Automatically Generated Product DescriptionsProceedings of the 11th International Symposium on Information and Communication Technology10.1145/3568562.3568583(52-58)Online publication date: 1-Dec-2022
      • (2022)GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clusteringNeural Computing and Applications10.1007/s00521-021-06563-w34:6(4321-4341)Online publication date: 1-Mar-2022

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media