Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397271.3401250acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

Published: 25 July 2020 Publication History
  • Get Citation Alerts
  • Abstract

    A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

    Supplementary Material

    MP4 File (3397271.3401250.mp4)
    A representation video for SIGIR 2020. Our paper title is GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning.

    References

    [1]
    He He, Derek Chen, Anusha Balakrishnan, and Percy Liang. 2018. Decoupling Strategy and Generation in Negotiation Dialogues. In EMNLP. 2333--2343.
    [2]
    Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul A Crook, Y-Lan Boureau, and Jason Weston. 2019. Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue. In EMNLP. 1951--1961.
    [3]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [4]
    Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or No Deal? End-to-End Learning of Negotiation Dialogues. In EMNLP. 2443--2453.
    [5]
    Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Diversity-Promoting Objective Function for Neural Conversation Models. In NAACL-HLT. Association for Computational Linguistics, 110--119.
    [6]
    Bing Liu and Ian Lane. 2017. Iterative policy learning in end-to-end trainable task-oriented neural dialog models. In ASRU. IEEE, 482--489.
    [7]
    Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention. In IJCAI. 4244--4250.
    [8]
    Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937.
    [9]
    Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, and Qing He. 2019 a. Policy Gradients for Contextual Recommendations. In WWW. 1421--1431.
    [10]
    Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hualin He, Qing He, and Pingzhong Tang. 2019 b. Policy optimization with model-based explorations. In AAAI, Vol. 33. 4675--4682.
    [11]
    Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, and Rosalind Picard. 2019. Hierarchical reinforcement learning for open-domain dialog. arXiv preprint arXiv:1909.07547 (2019).
    [12]
    Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models. (2016).
    [13]
    Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI. 3295--3301.
    [14]
    Zhiliang Tian, Rui Yan, Lili Mou, Yiping Song, Yansong Feng, and Dongyan Zhao. 2017. How to make context more useful? an empirical study on context-aware neural conversational models. In ACL (Volume 2: Short Papers). 231--236.
    [15]
    Chen Xing, Yu Wu, Wei Wu, Yalou Huang, and Ming Zhou. 2018. Hierarchical recurrent attention network for response generation. In AAAI.
    [16]
    Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, and Zhoujun Li. 2017. Building task-oriented dialogue systems for online shopping. In AAAI.
    [17]
    Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In NAACL. 1480--1489.
    [18]
    Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, and Xueqi Cheng. 2019. ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation. In ACL. 3721--3730.

    Cited By

    View all
    • (2024)Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in ChatbotsAI10.3390/ai50200415:2(803-841)Online publication date: 4-Jun-2024
    • (2024)A Target-Driven Planning Approach for Goal-Directed Dialog SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324207135:8(10475-10487)Online publication date: Aug-2024
    • (2024)Learning from Failure: Towards Developing a Disease Diagnosis Assistant That Also Learns from Unsuccessful DiagnosesCognitive Computation10.1007/s12559-024-10274-416:5(2222-2240)Online publication date: 27-Jun-2024
    • Show More Cited By

    Index Terms

    1. GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2020
      2548 pages
      ISBN:9781450380164
      DOI:10.1145/3397271
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 July 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dialogue system
      2. goal-oriented chatbot
      3. reinforcement learning

      Qualifiers

      • Short-paper

      Funding Sources

      • National Natural Science Foundation of China

      Conference

      SIGIR '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)49
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in ChatbotsAI10.3390/ai50200415:2(803-841)Online publication date: 4-Jun-2024
      • (2024)A Target-Driven Planning Approach for Goal-Directed Dialog SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324207135:8(10475-10487)Online publication date: Aug-2024
      • (2024)Learning from Failure: Towards Developing a Disease Diagnosis Assistant That Also Learns from Unsuccessful DiagnosesCognitive Computation10.1007/s12559-024-10274-416:5(2222-2240)Online publication date: 27-Jun-2024
      • (2023)Hierarchical diffusion for offline decision makingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619235(20035-20064)Online publication date: 23-Jul-2023
      • (2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
      • (2023)Goal— oriented conversational bot for employment domainTechnical Sciences10.31648/ts.933326Online publication date: 8-Nov-2023
      • (2023)Confident Action Decision via Hierarchical Policy Learning for Conversational RecommendationProceedings of the ACM Web Conference 202310.1145/3543507.3583536(1386-1395)Online publication date: 30-Apr-2023
      • (2023)Toward Symptom Assessment Guided Symptom Investigation and Disease DiagnosisIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32368974:6(1752-1766)Online publication date: Dec-2023
      • (2022)A knowledge infused context driven dialogue agent for disease diagnosis using hierarchical reinforcement learningKnowledge-Based Systems10.1016/j.knosys.2022.108292242(108292)Online publication date: Apr-2022
      • (2022)Design and Development of Chatbot Based on Reinforcement LearningMachine Learning Algorithms for Signal and Image Processing10.1002/9781119861850.ch12(219-229)Online publication date: 18-Nov-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media