Abstract
Dialogue generation task is one of the popular research topics in the field of natural language processing. However, how to improve the quality of model generated responses with the user feedback in the dialogue generation task is still one of the difficulties in the research. In this paper, we propose a dialogue generation method based on user feedback by modeling the likeability of user feedback and optimizing the model by using Reinforcement Learning from Human Feedback (RLHF) techniques to generate more likeable responses to users. We also introduce commonsense inference to help the model better understand the knowledge context and user intent. Finally, we used contrastive search in the decoding stage to make the generated responses more diverse. To verify the effectiveness of the model, we conducted some experiments and compared our model with the baseline models. The experiment results show that our approach outperforms the baseline models in terms of automatic evaluation. The final evaluation results show that our model ranks 2nd in the NLPCC 2023 Shared Task 9 Track 2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jaques, N., et al.: Human-centric dialog training via offline reinforcement learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 3985–4003 (2020)
Gao, X., Zhang, Y., Galley, M., Brockett, C., Dolan, B.: Dialogue response ranking training with large-scale human feedback data. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 386–395 (2020)
Zhang, S., et al.: Multi-action dialog policy learning from logged user feedback. arXiv preprint arXiv:2302.13505 (2023)
Ziegler, D.M., et al.: Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019)
Wu, Z., Bi, W., Li, X., Kong, L., Kao, B.: Lexical knowledge internalization for neural dialog generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022, pp. 7945–7958 (2022)
Liu, Z., Wang, H., Niu, Z.-Y., Wu, H., Che, W., Liu, T.: Towards conversational recommendation over multi-type dialogs. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 1036–1049 (2020)
Wu, Z., et al.: Fine-grained human feedback gives better rewards for language model training. arXiv preprint arXiv:2306.01693 (2023)
Sap, M., et al.: Atomic: an atlas of machine commonsense for if-then reasoning. Proc. AAAI Conf. Artif. Intell. 33, 3027–3035 (2019)
Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: COMET: commonsense transformers for automatic knowledge graph construction. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 4762–4779 (2019)
Sabour, S., Zheng, C., Huang, M.: CEM: commonsense-aware empathetic response generation. Proc. AAAI Conf. Artif. Intell. 36, 11229–11237 (2022)
Sharma, A., Lin, I.W., Miner, A.S., Atkins, D.C., Althoff, T.: Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach. In: Proceedings of the Web Conference 2021, pp. 194–205 (2021)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., Collier, N.: A contrastive framework for neural text generation. In: NeurIPS (2022)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 7871–7880 (2020)
Wang, Y., et al.: A large-scale Chinese short-text conversation dataset. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, pp. 91–103. Springer (2020)
Yuxian, Gu., et al.: EVA2.0: investigating open-domain Chinese dialogue systems with large-scale pre-training. Mach. Intell. Res. 20(2), 207–219 (2023)
Acknowledgments
The work was supported by National Natural Science Foundation of China (62172086, 62272092).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, M., Wang, D., Feng, S., Zhang, Y. (2023). Generating Better Responses from User Feedback via Reinforcement Learning and Commonsense Inference. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14304. Springer, Cham. https://doi.org/10.1007/978-3-031-44699-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-44699-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44698-6
Online ISBN: 978-3-031-44699-3
eBook Packages: Computer ScienceComputer Science (R0)