research-article

TopicAns: Topic-informed Architecture for Answer Recommendation on Technical Q&A Site

Authors: Yuanhang Yang, Wei He, Cuiyun Gao, Zenglin Xu, Xin Xia, and Chuanyi LiuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 1

Article No.: 20, Pages 1 - 25

https://doi.org/10.1145/3607189

Published: 24 November 2023 Publication History

Abstract

Technical Q&A sites, such as Stack Overflow and Ask Ubuntu, have been widely utilized by software engineers to seek support for development challenges. However, not all the raised questions get instant feedback, and the retrieved answers can vary in quality. The users can hardly avoid spending much time before solving their problems. Prior studies propose approaches to automatically recommend answers for the question posts on technical Q&A sites. However, the lengthiness and the lack of background knowledge issues limit the performance of answer recommendation on these sites. The irrelevant sentences in the posts may introduce noise to the semantics learning and prevent neural models from capturing the gist of texts. The lexical gap between question and answer posts further misleads current models to make failure recommendations. From this end, we propose a novel neural network named TopicAns for answer selection on technical Q&A sites. TopicAns aims at learning high-quality representations for the posts in Q&A sites with a neural topic model and a pre-trained model. This involves three main steps: (1) generating topic-aware representations of Q&A posts with the neural topic model, (2) incorporating the corpus-level knowledge from the neural topic model to enhance the deep representations generated by the pre-trained language model, and (3) determining the most suitable answer for a given query based on the topic-aware representation and the deep representation. Moreover, we propose a two-stage training technique to improve the stability of our model. We conduct comprehensive experiments on four benchmark datasets to verify our proposed TopicAns’s effectiveness. Experiment results suggest that TopicAns consistently outperforms state-of-the-art techniques by over 30% in terms of Precision@1.

References

[1]

Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. 2008. Knowledge sharing and Yahoo answers: Everyone knows something. In Proceedings of the 17th International Conference on World Wide Web. ACM, 665–674.

Digital Library

[2]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993–1022.

[3]

Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2019. An empirical assessment of best-answer prediction models in technical Q&A sites. Empir. Softw. Eng. 24, 2 (2019), 854–901.

Digital Library

[4]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794.

Digital Library

[5]

Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Ricardo A. Baeza-Yates, Nivio Ziviani, Gary Marchionini, Alistair Moffat, and John Tait (Eds.). ACM, 400–407.

Digital Library

[6]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Conference of the Association for Computational Linguistics, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2978–2988.

[7]

Yang Deng, Wai Lam, Yuexiang Xie, Daoyuan Chen, Yaliang Li, Min Yang, and Ying Shen. 2020. Joint learning of answer selection and answer summary generation in community question answering. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. AAAI Press, 7651–7658.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4171–4186.

[9]

Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 813–820.

[10]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020(Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.

[11]

Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (1997), 119–139.

Digital Library

[12]

Zhipeng Gao, Xin Xia, David Lo, and John Grundy. 2020. Technical Q8A site answer recommendation via question boosting. ACM Trans. Softw. Eng. Methodol. 30, 1 (2020), 1–34.

Digital Library

[13]

Zhipeng Gao, Xin Xia, David Lo, John Grundy, Xindong Zhang, and Zhenchang Xing. 2023. I know what you are searching for: Code snippet recommendation from stack overflow posts. ACM Trans. Softw. Eng. Methodol. 32, 3 (2023), 1–42.

Digital Library

[14]

Junda He, Zhou Xin, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, and David Lo. 2023. Representation learning for stack overflow posts: How far are we? CoRR abs/2303.06853 (2023).

[15]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. ACM, 2333–2338.

Digital Library

[16]

Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In Proceedings of the 8th International Conference on Learning Representations. OpenReview.net.

[17]

Zongcheng Ji, Fei Xu, Bin Wang, and Ben He. 2012. Question-answer topic model for question retrieval in community question answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 2471–2474.

Digital Library

[18]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 6769–6781.

[19]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.).

[20]

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations. OpenReview.net.

[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 1106–1114.

[22]

Yishu Miao, Edward Grefenstette, and Phil Blunsom. 2017. Discovering discrete latent topics with neural variational inference. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 2410–2419.

[23]

Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the 33rd International Conference on Machine Learning. JMLR.org, 1727–1736.

Digital Library

[24]

Saikat Mondal, C. M. Khaled Saifullah, Avijit Bhattacharjee, Mohammad Masudur Rahman, and Chanchal K. Roy. 2021. Early detection and guidelines to improve unanswered questions on stack overflow. In Proceedings of the 14th Innovations in Software Engineering Conference, Durga Prasad Mohapatra, Samaresh Mishra, Tony Clark, Alpana Dubey, Richa Sharma, and Lov Kumar (Eds.). ACM, 9:1–9:11.

Digital Library

[25]

Liqiang Nie, Xiaochi Wei, Dongxiang Zhang, Xiang Wang, Zhipeng Gao, and Yi Yang. 2017. Data-driven answer selection in community QA systems. IEEE Trans. Knowl. Data Eng. 29, 6 (2017), 1186–1198.

Digital Library

[26]

Xipeng Qiu and Xuanjing Huang. 2015. Convolutional neural tensor network architecture for community-based question answering. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. AAAI Press, 1305–1311.

Digital Library

[27]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980–3990.

[28]

Andreas Rücklé, Nafise Sadat Moosavi, and Iryna Gurevych. 2019. COALA: A neural coverage-based approach for long answer selection with small data. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 31st Innovative Applications of Artificial Intelligence Conference, 9th AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 6932–6939.

[29]

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. DiSAN: Directional self-attention network for RNN/CNN-free language understanding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 5446–5455.

[30]

Xingwu Sun, Yanling Cui, Hongyin Tang, Qiuyu Zhu, Fuzheng Zhang, and Beihong Jin. 2021. TITA: A two-stage interaction and topic-aware text matching model. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tür, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, 5431–5440.

[31]

Ming Tan, Cicero Dos Santos, Bing Xiang, and Bowen Zhou. 2016. Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 464–473.

[32]

Qiongjie Tian, Peng Zhang, and Baoxin Li. 2013. Towards predicting the best answers in community-based question-answering services. In Proceedings of the 7th International Conference on Weblogs and Social Media. The AAAI Press.

[33]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Neural Information Processing Systems. 5998–6008.

[34]

Shuohang Wang and Jing Jiang. 2017. A compare-aggregate model for matching text sequences. In Proceedings of the 5th International Conference on Learning Representations. OpenReview.net.

[35]

Wei Wu, Xu Sun, and Houfeng Wang. 2018. Question condensing networks for answer selection in community question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1746–1755.

[36]

Xiaobao Wu, Chunping Li, Yan Zhu, and Yishu Miao. 2020. Short text topic modeling with topic distribution quantization and negative sampling decoder. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1772–1782.

[37]

Bowen Xu, Zhenchang Xing, Xin Xia, and David Lo. 2017. AnswerBot: Automated generation of answer summary to developers’ technical questions. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.). IEEE Computer Society, 706–716.

Digital Library

[38]

Jun Xu and Hang Li. 2007. AdaRank: A boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Wessel Kraaij, Arjen P. de Vries, Charles L. A. Clarke, Norbert Fuhr, and Noriko Kando (Eds.). ACM, 391–398.

Digital Library

[39]

Yi Xu, Hai Zhao, and Zhuosheng Zhang. 2021. Topic-aware multi-turn dialogue modeling. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, 33rd Conference on Innovative Applications of Artificial Intelligence, 11th Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 14176–14184.

[40]

Wen-tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastusiak. 2013. Question answering using enhanced lexical semantic models. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 1744–1753. Retrieved from https://aclanthology.org/P13-1171/.

[41]

Seunghyun Yoon, Joongbo Shin, and Kyomin Jung. 2018. Learning to rank question-answer pairs using hierarchical recurrent encoder with latent topic clustering. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1575–1584.

[42]

Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R. Lyu, and Irwin King. 2018. Topic memory networks for short text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3120–3131.

[43]

Xiaodong Zhang, Sujian Li, Lei Sha, and Houfeng Wang. 2017. Attentive interactive neural networks for answer selection in community question answering. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI Press, 3525–3531.

[44]

Qihao Zhu, Zeyu Sun, Xiran Liang, Yingfei Xiong, and Lu Zhang. 2020. OCoR: An overlapping-aware code retriever. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 883–894. DOI:

Digital Library

Index Terms

TopicAns: Topic-informed Architecture for Answer Recommendation on Technical Q&A Site
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval

Recommendations

Question microblog identification and answer recommendation

Consulting through message-updating in social network is regarded as a popular way of information seeking. However, most questions cannot receive answers or suggestions timely, and some questions even fail to get replies. Thus, identifying microblogs ...
Read More
An answer recommendation framework for an online cancer community forum
Abstract
Health community forums are a kind of online platform to discuss various matters related to management of illness. People are increasingly searching for answers online, particularly when they are diagnosed with cancer like life-threatening ...
Read More
Technical Q8A Site Answer Recommendation via Question Boosting
Continuous Special Section: AI and SE

Software developers have heavily used online question-and-answer platforms to seek help to solve their technical problems. However, a major problem with these technical Q8A sites is “answer hungriness,” i.e., a large number of questions remain ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 1

January 2024

933 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3613536

Editor:
Mauro Pezzè
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2023

Online AM: 11 July 2023

Accepted: 05 June 2023

Revised: 28 April 2023

Received: 09 December 2022

Published in TOSEM Volume 33, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
324
Total Downloads

Downloads (Last 12 months)324
Downloads (Last 6 weeks)23

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents