Abstract
When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. However, this study ignored the helpful information in their corresponding problem descriptions. Therefore, we propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Then we formalize the title generation for different programming languages as separate but related tasks and utilize multi-task learning to solve these tasks. Later we fine-tune the pre-trained language model CodeT5 to automatically generate the titles. Unfortunately, the inconsistent inputs and optimization objectives between the pre-training task and our investigated task may make fine-tuning hard to fully explore the knowledge of the pre-trained model. To solve this issue, SOTitle+ further prompt-tunes CodeT5 with hybrid prompts (i.e., mixture of hard and soft prompts). To verify the effectiveness of SOTitle+, we construct a large-scale high-quality corpus from recent data dumps shared by Stack Overflow. Our corpus includes 179,119 high-quality question posts for six popular programming languages. Experimental results show that SOTitle+ can significantly outperform four state-of-the-art baselines in both automatic evaluation and human evaluation. In addition, our ablation studies also confirm the effectiveness of component settings (such as bi-modal information, prompt learning, hybrid prompts, and multi-task learning) of SOTitle+. Our work indicates that considering bi-modal information and prompt learning in Stack Overflow title generation is a promising exploration direction.
Similar content being viewed by others
Data Availibility Statements
The datasets generated during and analyzed during the current study are available in the Github repository (https://github.com/shaoyuyoung/SOTitlePlus).
Notes
https://archive.org/download/stackexchange, downloaded in March 2023
https://stackoverflow.com/tags?tab=popular, accessed in March 2023
An example used to clarify this process can be found in https://github.com/shaoyuyoung/SOTitlePlus/blob/main/embeddings.md
https://platform.openai.com/docs/api-reference. The version of ChatGPT we use is GPT-3.5-turbo.
References
Ahmad W, Chakraborty S, Ray B, Chang KW (2020) A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 4998–5007
Ahmad W, Chakraborty S, Ray B, Chang KW (2021) Unified pre-training for program understanding and generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 2655–2668
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 850–858
Arora P, Ganguly D, Jones GJ (2015) The good, the bad and their kins: Identifying questions with negative scores in stackoverflow. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 1232–1239
Banerjee S, Lavie A (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al (2020) Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp 1877–1901
Cao K, Chen C, Baltes S, Treude C, Chen X (2021) Automated query reformulation for efficient search based on query logs from stack overflow. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp 1273–1285
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp 551–561
Cohen J (1960) A coefficient of agreement for nominal scales. Educational and psychological measurement 20(1):37–46
Correa D, Sureka A (2013) Fit or unfit: analysis and prediction of’closed questions’ on stack overflow. In: Proceedings of the first ACM conference on Online social networks, pp 201–212
Ding N, Hu S, Zhao W, Chen Y, Liu Z, Zheng H, Sun M (2022) Openprompt: An open-source framework for prompt-learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 105–113
Duijn M, Kucera A, Bacchelli A (2015) Quality questions need quality code: Classifying code fragments on stack overflow. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, IEEE, pp 410–413
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: A comprehensive survey. Expert systems with applications 165:113679
Gao Z, Xia X, Grundy J, Lo D, Li YF (2020) Generating question titles for stack overflow from mined code snippets. ACM Transactions on Software Engineering and Methodology (TOSEM) 29(4):1–37
Gao Z, Xia X, Lo D, Grundy J, Li YF (2021) Code2que: A tool for improving question titles from mined code snippets in stack overflow. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1525–1529
Gao Z, Xia X, Lo D, Grundy J, Zhang X, Xing Z (2023) I know what you are searching for: Code snippet recommendation from stack overflow posts. ACM Transactions on Software Engineering and Methodology 32(3):1–42
Gros D, Sezhiyan H, Devanbu P, Yu Z (2020) Code to comment “translation”: Data, metrics, baselining & evaluation. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 746–757
Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25(3):2179–2217
Huang Q, Yuan Z, Xing Z, Xu X, Zhu L, Lu Q (2022) Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp 1–13
Husain H, Wu HH, Gazit T, Allamanis M, Brockschmidt M (2019) Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436
Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 510–520
Iyer S, Konstas I, Cheung A, Zettlemoyer L (2016) Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2073–2083
Jin X, Servant F (2019) What edits are done on the highly answered questions in stack overflow? an empirical study. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp 225–229
Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3045–3059
Li X, Ren X, Xue Y, Xing Z, Sun J (2023) Prediction of vulnerability characteristics based on vulnerability description and prompt learning. 2023 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 604–615
Li Z, Wu Y, Peng B, Chen X, Sun Z, Liu Y, Yu D (2021) Secnn: A semantic cnn parser for code comment generation. Journal of Systems and Software 181:111036
Li Z, Wu Y, Peng B, Chen X, Sun Z, Liu Y, Paul D (2022) Setransformer: A transformer-based code semantic parser for code comment generation. IEEE Transactions on Reliability 72(1):258–273
LIN C (2004) Rouge: A package for automatic evaluation of summaries. In: Proc. Workshop on Text Summariation Branches Out, Post-Conference Workshop of ACL 2004
Lin H, Chen X, Chen X, Cui Z, Miao Y, Zhou S, Wang J, Su Z (2023) Gen-fl: Quality prediction-based filter for automated issue title generation. Journal of Systems and Software 195:111513
Liu C, Bao X, Zhang H, Zhang N, Hu H, Zhang X, Yan M (2023a) Improving chatgpt prompt for code generation. arXiv:2305.08360
Liu K, Yang G, Chen X, Yu C (2022) Sotitle: A transformer-based post title generation approach for stack overflow. 2022 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 577–588
Liu K, Chen X, Chen C, Xie X, Cui Z (2023) Automated question title reformulation by mining modification logs from stack overflow. IEEE Transactions on Software Engineering 49(9):4390–4410
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35
Liu Q, Liu Z, Zhu H, Fan H, Du B, Qian Y (2019a) Generating commit messages from diffs using pointer-generator network. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp 299–309
Liu X, He P, Chen W, Gao J (2019b) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 4487–4496
Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2023d) Gpt understands, too. AI Open
Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on Learning Representations
Niu C, Li C, Ng V, Chen D, Ge J, Luo B (2023) An empirical comparison of pre-trained models of source code. arXiv:2302.04026
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving low quality stack overflow post detection. In: 2014 IEEE international conference on software maintenance and evolution, IEEE, pp 541–544
Prechelt L (1998) Early stopping-but when? In: Neural Networks: Tricks of the trade, Springer, pp 55–69
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21:1–67
Schick T, Schütze H (2021) Exploiting cloze-questions for few-shot text classification and natural language inference. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp 255–269
Tóth L, Nagy B, Janthó D, Vidács L, Gyimóthy T (2019) Towards an accurate prediction of the question quality on stack overflow using a deep-learning-based nlp approach. In: Proceedings of the 14th International Conference on Software Technologies, pp 631–639
Trienes J, Balog K (2019) Identifying unclear questions in community question answering websites. In: 41st European Conference on Information Retrieval, ECIR 2019, Springer, pp 276–289
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR (2022) No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 382–394
Wang Y, Wang W, Joty S, Hoi SC (2021) Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 8696–8708
Wei B, Li Y, Li G, Xia X, Jin Z (2020) Retrieve and refine: exemplar-based neural comment generation. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp 349–360
Wilcoxon F (1992) Individual comparisons by ranking methods. Springer
Xia CS, Zhang L (2023) Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. arXiv preprint arXiv:2304.00385
Xu B, Hoang T, Sharma A, Yang C, Xia X, Lo D (2021) Post2vec: Learning distributed representations of stack overflow posts. IEEE Transactions on Software Engineering 48(9):3423–3441
Yang G, Chen X, Zhou Y, Yu C (2022) Dualsc: Automatic generation and summarization of shellcode via transformer and dual learning. 2022 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 361–372
Yang G, Liu K, Chen X, Zhou Y, Yu C, Lin H (2022) Ccgir: Information retrieval-based code comment generation method for smart contracts. Knowledge-Based Systems 237:107858
Yang G, Zhou Y, Chen X, Zhang X, Han T, Chen T (2023a) Exploitgen: Template-augmented exploit code generation based on codebert. Journal of Systems and Software 197:111577
Yang G, Zhou Y, Chen X, Zhang X, Xu Y, Han T, Chen T (2023) A syntax-guided multi-task learning approach for turducken-style code generation. Empirical Software Engineering 28(6):141
Yang J, Hauff C, Bozzon A, Houben GJ (2014) Asking the right question in collaborative q &a systems. In: Proceedings of the 25th ACM conference on Hypertext and social media, pp 179–189
Yazdaninia M, Lo D, Sami A (2021) Characterization and prediction of questions without accepted answers on stack overflow. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp 59–70
Yin P, Deng B, Chen E, Vasilescu B, Neubig G (2018) Learning to mine aligned code and natural language pairs from stack overflow. In: Proceedings of the 15th international conference on mining software repositories, pp 476–486
Zhang F, Yu X, Keung J, Li F, Xie Z, Yang Z, Ma C, Zhang Z (2022) Improving stack overflow question title generation with copying enhanced codebert model and bi-modal information. Information and Software Technology 148:106922
Zhang F, Liu J, Wan Y, Yu X, Liu X, Keung J (2023) Diverse title generation for stack overflow posts with multiple-sampling-enhanced transforme. Journal of Systems and Software 200:111672
Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34(12):5586–5609
Zhu J, Li L, Yang L, Ma X, Zuo C (2023) Automating method naming with context-aware prompt-tuning. arXiv:2303.05771
Acknowledgements
The authors would like to thank the editors and the anonymous reviewers for their insightful comments and suggestions, which can substantially improve the quality of this work. Shaoyu Yang and Xiang Chen have contributed equally to this work and they are co-first authors. Xiang Chen is the corresponding author. This work is supported in part by the National Natural Science Foundation of China (Grant no. 61202006 and 61702041) and the Innovation Training Program for College Students (Grant no. 2023214 and 2023356).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by: Sebastian Baltes.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, S., Chen, X., Liu, K. et al. Automatic bi-modal question title generation for Stack Overflow with prompt learning. Empir Software Eng 29, 63 (2024). https://doi.org/10.1007/s10664-024-10466-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10466-4