Automatic bi-modal question title generation for Stack Overflow with prompt learning

Yang, Shaoyu; Chen, Xiang; Liu, Ke; Yang, Guang; Yu, Chi

doi:10.1007/s10664-024-10466-4

Automatic bi-modal question title generation for Stack Overflow with prompt learning

Published: 03 May 2024

Volume 29, article number 63, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Shaoyu Yang¹,
Xiang Chen ORCID: orcid.org/0000-0002-1180-3891¹,
Ke Liu¹,
Guang Yang² &
…
Chi Yu¹

338 Accesses
Explore all metrics

Abstract

When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. However, this study ignored the helpful information in their corresponding problem descriptions. Therefore, we propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Then we formalize the title generation for different programming languages as separate but related tasks and utilize multi-task learning to solve these tasks. Later we fine-tune the pre-trained language model CodeT5 to automatically generate the titles. Unfortunately, the inconsistent inputs and optimization objectives between the pre-training task and our investigated task may make fine-tuning hard to fully explore the knowledge of the pre-trained model. To solve this issue, SOTitle+ further prompt-tunes CodeT5 with hybrid prompts (i.e., mixture of hard and soft prompts). To verify the effectiveness of SOTitle+, we construct a large-scale high-quality corpus from recent data dumps shared by Stack Overflow. Our corpus includes 179,119 high-quality question posts for six popular programming languages. Experimental results show that SOTitle+ can significantly outperform four state-of-the-art baselines in both automatic evaluation and human evaluation. In addition, our ablation studies also confirm the effectiveness of component settings (such as bi-modal information, prompt learning, hybrid prompts, and multi-task learning) of SOTitle+. Our work indicates that considering bi-modal information and prompt learning in Stack Overflow title generation is a promising exploration direction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Question Generation from Code Snippets and Programming Error Messages

Leveraging meta-data of code for adapting prompt tuning for code summarization

Article 23 December 2024

Automatic title completion for Stack Overflow posts and GitHub issues

Article 25 July 2024

Data Availibility Statements

The datasets generated during and analyzed during the current study are available in the Github repository (https://github.com/shaoyuyoung/SOTitlePlus).

Notes

https://stackoverflow.com/help/how-to-ask
https://www.youtube.com/watch?v=_KgUISAT74M
https://github.com/shaoyuyoung/SOTitlePlus
https://zenodo.org/records/10656359
https://archive.org/download/stackexchange, downloaded in March 2023
https://stackoverflow.com/tags?tab=popular, accessed in March 2023
https://stackoverflow.com/questions/51560850
https://lxml.de/
An example used to clarify this process can be found in https://github.com/shaoyuyoung/SOTitlePlus/blob/main/embeddings.md
https://stackoverflow.com/questions/1478248
https://github.com/Maluuba/nlg-eval
https://pytorch.org/
https://github.com/huggingface/transformers
https://github.com/thunlp/OpenPrompt
https://stackoverflow.com/questions/66287470
https://github.com/shaoyuyoung/SOTitlePlus/blob/main/Appendix.md
https://platform.openai.com/docs/api-reference. The version of ChatGPT we use is GPT-3.5-turbo.
https://stackoverflow.com/questions/51523765
https://stackoverflow.com/questions/51579215
https://stackoverflow.com/questions/59391560
https://console.cloud.google.com/marketplace/details/github/github-repos

References

Ahmad W, Chakraborty S, Ray B, Chang KW (2020) A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 4998–5007
Ahmad W, Chakraborty S, Ray B, Chang KW (2021) Unified pre-training for program understanding and generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 2655–2668
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 850–858
Arora P, Ganguly D, Jones GJ (2015) The good, the bad and their kins: Identifying questions with negative scores in stackoverflow. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 1232–1239
Banerjee S, Lavie A (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al (2020) Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp 1877–1901
Cao K, Chen C, Baltes S, Treude C, Chen X (2021) Automated query reformulation for efficient search based on query logs from stack overflow. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp 1273–1285
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp 551–561
Cohen J (1960) A coefficient of agreement for nominal scales. Educational and psychological measurement 20(1):37–46
Article Google Scholar
Correa D, Sureka A (2013) Fit or unfit: analysis and prediction of’closed questions’ on stack overflow. In: Proceedings of the first ACM conference on Online social networks, pp 201–212
Ding N, Hu S, Zhao W, Chen Y, Liu Z, Zheng H, Sun M (2022) Openprompt: An open-source framework for prompt-learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 105–113
Duijn M, Kucera A, Bacchelli A (2015) Quality questions need quality code: Classifying code fragments on stack overflow. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, IEEE, pp 410–413
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: A comprehensive survey. Expert systems with applications 165:113679
Article Google Scholar
Gao Z, Xia X, Grundy J, Lo D, Li YF (2020) Generating question titles for stack overflow from mined code snippets. ACM Transactions on Software Engineering and Methodology (TOSEM) 29(4):1–37
Article Google Scholar
Gao Z, Xia X, Lo D, Grundy J, Li YF (2021) Code2que: A tool for improving question titles from mined code snippets in stack overflow. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1525–1529
Gao Z, Xia X, Lo D, Grundy J, Zhang X, Xing Z (2023) I know what you are searching for: Code snippet recommendation from stack overflow posts. ACM Transactions on Software Engineering and Methodology 32(3):1–42
Article Google Scholar
Gros D, Sezhiyan H, Devanbu P, Yu Z (2020) Code to comment “translation”: Data, metrics, baselining & evaluation. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 746–757
Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25(3):2179–2217
Article Google Scholar
Huang Q, Yuan Z, Xing Z, Xu X, Zhu L, Lu Q (2022) Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp 1–13
Husain H, Wu HH, Gazit T, Allamanis M, Brockschmidt M (2019) Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436
Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 510–520
Iyer S, Konstas I, Cheung A, Zettlemoyer L (2016) Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2073–2083
Jin X, Servant F (2019) What edits are done on the highly answered questions in stack overflow? an empirical study. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp 225–229
Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3045–3059
Li X, Ren X, Xue Y, Xing Z, Sun J (2023) Prediction of vulnerability characteristics based on vulnerability description and prompt learning. 2023 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 604–615
Google Scholar
Li Z, Wu Y, Peng B, Chen X, Sun Z, Liu Y, Yu D (2021) Secnn: A semantic cnn parser for code comment generation. Journal of Systems and Software 181:111036
Article Google Scholar
Li Z, Wu Y, Peng B, Chen X, Sun Z, Liu Y, Paul D (2022) Setransformer: A transformer-based code semantic parser for code comment generation. IEEE Transactions on Reliability 72(1):258–273
Article Google Scholar
LIN C (2004) Rouge: A package for automatic evaluation of summaries. In: Proc. Workshop on Text Summariation Branches Out, Post-Conference Workshop of ACL 2004
Lin H, Chen X, Chen X, Cui Z, Miao Y, Zhou S, Wang J, Su Z (2023) Gen-fl: Quality prediction-based filter for automated issue title generation. Journal of Systems and Software 195:111513
Article Google Scholar
Liu C, Bao X, Zhang H, Zhang N, Hu H, Zhang X, Yan M (2023a) Improving chatgpt prompt for code generation. arXiv:2305.08360
Liu K, Yang G, Chen X, Yu C (2022) Sotitle: A transformer-based post title generation approach for stack overflow. 2022 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 577–588
Google Scholar
Liu K, Chen X, Chen C, Xie X, Cui Z (2023) Automated question title reformulation by mining modification logs from stack overflow. IEEE Transactions on Software Engineering 49(9):4390–4410
Article Google Scholar
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35
Article Google Scholar
Liu Q, Liu Z, Zhu H, Fan H, Du B, Qian Y (2019a) Generating commit messages from diffs using pointer-generator network. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp 299–309
Liu X, He P, Chen W, Gao J (2019b) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 4487–4496
Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2023d) Gpt understands, too. AI Open
Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on Learning Representations
Niu C, Li C, Ng V, Chen D, Ge J, Luo B (2023) An empirical comparison of pre-trained models of source code. arXiv:2302.04026
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving low quality stack overflow post detection. In: 2014 IEEE international conference on software maintenance and evolution, IEEE, pp 541–544
Prechelt L (1998) Early stopping-but when? In: Neural Networks: Tricks of the trade, Springer, pp 55–69
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21:1–67
MathSciNet Google Scholar
Schick T, Schütze H (2021) Exploiting cloze-questions for few-shot text classification and natural language inference. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp 255–269
Tóth L, Nagy B, Janthó D, Vidács L, Gyimóthy T (2019) Towards an accurate prediction of the question quality on stack overflow using a deep-learning-based nlp approach. In: Proceedings of the 14th International Conference on Software Technologies, pp 631–639
Trienes J, Balog K (2019) Identifying unclear questions in community question answering websites. In: 41st European Conference on Information Retrieval, ECIR 2019, Springer, pp 276–289
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR (2022) No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 382–394
Wang Y, Wang W, Joty S, Hoi SC (2021) Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 8696–8708
Wei B, Li Y, Li G, Xia X, Jin Z (2020) Retrieve and refine: exemplar-based neural comment generation. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp 349–360
Wilcoxon F (1992) Individual comparisons by ranking methods. Springer
Book Google Scholar
Xia CS, Zhang L (2023) Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. arXiv preprint arXiv:2304.00385
Xu B, Hoang T, Sharma A, Yang C, Xia X, Lo D (2021) Post2vec: Learning distributed representations of stack overflow posts. IEEE Transactions on Software Engineering 48(9):3423–3441
Article Google Scholar
Yang G, Chen X, Zhou Y, Yu C (2022) Dualsc: Automatic generation and summarization of shellcode via transformer and dual learning. 2022 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 361–372
Google Scholar
Yang G, Liu K, Chen X, Zhou Y, Yu C, Lin H (2022) Ccgir: Information retrieval-based code comment generation method for smart contracts. Knowledge-Based Systems 237:107858
Article Google Scholar
Yang G, Zhou Y, Chen X, Zhang X, Han T, Chen T (2023a) Exploitgen: Template-augmented exploit code generation based on codebert. Journal of Systems and Software 197:111577
Yang G, Zhou Y, Chen X, Zhang X, Xu Y, Han T, Chen T (2023) A syntax-guided multi-task learning approach for turducken-style code generation. Empirical Software Engineering 28(6):141
Article Google Scholar
Yang J, Hauff C, Bozzon A, Houben GJ (2014) Asking the right question in collaborative q &a systems. In: Proceedings of the 25th ACM conference on Hypertext and social media, pp 179–189
Yazdaninia M, Lo D, Sami A (2021) Characterization and prediction of questions without accepted answers on stack overflow. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp 59–70
Yin P, Deng B, Chen E, Vasilescu B, Neubig G (2018) Learning to mine aligned code and natural language pairs from stack overflow. In: Proceedings of the 15th international conference on mining software repositories, pp 476–486
Zhang F, Yu X, Keung J, Li F, Xie Z, Yang Z, Ma C, Zhang Z (2022) Improving stack overflow question title generation with copying enhanced codebert model and bi-modal information. Information and Software Technology 148:106922
Article Google Scholar
Zhang F, Liu J, Wan Y, Yu X, Liu X, Keung J (2023) Diverse title generation for stack overflow posts with multiple-sampling-enhanced transforme. Journal of Systems and Software 200:111672
Article Google Scholar
Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34(12):5586–5609
Article Google Scholar
Zhu J, Li L, Yang L, Ma X, Zuo C (2023) Automating method naming with context-aware prompt-tuning. arXiv:2303.05771

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their insightful comments and suggestions, which can substantially improve the quality of this work. Shaoyu Yang and Xiang Chen have contributed equally to this work and they are co-first authors. Xiang Chen is the corresponding author. This work is supported in part by the National Natural Science Foundation of China (Grant no. 61202006 and 61702041) and the Innovation Training Program for College Students (Grant no. 2023214 and 2023356).

Author information

Authors and Affiliations

School of Information Science and Technology, Nantong University, Nantong, China
Shaoyu Yang, Xiang Chen, Ke Liu & Chi Yu
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Guang Yang

Authors

Shaoyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ke Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chi Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Sebastian Baltes.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, S., Chen, X., Liu, K. et al. Automatic bi-modal question title generation for Stack Overflow with prompt learning. Empir Software Eng 29, 63 (2024). https://doi.org/10.1007/s10664-024-10466-4

Download citation

Accepted: 21 February 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s10664-024-10466-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic bi-modal question title generation for Stack Overflow with prompt learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Question Generation from Code Snippets and Programming Error Messages

Leveraging meta-data of code for adapting prompt tuning for code summarization

Automatic title completion for Stack Overflow posts and GitHub issues

Data Availibility Statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now