research-article

FQN Inference in Partial Code by Prompt-tuned Language Model of Code

Authors:

Zhenchang Xing,

Qinghua LuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 2

Article No.: 31, Pages 1 - 32

https://doi.org/10.1145/3617174

Published: 21 December 2023 Publication History

Abstract

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this article, we propose using a prompt-tuned code masked language model (MLM) as a neural knowledge base for type inference, called POME, which is lightweight and has minimal requirements on code compilation. Unlike the existing symbol name and context matching for type inference, POME infers the FQNs syntax and usage knowledge encapsulated in prompt-tuned code MLM through a colze-style fill-in-blank strategy. POME is integrated as a plug-in into web and integrated development environments (IDE) to assist developers in inferring FQNs in the real world. We systematically evaluate POME on a large amount of source code from GitHub and Stack Overflow, and explore its generalization and hybrid capability. The results validate the effectiveness of the POME design and its applicability for partial code type inference, and they can be easily extended to different programming languages (PL). POME can also be used to generate a PL-hybrid type inference model for providing a one-for-all solution. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

References

[1]

C. M. Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal Kumar Roy. 2019. Learning from examples to find fully qualified names of API elements in code snippets. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering.243–254.

[2]

Piyush Kumar Gupta, Nikita Mehrotra, and Rahul Purandare. 2020. JCoffee: Using compiler feedback to make partial code snippets compilable. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. 810–813.

[3]

Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 204–213.

Digital Library

[4]

Subhadip Maji, Swapna Sourav Rout, and Sudeep Choudhary. 2021. Dcom: A deep column mapper for semantic data type detection. CoRR, abs/2106.12871, 2021.

[5]

Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.886–896.

Digital Library

[6]

Luca Piccolboni, Giuseppe Di Guglielmo, Luca P. Carloni, and Simha Sethumadhavan. 2021. CRYLOGGER: detecting crypto misuses dynamically. In 42nd IEEE Symposium on Security and Privacy, (SP’21), San Francisco, CA, 1972–1989.

[7]

Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, (NeurIPS’19), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, (Eds.). Vancouver, BC, 10197–10207.

[8]

Xiaoxue Ren, Xinyuan Ye, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Jianling Sun. 2020. API-misuse detection driven by fine-grained API-constraint knowledge graph. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering.461–472.

Digital Library

[9]

Leandro T. C. Melo, Rodrigo G. Ribeiro, Breno C. F. Guimarães, and Fernando Magno Quintão Pereira. 2020. Type inference for C: Applications to the static analysis of incomplete programs. ACM Transactions on Programming Languages and Systems 42, 3 (2020), 15:1–15:71.

Digital Library

[10]

Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In 36th International Conference on Software Engineering, ICSE’14, Hyderabad, India - May 31 - June 07), Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.). ACM, 643–652.

[11]

Yiwen Dong, Tianxiao Gu, Yongqiang Tian, and Chengnian Sun. 2022. SnR: Constraint-based type inference for incomplete Java code snippets. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27). ACM, 1982–1993. ACM, 1982–1993.

[12]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: (EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.

[13]

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In Proceedings of the International Conference on Machine Learning. PMLR, 5110–5121.

[14]

Premkumar T. Devanbu. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering.837–847.

[15]

Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 81:1–81:37.

[16]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.

[17]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS 2020, December 6-12, 2020, virtual, 2020), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).

[18]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text to-text transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.

[19]

Noah Liebman, Michael Nagara, Jacek Spiewla, and Erin Zolkosky. 2010. Cuebert: A new mixing board concept for musical theatre. In Proceedings of the NIME.

[20]

Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning.

[21]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien Nhut Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering.

Digital Library

[22]

Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 17th Working Conference on Reverse Engineering (WCRE’10, 13-16 October 2010, Beverly, MA), Giuliano Antoniol, Martin Pinzger, and Elliot J. Chikofsky, (Eds.). IEEE Computer Society, 35–44.

[23]

Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2020. Global relational models of source code. In Proceedings of the International Conference on Learning Representations.

[24]

Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander H. Miller. 2019. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 2463–2473.

[25]

Kurt D. Bollacker, Colin Evans, Praveen K. Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.

Digital Library

[26]

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16, Las Vegas, NV, USA, June 27-30), IEEE Computer Society, 779–788.

[27]

Anonymous. 2022. Analyzing CodeBERT’s performance on natural language code search. (2022).

[28]

Yi Sun, Yu Zheng, Chao Hao, and Hangping Qiu. 2021. NSP-BERT: A prompt-based zero-shot learner through an original pre-training task-next sentence prediction. CoRR, abs/2109.03564.

[29]

Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, and Maosong Sun. 2022. PTR: prompt tuning with rules for text classification. AI Open, 3 (2022), 182–192.

[30]

Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2022. PPT: pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (ACL 2022, Dublin, Ireland, May 22-27, 2022), Association for Computational Linguistics, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, (Eds.). 8410–8423.

[31]

Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Haitao Zheng, Zhiyuan Liu, Juanzi Li, and Hong-Gee Kim. 2022. Prompt-learning for fine-grained entity typing. In Findings of the Association for Computational Linguistics: (EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022), Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 6888–6901.

[32]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2021. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR, abs/2110.07602.

[33]

Hung Dang Phan, Hoan Anh Nguyen, Ngoc M. Tran, Linh-Huyen Truong, Anh Tuan Nguyen, and Tien Nhut Nguyen. 2018. Statistical learning of API fully qualified names in code snippets of online forums. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.632–642.

Digital Library

[34]

ChatGPT. https://openai.com/blog/chatgpt. Access date: May 13, 2023.

[35]

Timo Schick and Hinrich Schütze. 2021. It’s not just size that matters: Small language models are also few-shot learners. arXiv:2009.07118. Retrieved from https://arxiv.org/abs/2009.07118

[36]

Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the EACL.

[37]

Qing Huang, Zhiqiang Yuan, Zhenchang Xing, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2023. Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.Association for Computing Machinery, New York, NY, 13 pages. DOI:

Digital Library

[38]

Barthélémy Dagenais and Laurie Hendren. 2008. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 313–328.

Digital Library

[39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems Annual Conference on Neural Information Processing Systems (2017, December 4-9, 2017), Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Long Beach, CA, 5998–6008.

[40]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019), Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.

[41]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. CoRR, abs/1909.09436.

[42]

Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code? In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering.1332–1336.

[43]

Sergey Troshin and Nadezhda Chirkova. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.), Association for Computational Linguistics, 371–383.

[44]

Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hairong Jin. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). Pittsburgh, PA, 2377–2388.

[45]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021, Joaquin Vanschoren and Sai-Kit Yeung (Eds.).

[46]

Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 261–271.

[47]

Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 1–29.

Digital Library

[48]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.

[49]

Jian Gu, Pasquale Salza, and Harald C. Gall. 2022. Assemble foundation models for automatic code summarization.

[50]

Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. 2021. Bridging pre-trained models and downstream tasks for source code understanding. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022), ACM, 287–298.

[51]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: retrieval-augmented language model pre-training. CoRR, abs/2002.08909.

[52]

Raja Naeem Akram and Konstantinos Markantonakis. 2016. Challenges of security and trust of mobile devices as digital avionics component. In Proceedings of the 2016 Integrated Communications Navigation and Surveillance. 1C4–1–1C4–11. DOI:

[53]

IntelliJ IDEA. https://www.jetbrains.com/idea/. Access date: December, 2022.

[54]

Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: A method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the COLING.

Digital Library

[55]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv:2107.13586. Retrieved from https://arxiv.org/abs/2107.13586

[56]

B. L. WELCH. 1947. The Generalization of ‘Student’s’ Problem when several different population varlances are Involved. Biometrik, 34, 1-2 (1947), 28–35.

[57]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. An empirical cybersecurity evaluation of github copilot’s code contributions. CoRR, abs/2108.09293

[58]

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. 2020. The Pile: An 800GB dataset of diverse text for language modeling. arXiv e-prints, arXiv:2101.00027.

[59]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. 2022. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022, New Orleans, LA, USA, June 18-24, 2022), IEEE, 15979–15988.

[60]

Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. In Proceedings of the AAAI Conference on Artificial Intellegence (2021).

[61]

Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:

Digital Library

[62]

Amir M. Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:

Digital Library

[63]

Tianyi Zhang, Di Yang, Crista Lopes, and Miryung Kim. 2019. Analyzing and supporting adaptation of online code examples. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 316–327.

Digital Library

[64]

Medha Umarji, Susan Elliott Sim, and Crista Lopes. 2008. Archetypal internet-scale source code searching. In Proceedings of the IFIP International Conference on Open Source Systems. Springer, 257–263.

[65]

Rosalva E. Gallardo-Valencia and Susan Elliott Sim. 2009. Internet-scale code search. In 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. 49–52.

[66]

Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1589–1598.

Digital Library

[67]

Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259–1295.

Digital Library

[68]

Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow? Empirical Software Engineering 24, 2 (2019), 637–673.

Digital Library

[69]

Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API caveats accessibility by mining API caveats knowledge graph. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution.183–193.

[70]

Jiamou Sun, Zhenchang Xing, Rui Chu, Heilai Bai, Jinshui Wang, and Xin Peng. 2019. Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution.257–268.

[71]

Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019).

[72]

Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. 2009. Sourcerer: Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300–336.

Digital Library

[73]

Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY – A code-to-code search engine. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering. 946–957. DOI:

Digital Library

[74]

Qing Huang, An Qiu, Maosheng Zhong, and Yuan Wang. 2020. A code-description representation learning model based on attention. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. 447–455. DOI:

[75]

Qing Huang and Guoqing Wu. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705–732.

[76]

Qing Huang and Huaiguang Wu. 2019. QE-integrating framework based on Github knowledge and SVM ranking. Science China Information Sciences 62, 5 (2019), 1–16.

[77]

Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera, and Lionel Seinturier. 2016. SPOON: A library for implementing analyses and transformations of java source code. Softw. Pract. Exp., 46, 9 (2016), 1155–1179.

[78]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the ACL.

[79]

Benjamin Heinzerling and Kentaro Inui. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021, Online, April 19-23, 2021), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.). Association for Computational Linguistics, 1772–1791.

[80]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.

[81]

Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoderdecoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7-11 November, 2021), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708.

[82]

Anonymous. 2021. A new search paradigm for natural language code search. (2021).

[83]

Luca Buratti, Saurabh Pujar, Mihaela A. Bornea, J. Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, and Giacomo Domeniconi. 2020. Exploring software naturalness through neural language models. CoRR, abs/2006.12641

[84]

Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22 Pittsburgh, PA, USA, May 25-27, 2022), ACM, 2377–2388.

[85]

Patrick Morrison, Kim Herzig, Brendan Murphy, and Laurie Williams. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1–9.

Digital Library

[86]

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020, Online, July 5-10, 2020), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, (Eds.). Association for Computational Linguistics, 4998–5007.

[87]

Sergey Troshin and Nadezhda Chirkova. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.). Association for Computational Linguistics, 371–383.

[88]

Wenxuan Zhou, Junyi Du, and Xiang Ren. 2019. Improving BERT fine-tuning with embedding normalization. ArXiv abs/1911.03918

[89]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. 2018. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2018), 9.

[90]

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20, 2020), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4222–4235.

[91]

Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3816–3830.

[92]

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021), MarieFrancine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, (Eds.). Association for Computational Linguistics, 3045–3059.

[93]

Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101.00190 (2021).

[94]

Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. 2022. Context-tuning: Learning contextualized prompts for natural language generation. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022, Gyeongju, Republic of Korea, October 12-17), Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, (Eds.). International Committee on Computational Linguistics, 6340–6354.

[95]

Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, (Eds.). Association for Computational Linguistics, 5418–5426.

[96]

Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig. 2020. How can we know what language models know. Trans. Assoc. Comput. Linguistics 8 (2020), 423–438.

[97]

Benjamin Heinzerling and Kentaro Inui. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, (EACL 2021, Online, April 19-23), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.), Association for Computational Linguistics, 1772–1791.

Cited By

Huang QSun YXing ZCao YChen JXu XJin HLu J(2024)Let’s Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/3680469Online publication date: 23-Jul-2024
https://doi.org/10.1145/3680469

Index Terms

FQN Inference in Partial Code by Prompt-tuned Language Model of Code
1. Software and its engineering
  1. Software organization and properties

Recommendations

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search ...
Polymorphic type inference for machine code
PLDI '16

For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe ...
Understanding code snippets in code reviews: a preliminary study of the OpenStack community
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension

Code review is a mature practice for software quality assurance in software development with which reviewers check the code that has been committed by developers, and verify the quality of code. During the code review discussions, reviewers and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 2

February 2024

947 pages

EISSN:1557-7392

DOI:10.1145/3618077

Editor:
Mauro Pezzè
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2023

Online AM: 24 August 2023

Accepted: 24 July 2023

Revised: 15 July 2023

Received: 13 December 2022

Published in TOSEM Volume 33, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Graduate Innovative Special Fund Projects of Jiangxi Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
512
Total Downloads

Downloads (Last 12 months)315
Downloads (Last 6 weeks)28

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang QSun YXing ZCao YChen JXu XJin HLu J(2024)Let’s Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/3680469Online publication date: 23-Jul-2024
https://doi.org/10.1145/3680469

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents