Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FQN Inference in Partial Code by Prompt-tuned Language Model of Code

Published: 21 December 2023 Publication History

Abstract

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this article, we propose using a prompt-tuned code masked language model (MLM) as a neural knowledge base for type inference, called POME, which is lightweight and has minimal requirements on code compilation. Unlike the existing symbol name and context matching for type inference, POME infers the FQNs syntax and usage knowledge encapsulated in prompt-tuned code MLM through a colze-style fill-in-blank strategy. POME is integrated as a plug-in into web and integrated development environments (IDE) to assist developers in inferring FQNs in the real world. We systematically evaluate POME on a large amount of source code from GitHub and Stack Overflow, and explore its generalization and hybrid capability. The results validate the effectiveness of the POME design and its applicability for partial code type inference, and they can be easily extended to different programming languages (PL). POME can also be used to generate a PL-hybrid type inference model for providing a one-for-all solution. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

References

[1]
C. M. Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal Kumar Roy. 2019. Learning from examples to find fully qualified names of API elements in code snippets. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering.243–254.
[2]
Piyush Kumar Gupta, Nikita Mehrotra, and Rahul Purandare. 2020. JCoffee: Using compiler feedback to make partial code snippets compilable. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. 810–813.
[3]
Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 204–213.
[4]
Subhadip Maji, Swapna Sourav Rout, and Sudeep Choudhary. 2021. Dcom: A deep column mapper for semantic data type detection. CoRR, abs/2106.12871, 2021.
[5]
Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.886–896.
[6]
Luca Piccolboni, Giuseppe Di Guglielmo, Luca P. Carloni, and Simha Sethumadhavan. 2021. CRYLOGGER: detecting crypto misuses dynamically. In 42nd IEEE Symposium on Security and Privacy, (SP’21), San Francisco, CA, 1972–1989.
[7]
Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, (NeurIPS’19), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, (Eds.). Vancouver, BC, 10197–10207.
[8]
Xiaoxue Ren, Xinyuan Ye, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Jianling Sun. 2020. API-misuse detection driven by fine-grained API-constraint knowledge graph. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering.461–472.
[9]
Leandro T. C. Melo, Rodrigo G. Ribeiro, Breno C. F. Guimarães, and Fernando Magno Quintão Pereira. 2020. Type inference for C: Applications to the static analysis of incomplete programs. ACM Transactions on Programming Languages and Systems 42, 3 (2020), 15:1–15:71.
[10]
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In 36th International Conference on Software Engineering, ICSE’14, Hyderabad, India - May 31 - June 07), Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.). ACM, 643–652.
[11]
Yiwen Dong, Tianxiao Gu, Yongqiang Tian, and Chengnian Sun. 2022. SnR: Constraint-based type inference for incomplete Java code snippets. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27). ACM, 1982–1993. ACM, 1982–1993.
[12]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: (EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.
[13]
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In Proceedings of the International Conference on Machine Learning. PMLR, 5110–5121.
[14]
Premkumar T. Devanbu. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering.837–847.
[15]
Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 81:1–81:37.
[16]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.
[17]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS 2020, December 6-12, 2020, virtual, 2020), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).
[18]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text to-text transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.
[19]
Noah Liebman, Michael Nagara, Jacek Spiewla, and Erin Zolkosky. 2010. Cuebert: A new mixing board concept for musical theatre. In Proceedings of the NIME.
[20]
Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning.
[21]
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien Nhut Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering.
[22]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 17th Working Conference on Reverse Engineering (WCRE’10, 13-16 October 2010, Beverly, MA), Giuliano Antoniol, Martin Pinzger, and Elliot J. Chikofsky, (Eds.). IEEE Computer Society, 35–44.
[23]
Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2020. Global relational models of source code. In Proceedings of the International Conference on Learning Representations.
[24]
Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander H. Miller. 2019. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 2463–2473.
[25]
Kurt D. Bollacker, Colin Evans, Praveen K. Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.
[26]
Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16, Las Vegas, NV, USA, June 27-30), IEEE Computer Society, 779–788.
[27]
Anonymous. 2022. Analyzing CodeBERT’s performance on natural language code search. (2022).
[28]
Yi Sun, Yu Zheng, Chao Hao, and Hangping Qiu. 2021. NSP-BERT: A prompt-based zero-shot learner through an original pre-training task-next sentence prediction. CoRR, abs/2109.03564.
[29]
Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, and Maosong Sun. 2022. PTR: prompt tuning with rules for text classification. AI Open, 3 (2022), 182–192.
[30]
Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2022. PPT: pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (ACL 2022, Dublin, Ireland, May 22-27, 2022), Association for Computational Linguistics, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, (Eds.). 8410–8423.
[31]
Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Haitao Zheng, Zhiyuan Liu, Juanzi Li, and Hong-Gee Kim. 2022. Prompt-learning for fine-grained entity typing. In Findings of the Association for Computational Linguistics: (EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022), Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 6888–6901.
[32]
Xiao Liu, Kaixuan Ji, Yicheng Fu, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2021. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR, abs/2110.07602.
[33]
Hung Dang Phan, Hoan Anh Nguyen, Ngoc M. Tran, Linh-Huyen Truong, Anh Tuan Nguyen, and Tien Nhut Nguyen. 2018. Statistical learning of API fully qualified names in code snippets of online forums. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.632–642.
[34]
ChatGPT. https://openai.com/blog/chatgpt. Access date: May 13, 2023.
[35]
Timo Schick and Hinrich Schütze. 2021. It’s not just size that matters: Small language models are also few-shot learners. arXiv:2009.07118. Retrieved from https://arxiv.org/abs/2009.07118
[36]
Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the EACL.
[37]
Qing Huang, Zhiqiang Yuan, Zhenchang Xing, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2023. Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.Association for Computing Machinery, New York, NY, 13 pages. DOI:
[38]
Barthélémy Dagenais and Laurie Hendren. 2008. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 313–328.
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems Annual Conference on Neural Information Processing Systems (2017, December 4-9, 2017), Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Long Beach, CA, 5998–6008.
[40]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019), Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.
[41]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. CoRR, abs/1909.09436.
[42]
Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code? In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering.1332–1336.
[43]
Sergey Troshin and Nadezhda Chirkova. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.), Association for Computational Linguistics, 371–383.
[44]
Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hairong Jin. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). Pittsburgh, PA, 2377–2388.
[45]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021, Joaquin Vanschoren and Sai-Kit Yeung (Eds.).
[46]
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 261–271.
[47]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 1–29.
[48]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.
[49]
Jian Gu, Pasquale Salza, and Harald C. Gall. 2022. Assemble foundation models for automatic code summarization.
[50]
Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. 2021. Bridging pre-trained models and downstream tasks for source code understanding. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022), ACM, 287–298.
[51]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: retrieval-augmented language model pre-training. CoRR, abs/2002.08909.
[52]
Raja Naeem Akram and Konstantinos Markantonakis. 2016. Challenges of security and trust of mobile devices as digital avionics component. In Proceedings of the 2016 Integrated Communications Navigation and Surveillance. 1C4–1–1C4–11. DOI:
[53]
IntelliJ IDEA. https://www.jetbrains.com/idea/. Access date: December, 2022.
[54]
Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: A method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the COLING.
[55]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv:2107.13586. Retrieved from https://arxiv.org/abs/2107.13586
[56]
B. L. WELCH. 1947. The Generalization of ‘Student’s’ Problem when several different population varlances are Involved. Biometrik, 34, 1-2 (1947), 28–35.
[57]
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. An empirical cybersecurity evaluation of github copilot’s code contributions. CoRR, abs/2108.09293
[58]
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. 2020. The Pile: An 800GB dataset of diverse text for language modeling. arXiv e-prints, arXiv:2101.00027.
[59]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. 2022. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022, New Orleans, LA, USA, June 18-24, 2022), IEEE, 15979–15988.
[60]
Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. In Proceedings of the AAAI Conference on Artificial Intellegence (2021).
[61]
Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:
[62]
Amir M. Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:
[63]
Tianyi Zhang, Di Yang, Crista Lopes, and Miryung Kim. 2019. Analyzing and supporting adaptation of online code examples. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 316–327.
[64]
Medha Umarji, Susan Elliott Sim, and Crista Lopes. 2008. Archetypal internet-scale source code searching. In Proceedings of the IFIP International Conference on Open Source Systems. Springer, 257–263.
[65]
Rosalva E. Gallardo-Valencia and Susan Elliott Sim. 2009. Internet-scale code search. In 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. 49–52.
[66]
Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1589–1598.
[67]
Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259–1295.
[68]
Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow? Empirical Software Engineering 24, 2 (2019), 637–673.
[69]
Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API caveats accessibility by mining API caveats knowledge graph. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution.183–193.
[70]
Jiamou Sun, Zhenchang Xing, Rui Chu, Heilai Bai, Jinshui Wang, and Xin Peng. 2019. Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution.257–268.
[71]
Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019).
[72]
Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. 2009. Sourcerer: Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300–336.
[73]
Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY – A code-to-code search engine. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering. 946–957. DOI:
[74]
Qing Huang, An Qiu, Maosheng Zhong, and Yuan Wang. 2020. A code-description representation learning model based on attention. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. 447–455. DOI:
[75]
Qing Huang and Guoqing Wu. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705–732.
[76]
Qing Huang and Huaiguang Wu. 2019. QE-integrating framework based on Github knowledge and SVM ranking. Science China Information Sciences 62, 5 (2019), 1–16.
[77]
Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera, and Lionel Seinturier. 2016. SPOON: A library for implementing analyses and transformations of java source code. Softw. Pract. Exp., 46, 9 (2016), 1155–1179.
[78]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the ACL.
[79]
Benjamin Heinzerling and Kentaro Inui. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021, Online, April 19-23, 2021), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.). Association for Computational Linguistics, 1772–1791.
[80]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
[81]
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoderdecoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7-11 November, 2021), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708.
[82]
Anonymous. 2021. A new search paradigm for natural language code search. (2021).
[83]
Luca Buratti, Saurabh Pujar, Mihaela A. Bornea, J. Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, and Giacomo Domeniconi. 2020. Exploring software naturalness through neural language models. CoRR, abs/2006.12641
[84]
Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22 Pittsburgh, PA, USA, May 25-27, 2022), ACM, 2377–2388.
[85]
Patrick Morrison, Kim Herzig, Brendan Murphy, and Laurie Williams. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1–9.
[86]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020, Online, July 5-10, 2020), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, (Eds.). Association for Computational Linguistics, 4998–5007.
[87]
Sergey Troshin and Nadezhda Chirkova. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.). Association for Computational Linguistics, 371–383.
[88]
Wenxuan Zhou, Junyi Du, and Xiang Ren. 2019. Improving BERT fine-tuning with embedding normalization. ArXiv abs/1911.03918
[89]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. 2018. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2018), 9.
[90]
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20, 2020), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4222–4235.
[91]
Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3816–3830.
[92]
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021), MarieFrancine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, (Eds.). Association for Computational Linguistics, 3045–3059.
[93]
Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101.00190 (2021).
[94]
Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. 2022. Context-tuning: Learning contextualized prompts for natural language generation. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022, Gyeongju, Republic of Korea, October 12-17), Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, (Eds.). International Committee on Computational Linguistics, 6340–6354.
[95]
Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, (Eds.). Association for Computational Linguistics, 5418–5426.
[96]
Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig. 2020. How can we know what language models know. Trans. Assoc. Comput. Linguistics 8 (2020), 423–438.
[97]
Benjamin Heinzerling and Kentaro Inui. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, (EACL 2021, Online, April 19-23), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.), Association for Computational Linguistics, 1772–1791.

Cited By

View all
  • (2024)Let’s Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/3680469Online publication date: 23-Jul-2024

Index Terms

  1. FQN Inference in Partial Code by Prompt-tuned Language Model of Code

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 2
    February 2024
    947 pages
    EISSN:1557-7392
    DOI:10.1145/3618077
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 December 2023
    Online AM: 24 August 2023
    Accepted: 24 July 2023
    Revised: 15 July 2023
    Received: 13 December 2022
    Published in TOSEM Volume 33, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Type inference
    2. fully qualified names
    3. code masked language model
    4. neural knowledge base

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Graduate Innovative Special Fund Projects of Jiangxi Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)315
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Let’s Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/3680469Online publication date: 23-Jul-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media