Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3385412.3385997acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Typilus: neural type hints

Published: 11 June 2020 Publication History

Abstract

Type inference over partial contexts in dynamically typed languages is challenging. In this work, we present a graph neural network model that predicts types by probabilistically reasoning over a program’s structure, names, and patterns. The network uses deep similarity learning to learn a TypeSpace — a continuous relaxation of the discrete space of types — and how to embed the type properties of a symbol (i.e. identifier) into it. Importantly, our model can employ one-shot learning to predict an open vocabulary of types, including rare and user-defined ones. We realise our approach in Typilus for Python that combines the TypeSpace with an optional type checker. We show that Typilus accurately predicts types. Typilus confidently predicts types for 70% of all annotatable symbols; when it predicts a type, that type optionally type checks 95% of the time. Typilus can also find incorrect type annotations; two important and popular open source libraries, fairseq and allennlp, accepted our pull requests that fixed the annotation errors Typilus discovered.

References

[1]
Miltiadis Allamanis. 2019. The Adverse Effects of Code Duplication in Machine Learning Models of Code. In SPLASH Onward!
[2]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning Natural Coding Conventions. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE).
[3]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE).
[4]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 81.
[5]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In Proceedings of the International Conference on Learning Representations (ICLR).
[6]
Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the Working Conference on Mining Software Repositories (MSR). IEEE Press, 207–216.
[7]
Miltiadis Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning (ICML).
[8]
Uri Alon, Omer Levy, and Eran Yahav. 2010. code2seq: Generating Sequences from Structured Representations of Code. In Proceedings of the International Conference on Learning Representations (ICLR).
[9]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 40.
[10]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR).
[11]
Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Vol. 2. 314–319.
[12]
Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv:1809.05193 (2018).
[13]
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the International Conference on Machine Learning (ICML). 2933–2942.
[14]
Gilad Bracha. 2004. Pluggable type systems. In OOPSLA workshop on revival of dynamic languages, Vol. 4.
[15]
Marc Brockschmidt, Miltiadis Allamanis, Alexander L Gaunt, and Oleksandr Polozov. 2019. Generative code modeling with graphs. In International Conference in Learning Representations.
[16]
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1335–1344.
[17]
Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, and Martin Vechev. 2019. Scalable taint specification inference with big code. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 760–774.
[18]
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. Syntax, Semantics and Structure in Statistical Translation (2014).
[19]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In CVPR.
[20]
Milan Cvitkovic, Badal Singh, and Anima Anandkumar. 2019. Open Vocabulary Learning on Source Code with a Graph-Structured Cache. In Proceedings of the International Conference on Machine Learning (ICML).
[21]
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T Barr. 2018. RefiNym: using names to refine types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 107–117.
[22]
Yaniv David, Uri Alon, and Eran Yahav. 2019. Neural Reverse Engineering of Stripped Binaries. arXiv preprint arXiv:1902.09122 (2019).
[23]
Daniel DeFreez, Aditya V Thakur, and Cindy Rubio-González. 2018. Path-based function embedding and its application to specification mining. arXiv preprint arXiv:1802.07779 (2018).
[24]
Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured neural summarization.
[25]
Python Software Foundation. 2020. typing – Support for type hints. https://docs.python.org/3/library/typing.html. Visited March 2020.
[26]
Zheng Gao, Christian Bird, and Earl T Barr. 2017. To type or not to type: quantifying detectable bugs in JavaScript. In Proceedings of the International Conference on Software Engineering (ICSE).
[27]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1263–1272.
[28]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
[29]
www.deeplearningbook.org.
[30]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.
[31]
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting debug information in stripped binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1667–1680.
[32]
Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 152–162.
[33]
Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763–773.
[34]
Kihong Heo, Mukund Raghothaman, Xujie Si, and Mayur Naik. 2019. Continuously reasoning about programs using differential Bayesian inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 561–575.
[35]
Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837–847.
[36]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073–2083.
[37]
Rafael-Michael Karampatsis and Charles Sutton. 2019. Maybe Deep Neural Networks are the Best Choice for Modeling Source Code. arXiv preprint arXiv:1903.05734 (2019).
[38]
Sayali Kate, John-Paul Ore, Xiangyu Zhang, Sebastian Elbaum, and Zhaogui Xu. 2018. Phys: probabilistic physical unit assignment and inconsistency detection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 563–573.
[39]
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Character-aware neural language models. In Thirtieth AAAI Conference Typilus: Neural Type Hints PLDI ’20, June 15–20, 2020, London, UK on Artificial Intelligence.
[40]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[41]
Ted Kremenek, Andrew Y Ng, and Dawson R Engler. 2007. A Factor Graph Model for Software Bug Finding. In Proceedings of the International Joint Conference on Artifical intelligence (IJCAI).
[42]
Jeremy Lacomis, Pengcheng Yin, Edward J Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. 2019.
[43]
DIRE: A Neural Approach to Decompiled Identifier Naming. In Proceedings of the International Conference on Automated Software Engineering (ASE).
[44]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated Graph Sequence Neural Networks. Proceedings of the International Conference on Learning Representations (ICLR) (2016).
[45]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: a manifesto. Commun. ACM 58, 2 (2015), 44–46.
[46]
Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 84.
[47]
Chris Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In Proceedings of the International Conference on Machine Learning (ICML). 649–657.
[48]
Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 304–315.
[49]
Ravi Mangal, Xin Zhang, Aditya V Nori, and Mayur Naik. 2015. A userguided approach to program analysis. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE).
[50]
Fabian Muehlboeck and Ross Tate. 2017. Sound gradual typing is nominally alive and well. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 56.
[51]
Stack Overflow. 2011. What is the difference between a string and a byte string? https://stackoverflow.com/questions/6224052. Visited Nov 2019.
[52]
Michael Pradel and Koushik Sen. 2017. Deep Learning to Find Bugs. (2017).
[53]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from Big Code. In Proceedings of the Symposium on Principles of Programming Languages (POPL), Vol. 50. ACM, 111–124.
[54]
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI), Vol. 49. ACM, 419–428.
[55]
Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting argument selection defects. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 104.
[56]
Gregor Richards, Sylvain Lebresne, Brian Burg, and Jan Vitek. 2010. An Analysis of the Dynamic Behavior of JavaScript Programs. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, Canada) (PLDI ’10). ACM, New York, NY, USA, 1–12.
[57]
[58]
Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. 2018. Learning loop invariants for program verification. In Advances in Neural Information Processing Systems. 7751–7762.
[59]
Spotify and Contributors. 2019. Annoy: Approximate Nearest Neighbors. https://github.com/spotify/annoy.
[60]
Asumu Takikawa, Daniel Feltey, Ben Greenman, Max S New, Jan Vitek, and Matthias Felleisen. 2016. Is sound gradual typing dead?. In ACM SIGPLAN Notices, Vol. 51. ACM, 456–468.
[61]
Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. 2019. Neural Program Repair by Jointly Learning to Localize and Repair. arXiv preprint arXiv:1904.01720 (2019).
[62]
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE).
[63]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

Cited By

View all
  • (2024)Towards Effective Static Type-Error Detection for PythonProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695545(1808-1820)Online publication date: 27-Oct-2024
  • (2024)QuAC: Quick Attribute-Centric Type Inference for PythonProceedings of the ACM on Programming Languages10.1145/36897838:OOPSLA2(2040-2069)Online publication date: 8-Oct-2024
  • (2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2020
1174 pages
ISBN:9781450376136
DOI:10.1145/3385412
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. graph neural networks
  3. meta-learning
  4. structured learning
  5. type inference

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)14
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Effective Static Type-Error Detection for PythonProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695545(1808-1820)Online publication date: 27-Oct-2024
  • (2024)QuAC: Quick Attribute-Centric Type Inference for PythonProceedings of the ACM on Programming Languages10.1145/36897838:OOPSLA2(2040-2069)Online publication date: 8-Oct-2024
  • (2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
  • (2024)Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitACM Computing Surveys10.1145/3664597Online publication date: 18-May-2024
  • (2024)Generating Python Type Annotations from Type Inference: How Far Are We?ACM Transactions on Software Engineering and Methodology10.1145/365215333:5(1-38)Online publication date: 3-Jun-2024
  • (2024)Feedback-Directed Partial ExecutionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680320(781-793)Online publication date: 11-Sep-2024
  • (2024)The Emergence of Large Language Models in Static Analysis: A First Look through Micro-BenchmarksProceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652288(35-39)Online publication date: 14-Apr-2024
  • (2024)Risky Dynamic Typing-related Practices in Python: An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/364959333:6(1-35)Online publication date: 27-Jun-2024
  • (2024)TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference ToolsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3640033(49-53)Online publication date: 14-Apr-2024
  • (2024)Dynamic Inference of Likely Symbolic Tensor Shapes in Python Machine Learning ProgramsProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639718(147-156)Online publication date: 14-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media