Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3691620.3695545acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open access

Towards Effective Static Type-Error Detection for Python

Published: 27 October 2024 Publication History

Abstract

In this experience paper, we design, implement, and evaluate a new static type-error detection tool for Python. To build a practical tool, we first collected and analyzed 68 real-world type errors gathered from 20 open-source projects. This empirical investigation revealed four key static-analysis features that are crucial for the effective detection of Python type errors in practice. Utilizing these insights, we present a tool called Pyinder, which can successfully detect 34 out of the 68 bugs, compared to existing type analysis tools that collectively detect only 16 bugs. We also discuss the remaining 34 bugs that Pyinder failed to detect, offering insights into future directions for Python type analysis tools. Lastly, we show that Pyinder can uncover previously unknown bugs in recent Python projects.

References

[1]
Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural Type Hints. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 91--105.
[2]
Jong-hoon An, Avik Chaudhuri, and Jeffrey S Foster. 2009. Static typing for Ruby on Rails. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 590--594.
[3]
Jong-hoon An, Avik Chaudhuri, Jeffrey S Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices 46, 1 (2011), 459--472.
[4]
Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In European conference on Object-oriented programming. Springer, 428--452.
[5]
Brett Cannon. 2005. Localized type inference of atomic types in python. Ph. D. Dissertation. Citeseer.
[6]
Wontae Choi, Satish Chandra, George Necula, and Koushik Sen. 2015. SJS: A type system for JavaScript with fixed object layout. In International Static Analysis Symposium. Springer, 181--198.
[7]
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr. 2018. RefiNym: Using Names to Refine Types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 107--117.
[8]
Luca Di Grazia and Michael Pradel. 2022. The evolution of type annotations in python: an empirical study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, Singapore, Singapore,) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 209--220.
[9]
Julian Dolby, Avraham Shinnar, Allison Allain, and Jenna Reinen. 2018. Ariadne: analysis for machine learning programs. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (Philadelphia, PA, USA) (MAPL 2018). Association for Computing Machinery, New York, NY, USA, 1--10.
[10]
dropbox. 2017. Pyannotate: Auto-generate PEP-484 annotations. https://github.com/dropbox/pyannotate
[11]
Aryaz Eghbali and Michael Pradel. 2022. DynaPyt: a dynamic analysis framework for Python. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, Singapore, Singapore,) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 760--771.
[12]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon) (KDD'96). AAAI Press, 226--231.
[13]
facebook. 2017. Pyre-check: Performant type-checking for python. https://github.com/facebook/pyre-check
[14]
Michael Furr, Jong-hoon An, and Jeffrey S Foster. 2009. Profile-guided static typing for dynamic scripting languages. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 283--300.
[15]
Michael Furr, Jong-hoon An, Jeffrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM symposium on Applied Computing. 1859--1866.
[16]
Zheng Gao, Christian Bird, and Earl T. Barr. 2017. To Type or Not to Type: Quantifying Detectable Bugs in JavaScript. In Proceedings of the 39th International Conference on Software Engineering (Buenos Aires, Argentina) (ICSE '17). IEEE Press, 758--769.
[17]
google. 2015. Pytype: A static type analyzer for Python code. https://github.com/google/pytype
[18]
Michael Gorbovitski, Yanhong A Liu, Scott D Stoller, Tom Rothamel, and Tuncay K Tekle. 2010. Alias analysis for optimization of dynamic languages. In Proceedings of the 6th Symposium on Dynamic Languages. 27--42.
[19]
Brian Hackett and Shu-yu Guo. 2012. Fast and precise hybrid type inference for JavaScript. ACM SIGPLAN Notices 47, 6 (2012), 239--250.
[20]
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, and Andreas Stefik. 2014. An Empirical Study on the Impact of Static Typing on Software Maintainability. Empirical Softw. Engg. 19, 5 (oct 2014), 1335--1382.
[21]
Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMT-Based Type Inference for Python 3. In International Conference on Computer Aided Verification. https://api.semanticscholar.org/CorpusID:51873753
[22]
Phillip Heidegger and Peter Thiemann. 2010. Recency types for analyzing scripting languages. In European conference on Object-oriented programming. Springer, 200--224.
[23]
Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 152--162.
[24]
Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Machine-learning-guided selectively unsound static analysis. In Proceedings of the 39th International Conference on Software Engineering (Buenos Aires, Argentina) (ICSE '17). IEEE Press, 519--529.
[25]
Instagram. 2017. https://github.com/Instagram/MonkeyType
[26]
Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for JavaScript. In International Static Analysis Symposium. Springer, 238--255.
[27]
Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning graph-based heuristics for pointer analysis without handcrafting application-specific features. Proc. ACM Program. Lang. 4, OOPSLA, Article 179 (nov 2020), 30 pages.
[28]
Sehun Jeong, Minseok Jeon, Sungdeok Cha, and Hakjoo Oh. 2017. Data-driven context-sensitivity for points-to analysis. Proc. ACM Program. Lang. 1, OOPSLA, Article 100 (oct 2017), 28 pages.
[29]
Kevin Jesse, Premkumar T. Devanbu, and Toufique Ahmed. 2021. Learning Type Annotation: Is Big Data Enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 1483--1486.
[30]
Jetbrains. 2020. https://www.jetbrains.com/lp/python-developers-survey-2020/.
[31]
George Kastrinis and Yannis Smaragdakis. 2013. Hybrid context-sensitivity for points-to analysis. SIGPLAN Not. 48, 6 (jun 2013), 423--434.
[32]
Faizan Khan, Boqi Chen, Daniel Varro, and Shane McIntosh. 2022. An Empirical Study of Type-Related Defects in Python Projects. IEEE Transactions on Software Engineering 48, 8 (2022), 3145--3158.
[33]
Sifis Lagouvardos, Julian Dolby, Neville Grech, Anastasios Antoniadis, and Yannis Smaragdakis. 2020. Static Analysis of Shape in TensorFlow Programs. In 34th European Conference on Object-Oriented Programming (ECOOP 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166), Robert Hirschfeld and Tobias Pape (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 15:1--15:29.
[34]
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Precision-guided context sensitivity for pointer analysis. Proc. ACM Program. Lang. 2, OOPSLA, Article 141 (oct 2018), 29 pages.
[35]
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Scalability-first pointer analysis with self-tuning context-sensitivity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 129--140.
[36]
Jingbo Lu, Dongjie He, and Jingling Xue. 2021. Eagle: CFL-Reachability-Based Precision-Preserving Acceleration of Object-Sensitive Pointer Analysis with Partial Context Sensitivity. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 46 (jul 2021), 46 pages.
[37]
microsoft. 2019. Pyright: Static Type Checker for Python. https://github.com/microsoft/pyright
[38]
Nevena Milojković, Mohammad Ghafari, and Oscar Nierstrasz. 2017. Exploiting Type Hints in Method Argument Names to Improve Lightweight Type Inference. In Proceedings of the 25th International Conference on Program Comprehension (Buenos Aires, Argentina) (ICPC '17). IEEE Press, 77--87.
[39]
Amir M. Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 2241--2252.
[40]
Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic Type Inference for Gradual Hindley-Milner Typing. Proc. ACM Program. Lang. 3, POPL, Article 18 (jan 2019), 29 pages.
[41]
Octoverse. 2023. https://octoverse.github.com/.
[42]
Wonseok Oh and Hakjoo Oh. 2022. PyTER: Effective Program Repair for Python Type Errors. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, Singapore, Singapore,) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 922--934.
[43]
Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 2019--2030.
[44]
Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. TypeWriter: Neural Type Prediction with Search-Based Validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 209--220.
[45]
Michael Pradel, Parker Schuh, and Koushik Sen. 2015. TypeDevil: Dynamic type inconsistency analysis for JavaScript. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 314--324.
[46]
python. 2012. Mypy: Static Typing for Python. https://github.com/python/mypy
[47]
python. 2015. typeshed: Collection of library stubs for Python, with static types. https://github.com/python/typeshed
[48]
Ingkarat Rak-amnouykit, Daniel McCrevan, Ana Milanova, Martin Hirzel, and Julian Dolby. 2020. Python 3 Types in the Wild: A Tale of Two Type Systems. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (Virtual, USA) (DLS 2020). Association for Computing Machinery, New York, NY, USA, 57--70.
[49]
Armin Rigo and Samuele Pedroni. 2006. PyPy's approach to virtual machine construction. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications. 944--953.
[50]
Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph.D. Dissertation. Massachusetts Institute of Technology.
[51]
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick your contexts well: understanding object-sensitivity. SIGPLAN Not. 46, 1 (jan 2011), 17--30.
[52]
IEEE Spectrum. 2023. https://spectrum.ieee.org/the-top-programming-languages-2023.
[53]
Andreas Stuchlik and Stefan Hanenberg. 2011. Static vs. Dynamic Type Systems: An Empirical Study about the Relationship between Type Casts and Development Time. In Proceedings of the 7th Symposium on Dynamic Languages (Portland, Oregon, USA) (DLS '11). Association for Computing Machinery, New York, NY, USA, 97--106.
[54]
Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis. 2021. Making pointer analysis more precise by unleashing the power of selective context sensitivity. Proc. ACM Program. Lang. 5, OOPSLA, Article 147 (oct 2021), 27 pages.
[55]
Nikos Vasilakis, Grigoris Ntousakis, Veit Heller, and Martin C. Rinard. 2021. Efficient Module-Level Dynamic Analysis for Dynamic Languages with Module Recontextualization. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 1202--1213.
[56]
Jiayi Wei, Greg Durrett, and Isil Dillig. 2023. TypeT5: Seq2seq Type Inference using Static Analysis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=4TyNEhI2GdN
[57]
Jiayi Wei, Maruth Goyal, Greg Durrett, and Işıl Dillig. 2020. LambdaNet: Probabilistic Type Inference using Graph Neural Networks. ArXiv abs/2005.02161 (2020). https://api.semanticscholar.org/CorpusID:211027382
[58]
Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qijin Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong Jin Kang, Thong Hoang, David Lo, and Eng Lieh Ouh. 2020. BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1556--1560.
[59]
Wenjie Xu, Lin Chen, Chenghao Su, Yimeng Guo, Yanhui Li, Yuming Zhou, and Baowen Xu. 2023. How Well Static Type Checkers Work with Gradual Typing? A Case Study on Python. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC). 242--253.
[60]
Zhaogui Xu, Peng Liu, Xiangyu Zhang, and Baowen Xu. 2016. Python Predictive Analysis for Bug Detection. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). Association for Computing Machinery, New York, NY, USA, 121--132.
[61]
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python Probabilistic Type Inference with Natural Language Support. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). Association for Computing Machinery, New York, NY, USA, 607--618.
[62]
Xin Zhang, Rongjie Yan, Jiwei Yan, Baoquan Cui, Jun Yan, and Jian Zhang. 2022. ExcePy: A Python Benchmark for Bugs with Python Built-in Types. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 856--866.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering
October 2024
2587 pages
ISBN:9798400712487
DOI:10.1145/3691620
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Badges

Qualifiers

  • Research-article

Conference

ASE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 163
    Total Downloads
  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)59
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media