Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524842.3528467acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Complex Python features in the wild

Published: 17 October 2022 Publication History

Abstract

While Python is increasingly popular, program analysis tooling for Python is lagging. This is due, in part, to complex features of the Python language---features with difficult to understand and model semantics. Besides the "usual suspects", reflection and dynamic execution, complex Python features include context managers, decorators, and generators, among others. This paper explores how often and in what ways developers use certain complex features. We analyze over 3 million Python files mined from GitHub to address three research questions: (i) How often do developers use certain complex Python features? (ii) In what ways do developers use these features? (iii) Does use of complex features increase or decrease over time? Our findings show that usage of dynamic features that pose a threat to static analysis is infrequent. On the other hand, usage of context managers and decorators is surprisingly widespread. Our actionable result is a list of Python features that any "minimal syntax" ought to handle in order to capture developers' use of the Python language. We hope that understanding the usage of Python features will help tool-builders improve Python tools, which can in turn lead to more correct, secure, and performant Python code.

References

[1]
[n. d.]. Depends: Comprehensive Code Dependency Analysis Tool. https://github.com/multilang-depends/depends (retrieved March 2022).
[2]
[n. d.]. Pyan: Static Analysis of Python Code. https://github.com/davidfraser/pyan (retrieved March 2022).
[3]
[n. d.]. The Python Language Reference: 8. Compound Statements. https://docs.python.org/3/reference/compound_stmts.html (retrieved March 2022).
[4]
[n. d.]. pytype: A static type analyzer for Python code. https://google.github.io/pytype/ (retrieved March 2022).
[5]
2001. PyLint: Code analysis for Python. https://pylint.org (retrieved March 2022).
[6]
Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. 2007. RPython: a Step Towards Reconciling Dynamically and Statically Typed OO Languages. In Dynamic Languages Symposium (DLS).
[7]
Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding TypeScript. In European Conference for Object-Oriented Programming (ECOOP). 257--281.
[8]
Julian Dolby, Avraham Shinnar, Allison Allain, and Jenna Reinen. 2018. Ariadne: Analysis for Machine Learning Programs. In Workshop on Machine Learning and Programming Languages (MAPL). 1--10.
[9]
Robert Dyer, Hridesh Rajan, Hoan Anh Nguyen, and Tien N. Nguyen. 2014. Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. In International Conference on Software Engineering (ICSE). 779--790.
[10]
Aymeric Fromherz, Abdelraouf Ouadjaout, and Antoine Miné. 2018. Static Value Analysis of Python Programs by Abstract Interpretation. In NASA Formal Methods Symposium (NFM). 185--202.
[11]
Hristina Gulabovska and Zoltán Porkoláb. 2019. Survey on Static Analysis Tools of Python Programs. In Workshop of Software Quality, Analysis, Monitoring, Improvement, and Applications (SQAMIA). http://ceur-ws.org/Vol-2508/papergul.pdf
[12]
A. Holkner and J. Harland. 2009. Evaluating the dynamic behaviour of Python applications. In Australasian Computer Science Conference (ACSC). 17--25. https://crpit.scem.westernsydney.edu.au/abstracts/CRPITV91Holkner.html
[13]
Atsushi Igarashi, Benjamin C. Pierce, and Philip Wadler. 2001. Featherweight Java: A Minimal Core Calculus for Java and GJ. Transactions on Programming Languages and Systems (TOPLAS) 23, 3 (May 2001), 396--450.
[14]
Li Li, Jiawei Wang, and Haowei Quan. 2022. Scalpel: The Python Static Analysis Framework. https://arxiv.org/abs/2202.11840
[15]
Leo A. Meyerovich and Ariel S. Rabkin. 2013. Empirical Analysis of Programming Language Adoption. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 1--18.
[16]
Raphaël Monat, Abdelraouf Ouadjaout, and Antoine Miné. 2020. Static Type Analysis by Abstract Interpretation of Python Programs. In European Conference on Object-Oriented Programming (ECOOP). 17:1--17:29.
[17]
Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the Design of the R Language. In European Conference for Object-Oriented Programming (ECOOP). 104--131.
[18]
Chris Okasaki. 1995. Simple and efficient purely functional queues and deques. Journal of Functional Programming (JFP) 5, 4 (1995), 583--592.
[19]
Yun Peng, Yu Zhang, and Mingzhe Hu. 2021. An Empirical Study for Common Language Features Used in Python Projects. In Conference on Software Analysis, Evolution and Reengineering (SANER). 24--35.
[20]
Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In Conference on Mining Software Repositories (MSR). 507--517.
[21]
Ingkarat Rak-amnouykit, Daniel McCrevan, Ana Milanova, Martin Hirzel, and Julian Dolby. 2020. Python 3 Types in the Wild: A Tale of Two Type Systems. In Dynamic Languages Symposium (DLS). 57--70.
[22]
Ingkarat Rak-amnouykit, Ana Milanova, Guillaume Baudart, Martin Hirzel, and Julian Dolby. 2021. Extracting Hyperparameter Constraints from Code. In ICLR Workshop on Security and Safety in Machine Learning Systems (SecML@ICLR). https://aisecure-workshop.github.io/aml-iclr2021/papers/18.pdf
[23]
Beatrice Åkerblom, Jonathan Stendahl, Mattias Tumlin, and Tobias Wrigstad. 2014. Tracing Dynamic Features in Python Programs. In Conference on Mining Software Repositories (MSR). 292--295.
[24]
Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. 2011. The Eval That Men Do: A Large-Scale Study of the Use of Eval in JavaScript Applications. In European Conference for Object-Oriented Programming (ECOOP). 52--78.
[25]
Gregor Richards, Sylvain Lebresne, Brian Burg, and Jan Vitek. 2010. An analysis of the dynamic behavior of JavaScript programs. In Conference on Programming Language Design and Implementation (PLDI). 1--12.
[26]
Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. 2021. PyCG: Practical Call Graph Generation in Python. In International Conference on Software Engineering (ICSE). 1646--1657.
[27]
Jason Tsay, Alan Braz, Martin Hirzel, Avraham Shinnar, and Todd Mummert. 2020. AIMMX: Artificial Intelligence Model Metadata Extractor. In Conference on Mining Software Repositories (MSR). 81--92.
[28]
Weijie Zhou, Yue Zhao, Guoqiang Zhang, and Xipeng Shen. 2020. HARP: holistic analysis for refactoring Python-based analytics programs. In International Conference on Software Engineering (ICSE). 506--517.

Cited By

View all
  • (2025)Detecting and Explaining Python Name ErrorsInformation and Software Technology10.1016/j.infsof.2024.107592178(107592)Online publication date: Feb-2025
  • (2024)A Case for Feminism in Programming Language DesignProceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3689492.3689809(205-222)Online publication date: 17-Oct-2024
  • (2024)Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Dataflows to Generate Dataflow Graphs in Dynamically Typed CodeACM Transactions on Software Engineering and Methodology10.1145/367245833:7(1-29)Online publication date: 12-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
May 2022
815 pages
ISBN:9781450393034
DOI:10.1145/3524842
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AST
  2. Python

Qualifiers

  • Research-article

Conference

MSR '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)7
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Detecting and Explaining Python Name ErrorsInformation and Software Technology10.1016/j.infsof.2024.107592178(107592)Online publication date: Feb-2025
  • (2024)A Case for Feminism in Programming Language DesignProceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3689492.3689809(205-222)Online publication date: 17-Oct-2024
  • (2024)Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Dataflows to Generate Dataflow Graphs in Dynamically Typed CodeACM Transactions on Software Engineering and Methodology10.1145/367245833:7(1-29)Online publication date: 12-Jun-2024
  • (2024)Efficient Construction of Practical Python Call Graphs with Entity Knowledge BaseInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450010434:07(999-1024)Online publication date: 22-May-2024
  • (2024)Static analysis driven enhancements for comprehension in machine learning notebooksEmpirical Software Engineering10.1007/s10664-024-10525-w29:5Online publication date: 12-Aug-2024
  • (2023)Enhancing Comprehension and Navigation in Jupyter Notebooks with Static Analysis2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00044(391-401)Online publication date: Mar-2023
  • (2023)TypeScript’s Evolution: An Analysis of Feature Adoption Over Time2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00027(109-114)Online publication date: May-2023
  • (2022)Characterizing Python Method Evolution with PyMevol: An Essential Step Towards Enabling Reliable Software Systems2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW55968.2022.00044(81-86)Online publication date: Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media