PyVerDetector A Chrome Extension Detecting The Python Version of Stack Overflow Code Snippets
PyVerDetector A Chrome Extension Detecting The Python Version of Stack Overflow Code Snippets
Shiyu Yang1 , Tetsuya Kanda1 , Davide Pizzolotto1 , Daniel M. German2 , Yoshiki Higo1
1 Graduate
School of Information Science and Technology, Osaka University, Osaka, Japan
2 Department
of Computer Science, University of Victoria, Victoria, Canada
yangsy@ist.osaka-u.ac.jp, t-kanda@ist.osaka-u.ac.jp, davidepi@ist.osaka-u.ac.jp, dmg@uvic.ca, higo@ist.osaka-u.ac.jp
Abstract—Over the years, Stack Overflow (SO) has accumu- as the previous version. However, Python is known to break
lated numerous code snippets, with developers going to SO for backward compatibility at almost every release (its backward
problem solutions and code references. However, in the case of compatibility policy, including its rules to break compatibility,
the Python programming language, Python 3 is not necessarily
backward compatible with Python 2. The major implication of is documented in PEP 387 [9]). Python 3.0 significantly broke
this versioning problem is that code written in Python 2 may backward compatibility with Python 2, while most releases
not be interpreted by Python 3 without modifications. This issue make small changes that only affect a small proportion of
may affect the usability of Python code snippets on SO. We features in the language (every release since Python 3.5 has
investigate how many Python code snippets on SO suffer from removed deprecated features needed by older Python programs
version compatibility issues, and find that about 10% of the
snippets exhibit this problem. Moreover, of the code snippets to run). The lack of backward compatibility of Python snippets
that are interpretable only by Python 2 or Python 3, less than may thus reduce their usability.
17% are tagged with the Python version. To understand how common SO Python code snippets have
In this paper, we present a Chrome extension called PyVerDe- version compatibility issues, we conducted an empirical study.
tector. This extension allows the user to select a given version We structure our study by answering the following research
of Python and verifies whether the code snippets on a given SO
question are compatible with the user’s selected Python version, questions:
providing error messages if not. The tool parses snippets and • RQ1: How many Python code snippets have version
can determine versioning errors due to differences in syntax and compatibility issues in the top answers to SO ques-
also provides the user with a list of Python versions capable of tions?
interpreting each code snippet.
Index Terms—Stack Overflow, Python version detector, Com-
About 10% of code snippets have version compatibility
patibility issues in the top answers to questions.
• RQ2: How many of the code snippets interpretable
26
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
B. Backend
The backend part is inspired by CPython and Py-
Comply [10]. Since it is not practical to wrap multiple execu-
tion environments of the old Python versions inside a Chrome
extension, we decided to use an Abstract Syntax Tree (AST)
parser following the grammar of Python. If the Python snippet
can be parsed, with a grammar for a specific Python version,
we assume the snippet is compliant for the particular version.
We used a combination of Flex2 and Bison3 to generate the
code snippet’s AST, reporting compliance only in case this
AST is generated successfully.
Naively wrapping the output of Flex and Bison, however, is
Fig. 1: Overview of PyVerDetector.
not sufficient. Unlike PyComply, the web-based nature of our
tool required us to perform heavy modifications to the gen-
erated parsers in order to support asynchronous invocations,
code snippets. PyVerDetector has two main features: and join multiple parsers together in a single executable. In
1) For each code snippet inside a code block, the tool our approach, a code snippet is tested against each grammar
determines if the code snippet is compiled without errors sequentially, and the various error message collected in a
using the user-selected Python version. JSON message to be sent to the frontend.
2) If not, the tool provides the error message and the Finally, in order to run the backend inside a web extension,
location of the error. we used the Emscripten toolchain4 to compile the original C
PyVerDetector consists of two components: a frontend part code into cross-platform WebAssembly to be bundled inside
(running on the user’s browser) responsible for fetching the the extension.
code snippet from SO and displaying the results, and a C. Python Grammars and Extensibility
backend part (running on a server) responsible for statically
Being our tool based on the Flex and Bison parser gen-
analyzing the given code snippet across multiple Python
erators, it implies the necessity of having an input grammar
versions. The overview of PyVerDetector is shown in Figure 1.
representing the Python Language. While writing a different
Upon loading a SO page, for each Python snippet, the frontend
grammar for each Python version is certainly not impossible,
calls the backend and retrieves the parse result containing
being up-to-date with the annual Python release schedule by
all the supported Python versions for that snippet. Finally,
manually writing a new grammar for each release would re-
the frontend alters the page to present the result to the user.
quire considerable effort nonetheless. Fortunately, the Python
PyVerDetector supports versions (2.0 to 2.7, 3.0 to 3.8).
website provides the full changelog5 and grammar of each
released version since Python 2.26 .
A. Frontend These grammars, however, are written for a LL(1) parser,
The frontend has two main features: while Bison is an LALR(1) parser. Despite every LL(1)
grammar being LR(1), but not necessarily LALR(1) [12], these
1) Format code snippet: The frontend fetches Python code
two in practice have a great intersection. For this reason, we
snippets from the page, formats the code snippets of the REPL
managed to write a tool to convert the provided grammars from
mode, and sends them to the backend for parsing. The frontend
LL(1) EBNF syntax found in the Python archives to LALR(1)
also copies the formatted code snippet to the clipboard for the
Bison syntax expected by our parser generator.
user to use.
Unfortunately, since version 3.10, CPython switched from
2) Display result: The result from the backend contains a LL(1) parser to a PEG parser [13], with the grammars being
the parsing results of the code snippet for all Python versions. provided only in PEG syntax since Python 3.9. Converting
The frontend presents them to the user according to the Python from a PEG grammar to LALR(1) is not as easy as converting
version selected by the user in the drop-down menu inserted from a LL(1) to LALR(1). In fact, the equivalence between
below the code snippet. The frontend shows the following two PEG and Context-Free Grammars such as EBNF has been
types of messages on the page: proven undecidable [14]. For this reason, our extension works
• The parsing result of the user-selected Python version with Python versions up to 3.8, but to extend it to future
(default: 3.8). If the code snippet passes the parsing of versions of Python, an additional parser for PEGs should be
that version, the message “No error for Python X.X” wrapped alongside the current Context-Free Grammar parser.
is displayed. Otherwise, the error message and the line
2 https://github.com/westes/flex
number of the error that occurred are displayed. 3 https://www.gnu.org/software/bison
• If there are some Python versions other than the selected 4 https://emscripten.org/
one which passes the parsing of the code snippet, output 5 https://docs.python.org/3/whatsnew/changelog.html
27
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Example of how the extension works in the default version. (Left): Parse error, (Right): Pass.
28
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] S. Baltes and S. Diehl, “Usage and attribution of stack overflow code
snippets in github projects,” Empirical Software Engineering, vol. 24,
no. 3, pp. 1259–1295, 2019.
[2] C. Ragkhitwetsagul, J. Krinke, M. Paixao, G. Bianco, and R. Oliveto,
“Toxic code snippets on stack overflow,” IEEE Transactions on Software
Engineering, vol. 47, no. 3, pp. 560–581, 2019.
[3] H. Zhang, S. Wang, T.-H. Chen, Y. Zou, and A. E. Hassan, “An empirical
study of obsolete answers on stack overflow,” IEEE Transactions on
Software Engineering, vol. 47, no. 4, pp. 850–862, 2019.
[4] J. Zhou and R. J. Walker, “Api deprecation: a retrospective analysis and
detection method for code examples on the web,” in Proceedings of the
2016 24th ACM SIGSOFT International Symposium on Foundations of
Software Engineering, 2016, pp. 266–277.
[5] Y. Wu, S. Wang, C.-P. Bezemer, and K. Inoue, “How do developers uti-
lize source code from stack overflow?” Empirical Software Engineering,
vol. 24, no. 2, pp. 637–673, 2019.
[6] F. Fischer, K. Böttinger, H. Xiao, C. Stransky, Y. Acar, M. Backes, and
S. Fahl, “Stack overflow considered harmful? the impact of copy&paste
on android application security,” in 2017 IEEE Symposium on Security
and Privacy (SP), 2017, pp. 121–136.
[7] L. An, O. Mlouki, F. Khomh, and G. Antoniol, “Stack overflow: A code
laundering platform?” in 2017 IEEE 24th International Conference on
Software Analysis, Evolution and Reengineering (SANER), 2017, pp.
283–293.
[8] M. Verdi, A. Sami, J. Akhondali, F. Khomh, G. Uddin, and A. K.
Motlagh, “An empirical study of c++ vulnerabilities in crowd-sourced
code examples,” IEEE Transactions on Software Engineering, 2020.
[9] B. Peterson and B. Cannon, “Backwards compatibility policy,” PEP
387, 2009. [Online]. Available: https://peps.python.org/pep-0617/
[10] B. A. Malloy and J. F. Power, “Quantifying the transition from python
2 to 3: An empirical study of python applications,” in 2017 ACM/IEEE
International Symposium on Empirical Software Engineering and Mea-
surement (ESEM), 2017, pp. 314–323.
[11] S. Baltes, C. Treude, and S. Diehl, “Sotorrent: Studying the origin, evo-
lution, and usage of stack overflow code snippets,” in 2019 IEEE/ACM
16th International Conference on Mining Software Repositories (MSR),
2019, pp. 191–194.
[12] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: principles,
techniques, & tools, 2nd ed. Pearson, 2007, p. 242.
[13] G. van Rossum, P. Galindo, and L. Nikolaou, “New peg parser for
cpython,” PEP 617, 2020. [Online]. Available: https://peps.python.org/
pep-0617/
[14] B. Ford, “Parsing expression grammars: a recognition-based syntactic
foundation,” in Proceedings of the 31st ACM SIGPLAN-SIGACT sym-
posium on Principles of programming languages, 2004, pp. 111–122.
29
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.