Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

PyVerDetector A Chrome Extension Detecting The Python Version of Stack Overflow Code Snippets

This document describes a study of Python code snippets on Stack Overflow to determine how many have compatibility issues between Python versions. The researchers created a Chrome extension called PyVerDetector that detects the Python version required for a code snippet. It analyzes snippets to find syntax differences and versioning errors. The study found that about 10% of snippets in top answers have compatibility issues, and less than 17% of snippets only compatible with Python 2 or 3 are tagged with the required version. The tool helps users select a Python version to check if snippets are compatible.

Uploaded by

amanswaraj007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

PyVerDetector A Chrome Extension Detecting The Python Version of Stack Overflow Code Snippets

This document describes a study of Python code snippets on Stack Overflow to determine how many have compatibility issues between Python versions. The researchers created a Chrome extension called PyVerDetector that detects the Python version required for a code snippet. It analyzes snippets to find syntax differences and versioning errors. The study found that about 10% of snippets in top answers have compatibility issues, and less than 17% of snippets only compatible with Python 2 or 3 are tagged with the required version. The tool helps users select a Python version to check if snippets are compatible.

Uploaded by

amanswaraj007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)

PyVerDetector: A Chrome Extension Detecting the


Python Version of Stack Overflow Code Snippets
2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) | 979-8-3503-3750-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICPC58990.2023.00013

Shiyu Yang1 , Tetsuya Kanda1 , Davide Pizzolotto1 , Daniel M. German2 , Yoshiki Higo1
1 Graduate
School of Information Science and Technology, Osaka University, Osaka, Japan
2 Department
of Computer Science, University of Victoria, Victoria, Canada
yangsy@ist.osaka-u.ac.jp, t-kanda@ist.osaka-u.ac.jp, davidepi@ist.osaka-u.ac.jp, dmg@uvic.ca, higo@ist.osaka-u.ac.jp

Abstract—Over the years, Stack Overflow (SO) has accumu- as the previous version. However, Python is known to break
lated numerous code snippets, with developers going to SO for backward compatibility at almost every release (its backward
problem solutions and code references. However, in the case of compatibility policy, including its rules to break compatibility,
the Python programming language, Python 3 is not necessarily
backward compatible with Python 2. The major implication of is documented in PEP 387 [9]). Python 3.0 significantly broke
this versioning problem is that code written in Python 2 may backward compatibility with Python 2, while most releases
not be interpreted by Python 3 without modifications. This issue make small changes that only affect a small proportion of
may affect the usability of Python code snippets on SO. We features in the language (every release since Python 3.5 has
investigate how many Python code snippets on SO suffer from removed deprecated features needed by older Python programs
version compatibility issues, and find that about 10% of the
snippets exhibit this problem. Moreover, of the code snippets to run). The lack of backward compatibility of Python snippets
that are interpretable only by Python 2 or Python 3, less than may thus reduce their usability.
17% are tagged with the Python version. To understand how common SO Python code snippets have
In this paper, we present a Chrome extension called PyVerDe- version compatibility issues, we conducted an empirical study.
tector. This extension allows the user to select a given version We structure our study by answering the following research
of Python and verifies whether the code snippets on a given SO
question are compatible with the user’s selected Python version, questions:
providing error messages if not. The tool parses snippets and • RQ1: How many Python code snippets have version
can determine versioning errors due to differences in syntax and compatibility issues in the top answers to SO ques-
also provides the user with a list of Python versions capable of tions?
interpreting each code snippet.
Index Terms—Stack Overflow, Python version detector, Com-
About 10% of code snippets have version compatibility
patibility issues in the top answers to questions.
• RQ2: How many of the code snippets interpretable

I. I NTRODUCTION only by Python 2 or only by Python 3 are tagged


with such Python version?
Stack Overflow (SO) is a Q&A website for developers Only about 17% of code snippets interpretable only by
where users can post questions, answer questions, and search Python 2 or only by Python 3 are tagged with the Python
for content. The questions and answers on SO contain nu- version.
merous code snippets, and this vast amount of ready-to-use Based on those results, we noticed that the version com-
code snippets provides developers with an easy way to find patibility issues of Python code snippets on SO could not be
solutions to daily programming problems. Nowadays, copying ignored. When looking for a desired code snippet, users need
code examples from SO is common [1]. to check whether the snippet can be interpreted by their desired
While searching for the required code snippets on SO is Python version, and the tags of questions can not help users
convenient, recent studies have shown that code snippets can solve this issue.
be toxic [2], obsolete [3], [4], and low-quality [5], leading In this work, we have developed a Chrome extension called
to software quality issues [3], [6], license violations [7], or PyVerDetector that detects whether a given code snippet on
migration of security vulnerabilities [8]. SO can be interpreted by the developer’s selected Python
There are many reasons why code snippets on SO are version (2 or 3), and provides an error message if it cannot.
problematic. One of the reasons is the use of outdated pro- This tool is publicly available on GitHub1 . Although some
gramming language features in the code snippet. Existing research tools to detect Python versions have been developed,
programming languages constantly evolve to meet new needs. such as PyComply [10], these tools perform code parsing
Popular programming languages often use versions to indicate locally, which cannot provide real-time, online Python code
their evolution, with more recent versions usually representing snippets version checking for SO users.
more mature forms of the language. Many popular program- In the rest of the paper, in order to save space, we will
ming languages are backward compatible, which means that refer to a specific version of Python by its number only. For
programs compiled with an earlier language version can be
compiled with a later version and exhibit the same behavior 1 https://github.com/ysy-dlg/PyVerDetector

2643-7171/23/$31.00 ©2023 IEEE 25


DOI 10.1109/ICPC58990.2023.00013
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
example, instead of Python 3.0 we represent it as 3.0. TABLE I: Compatibility Of Code Snippets In Top Answer.
Categories #Snippets Percentage
II. E MPIRICAL STUDY Pass all 760,394 60.33%
To investigate the extent of code snippets with version Fail for all 382,275 30.33%
Pass 2&3 131 0.01% Tagged version Percentage
compatibility issues on SO, we use Python interpreters to Only pass 3 48,045 3.81% 10,157 21.14%
check Python code snippets from SO top answers. The reason Only pass 2 69,597 5.52% 9,122 13.11%
for choosing top answers lies in the fact that they are expected
to be correct answers and most likely to be used by other users.
Of the above categories, the results are shown in the first
A. Data Collection three columns of Table I. As shown in the table, “Pass 2&3”,
To conduct the empirical study, we first need to obtain code “Only pass 3”, and “Only pass 2” are the three types of code
snippets written in Python from the top answers posted on SO. snippets with version compatibility issues.
We determine the top answer for each question based on the In addition, the data shows that nearly 30% of the code
number of upvotes for the answer. We used the following two snippets are not executable by any Python version in this
criteria to identify the code snippets required for this study: study. This means that these code snippets fail for all Python
versions. There are several possible reasons: (1) Code snippets
• Code snippet from top answers with at least one question
using old Python versions such as 2.0 and 3.0. (2) Answers
tag containing the word “Python”.
tagged “Python” containing non-Python code snippets, such as
• Posting data is the latest version.
program output and code snippets written in other languages.
We used SOTorrent [11], version SOTorrent20 03, to extract (3) Programming errors, such as syntax errors.
Python code snippets on SO. Based on our criteria above, Answer to RQ1: About 10% of code snippets have version
we extracted 1,256,503 questions. Based on these questions compatibility issues in the top answers to questions.
we extracted 2,427,602 answers, of which 698,506 were top 2) RQ2: How many of the code snippets interpretable only
answers. Considering that an answer may contain multiple by Python 2 or only by Python 3 are tagged with such Python
snippets, we extracted a total of 1,260,442 code snippets. version?
B. Code Snippet Analysis Questions on SO require tags to be assigned to describe the
topic of the question. “Python 2.x” and “Python 3.x” tags now
We used multiple Python interpreters to parse Python code
exist on SO for users to tag questions for Python 2 or Python
snippets; one major release of Python 2 (2.7) and four major
3 only. For example, if the “Python 3.x” tags correctly identify
releases of Python 3 (3.5 to 3.8). Other versions, older versions
a code snippet that cannot be used by Python 2, then users
which are difficult to prepare an execution environment, and
can avoid using that code snippet in Python 2. We would like
newer versions released after the dataset was released were
to know how many of the code snippets that are interpretable
excluded. A code snippet is considered to be interpretable by a
only by Python 2 or interpretable only by Python 3 are tagged
Python version if it can be parsed by the Python interpreter for
with such Python version.
that Python version. This is checked by attempting to compile For this purpose, we further processed the parsing results of
the script using the py_compile module. Otherwise, the the code snippets. The results are shown in Table I. The fourth
code snippet is deemed uninterpretable by that Python version. row of the table is a code snippet that can only be interpreted
C. Results by Python 3. Its fourth column corresponds to the number of
code snippets that have the tag “Python 3.x” in the question.
1) RQ1: How many Python code snippets have version
The last row of the table is for code snippets that can only
compatibility issues in the top answers to SO questions?
be interpreted by Python 2. The fourth column corresponds to
Analyzing the obtained Python interpreter parsing results of
the number of code snippets with the “Python 2.x” tag in the
code snippets, we found that the code snippets can be divided
question.
into the following five categories:
Answer to RQ2: Only about 17% of code snippets inter-
• Pass all versions (Pass all): The code snippet passes
pretable only by Python 2 or only by Python 3 are tagged with
parsing for Python 2 and all Python 3 versions. the Python version.
• Fail for all versions (Fail for all): The code snippet fails
It is clear that the version compatibility issues of code
for parsing for Python 2 and any Python 3 versions. snippets exist at SO and cannot be ignored. However, since
• Pass Python 2, Fail for some Python 3 (Pass 2&3): The
code snippets are not well tagged with the Python version,
code snippet passes parsing for Python 2 and at least one SO users cannot simply determine the Python version of the
parsing for Python 3, but not all Python 3 versions. snippet by the tag, which may lead to misuse of the snippet.
• Fail for Python 2, Pass all or some Python 3 (Only
Therefore, we need to provide a tool for users to determine
pass 3): The code snippet fails for Python 2 parsing but the Python version of Python code snippets on SO.
passes all or some Python 3 parsing.
• Fail for all Python 3, Pass Python 2 (Only pass 2): III. P Y V ER D ETECTOR
The code snippet fails for any Python 3 parsing but passes We developed a Chrome extension, PyVerDetector to help
Python 2 parsing. SO users address the issue of version compatibility of Python

26

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
B. Backend
The backend part is inspired by CPython and Py-
Comply [10]. Since it is not practical to wrap multiple execu-
tion environments of the old Python versions inside a Chrome
extension, we decided to use an Abstract Syntax Tree (AST)
parser following the grammar of Python. If the Python snippet
can be parsed, with a grammar for a specific Python version,
we assume the snippet is compliant for the particular version.
We used a combination of Flex2 and Bison3 to generate the
code snippet’s AST, reporting compliance only in case this
AST is generated successfully.
Naively wrapping the output of Flex and Bison, however, is
Fig. 1: Overview of PyVerDetector.
not sufficient. Unlike PyComply, the web-based nature of our
tool required us to perform heavy modifications to the gen-
erated parsers in order to support asynchronous invocations,
code snippets. PyVerDetector has two main features: and join multiple parsers together in a single executable. In
1) For each code snippet inside a code block, the tool our approach, a code snippet is tested against each grammar
determines if the code snippet is compiled without errors sequentially, and the various error message collected in a
using the user-selected Python version. JSON message to be sent to the frontend.
2) If not, the tool provides the error message and the Finally, in order to run the backend inside a web extension,
location of the error. we used the Emscripten toolchain4 to compile the original C
PyVerDetector consists of two components: a frontend part code into cross-platform WebAssembly to be bundled inside
(running on the user’s browser) responsible for fetching the the extension.
code snippet from SO and displaying the results, and a C. Python Grammars and Extensibility
backend part (running on a server) responsible for statically
Being our tool based on the Flex and Bison parser gen-
analyzing the given code snippet across multiple Python
erators, it implies the necessity of having an input grammar
versions. The overview of PyVerDetector is shown in Figure 1.
representing the Python Language. While writing a different
Upon loading a SO page, for each Python snippet, the frontend
grammar for each Python version is certainly not impossible,
calls the backend and retrieves the parse result containing
being up-to-date with the annual Python release schedule by
all the supported Python versions for that snippet. Finally,
manually writing a new grammar for each release would re-
the frontend alters the page to present the result to the user.
quire considerable effort nonetheless. Fortunately, the Python
PyVerDetector supports versions (2.0 to 2.7, 3.0 to 3.8).
website provides the full changelog5 and grammar of each
released version since Python 2.26 .
A. Frontend These grammars, however, are written for a LL(1) parser,
The frontend has two main features: while Bison is an LALR(1) parser. Despite every LL(1)
grammar being LR(1), but not necessarily LALR(1) [12], these
1) Format code snippet: The frontend fetches Python code
two in practice have a great intersection. For this reason, we
snippets from the page, formats the code snippets of the REPL
managed to write a tool to convert the provided grammars from
mode, and sends them to the backend for parsing. The frontend
LL(1) EBNF syntax found in the Python archives to LALR(1)
also copies the formatted code snippet to the clipboard for the
Bison syntax expected by our parser generator.
user to use.
Unfortunately, since version 3.10, CPython switched from
2) Display result: The result from the backend contains a LL(1) parser to a PEG parser [13], with the grammars being
the parsing results of the code snippet for all Python versions. provided only in PEG syntax since Python 3.9. Converting
The frontend presents them to the user according to the Python from a PEG grammar to LALR(1) is not as easy as converting
version selected by the user in the drop-down menu inserted from a LL(1) to LALR(1). In fact, the equivalence between
below the code snippet. The frontend shows the following two PEG and Context-Free Grammars such as EBNF has been
types of messages on the page: proven undecidable [14]. For this reason, our extension works
• The parsing result of the user-selected Python version with Python versions up to 3.8, but to extend it to future
(default: 3.8). If the code snippet passes the parsing of versions of Python, an additional parser for PEGs should be
that version, the message “No error for Python X.X” wrapped alongside the current Context-Free Grammar parser.
is displayed. Otherwise, the error message and the line
2 https://github.com/westes/flex
number of the error that occurred are displayed. 3 https://www.gnu.org/software/bison
• If there are some Python versions other than the selected 4 https://emscripten.org/
one which passes the parsing of the code snippet, output 5 https://docs.python.org/3/whatsnew/changelog.html

them like “Also works for: Python 3.8”. 6 https://docs.python.org/release/2.2/ref/grammar.txt

27

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Example of how the extension works in the default version. (Left): Parse error, (Right): Pass.

TABLE II: Comparison results of PyVerDetector (PyVer) and


PyComply (PyC)
Precision Recall Accuracy
Python Version
PyVer PyC PyVer PyC PyVer PyC
Ver2.7 98.28% 98.28% 99.95% 99.95% 98.82% 98.82%
Ver3.5 98.02% 98.02% 99.98% 99.98% 98.72% 98.72%
Ver3.6 97.98% 97.98% 99.98% 99.98% 98.67% 98.67%
Ver3.7 97.98% - 100.00% - 98.68% -
Ver3.8 97.98% - 100.00% - 98.67% -

using the same method as described in Section II-A. We use


the parsing results of Python interpreters as the ground truth.
The accuracy of PyVerDetector and PyComply was measured
Fig. 3: User manually selects a version of Python, incompat-
by comparing their respective parsing results with the ground
ible with the given snippet.
truth in terms of code snippets. We evaluate PyVerDetector us-
ing the well-known metrics for binary classification: precision,
IV. U SAGE S CENARIOS recall, and accuracy.
The results are shown in Table II. We can see that PyVerDe-
A. Displaying the latest version as default
tector (PyVer) and PyComply (PyC) have the same accuracy
The default value displayed by PyVerDetector is the parsing for the three Python versions 2.7, 3.5, and 3.6. However,
result for the latest available version. As shown in Figure 2, PyVerDetector can provide code detection for the two newer
when the user opens a SO page7 with some Python code Python versions, 3.7 and 3.8, and has shown high accuracy in
snippets, PyVerDetector will immediately display its parsing both versions.
results for 3.8 for all Python code snippets on the page. We use
green to show pass messages and red to show error messages.
VI. C ONCLUSION
This allows the user to quickly get an at-a-glance view of the
compatibility of the code snippets for a recent Python version. In this paper, we conducted an empirical study to understand
B. Accurate display of results for selected versions the extent of Python code snippets in SO that have version
compatibility issues. We found that version compatibility
When the user wants to know the compatibility of a code
issues exist in SO code snippets. In response, we developed a
snippet for a particular Python version, PyVerDetector can
Chrome extension, PyVerDetector, which can identify version-
show the user exactly the relevant information. As shown in
ing errors due to different syntax. PyVerDetector helps users
Figure 3, the same question in Figure 2, the user has selected
detect whether the code snippets on a given SO question are
3.7, and PyVerDetector returns an error message that the code
compatible with their selected Python version and provides
snippet cannot be interpreted by 3.7 because the “Positional-
error messages if not. We evaluated PyVerDetector by com-
only parameters” is used in the fourth line of the snippet. This
paring it with PyComply, showing comparable performance
feature, in fact, has only been supported since 3.8.
but with newer versions supported by our tool, making code
V. E VALUATION OF ACCURACY snippet detection on SO more convenient and efficient.
In this section, we evaluate the accuracy of PyVerDetector
and compare it with an existing tool: PyComply [10]. ACKNOWLEDGMENT
We apply PyVerDetector and PyComply to the dataset of top
answer code snippets and obtain their respective parsing results This work was supported by JSPS KAKENHI Grant Num-
bers JP20H04166, JP21K18302, JP21K11820, JP21H04877,
7 https://stackoverflow.com/questions/28243832 JP22H03567, JP22K11985, JP19K20239.

28

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] S. Baltes and S. Diehl, “Usage and attribution of stack overflow code
snippets in github projects,” Empirical Software Engineering, vol. 24,
no. 3, pp. 1259–1295, 2019.
[2] C. Ragkhitwetsagul, J. Krinke, M. Paixao, G. Bianco, and R. Oliveto,
“Toxic code snippets on stack overflow,” IEEE Transactions on Software
Engineering, vol. 47, no. 3, pp. 560–581, 2019.
[3] H. Zhang, S. Wang, T.-H. Chen, Y. Zou, and A. E. Hassan, “An empirical
study of obsolete answers on stack overflow,” IEEE Transactions on
Software Engineering, vol. 47, no. 4, pp. 850–862, 2019.
[4] J. Zhou and R. J. Walker, “Api deprecation: a retrospective analysis and
detection method for code examples on the web,” in Proceedings of the
2016 24th ACM SIGSOFT International Symposium on Foundations of
Software Engineering, 2016, pp. 266–277.
[5] Y. Wu, S. Wang, C.-P. Bezemer, and K. Inoue, “How do developers uti-
lize source code from stack overflow?” Empirical Software Engineering,
vol. 24, no. 2, pp. 637–673, 2019.
[6] F. Fischer, K. Böttinger, H. Xiao, C. Stransky, Y. Acar, M. Backes, and
S. Fahl, “Stack overflow considered harmful? the impact of copy&paste
on android application security,” in 2017 IEEE Symposium on Security
and Privacy (SP), 2017, pp. 121–136.
[7] L. An, O. Mlouki, F. Khomh, and G. Antoniol, “Stack overflow: A code
laundering platform?” in 2017 IEEE 24th International Conference on
Software Analysis, Evolution and Reengineering (SANER), 2017, pp.
283–293.
[8] M. Verdi, A. Sami, J. Akhondali, F. Khomh, G. Uddin, and A. K.
Motlagh, “An empirical study of c++ vulnerabilities in crowd-sourced
code examples,” IEEE Transactions on Software Engineering, 2020.
[9] B. Peterson and B. Cannon, “Backwards compatibility policy,” PEP
387, 2009. [Online]. Available: https://peps.python.org/pep-0617/
[10] B. A. Malloy and J. F. Power, “Quantifying the transition from python
2 to 3: An empirical study of python applications,” in 2017 ACM/IEEE
International Symposium on Empirical Software Engineering and Mea-
surement (ESEM), 2017, pp. 314–323.
[11] S. Baltes, C. Treude, and S. Diehl, “Sotorrent: Studying the origin, evo-
lution, and usage of stack overflow code snippets,” in 2019 IEEE/ACM
16th International Conference on Mining Software Repositories (MSR),
2019, pp. 191–194.
[12] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: principles,
techniques, & tools, 2nd ed. Pearson, 2007, p. 242.
[13] G. van Rossum, P. Galindo, and L. Nikolaou, “New peg parser for
cpython,” PEP 617, 2020. [Online]. Available: https://peps.python.org/
pep-0617/
[14] B. Ford, “Parsing expression grammars: a recognition-based syntactic
foundation,” in Proceedings of the 31st ACM SIGPLAN-SIGACT sym-
posium on Principles of programming languages, 2004, pp. 111–122.

29

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on December 22,2023 at 06:39:55 UTC from IEEE Xplore. Restrictions apply.

You might also like