Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\newunicodechar

✓✓

PCART: Automated Repair of Python API Parameter Compatibility Issues

Shuai Zhang, Guanping Xiao, Jun Wang, Huashan Lei, Yepang Liu, Yulei Sui, and Zheng Zheng Shuai Zhang, Guanping Xiao, Jun Wang and Huashan Lei are with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. (email: {shuaizhang, gpxiao, junwang, leihuashan}@nuaa.edu.cn)Yepang Liu is with the Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China. (email: liuyp1@sustech.edu.cn)Yulei Sui is with the School of Computer Science and Engineering, University of New South Wales, Sydney, Australia. (email: y.sui@unsw.edu.au)Zheng Zheng is with the School of Automation Science and Electrical Engineering, Beihang University, Beijing, China. (email: zhengz@buaa.edu.cn)Guanping Xiao is the corresponding author.
Abstract

In modern software development, Python third-party libraries have become crucial, particularly due to their widespread use in fields such as deep learning and scientific computing. However, the parameters of APIs in third-party libraries often change during evolution, causing compatibility issues for client applications that depend on specific versions. Due to Python’s flexible parameter-passing mechanism, different methods of parameter passing can result in different API compatibility. Currently, no tool is capable of automatically detecting and repairing Python API parameter compatibility issues. To fill this gap, we propose PCART, the first to implement a fully automated process from API extraction, code instrumentation, and API mapping establishment, to compatibility assessment, and finally to repair and validation, for solving various types of Python API parameter compatibility issues, i.e., parameter addition, removal, renaming, reordering of parameters, as well as the conversion of positional parameters to keyword parameters. We construct a large-scale benchmark PCBench, including 47,478 test cases mutated from 844 parameter-changed APIs of 33 popular Python libraries, to evaluate PCART. The evaluation results show that PCART is effective yet efficient, significantly outperforming existing tools (MLCatchUp and Relancer) and the large language model ChatGPT-4, achieving an F-measure of 96.49% in detecting API parameter compatibility issues and a repair accuracy of 91.36%. The evaluation on 14 real-world Python projects from GitHub further demonstrates that PCART has good practicality. We believe PCART can help programmers reduce the time spent on maintaining Python API updates and facilitate automated Python API compatibility issue repair.

Index Terms:
Python Libraries, API Parameter, Compatibility Issues, Automated Detection and Repair

I Introduction

In recent years, Python has consistently ranked at the top of various programming languages in the TIOBE index. As of April 2024, it accounts for 16.41% of the popularity among all programming languages [1]. The Python package index (PyPI) hosts more than 500,000 third-party libraries in areas ranging from data science, artificial intelligence, web development, to cybersecurity [2]. The rich ecosystem and broad community support have made Python a favorite among data scientists, software engineers, and academic researchers.

However, with the fast-paced development of Python third-party libraries, factors such as code refactoring, bug fixing, and new feature introducing often lead to API parameter changes (e.g., parameter addition, removal, renaming, and order change), resulting in the occurrence of API parameter compatibility issues [3]. As illustrated in Fig. 1, when developers upgrade the dependent NumPy library in their project from the current version (1.9.3) to the target version (1.10.0), the invocation of \mintinlinepythonnumpy.correlate(a, v, ’valid’, False) will crash and throw an exception, i.e., TypeError: correlate() takes from 2 to 3 positional arguments but 4 were given. This is because parameter \mintinlinepythonold_behavior of API \mintinlinepythoncorrelate() is removed in 1.10.0. API parameter compatibility issues cause Python programs to crash or malfunction, posing challenges to software reliability and stability. To repair them, developers have to manually fix their code to accommodate the new library version. This process is both tedious and time-consuming, especially for large projects that build upon multiple libraries.

Refer to caption
Figure 1: An example of Python API parameter compatibility issue.

Therefore, there is an urgent need for a tool that can automatically detect and repair API parameter compatibility issues in Python projects. However, automated detecting and repairing API parameter compatibility issues is not a trivial task, due to Python’s dynamic nature and flexible parameter passing mechanism. Currently, no tool can fully automate the detection and repair of API parameter compatibility issues [4]. Existing effort primarily focuses on the detection of deprecated APIs (i.e., PyCompat [3], DLocator [5], and APIScanner [6]) and the analysis of API evolution and change patterns (i.e., AexPy [7] and pidff [8]). Only a few tools attempt to repair API compatibility issues. MLCatchUp [9] employs a static method to detect and repair deprecated APIs of machine learning libraries within a single source file based on the manually provided API signatures, while Relancer [10] dynamically executes Jupyter Notebook code snippets to iteratively repair deprecated APIs relying on runtime error messages. These tools have the following three main limitations:

  • Limitation 1. The assessment of compatibility in existing tools relies more on the API definitions and does not consider the impact of different methods of parameter passing (e.g., position or keyword) on API compatibility.

  • Limitation 2. Existing tools have a low level of automation. They require manual establishment of API mapping relationships in advance to assist with detection and repair. In addition, the validation of repairs in existing tools also requires manual review or simply uses whether the repaired code can run as the criterion for the correctness of repairs.

  • Limitation 3. The supported repair patterns of existing tools are relatively simple and struggle to cope with the high flexibility in parameter passing and diverse formats of API calls.

Refer to caption
Figure 2: Overview of our PCART approach.

To address these limitations, we propose PCART, an automated repair tool for Python API parameter compatibility issues. As shown in Fig. 2, PCART incorporates static and dynamic approaches to fully automate the detection and repair process, including ❶ invoked API extraction, ❷ instrumentation and execution, ❸ API mapping establishment, ❹ compatibility assessment, and ❺ repair and validation. Given a Python project and the target library version to be updated, PCART precisely detects API parameter compatibility issues in all source files (i.e., \mintinlinepython.py files) and infers the repair actions, without having to establish a database of breaking API changes and their repair solutions in advance. PCART supports the detection and repair of API parameter compatibility issues introduced by various change types, including parameter addition, removal, renaming, reordering, and the conversion of positional parameters to keyword parameters. After automated repair and validation, PCART generates a detailed report, encompassing the detected compatibility issues and their repair solutions.

Besides, to evaluate PCART, we construct a large-scale benchmark, i.e., PCBench, including a total of 47,478 test cases concerning diverse combinations of parameters and their passing methods. The benchmark covers 844 parameter-changed APIs collected from 33 popular Python third-party libraries up to all versions as of July 9, 2023. We conduct a comprehensive comparative analysis of PCART against existing repair tools, i.e., MLCatchUp and Relancer, regarding detection and repair performance. Besides, we also compare PCART with ChatGPT-4, the state-of-the-art large language model (LLM). Moreover, we discuss and evaluate the efficiency and practicality of PCART.

In summary, we make the following key contributions:

  • Approach. We introduce PCART, a precisely and fully automated approach for detecting and repairing API parameter compatibility issues in Python projects. PCART is an open-source tool, which is publicly available at https://github.com/pcart-tools.

  • Benchmark. We construct PCBench, a large-scale benchmark with 47,478 test cases with compatibility labels, covering 844 parameter-changed APIs from 33 popular Python libraries. The large-scale dataset serves as a benchmark for evaluating the detection and repair tools of Python API parameter compatibility issues.

  • Evaluation. We compare PCART with existing repair tools (MLCatchUp and Relancer) on PCBench. The results show that PCART achieves 96.49% F-measure and 91.36% accuracy regarding detection and repair, respectively, both superior to MLCatchUp (86.30%/9.53%) and Relancer (78.52%/0.00%). Besides, the comparison with ChatGPT-4 further demonstrates the effectiveness of PCART.

  • Efficiency and Practicality. We evaluate the efficiency of PCART in detection and repair on PCBench. PCART spends 2735 ms to process an API per test case on average. In addition, we conduct case studies on 14 real-world Python projects from GitHub, verifying the practicality of PCART in detecting and repairing Python API parameter compatibility issues.

The rest of the paper is organized as follows. Section II briefly introduces the characteristics of Python API parameters and the challenges of detecting and repairing API parameter compatibility issues in Python projects. Section III describes the design of PCART. Section IV details the construction of PCBench. Section V presents the evaluation of PCART, including the settings of research questions and experiments. Section VI analyzes the evaluation results and limitations of PCART, while Section VIIdiscusses the threats to validity. Section VIII summarizes related work on API evolution and compatibility issue repair techniques. Finally, Section IX concludes the paper.

II Background and Challenges

II-A Characteristics of Python API Parameters

Python demonstrates exceptional flexibility in its API parameter passing mechanism, markedly contrasting with traditional programming languages such as C/C++ and Java [11]. This flexibility is reflected in several key aspects:

(1) Supports Positional and Keyword Parameters. Python employs a special syntax (i.e., \mintinlinepython*) in API definitions to distinguish positional and keyword parameters, the two fundamental types of parameters. Parameters located before “\mintinlinepython*” are positional parameters, which must be passed in order according to their positions if not accompanied by a parameter name; otherwise, they can be passed out of order if the parameter name is given. Parameters after “\mintinlinepython*” are keyword parameters, which require the inclusion of the parameter name when used; otherwise, a syntax error will occur.

As illustrated in Listing 1, when the API \mintinlinepythonRule of the Rich library is upgraded from version 2.3.1 to 3.0.0, the positional parameters \mintinlinepythoncharacter and \mintinlinepythonstyle are transformed into keyword parameters. Consequently, in Rich version 2.3.1, if called without specifying parameter names, upgrading to version 3.0.0 would result in a compatibility issue. Conversely, if called with parameter names, the upgrade is compatible.

1#API definition in library Rich 2.3.1
2def Rule(title:Union[str,Text]=’’,character:str=None,style:Union[str,Style]=’rule.line’)
3
4#API definition in library Rich 3.0.0
5def Rule(title:Union[str,Text]=’’,*,character:str=None,style:Union[str,Style]=’rule.line’)
6
7from rich.rule import Rule
8#Incompatible calling from Rich 2.3.1 to 3.0.0
9rule = Rule(’’, None, ’rule.line’)
10
11#Compatible calling from Rich 2.3.1 to 3.0.0
12rule = Rule(’’, character=None, style=’rule.line’)
Listing 1: Examples of positional/keyword parameters and different parameter passing methods.

(2) Supports Optional Parameters. In Python API definitions, optional parameters, i.e., parameters with default values, are not necessarily passed during API calls. Listing 2 shows that the API \mintinlinepythonProxy defined in HTTPX version 0.18.2 includes an optional parameter \mintinlinePythonmode, which is removed in 0.19.0. If the removed optional parameter is used with the invocation of API \mintinlinePythonProxy in HTTPX 0.18.2, such a call would become incompatible upon upgrading to the new version 0.19.0. In contrast, if this optional parameter is not used, upgrading HTTPX from 0.18.2 to 0.19.0 remains compatible.

1#API definition in library HTTPX 0.18.2
2def Proxy(self,url:URLTypes,*,headers:HeaderTypes=None,mode:str=’DEFAULT’)
3
4#API definition in library HTTPX 0.19.0
5def Proxy(self,url:URLTypes,*,headers:HeaderTypes=None)
6
7import httpx
8proxy_url = ’http://localhost:8080’
9proxy_headers = {’Custom-Header’: ’Value’}
10
11#Incompatible calling from HTTPX 0.18.2 to 0.19.0
12proxy = httpx.Proxy(proxy_url,headers=proxy_headers,mode=’DEFAULT’)
13
14#Compatible calling from HTTPX 0.18.2 to 0.19.0
15proxy = httpx.Proxy(proxy_url,headers=proxy_headers)
Listing 2: Examples of optional parameter and different parameter passing methods.

(3) Supports Variadic Parameters. Python introduces variadic parameters, i.e., \mintinlinepython*args and \mintinlinepython**kwargs, to permit APIs to accept an arbitrary number of positional and keyword arguments, respectively. As illustrated in Listing 3, the API \mintinlinePythonpdist defined in version 0.19.1 of the SciPy library accepts specific parameters: \mintinlinepythonp, \mintinlinepythonw, \mintinlinepythonV, and \mintinlinepythonVI. However, in version 1.0.0, these specific parameters are replaced with \mintinlinePython*args and \mintinlinePython**kwargs to allow the API to accept a broader range of arguments. Despite the removal of some parameters in version 1.0.0, using these removed parameters in the new version remains compatible.

1#API definition in library SciPy 0.19.1
2def pdist(X, metric=’euclidean’, p=None, w=None, V=None, VI=None)
3
4#API definition in library SciPy 1.0.0
5def pdist(X, metric=’euclidean’, *args, **kwargs)
6
7from scipy.spatial.distance import pdist
8#Compatible calling from SciPy 0.19.1 to 1.0.0
9pdist(X, ’euclidean’, None, None, V=None, VI=None)
Listing 3: Examples of variadic parameters.

The aforementioned flexible parameter-passing mechanism of Python provides developers with a convenient and efficient programming experience, it also impacts the compatibility of APIs during the evolution of Python libraries. Through an in-depth analysis of six popular Python frameworks, Zhang et al. [3] identified that among the 14 common patterns of API changes, eight of which directly involve changes to parameters. This result highlights the prevalence and significance of parameter changes in Python API changes. A recent large-scale empirical study on API breaking changes found that among the 61 cases of breaking API changes, 34 cases are caused by changes to parameters [7], further confirming the prevalence and critical role of parameter changes in breaking API changes. Addressing API parameter compatibility issues automatically can help developers reduce the time spent on manual maintenance of client code impacted by breaking changes to API parameters.

II-B Challenges in Automated Detection and Repair of Python API Parameter Compatibility Issues

To detect and repair of Python API parameter compatibility issues automatically, we face several challenges as follows:

Challenge 1. Compatibility Assessment. How to precisely assess the compatibility of invoked APIs is challenging. First, the compatibility assessment depends not only on the changes in API definitions but also on the actual usage of parameter-passing methods. From Listings 1 to 3, we can observe that breaking parameter changes does not necessarily mean calling the API would cause compatibility issues. The actual usage of parameter-passing methods in user projects greatly impacts API compatibility.

Second, a runnable API invocation does not imply it is truly compatible, as not all API parameter compatibility issues would result in a program crash. As the example shown in Listing 4, the \mintinlinepythonmaxcardinality parameter of the API \mintinlinePythonmin_weight_matching is removed in NetworkX version 3.0. Since this parameter is passed by its position when calling API \mintinlinePythonmin_weight_matching in version 2.8.8, upon upgrading to the new version, its value would erroneously be assigned to the \mintinlinePythonweight parameter, due to the removal of the parameter \mintinlinepythonmaxcardinality. The code snippet is executable without any syntax errors. However, the semantics of the API have been changed in the program.

1#API definition in library NetworkX 2.8.8
2def min_weight_matching(G,maxcardinality=None,weight=’weight’)
3
4#API definition in library NetworkX 3.0
5def min_weight_matching(G,weight=’weight’)
6
7import networkx as nx
8G = nx.Graph()
9#Runnable but incompatible calling from Networkx 2.8.8 to 3.0
10matching = nx.min_weight_matching(G, None)
Listing 4: An example of runnable but incompatible code snippet.

Challenge 2. Automated Establishment of API Mappings. Establishing API mappings automatically is crucial for implementing a fully automated detection and repair tool. Existing tools, such as MLCatchUp [9] and Relancer [10], require users to manually provide the old and new signatures of the updated APIs or to manually pre-build a database of breaking API changes. For detecting and repairing API parameter compatibility issues, we need to establish two types of API mappings: (1) mappings of API signatures between the old and the new library versions: APIoldAPInew𝐴𝑃subscript𝐼𝑜𝑙𝑑𝐴𝑃subscript𝐼𝑛𝑒𝑤API_{old}\rightarrow API_{new}italic_A italic_P italic_I start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_A italic_P italic_I start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT; (2) mappings of parameters between the old and the new API signatures: ParameteroldParameternew𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒subscript𝑟𝑜𝑙𝑑𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒subscript𝑟𝑛𝑒𝑤Parameter_{old}\rightarrow Parameter_{new}italic_P italic_a italic_r italic_a italic_m italic_e italic_t italic_e italic_r start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_P italic_a italic_r italic_a italic_m italic_e italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT. In the following, we discuss the challenges in automated establishing these two types of API mapping relationships.

(1) Establishing API Signature Mappings. To establish APIoldAPInew𝐴𝑃subscript𝐼𝑜𝑙𝑑𝐴𝑃subscript𝐼𝑛𝑒𝑤API_{old}\rightarrow API_{new}italic_A italic_P italic_I start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_A italic_P italic_I start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT mappings, one solution is to extract the definition of the invoked API from the library source code. Existing tools, such as DLocator [5] and PyCompat [3], mainly employ manual or semi-automated approaches to extract API definitions by comparing the call paths of APIs invoked in the project against their real paths in the library source code. However, the effectiveness of such a solution is significantly impacted by the same-name APIs, API aliases, and API overloading in the source code of Python libraries.

On the one hand, the same-name APIs and API aliases may generate multiple uncertain matching results. Developers of Python libraries often use the “\mintinlinepython__init__.py” file and the \mintinlinepythonimport mechanism to create API aliases, aiming at shortening the call paths of public APIs available for user usage [12]. However, API aliases can lead to inconsistencies between an API’s call path in the project and its real path in the library source code. As shown in Fig. 3, in the NumPy source code (version 1.10.0), the real path of API \mintinlinepythonmax is \mintinlinepythonnumpy.core.fromnumeric.amax. Due to the alias and \mintinlinepythonimport mechanism, the call path provided for user invocation is \mintinlinepythonnumpy.max. If the terms “numpy” and “max” are used for comparison, the matching result is uncertain, because there are several other APIs with the same name \mintinlinepythonmax, such as \mintinlinepythonnumpy.core.getlimits.iinfo.max and \mintinlinepythonnumpy.ma.core.MaskedArray.max. Our preliminary statistics of the same-name APIs on 33 popular Python libraries (described in Section IV) find that the proportion of the same-name APIs against the total APIs ranges from 3.90% to 24.38% per version across different libraries.

Refer to caption
Figure 3: An example of alias and import mechanism of Python APIs.

On the other hand, API overloading also poses challenges in precisely establishing APIoldAPInew𝐴𝑃subscript𝐼𝑜𝑙𝑑𝐴𝑃subscript𝐼𝑛𝑒𝑤API_{old}\rightarrow API_{new}italic_A italic_P italic_I start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_A italic_P italic_I start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT mappings. Although Python does not support function overloading, many third-party libraries such as PyTorch, TensorFlow, and NumPy have implemented function overloading through C/C++ extensions. These overloads can automatically select and invoke the correct C/C++ function version based on the types and numbers of arguments passed during API invocation. For example, in PyTorch 1.5.0 [13], the API \mintinlinepythontorch.max has three overloading forms: \mintinlinepythontorch.max(input), \mintinlinepythontorch.max(input, dim, keepdim=False, out=None), and \mintinlinepythontorch.max(input, other, out=None). Due to multiple overloading forms, it is difficult to identify which one is called in the project, even if the definition is correctly extracted from the library source code. This may result in establishing wrong APIoldAPInew𝐴𝑃subscript𝐼𝑜𝑙𝑑𝐴𝑃subscript𝐼𝑛𝑒𝑤API_{old}\rightarrow API_{new}italic_A italic_P italic_I start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_A italic_P italic_I start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT mappings.

(2) Establishing Parameter Mappings. After establishing API signature mappings, it is necessary to establish parameter mappings (i.e., ParameteroldParameternew𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒subscript𝑟𝑜𝑙𝑑𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒subscript𝑟𝑛𝑒𝑤Parameter_{old}\rightarrow Parameter_{new}italic_P italic_a italic_r italic_a italic_m italic_e italic_t italic_e italic_r start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_P italic_a italic_r italic_a italic_m italic_e italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT) for analyzing parameter changes. Intuitively, the establishment of parameter mappings relies on the name of parameters for matching. Once the mapping is determined, further analysis of parameter changes, such as position change or type change, can be conducted. However, establishing correct mappings becomes more complicated when parameter renaming or removal occurs. As shown in Listing 5, given the signatures of TensorFlow API \mintinlinePythonDispatchServer between 2.3.4 and 2.4.0 versions, the parameter \mintinlinePythonstart can be mapped explicitly based on its name between the two versions. However, the relationship between \mintinlinePythonport and \mintinlinePythonprotocol parameters in 2.3.4 and the \mintinlinePythonconfig parameter in the new version 2.4.0 is difficult to determine. Through checking API documents, the changes are removal (\mintinlinePythonport and \mintinlinePythonprotocol) and addition (\mintinlinePythonconfig). Similarly, in the \mintinlinePythondrop API of the Polars library, the parameters \mintinlinePythonname and \mintinlinePythoncolumns from versions 0.14.17 to 0.14.18 might not seem like renaming, yet they actually are.

1#API definition in library Tensorflow 2.3.4
2def DispatchServer(port, protocol=None, start=True)
3
4#API definition in library Tensorflow 2.4.0
5def DispatchServer(config=None, start=True)
6
7
8#API definition in library polars 0.14.17
9def drop(name: ’str | list[str]’)
10
11#API definition in library polars 0.14.18
12def drop(columns: ’str | list[str]’)
Listing 5: Examples of parameter renaming and removal.

Challenge 3. Automated Repair and Validation. Automated repair and validation constitute a critical component in addressing Python API parameter compatibility issues. MLCatchUp [9] repairs compatibility issues based on the manual given API signatures. The static method requires manual effort to validate the repair results, which is error-prone and time-consuming. Relancer [10], on the other hand, repairs by dynamically executing each line of code in Jupyter notebooks, based on runtime error messages, but it is susceptible to the impact of multiple compatibility issues within a single file or across several source files. For example, the code snippet shown in Listing 6 contains several API invocations with compatibility issues in parameters, where the failure to repair any one of these invoked APIs could halt the entire automated repair and validation process. This is because of the sequential code execution. The execution of each API depends on its context within the code, such as the dependency of function \mintinlinePythona.b(z) on the value of parameter \mintinlinePythonz and the return value of \mintinlinePythonA(x, y). If the incompatible invocation of \mintinlinepythonA(x, y) has not been fixed, the return value would not pass to \mintinlinepythona.b(z). Therefore, to repair and validate each API independently, it is necessary to acquire the contextual dependency information of each invoked API within the code.

1foo1(1,x,y) #x was removed
2...
3foo2(x,0.2,y) #x and y swapped positions
4...
5a=A(x,y=1) #y was renamed
6...
7a.b(2) #Conversion to keyword parameter
Listing 6: An example of multiple invoked APIs with compatibility issues in parameters within a single source file.

III Our PCART Approach

III-A Overview

To address the aforementioned challenges in detecting and repairing Python API parameter compatibility issues, we introduce PCART (Fig. 2), which has the following key advantages.

  • Precise Compatibility Assessment. PCART precisely assesses the compatibility of invoked APIs based on the formulation of three information sources, i.e., parameter types (e.g., positional/keyword parameter), change types (e.g., removal/renaming), and parameter passing methods (e.g., positional/keyword passing) (Section III-E).

  • Fully Automated Detection and Repair. PCART establishes API mapping relationships automatically. To establish API signature mappings, PCART introduces a novel code instrumentation and dynamic mapping approach to precisely acquire the API definitions across the current version and the target version to be upgraded (Sections III-C and III-D). Besides, PCART establishes parameter mappings by a rule-based method (Section III-E). Moreover, the validation of repair is also automatically performed in PCART by integrating both dynamic and static validations (Section III-F).

  • Diverse Parameter Changes Support. PCART is capable of extracting several complex forms of API calls (Section III-B) and repairing API parameter compatibility issues raised by various types of changes, i.e., parameter addition, removal, renaming, reordering, and the conversion of positional parameters to keyword parameters (Section III-F).

Fig. 2 shows the overview of PCART. When users plan to upgrade the third-party library dependency in their project to a new version, initially, PCART extracts the APIs related to the upgraded library from the project’s source code. Then, PCART performs code instrumentation for the invoked APIs to save their contextual dependency information by executing the project. Subsequently, it employs both dynamic and static methods to establish accurate API mappings. It then assesses the compatibility of the invoked APIs, and finally, if a compatibility issue is found, PCART repairs and validates the incompatible API invocation to the compatible one.

PCART begins by accepting a configuration file as input (Fig. 4), which contains the following information: path to the project, run command and its entry file path, library name, current version number, target version number, path to the virtual environment of the current version, and path to the virtual environment of the target version. PCART considers each source file in the project as a task to be processed and adds it to a task queue. To accelerate the detection and repair, PCART creates a pool of processes to handle these tasks concurrently. Each process executes a full set of detection and repair procedures.

Refer to caption
Figure 4: The input configuration of PCART.

After processing all source files, PCART outputs the repaired project and the repair report (Fig. 5). The report records each API’s invocation form within the project, invocation location, coverage information of dynamic execution, parameter definitions in both the current and target versions, compatibility status, and the results of the repairs.

Refer to caption
Figure 5: The output repair report of PCART.

Below, we elaborate on the design details of PCART, which consists of five main modules: ❶ invoked API extraction, ❷ instrumentation and execution, ❸ API mapping establishment, ❹ compatibility assessment, and ❺ repair and validation.

III-B Invoked API Extraction

Given a Python project, which typically contains multiple \mintinlinepython.py source files, each file may invoke several APIs of the target third-party library. PCART first converts the source files into abstract syntax trees (ASTs), and then traverses the ASTs to identify all the API calls related to the specified library needing to be upgraded. The source files of the invoked APIs and the line positions within these files are also extracted. Due to the diversity in programming habits among different developers, the API invocation statements in the code vary in form, making it challenging to precisely extract all library-related API calls as strings from the source files by using text processing techniques like regular expressions. Thus, transforming the source files into a uniform format, i.e., AST, facilitates the extraction of invoked APIs. Details of invoked API extraction are presented as follows.

Refer to caption
Figure 6: The AST Structure of A(B(x)).C(y).
1#1. Direct Invocation
2foo(x, y)
3
4#2. Class Object Invocation
5a=A(x)
6a.foo(y, z)
7
8#3. Return Value Invocation
9f(x).foo(y, z)
10
11#4. Argument Invocation
12f(x, foo(y, z))
13
14#5. Inheritance Invocation
15from pkg.module import C
16class Custom(C):
17 def custom_method(self, x, y):
18 self.foo(x, y)
Listing 7: Five typical types of API calls.

(1) Traversing the Abstract Syntax Tree. First, for each \mintinlinepython.py file, PCART uses Python’s built-in AST module to parse the source code into an abstract syntax tree (AST). Then, PCART employs the breadth-first search (BFS) to perform a level-order search on the AST, identifying nodes of types, i.e., \mintinlinepythonAssign, \mintinlinepythonImport, and \mintinlinepythonImportFrom, which correspond to the assignment, import, and the from-import statements, respectively.

Second, for extracting API call statements, PCART performs the depth-first search (DFS) for branch-wise deep searches, due to the storage structure of API calls in the ASTs. As the example shown in Fig. 6, given a complex API call statement \mintinlinepythonA(B(x)).C(y), API \mintinlinepythonC is called through the return value of API \mintinlinepythonA, while API \mintinlinepythonB is called as an argument of API \mintinlinepythonA. Such API invocation format is located on the same branch of the tree. Therefore, the DFS algorithm not only allows for the determination of the sequential relationship between APIs during the search process but also enables the extraction of all potential API calls within the source code. Existing tools, e.g., DLocator [5] and MLCatchUp [9], are unable to accurately identify this type of call format. In contrast, PCART supports five typical types of API calls, i.e., direct invocation, class object invocation, return value invocation, argument invocation, and inheritance invocation (Listing. 7).

(2) Restoring the Conventional API Call Path. PCART combines the \mintinlinepythonAssign and \mintinlinepythonImport nodes to reconstruct the call path of an API, enabling the identification of the third-party library it belongs to. As demonstrated in Listing 8, for the invocation \mintinlinepythona.b(y, z), the assignment statement \mintinlinepythona = A(x) reveals that the variable \mintinlinepythona is an object instantiated from class \mintinlinepythonA. Consequently, \mintinlinepythona.b(y, z) can be restored to \mintinlinepythonA(x).b(y, z). Further analysis of the \mintinlinepythonImportForm statement shows that \mintinlinepythonA is an alias for \mintinlinepythonM, which is imported from \mintinlinepythonpkg.module. Thus, the API call statement \mintinlinepythonA(x).b(y, z) is finally restored to its fully qualified call form as \mintinlinepythonpkg.module.M(x).b(y, z).

1from pkg.module import M as A
2a=A(x)
3a.b(y,z)
Listing 8: Calling a regular API.

(3) Restoring the Path of Inherited API Calls. Beyond conventional API calls, users also invoke APIs from Python libraries using their custom classes through inheritance. For instance, in Listing 9, \mintinlinepythonself.c_method is essentially an API from a third-party library. However, existing tools, e.g., DLocator [5], PyCompat [3], and MLCatchUp [9], fail to extract such API call formats.

To address this issue, PCART first identifies all \mintinlinepythonClassDef type nodes on the AST, corresponding to class definition statements in the code, and then assesses whether each custom class has any inheritance. If so, PCART extracts all custom APIs defined within that class. Subsequently, PCART determines whether each self-invoked API belongs to the class’s custom APIs. If not, it is considered an API from the inherited class. For example, \mintinlinepythonself.c_method would be resolved back to \mintinlinepythonC.c_method, and later further resolved to \mintinlinepythonpkg.module.C.c_method.

1from pkg.module import C
2class Custom(C):
3 def custom_method(self):
4 self.c_method()
Listing 9: An example of inherited API call.

III-C Instrumentation and Execution

To facilitate the automated establishment of API mappings and enable independent repair and validation of each invoked API, we propose a code instrumentation approach to preserve the contextual dependency of invoked APIs. As illustrated in Fig. 7, for each API call, we define two types of contextual dependency: the preceding dependency and the subsequent dependency. The first records the caller information, while the latter preserves parameter values required by runtime.

Refer to caption
Figure 7: Context dependency of an invoked API.
1#1. Direction Invocation
2dic[’foo(x, y)’]=[x, y]
3foo(x, y)
4
5#2. Class Object Invocation
6dic[’A(x)’]=[x]
7a=A(x)
8dic[’@a.foo(y, z)’]=a
9dic[’a.foo(y, z)’]=[y, z]
10a.foo(y, z)
11
12#3. Return Value Invocation
13dic[’f(x)’]=[x]
14dic[’@f(x).foo(y, z)’]=f(x)
15dic[’f(x).foo(y, z)’]=[y ,z]
16f(x).foo(y, z)
17
18#4. Argument Invocation
19dic[’foo(y, z)’]=[y, z]
20dic[’f(x,, foo(y, z)’]=[x, foo(y, z)]
21f(x, foo(y, z))
22
23
24#5. Inheritance Invocation
25from pkg.module import C
26class Custom(C):
27 def custom_method(self, x, y):
28 dic[’@self.foo(x, y)’]=self
29 dic[’self.foo(x, y)’]=[x, y]
30 self.foo(x, y)
31custom=Custom()
32custom.custom_method(x, y)
Listing 10: Instrumentations for the \mintinlinepythonfoo API regarding different formats of API calls.

As shown in Listing 10, PCART inserts corresponding assignment statements (dictionaries) into the code to obtain the contextual dependency of each invoked API. By running the project in the current version (the compatible one), the contextual dependency of the invoked API is recorded in the instrumented dictionaries. Later, the recorded values are serialized and stored in pickle files (\mintinlinepython.pkl) in binary format by utilizing Dill, a powerful library for serializing and de-serializing Python objects [14]. Each invoked API has one corresponding pickle file. The reason for choosing pickle files for storage is that they can effectively save all variable values and Python objects generated during runtime. Moreover, these stored values can be easily retrieved by directly loading the pickle file, without the need to rerun the project. However, instrumentation for every API call may encounter several difficulties, due to different coding styles and API call formats. Below, we discuss five typical types of processing encountered during code instrumentation.

(1) Handling Code Indentation. Python defines the scope of different statements through the use of indentation. However, due to personal habits and differences in integrated development environments (IDEs) used by developers, code may employ various indentation styles, such as spaces and tab characters. Therefore, in the process of code instrumentation, to accurately calculate the indentation for each instrumented statement, PCART converts all tab characters in the code to four spaces by the shell command, i.e., \mintinlineshellexpand -t 4 ”file">"𝑓𝑖𝑙𝑒""file">"italic_f italic_i italic_l italic_e " > "temp” & mv ”temp""𝑡𝑒𝑚𝑝""temp""italic_t italic_e italic_m italic_p " "file”. This ensures the instrumentation can be correctly applied across different projects.

(2) Determining the Location for Instrumentation. PCART first traverses every line of the source file and performs a string-based comparison to locate the lines of invoked APIs according to the API calls obtained from ❶. Then, for each API call, PCART inserts the assignment statement before the line of the API call, applying the same indentation level. This is because when an API call occurs within a \mintinlinepythonreturn statement, such as \mintinlinepythonreturn f(1, 2) in Listing 11, if the instrumentation inserted after the \mintinlinepythonreturn statement, the instrumented statement would not execute during runtime, thereby failing to store the contextual dependency of the invoked API.

1def foo():
2 dic["f(1,2)"]=[1, 2] #Correct Location
3 return f(1,2)
4 dic["f(1,2)"]=[1, 2] #Wrong Location
Listing 11: Instrumentation for API calls in \mintinlinepythonreturn statements.

(3) Handling Line Breaks in Parameter Passing. When an API call passes multiple parameters, developers may opt to write each parameter on a new line to enhance code readability. However, this can certainly complicate code instrumentation, as shown in Listing 12. Therefore, PCART first modifies all statements in the code that pass parameters with line breaks to be on the same line before instrumentation. This ensures the correctness of the instrumentation process.

1#Before Handling
2val= Foo(
3 dic["f1(x)"]=[x] #Wrong Instrumentation
4 f1(x),
5 f2(y),
6 f3(z),
7 )
8
9#After Handling
10#Correct Instrumentation
11dic[’f1(x)’]=[x]
12dic[’f2(y)’]=[y]
13dic[’f3(z)’]=[z]
14dic[’Foo(f1(x), f2(y), f3(z))’]=[f1(x), f2(y), f3(z)]
15val= Foo(f1(x), f2(y), f3(z))
Listing 12: Handling line breaks in parameter passing.

(4) Handling List Comprehensions. List comprehension in Python is a concise and efficient method for creating lists and dictionaries from iterable objects, structured as \mintinlinepython[expression for item in iterable if condition]. As depicted in Listing 13, when the expression is a function call to \mintinlinepythonf, since the parameters \mintinlinepythonx and \mintinlinepythony that \mintinlinepythonf depends on are located inside the list, instrumenting the line before would lead to undefined variable errors. To solve this issue, PCART parses list comprehensions using Python’s built-in AST module to obtain the \mintinlinepythonitem, \mintinlinepythoniterable, and \mintinlinepythoncondition. Then, it transforms them into the form \mintinlinepython[item in iterable if condition]. The first element in the list is selected as the parameter value for the function \mintinlinepythonf.

1#Before Handling
2dic[’f(x,y)’]=[x,y] #NameError:’x’ is not defined
3a=[f(x,y) for x,y in lst if x>0 and y>0]
4
5
6#After Handling
7x,y=[(x,y) for x,y in lst if x>0 and y>0][0]
8dic[’f(x,y)’]=[x,y] #Correct Instrumentation
9a=[f(x,y) for x,y in lst if x>0 and y>0]
Listing 13: Handling API calls in list comprehensions.

(5) Expanding if-else Statements. Another special expression worth mentioning is the \mintinlinepythonif-else statement (Listing 14). Direct instrumentation before the \mintinlinepythonif-else statement could lead to \mintinlinepythona(x) receiving a parameter value less than 0, thereby causing a runtime error potentially. Therefore, PCART expands this conditional expression by modifying the \mintinlinepythonast.IfExp to the \mintinlinepythonast.If type node on the AST.

1#Before Expanding
2def foo():
3 dic[’a(x)’]=[x]
4 dic[’@a(x).b(y, z)’]=a(x) #Wrong Instrumentation
5 dic[’a(x).b(y, z)’]=[y, z]
6 return a(x).b(y, z) if x>0 else x
7
8
9#After Expanding
10def foo():
11 if x>0:
12 dic[’a(x)’]=x
13 dic[’@a(x).b(y,z)’]=a(x) #Correct Instrumentation
14 dic[’a(x).b(y,z)’]=[y, z]
15 return a(x).b(y, z)
16 else:
17 return x
Listing 14: Expanding \mintinlinepythonif-\mintinlinepythonelse statements within \mintinlinepythonreturn statement for instrumentation.

III-D API Mapping Establishment

(1) Dynamic Mapping. To establish API signature mappings (i.e., APIoldAPInew𝐴𝑃subscript𝐼𝑜𝑙𝑑𝐴𝑃subscript𝐼𝑛𝑒𝑤API_{old}\rightarrow API_{new}italic_A italic_P italic_I start_POSTSUBSCRIPT italic_o italic_l italic_d end_POSTSUBSCRIPT → italic_A italic_P italic_I start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT) automatically, PCART first performs dynamic mapping, leveraging Python’s dynamic reflection to obtain the signatures (parameter definitions) of the invoked APIs. Specifically, PCART uses Python’s built-in inspect module, which is part of the Python standard library. The inspect module can inspect, analyze, and collect information about Python objects (e.g., modules, classes, functions, and methods) during runtime [15].

For each invoked API, PCART first generates a Python script under the project’s directory structure. The script imports all necessary modules required for loading the pickle files (created in ❷), including user-defined modules and all third-party library modules. For example, if the project is run with the command \mintinlineshellpython run.py, the \mintinlinepythonimport statement in the generated script is \mintinlinepythonfrom run import *. This step is crucial because if the project source files contain instances of functions or classes from specific modules, loading the pickle files depends on their definitions. Otherwise, Python runtime will not be able to restore these instances for the invoked APIs. PCART then executes this script within the project’s virtual environment (current and target versions), successfully loading the previously saved contextual dependency (e.g., parameter values) from the pickle files into memory.

After loading the pickle files, PCART uses Python’s built-in inspect module to dynamically obtain the signatures of the invoked APIs. As shown in Listing 15, by loading the \mintinlinepythona(x).b(y, z).pkl file into memory, PCART obtains the value of \mintinlinepythona(x), and then retrieves the signature of API \mintinlinepythonb in the current library version through the \mintinlinepythoninspect.signature function. Similarly, to obtain the API signature for the target version, PCART performs the script under the virtual environment of the new library version. Note that to obtain the signature of \mintinlinepythonb, only the preceding contextual dependency (i.e., the value of \mintinlinepythona(x)) is necessary for the inspection.

1import dill
2import inspect
3from run import *
4
5with open(’a(x).b(y, z).pkl’,’rb’) as fr:
6 dic=dill.load(fr)
7
8para_def=inspect.signature(dic[’@a(x).b(y, z)’].b)
9print(para_def) #(x=1, y=2)
Listing 15: Dynamic mapping of API signatures.

It is noted that there exists some API calls for which are unable to dynamically obtain their signatures. The primary reasons are as follows. First, when encountering built-in APIs, such as those in PyTorch and NumPy that are compiled through C/C++ extensions, it becomes impossible to use the inspect module to dynamically acquire API signatures. For example, executing \mintinlineshellinspect.signature(torch.nn.functional.avg_pool2d) would raise ValueError: no signature found for builtin <built-in function avg_pool2d>.

Second, when the module that a pickle file depends on has been changed in the target version, the pickle files generated in the old library version being unable to load in the target version. For example, in Matplotlib 3.6.3, loading a class object instantiated by \mintinlinePythonmatplotlib.pyplot.colorbar in version 3.7.0 will result in an ModuleNotFoundError: No module named ’matplotlib.axes._subplots’. In this case, PCART will attempt to regenerate the pickle files under the virtual environment of the target version (❷). However, if the invoked APIs also have compatibility issues in the target version, it leads to the inability to regenerate pickle files.

(2) Static Mapping. When dynamic mapping fails, PCART resorts to a static mapping method, by matching the API’s call path in the project with its actual path in the library source code to obtain its signature. The rationale for choosing the library source code over the API documentation is that the API definitions in the documentation are largely unstructured, incomplete, and outdated to a certain extent. There is significant variability in the level of detail and format regarding the recorded API across different library API documents. This makes it hard to implement an automated approach with high generalizability. Thus, PCART uses the library source code for the static mapping of API signatures. Details of the static mapping are provided as follows.

(2.1) Extracting APIs Defined in Libraries. First, similar to the processing of project source files, PCART parses each library source file into an AST. Through traversing the AST, all \mintinlinepythonFunctionDef and \mintinlinepythonClassDef nodes are identified, corresponding to the definition statements of functions and classes in the code, respectively. One complicated form of API definitions is the nested definitions, i.e., classes defined within classes and functions defined within functions, as illustrated in Fig. 8. It is imperative to accurately discern the hierarchical relationships between classes and the affiliations among APIs to correctly construct the path of each API within the source code. Thus, the DFS algorithm is employed for navigation.

Refer to caption
Figure 8: AST structure of the nested API definitions in library source code.

In addition, regarding the built-in APIs (described in Section II-B), developers usually declare their definitions in \mintinlinepython.pyi files. Listing 16 shows the declarations of the PyTorch built-in API \mintinlinepythonmax in the \mintinlinepythontorch/__init__.pyi file (version 1.5.0). Hence, PCART attempts to parse \mintinlinepython.pyi files to acquire the definitions of built-in APIs’ overloads.

1class _TensorBase(object):
2 @overload
3 def max(self, dim: _int, keepdim: _bool=False) -> namedtuple_values_indices: ...
4
5 @overload
6 def max(self, dim: Union[str, ellipsis, None], keepdim: _bool=False) -> namedtuple_values_indices: ...
7
8 @overload
9 def max(self, other: Tensor) -> Tensor: ...
10
11 @overload
12 def max(self) -> Tensor: ...
Listing 16: Examples of PyTorch buitin-in API definitions in \mintinlinepython.pyi file.

Finally, by considering the class to which an API belongs, the module that contains the class, and the package that encompasses the module, PCART constructs the actual path of each API within the source code.

(2.2) Adjusting Library API Paths. The actual path of library APIs in the source code often differs from the path provided to users. This inconsistency mainly arises due to the developers leveraging the import mechanism and \mintinlinepython__init__.py files to shorten the API paths available for user calls, as mentioned in challenge 2 (Section II-B).

To address this issue, PCART parses the \mintinlinepythonimport statements in \mintinlinepython__init__.py files across each directory level. This allows us to adjust the API’s actual path in the library source code to match the calling path provided in the API documentation. For instance, as depicted in Fig. 3, by importing the API \mintinlinepythonamax from \mintinlinepythonfromnumeric.py into the \mintinlinepythoncore directory’s namespace within its \mintinlinepython__init__.py file and aliasing it as \mintinlinepythonmax, \mintinlinepythoncore can directly access \mintinlinepythonamax without navigating through the intermediate \mintinlinepythonfromnumeric module. Similarly, by importing all modules from the \mintinlinepythoncore directory into the numpy namespace within numpy’s \mintinlinepython__init__.py, one can ultimately access the \mintinlinepythonamax function in \mintinlinepythonfromnumeric.py directly via \mintinlinepythonnumpy.max.

Moreover, although PCART has attempted to adjust the actual paths of APIs to their calling paths, static mapping may still generates multiple candidates. To resolve this uncertainty, PCART initially saves all candidates in ❸. Subsequently, based on the results of the API compatibility assessment (❹), PCART eliminates the compatible candidates in the repair and validation phase (❺).

Refer to caption


Figure 9: An example of parameter mapping establishment and change analysis in the API \mintinlinepythonfoo across V1 and V2 versions.

III-E Compatibility Assessment

The compatibility of invoked APIs is impacted not only by the changes at the level of API parameters but also by the actual methods in which users pass the parameters. Therefore, PCART first analyzes the change of APIs at the parameter level. Subsequently, by integrating how parameters are passed during actual API calls, PCART precisely assesses whether an invoked API is compatible.

(1) Analyzing API Parameter Change Types. PCART begins by distinguishing between positional and keyword parameters in the API definitions using the identifier “\mintinlinepython*”. Then, PCART establishes mappings between parameters across two library versions (i.e., the current and the target) based on attributes such as parameter name, position, and type. This process is divided into three steps. In the following, we use an example to present the procedures in detail, as depicted in Fig. 9.

Step 1. PCART prioritizes establishing the mapping relationship between parameters based on the consistency of parameter names. For positional parameters, type changes and positional changes are analyzed. For the example illustrated in Fig. 9, the positions of positional parameters \mintinlinePythony and \mintinlinePythonz are changed in version V2. For keyword parameters, since their usage does not depend on position, only type changes need to be analyzed. For instance, the type of keyword parameter \mintinlinePythonu in version V1 changes from \mintinlinePythonfloat to \mintinlinePythonint in version V2. After each round of analysis, mapped parameters are removed from the parameter list to avoid interference with subsequent mapping relationships.

Step 2. To determine whether there are changes from positional parameters to keyword parameters (i.e., Pos2Key) or keyword parameters to positional parameters (i.e., Key2Pos), PCART also uses parameter name consistency to establish the mapping relationship between positional parameters and keyword parameters. For example, the positional parameter \mintinlinePythonw in version V1 becomes a keyword parameter in version V2.

Step 3. For the remaining positional parameters with undetermined mappings, PCART establishes mapping relationships by considering the consistency of both position and type. For each positional parameter in version V1, if a parameter with the same position and type can be found in version V2, they are considered corresponding. At this point, the renaming analysis can be conducted. For example, the positional parameter \mintinlinePythonx in version V1 is renamed to \mintinlinePythona in version V2. For the remaining keyword parameters with undetermined mappings, PCART establishes mapping relationships based on type consistency. For each keyword parameter in version V1, if a parameter with the same type can be found in version V2, they are considered corresponding. For example, the keyword parameter \mintinlinePythonv in version V1 is renamed to \mintinlinePythone in version V2.

After these steps, any parameter in the version V1 parameter list that still does not have a mapping is considered removed in version V2. For instance, the positional parameter \mintinlinePythonb is removed in version V2. On the contrary, parameters in version V2 that remain unmapped are considered newly added parameters, such as the positional parameter \mintinlinePythonc and the keyword parameter \mintinlinePythond in version V2.

PCART saves the final change types to all parameters in a dictionary structure. The keys of the dictionary are variables of tuple type, storing the parameter name and its position. Since the position can be arbitrary when parameters are passed by keywords, the dictionary must be accessed using the parameter name. The values of the dictionary record the change types made to the parameters from versions V1 to V2 with their related changes.

(2) Analyzing Parameter Passing Methods. In the practical usage of Python APIs, there are three typical methods of parameter passing: positional passing, keyword passing, and no passing (applicable to parameters with default values). Positional parameters can be passed either through their position or by specifying their names, while keyword parameters must be passed by specifying their names. If a positional or keyword parameter is assigned a default value, it becomes optional to pass when calling the API.

(3) Formulating Compatibility. In PCART, we propose a model for assessing parameter compatibility, denoted as f(P,E,M)=PEM𝑓𝑃𝐸𝑀𝑃𝐸𝑀f(P,E,M)=P\wedge E\wedge Mitalic_f ( italic_P , italic_E , italic_M ) = italic_P ∧ italic_E ∧ italic_M. Here, P𝑃Pitalic_P represents the type of parameter, with a domain of {p𝑝pitalic_p, k𝑘kitalic_k}, where p𝑝pitalic_p refers to the positional parameter and k𝑘kitalic_k to the keyword parameter. E𝐸Eitalic_E denotes the change type the parameter undergoes. For parameters used in invoked APIs, as parameters themselves do not undergo additions, the domain of E𝐸Eitalic_E is {ΔdsubscriptΔ𝑑\Delta_{d}roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, ΔrsubscriptΔ𝑟\Delta_{r}roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, ΔosubscriptΔ𝑜\Delta_{o}roman_Δ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, ΔpsubscriptΔ𝑝\Delta_{p}roman_Δ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, ΔksubscriptΔ𝑘\Delta_{k}roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, ΔtsubscriptΔ𝑡\Delta_{t}roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT}, where ΔdsubscriptΔ𝑑\Delta_{d}roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT denotes removal, ΔrsubscriptΔ𝑟\Delta_{r}roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT renaming, ΔosubscriptΔ𝑜\Delta_{o}roman_Δ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT a change in parameter order, ΔpsubscriptΔ𝑝\Delta_{p}roman_Δ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT a conversion from keyword to positional parameter, ΔksubscriptΔ𝑘\Delta_{k}roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT a conversion from positional to keyword parameter, and ΔtsubscriptΔ𝑡\Delta_{t}roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents incompatible type change. M𝑀Mitalic_M indicates the method of parameter passing, with a domain of {psubscript𝑝\uparrow_{p}↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, ksubscript𝑘\uparrow_{k}↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, nsubscript𝑛\uparrow_{n}↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT}, where psubscript𝑝\uparrow_{p}↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT stands for positional passing, ksubscript𝑘\uparrow_{k}↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT stands for keyword passing, and nsubscript𝑛\uparrow_{n}↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT represents no passing. By combining these three types of information, we can formulate the compatibility for each parameter passed in an invoked API, as shown in Table I.

For example, pΔdksubscript𝑘𝑝limit-fromsubscriptΔ𝑑absentp\wedge\Delta_{d}\wedge\uparrow_{k}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT means that a positional parameter undergoes a change of removal in the target version and it is passed by keyword. Such parameter usage is incompatible. By contrast, if the parameter is not passed (nsubscript𝑛\uparrow_{n}↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT), it is compatible although it was deleted in the target version. Thus, the overall compatibility for an invoked API is denoted as follows:

CInvokedAPI=i=1nCi=i=1nf(Pi,Ei,Mi),subscript𝐶𝐼𝑛𝑣𝑜𝑘𝑒𝑑𝐴𝑃𝐼superscriptsubscript𝑖1𝑛subscript𝐶𝑖superscriptsubscript𝑖1𝑛𝑓subscript𝑃𝑖subscript𝐸𝑖subscript𝑀𝑖\centering C_{InvokedAPI}=\wedge_{i=1}^{n}C_{i}=\wedge_{i=1}^{n}f(P_{i},E_{i},% M_{i}),\@add@centeringitalic_C start_POSTSUBSCRIPT italic_I italic_n italic_v italic_o italic_k italic_e italic_d italic_A italic_P italic_I end_POSTSUBSCRIPT = ∧ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∧ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (1)

where n𝑛nitalic_n denotes the total number of parameters passed in an invoked API, Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the compatibility of parameter i𝑖iitalic_i, which is formulated by f(Pi,Ei,Mi)𝑓subscript𝑃𝑖subscript𝐸𝑖subscript𝑀𝑖f(P_{i},E_{i},M_{i})italic_f ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). The invoked API is compatible if and only if all parameters are compatible; otherwise, the invoked API is incompatible.

TABLE I: Formulation of API Parameter Compatibility
Parameter Type f(P,E,M)𝑓𝑃𝐸𝑀f(P,E,M)italic_f ( italic_P , italic_E , italic_M ) Compatibility
Positional Parameter pΔdnsubscript𝑛𝑝limit-fromsubscriptΔ𝑑absentp\wedge\Delta_{d}\wedge\uparrow_{n}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
pΔdpsubscript𝑝𝑝limit-fromsubscriptΔ𝑑absentp\wedge\Delta_{d}\wedge\uparrow_{p}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT Incompatible
pΔdksubscript𝑘𝑝limit-fromsubscriptΔ𝑑absentp\wedge\Delta_{d}\wedge\uparrow_{k}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Incompatible
pΔonsubscript𝑛𝑝limit-fromsubscriptΔ𝑜absentp\wedge\Delta_{o}\wedge\uparrow_{n}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
pΔopsubscript𝑝𝑝limit-fromsubscriptΔ𝑜absentp\wedge\Delta_{o}\wedge\uparrow_{p}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT Incompatible
pΔoksubscript𝑘𝑝limit-fromsubscriptΔ𝑜absentp\wedge\Delta_{o}\wedge\uparrow_{k}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Compatible
pΔrnsubscript𝑛𝑝limit-fromsubscriptΔ𝑟absentp\wedge\Delta_{r}\wedge\uparrow_{n}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
pΔrpsubscript𝑝𝑝limit-fromsubscriptΔ𝑟absentp\wedge\Delta_{r}\wedge\uparrow_{p}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT Compatible
pΔrksubscript𝑘𝑝limit-fromsubscriptΔ𝑟absentp\wedge\Delta_{r}\wedge\uparrow_{k}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Incompatible
pΔknsubscript𝑛𝑝limit-fromsubscriptΔ𝑘absentp\wedge\Delta_{k}\wedge\uparrow_{n}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
pΔkpsubscript𝑝𝑝limit-fromsubscriptΔ𝑘absentp\wedge\Delta_{k}\wedge\uparrow_{p}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT Incompatible
pΔkksubscript𝑘𝑝limit-fromsubscriptΔ𝑘absentp\wedge\Delta_{k}\wedge\uparrow_{k}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Compatible
pΔtnsubscript𝑛𝑝limit-fromsubscriptΔ𝑡absentp\wedge\Delta_{t}\wedge\uparrow_{n}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
pΔtpsubscript𝑝𝑝limit-fromsubscriptΔ𝑡absentp\wedge\Delta_{t}\wedge\uparrow_{p}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT Incompatible
pΔtksubscript𝑘𝑝limit-fromsubscriptΔ𝑡absentp\wedge\Delta_{t}\wedge\uparrow_{k}italic_p ∧ roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Incompatible
Keyword Parameter kΔdnsubscript𝑛𝑘limit-fromsubscriptΔ𝑑absentk\wedge\Delta_{d}\wedge\uparrow_{n}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
kΔdksubscript𝑘𝑘limit-fromsubscriptΔ𝑑absentk\wedge\Delta_{d}\wedge\uparrow_{k}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Incompatible
kΔrnsubscript𝑛𝑘limit-fromsubscriptΔ𝑟absentk\wedge\Delta_{r}\wedge\uparrow_{n}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
kΔrksubscript𝑘𝑘limit-fromsubscriptΔ𝑟absentk\wedge\Delta_{r}\wedge\uparrow_{k}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Incompatible
kΔpnsubscript𝑛𝑘limit-fromsubscriptΔ𝑝absentk\wedge\Delta_{p}\wedge\uparrow_{n}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
kΔpksubscript𝑘𝑘limit-fromsubscriptΔ𝑝absentk\wedge\Delta_{p}\wedge\uparrow_{k}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Compatible
kΔtnsubscript𝑛𝑘limit-fromsubscriptΔ𝑡absentk\wedge\Delta_{t}\wedge\uparrow_{n}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Compatible
kΔtksubscript𝑘𝑘limit-fromsubscriptΔ𝑡absentk\wedge\Delta_{t}\wedge\uparrow_{k}italic_k ∧ roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∧ ↑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Incompatible

III-F Repair and Validation

(1) Repair. PCART leverages the change dictionary generated during the compatibility assessment phase ❹, to fix the detected incompatible API invocation of third-party library in the project. The repair types currently supported include parameter addition, removal, renaming, reordering, and the conversion of positional parameters to keyword parameters.

(1.1) Locating Incompatible API Invocations. PCART first utilizes Python’s built-in AST module to convert the source files of the user’s project into an AST. By traversing the AST, PCART identifies all the APIs in the code that require to be fixed. As illustrated in Fig. 10, suppose the API needing to be repaired is \mintinlinePythonA(y).f(x). However, since the BFS algorithm is insensitive to the sequence of API calls during its search process, it may mistakenly repair the wrong API when encountering another API with the same name, i.e., \mintinlinePythonf(x). Existing repair tools like MLCatchUp [9] cannot recognize this situation of API calls. PCART, on the other hand, employs the DFS algorithm, precisely resolving this issue.

Refer to caption
Figure 10: Comparison of AST structures between \mintinlinepythonf(x) and \mintinlinepythonA(y).f(x).

(1.2) Updating AST with Compatible Parameters. PCART repairs incompatible API invocation with compatible parameters by inserting, deleting, or modifying tree nodes on the AST. Besides, since the repair process is recursive, it fixes all instances of breaking API changes in the code in sequence, based on the order and form of API calls in the code (e.g., direct calls, object calls, return value calls, parameter calls, and inheritance calls). MLCatchUp [9] fails to achieve this, as it only recognizes simple call forms. On the other hand, when an API repair fails, PCART will skip the current failed API and proceed to repair the next API, whereas Relancer [10] will stop at a failed API repair attempt.

(1.3) Repairing Incompatible Candidates. As mentioned at the end of ❸ API mapping establishment, the static mapping, i.e., extracting the definition of invoked APIs through matching the API name in library source code, may generate multiple APIs with the same name or APIs with multiple overloads. This leads to multiple API signature mapping candidates.

To address this, PCART first eliminates compatible mapping candidates, as there is no need to fix. The process is performed by iterating through all candidates and determining the correspondence of APIs with the same name across two different versions based on their path names. Given two lists of definitions of an invoked API, if an API definition from the current version list is the same as the one found in the target version list, it is apparently that this mapping candidate can be excluded. In addition, if the current version’s API definition is found to have a compatible one in the target version, assessed by ❹, it is also excluded, as there is no parameter compatibility issue. The remaining mapping candidates are incompatible. For each incompatible candidate, PCART attempts to fix it following the aforementioned procedures (1.1) and (1.2).

(2) Validation. PCART validates a fix via both static and dynamic approaches, as shown in Fig. 11. A fix is considered successful only if both validations are passed.

Static Validation. PCART first constructs a full parameter mirror for the repaired API, which involves adding parameter names to all the parameters in the API. Then, PCART compares the full parameter mirror with the parameter definition of the target version. For parameters passed by name, it is required that the fix must have the parameters with corresponding names found in the definition. For parameters passed by position, the fix (mirror) must have parameters that match both the position and the name in the definition, thereby ensuring the formal correctness of the repaired API. The rationale for performing static formal validation is that a runnable API invocation does not indicate it is necessarily compatible, as discussed in Challenge 1 (Section II-B. Therefore, the fix in the parameters of an invoked API should be formally consistent with the target version’s definition.

Dynamic Validation. For runtime validation, PCART first loads the pickle file (generated in ❷) related to the incompatible API call into memory to obtain the contextual dependency (e.g., parameter values) and then performs individual runtime validation under the virtual environment of the target library version. The runtime validation further assures the correctness of the fix.

Once the repair and validation is complete, the updated AST is converted back to code using \mintinlinePythonast.unparse. After processing all files in the project, PCART generates a repair report (Fig. 5) to better assist users in resolving API parameter compatibility issues.

Refer to caption
Figure 11: Static and dynamic validations of repairs in PCART.

IV PCBench: Benchmark for Python API Parameter Compatibility Issues

To evaluate PCART, we construct a large-scale benchmark, PCBench, providing a baseline for Python API parameter compatibility issues. In the following sections, we present the details of building PCBench, including data collection of popular Python third-party libraries and APIs with parameter changes, test case generation, and the assignment of compatibility labels.

IV-A Collecting Popular Python Third-party Libraries

To construct a representative benchmark, we need to collect popular third-party libraries in the Python ecosystem. The collection procedure consists of three rounds. The first round involves searching on GitHub with the keyword “Python stars:>10000”, resulting in 312 GitHub projects. The second round is to filter these projects to identify popular Python third-party libraries based on two criteria: the project (a) provides relevant APIs for user calls, and (b) has an average daily download count on PyPI of over 100,000 in the most recent week (as of July 9, 2023). After the second round, 55 Python libraries were filtered from the 312 GitHub projects. The selection criteria in the third round include: the library (a) contains comprehensive API documentation and detailed version change logs, and (b) is not a command-line tool. The first criterion helps us identify and collect APIs with parameter changes, while the second criterion ensures the APIs are being called in Python projects rather than executed in the terminal (e.g., bash and Zsh). After three rounds of selection, 33 libraries were finalized, covering domains such as machine learning, natural language processing, image processing, data science, and web frameworks, as shown in Table II.

IV-B Collecting APIs with Parameter Changes

Step 1. Analyzing Change Logs. Our initial task involves manually inspecting the change logs of libraries starting from Python 3 versions to identify APIs that involve parameter changes. These parameter changes mainly include the addition, removal, renaming, reordering of parameters, and the alteration of positional parameters to keyword parameters. Once we identified the changed APIs, we selected the version where the change occurred as the target version and its preceding version as the current version. Next, we extracted the API parameter definitions from the documentation of these two library versions.

Step 2. Generating API Usage. Based on the API definitions of the current version, we used ChatGPT to generate code examples that include all parameters. However, the code generated by ChatGPT may have missing parameters or syntax errors. Therefore, we manually examined and corrected each generated API usage. Then, we created two separate virtual environments for each API using Anaconda (23.5.2) [16]. The two virtual environments install the current and target versions of the library, as well as their related third-party dependencies. This ensures that the generated API usage runs normally in the current version environment. Steps 1 and 2 were performed independently by three authors.

Step 3. Cross Validation. After completing the data collection, two authors with professional experience in Python project development performed a cross-check on the collected data, to ensure the reliability and accuracy of the data. Specifically, the correctness of the API parameter definitions in the current and target versions was validated by using Python’s inspect module, i.e., \mintinlinepythoninspect.signature. Besides, the change types of parameters were confirmed by comparing the API parameter definitions across the two versions. Moreover, we reviewed the API usage to confirm that all parameters were involved in the API calls.

The collection and validation process lasted six months. Finally, we collected 844 APIs with parameter changes from the 33 libraries. The number of collected APIs for each library is presented in Table II.

TABLE II: Distribution of Changed APIs and Test Cases Across 33 Python Third-party Libraries
Library #APIs #Test Cases Library #APIs #Test Cases Library #APIs #Test Cases
PyTorch 4 91 Redis 2 6 HTTPX 8 191
Scipy 193 5,887 Faker 8 24 NetworkX 49 542
Gensim 13 999 LightGBM 1 10 XGBoost 1 20
TensorFlow 19 585 Loguru 5 102 Plotly 52 20,208
Tornado 20 570 SymPy 15 274 Django 3 69
Transformers 1 44 scikit-learn 117 4,620 Pillow 26 344
Requests 2 20 Flask 2 11 JAX 1 28
Matplotlib 21 2,027 Click 4 241 Polars 30 539
FastAPI 4 156 aiohttp 12 153 pandas 73 5,103
NumPy 85 2,006 spaCy 2 19 Rich 33 1,261
Pydantic 1 16 Keras 20 845 Dask 17 467

IV-C Generating Test Cases via Parameter Mutation

To better simulate the diversity and flexibility of parameter passing when calling APIs in users’ projects, we performed parameter mutation on the generated usage of the 844 APIs. The mutation involves changing the number of parameters, the method of parameter passing, and the order of parameters, thereby mutating a substantial number of test cases with different combinations of parameter numbers and parameter-passing methods. Fig. 12 illustrates the process of parameter mutation for the API \mintinlinepythonfoo with parameter definition \mintinlinepython(u, v, w=3, *, x, y=5, z=6) in the current version. Details of parameter mutation are as follows.

Refer to caption
Figure 12: An example of parameter mutation on \mintinlinepythonfoo for generating test cases.

Mutant Operator 1. Choosing Positional Parameters. We started by fixing positional and keyword parameters with no default values into combinations, where positional parameters were passed by position. Parameters with no default values must be passed when invoking APIs. Then, we added positional parameters with default values into the combination, also passed by position. For the API \mintinlinepythonfoo, this mutant operator generates two combinations, i.e., \mintinlinepythonfoo(1, 2, x=4) and \mintinlinepythonfoo(1, 2, 3, x=4), as shown in Fig. 12.

Mutant Operator 2. Changing Positional Parameters Passing Method. Building on the first mutation, we changed the passing method of positional parameters from positional to keyword passing using parameter names. To ensure the syntactic correctness of Python, i.e., parameters passed by name must come after those without names, we added names to parameters from the back to the front. For example, performing this operator on \mintinlinepythonfoo(1, 2, x=4) mutates another two new combinations, i.e., \mintinlinepythonfoo(1, v=2, x=4) and \mintinlinepythonfoo(u=1, v=2, x=4).

Mutant Operator 3. Choosing Keyword Parameter and Shuffling Order. Based on the second operator, we initially selected one keyword parameter from the list containing keyword parameters with default values at a time to ensure that each keyword parameter has the possibility of being used individually. Then, in an incrementally increasing manner, we selected several keyword parameters with default values from the same list and added them to the combination. Finally, we randomly shuffled the order of parameters with names in the combination. Different orders of parameter passing align with practical usage in Python project development, which further complicates the difficulties in the detection and repair of parameter compatibility issues.

The parameter mutation was performed automatically by a script we implemented. We saved every combination generated by the three mutant operators, where the total number of combinations for each API mutation can be calculated using i=nN(i+1)2msuperscriptsubscript𝑖𝑛𝑁𝑖12𝑚\sum_{i=n}^{N}{(i+1)*2m}∑ start_POSTSUBSCRIPT italic_i = italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_i + 1 ) ∗ 2 italic_m, where N𝑁Nitalic_N represents the number of positional parameters, n𝑛nitalic_n represents the number of positional parameters without default values, and m𝑚mitalic_m represents the number of keyword parameters with default values. For the API \mintinlinepythonfoo illustrated in Fig. 12, it has three positional parameters (two of them have default values) and two keyword parameters with default values. Thus, according to the formula, the parameter mutation generates 28 combinations in total. Note that, due to certain APIs having unusable default values or scenarios where only one of two parameter values can be used, some combinations were not feasible (i.e., unable to execute) and thus excluded. Finally, by mutating the parameters of 844 changed APIs, we generated a total of 47,478 test cases. The distribution of these test cases is illustrated in Table II.

IV-D Assigning Compatibility Labels to Test Cases

We conducted a labeling process for the 47,478 test cases containing the 844 APIs to determine their compatibility status within the target versions (i.e., the change occurred). Detailed steps are as follows:

(1) Executing Test Cases. We executed each test case in the target version environment corresponding to the changed API. This leads to 8,136 test cases failed to run, indicating they are incompatible with the target versions. Therefore, the label for these test cases is incompatible. Consequently, we manually analyzed these unrunnable test cases to identify the specific types of parameter changes causing the incompatibilities and categorized them based on these types. Table III shows that the majority of incompatibility is caused by position changes, followed by parameter removal. Moreover, since a test case could involve multiple types of changes, the ‘Total’ row in Table III reflects the intersection of these test cases.

TABLE III: Distribution of API Parameter Changes in Unrunnable Test Cases
Change Type Incompatible Cases Incompatible APIs
Removal 3,138 112
Rename 509 43
Pos2key 307 13
Position 6,013 103
**kwargs 138 15
Total 8,136 207
TABLE IV: Distribution of API Parameter Changes in Runnable Test Cases
Change Type Compatible Cases Compatible APIs Incompatible Cases Incompatible APIs
No Change 27,107 813 0 0
Removal 0 0 188 46
Rename 282 39 87 13
Pos2Key 638 27 873 20
Position 6,668 86 3,441 91
**kwargs 334 15 0 0
Total 34,891 839 4,451 141

(2) Manual Labeling. For the remaining 39,342 test cases that could run successfully, we did not immediately classify them as compatible, as successful execution does not equate to compatibility (discussed in Section II-B). Therefore, we conducted further manual analysis by examining the changes in API definitions before and after the code update and their usage in test cases. We summarized the following rules to determine the compatibility of test cases in the target versions:

Rule 1. Test cases that do not involve changed parameters are considered compatible. They are not affected by parameter changes and are thus compatible with the target versions.

Rule 2. When API definitions in the target versions do not include \mintinlinePython*args and \mintinlinePython**kwargs, test cases using removed parameters are deemed incompatible; for renaming in parameter names, passing by parameter name is considered incompatible, while positional passing is considered compatible. Besides, when a parameter is converted from a positional to a keyword argument, keyword passing is compatible, while positional passing is incompatible. Furthermore, for changes in the position of parameters, again, passing by the parameter name is compatible, while those passed without the parameter names are considered incompatible. Test cases are considered incompatible if there is an incompatible type change in the parameter.

Rule 3. When API definitions in the target versions include \mintinlinePython*args or \mintinlinePython**kwargs, changes such as parameter renaming and parameter removal are considered compatible because \mintinlinePython*args can accept a variadic number of positional arguments, while \mintinlinePython**kwargs can accept a variadic number of keyword arguments, as introduced in Section II-A.

Table IV presents the distribution of compatible and incompatible test cases under different types of parameter changes that are still executable. It can be observed that 27,107 test cases remain compatible because they do not utilize the changed parameters. However, 4,451 test cases are executable but incompatible. The ‘Total’ row calculates the intersection of these test cases.

V Evaluation

V-A Research Questions

Our work mainly focuses on answering the following five research questions (RQs):

RQ1: How does PCART perform in detecting API parameter compatibility issues? Precisely detecting API parameter compatibility issues in Python projects is crucial for maintaining the stability and reliability of Python programs. In this RQ, we evaluate PCART’s performance in detecting API parameter compatibility issues using PCBench and compare it with MLCatchUp [9] and Relancer [10].

RQ2: How does PCART perform in repairing API parameter compatibility issues? Effectively repairing API parameter compatibility issues within user projects can significantly reduce the effort of manually maintaining code updates. In RQ2, we assess PCART’s performance in repairing API parameter compatibility issues using PCBench and compare it with MLCatchUp and Relancer.

RQ3: How does PCART compare to ChatGPT in detecting and repairing API parameter compatibility issues? Recently, advanced large language models (LLMs) like ChatGPT have been applied to many software engineering tasks such as automated program repair [17, 18, 19, 20, 21, 22] and software testing  [23, 24, 25]. This RQ compares the performance of PCART in detecting and repairing API parameter compatibility issues against the state-of-the-art LLM, i.e., ChatGPT-4 [26].

RQ4: What is the time cost of PCART in detecting and repairing API parameter compatibility issues? Automated detection and repair of API parameter compatibility issues are non-trivial. Providing clarity on the time cost of PCART facilitates assessing its efficiency in handling these issues.

RQ5: What is the effectiveness of PCART in real-world Python projects? To validate the practicality of PCART in diverse and complex real-world environments, we perform PCART in real-world Python projects on detecting and repairing API parameter compatibility issues.

V-B Experiment Setup

(1) Settings of Comparison Tools. In this study, we aim to address RQ1, RQ2, and RQ3 by comparing PCART with existing tools and ChatGPT-4.

Settings of MLCatchUp. MLCatchUp [9] is an open-source tool designed to fix deprecated APIs in a single \mintinlinepython.py file. It requires users to manually input the signatures (parameter definitions) of APIs before and after version updates. It then performs repair through static analysis of the project code. In terms of detecting API parameter compatibility issues, MLCatchUp does not provide such functionality. It only outputs the repair operations and results. Therefore, we used the following settings to evaluate the detection performance of MLCatchUp. For compatible test cases, if MLCatchUp’s repair operations do not affect the original compatibility of the test cases, its detection is considered correct; otherwise, it is thus incorrect. For incompatible test cases, if MLCatchUp does not provide any repair operations, it is considered an incorrect detection; if it does provide repair operations, the detection is considered correct. In terms of repairing API parameter compatibility issues, since MLCatchUp does not support automated validation of repair results, we manually reviewed and validated its repair results.

Settings of Relancer. Relancer [10] focuses on repairing deprecated APIs in Jupyter Notebooks by analyzing error messages generated during the code execution. In detecting API parameter compatibility issues, Relancer simply uses whether the code can run normally without crashing as the standard to assess API compatibility. Therefore, it can only detect and repair test cases that cannot run. This means that it considers all runnable test cases as compatible. In terms of repairing API parameter compatibility issues, we determined the success of repairs by automatically parsing the information output by Relancer during the repair process. If the output information contains “This case is fully fixed!”, the repair is considered successful; otherwise, it is regarded as a failure.

Settings of ChatGPT-4. Another solution is to directly inquire with ChatGPT about the compatibility of APIs in test cases across different library versions and their repair results. To answer RQ3, we used the test cases as input to query ChatGPT (GPT-4 classic: gpt-4-0125-preview) for detection and repair. For each test case, we conducted two sets of experiments: one providing the API parameter definitions of the current and target versions in prompts and the other not providing such definitions, using predefined prompt templates, as shown in Fig. 13. To simulate the independence and randomness of users in actual usage, these tests were conducted independently in new session windows on Edge and Chrome browsers, respectively. We evaluated the accuracy by manually checking the results of both responses. The result is considered correct only if both responses are correct; if the responses are inconsistent, the result is incorrect.

Refer to caption
(a) Prompt template without API definition.
Refer to caption
(b) Prompt template with API definition.
Figure 13: Detection and repair prompt templates in ChatGPT-4.

(2) Evaluation Datasets. We constructed three datasets for conducting evaluation experiments.

PCBench. To answer RQs 1, 2, and 4, we used PCBench, introduced in Section IV, to evaluate the performance of PCART, MLCatchUp [9], and Relancer [10] in detecting and repairing API parameter compatibility issues, as well as the efficiency of PCART.

Dataset for ChatGPT-4. Considering the limitations on the number of queries of ChatGPT, it is not feasible to use all the test cases in PCBench for evaluation. Therefore, in RQ3, we randomly selected one compatible and one incompatible test case from the 29 Python third-party libraries included in PCBench to evaluate ChatGPT-4. The remaining four libraries contain only compatible test cases, thus we did not select test cases from these libraries in this RQ.

Real-world Python Projects. For RQ5, we selected real-world Python projects from GitHub as our test dataset. Initially, we filtered out APIs containing incompatible test cases from PCBench. We then searched these APIs on GitHub, applying a filter for Python language. To enhance search efficiency, we included the specific parameters that had changed in these APIs as part of our search keywords. Through this approach, we successfully collected 14 Python projects that have parameter compatibility issues, covering seven popular Python libraries, as shown in Table V. For each project, we first configured the required dependencies based on the \mintinlinepythonrequirements.txt file found in the project. If the file was absent, we manually set up the virtual environment to ensure that the project could run normally.

TABLE V: The Collected Real-world GitHub Python Project Dataset
Project Library Current Version Target Version
allnews [27] Gensim 3.8.3 4.0.0
Youtube-Comedy [28] Gensim 0.12.3 0.12.4
recommendation-engine [29] NetworkX 1.11 2.0
political-polarisation [30] NetworkX 2.8.8 3.0.0
CustomSamplers [31] NumPy 1.9.3 1.10.1
machine-learning [32] NumPy 1.23.5 1.24.0
gistable [33] NumPy 1.23.5 1.24.0
fuel_forecast_explorer [34] pandas 1.5.3 2.0.0
sg-restart-regridder [35] pandas 1.5.3 2.0.0
polars-book-cn [36] Polars 0.16.18 0.17.0
EJPLab_Computational [37] Polars 0.16.18 0.17.0
Deep-Graph-Kernels [38] SciPy 0.19.1 1.0.0
AIBO [39] SciPy 1.7.3 1.10.0
django-selenium-testing [40] Tornado 3.1 5.0

(3) Evaluation Metrics. The evaluation metrics for detection and repair are presented as follows.

Metrics for Detection. In our evaluation, incompatible test cases are defined as positive instances. Accordingly, for each tool, we calculated the following key metrics: true positives (TP), which are the number of incompatible cases correctly detected; false positives (FP), which are the number of compatible test cases erroneously detected as incompatible; and false negatives (FN), which are the number of incompatible test cases wrongly detected as compatible. Based on these metrics, we computed precision, recall, and F-measure using formulas (2), (3), and (4), respectively, to evaluate the performance of PCART and the compared tools in detecting API parameter compatibility issues.

Precision=TPTP+FP𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑇𝑃𝑇𝑃𝐹𝑃Precision=\frac{TP}{TP+FP}italic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_P end_ARG (2)
Recall=TPTP+FN𝑅𝑒𝑐𝑎𝑙𝑙𝑇𝑃𝑇𝑃𝐹𝑁Recall=\frac{TP}{TP+FN}italic_R italic_e italic_c italic_a italic_l italic_l = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_N end_ARG (3)
Fmeasure=2×Precision×RecallPrecision+Recall𝐹𝑚𝑒𝑎𝑠𝑢𝑟𝑒2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑅𝑒𝑐𝑎𝑙𝑙F-measure=2\times\frac{Precision\times Recall}{Precision+Recall}italic_F - italic_m italic_e italic_a italic_s italic_u italic_r italic_e = 2 × divide start_ARG italic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n × italic_R italic_e italic_c italic_a italic_l italic_l end_ARG start_ARG italic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n + italic_R italic_e italic_c italic_a italic_l italic_l end_ARG (4)

Metrics for Repair. We used accuracy as the metric to evaluate the effectiveness of PCART and the compared tools in repairing API compatibility issues. The repair accuracy is defined as follow:

Accuracy=SRAR,𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑆𝑅𝐴𝑅Accuracy=\frac{SR}{AR},italic_A italic_c italic_c italic_u italic_r italic_a italic_c italic_y = divide start_ARG italic_S italic_R end_ARG start_ARG italic_A italic_R end_ARG , (5)

where SR𝑆𝑅SRitalic_S italic_R represents the total number of incompatible test cases successfully repaired, and AR𝐴𝑅ARitalic_A italic_R denotes the total number of incompatible test cases that required repair.

(4) Experiment Environment. Our experiments were conducted on a server running a 64-bit Ubuntu 18.04.1 OS, equipped with two Intel Xeon Gold 6230R CPUs at 2.10GHz (26 cores with 52 threads), three Nvidia RTX 2080Ti GPUs, 160GB of RAM, 256 GB SSD, and 8 TB HDD storage. PCART is implemented using Python 3.9.

VI Results and Analysis

VI-A RQ1: How does PCART Perform in Detecting API Parameter Compatibility Issues?

TABLE VI: Comparison of Detecting API Parameter Compatibility Issues
Library MLCatchUp Relancer PCART
TP FP FN TP FP FN TP FP FN
PyTorch 7 0 0 7 0 0 7 0 0
SciPy 1 23 437 409 0 29 438 6 0
Gensim 430 0 24 390 0 64 441 0 13
TensorFlow 57 47 0 49 0 8 56 0 1
Tornado 238 0 52 53 0 237 286 0 4
Transformers 0 0 0 0 0 0 0 0 0
Requests 0 0 6 2 0 4 6 0 0
Matplotlib 444 0 50 166 0 328 457 0 35
FastAPI 51 0 0 47 0 4 51 0 0
NumPy 298 20 62 334 0 26 360 0 0
Pydantic 0 0 8 8 0 0 4 0 4
Redis 0 0 0 0 0 0 0 0 0
Faker 0 0 0 0 0 0 0 0 0
LightGBM 0 0 0 0 0 0 0 0 0
Loguru 3 0 0 3 0 0 3 0 0
SymPy 25 0 39 33 0 31 63 0 1
scikit-learn 701 623 0 114 0 587 695 0 6
Flask 2 0 0 2 0 0 0 0 2
Click 3 0 0 0 0 3 3 0 0
aiohttp 16 16 0 16 0 0 16 0 0
spaCy 6 0 0 6 0 0 6 0 0
Keras 100 26 0 45 0 55 100 0 0
HTTPX 58 54 7 64 0 1 65 0 0
NetworkX 107 0 38 134 0 11 142 0 3
XGBoost 5 0 0 5 0 0 5 0 0
Plotly 6,275 358 45 3,475 0 2,845 6,320 0 0
Django 3 0 4 7 0 0 7 0 0
Pillow 0 0 7 7 0 0 7 0 0
JAX 12 16 0 12 0 0 12 0 0
Polars 134 0 3 62 0 75 137 0 0
pandas 1,431 0 1,170 2,485 0 116 1,861 0 740
Rich 137 184 44 159 0 22 181 0 0
Dask 47 0 0 42 0 5 8 0 39
Total 10,591 1,367 1,996 8,136 0 4,451 1,1737 6 848
Precision 88.57% 100.00% 99.95%
Recall 84.14% 64.64% 93.26%
F-measure 86.30% 78.52% 96.49%

Table VI shows the comparison results of MLCatchUP, Relancer, and PCART in detecting API parameter compatibility issues on PCBench. Details of TP, FP, and FN across different libraries and the calculated precision, recall, and F-measure are presented in the table. It can be observed that MLCatchUp performs the worst in terms of FP: 1,367 compatible test cases are wrongly detected as incompatible ones. Relancer has the highest number of FN: 4,451 incompatible test cases are erroneously detected as compatible ones. PCART excels in detecting TP: 11,737 incompatible test cases are correctly detected.

MLCatchUp assesses API compatibility solely based on API parameter definitions without adequately considering the impact of actual methods of parameter passing in the invoked APIs, resulting in a high number of FP. In contrast, Relancer evaluates API compatibility simply based on whether test cases can run successfully. Although this strategy effectively avoids wrongly detecting compatible test cases as incompatible ones, i.e., the number of FP counts to zero and achieving a precision rate of 100%, Relancer overlooks those test cases that can run but actually have compatibility issues, leading to a significant increase in FN (i.e., 4,451 test cases).

Our tool, PCART, evaluates API compatibility by considering both the API definitions and the actual parameter-passing methods (Section III-E). PCART achieves the highest TP score of detecting incompatible test cases, significantly outperforming existing tools in both the recall and F-measure metrics of 93.26% and 94.69%, respectively.

The promising detection performance of PCART not only demonstrates the effectiveness of the proposed compatibility assessment (Section III-E) but also validates the effectiveness of the automated API mapping establishment approach (Section III-D), which is the key technique to address the challenge 2 (Section II-B). It should be noted that the API signature mappings (parameter definitions) were manually provided to MLCatchUP when performing the evaluation experiment. However, PCART adopts a dynamic mapping approach, which precisely and automatically obtains the signatures of APIs across different versions.

As shown in Fig. 14, among the successfully detected test cases, 95.46% utilize dynamic mapping to obtain parameter definitions in both the current and target versions, whereas this proportion is also as high as 94.32% in the failed test cases. This implies that most API mappings are established by the dynamic mapping method, while only a small portion of API mappings are built through the static method.

Refer to caption
Figure 14: Proportion of API mapping methods used in PCART across the current and the target versions.

Although PCART excels in accurately detecting API parameter compatibility issues, there is still room for improvement in its performance on the false negatives, with a total of 848 cases not identified correctly. This is mainly attributed to improper handling of parameter mapping relationships, leading to some test cases that are actually incompatible being wrongly detected as compatible, thus later missing necessary fixes. For example, as shown in Listing 5, the parameter \mintinlinePythonport is removed in the new version, and parameter \mintinlinePythonconfig is added. PCART mistakenly analyzes this change as a renaming, and because the parameter \mintinlinepythonport is passed by position, leading to a failed detection.

Correctly detecting parameter renaming is no-trivial, especially when lacking clear semantic information, making it particularly challenging to determine whether a parameter has been renamed or deleted. Besides, due to errors in PCART ’s static mapping, 26 test cases in the benchmark are not correctly analyzed, whose compatibility status is labeled as unknown.

VI-B RQ2: How does PCART Perform in Repairing API Parameter Compatibility Issues?

TABLE VII: Comparison of Repairing API Parameter Compatibility Issues
Library MLCatchUp Relancer PCART
Succeeded Failed Succeeded Failed Succeeded Failed Succeeded* Failed*
PyTorch 0 7 0 7 7 0 7 0
SciPy 0 438 0 438 415 23 415 23
Gensim 285 169 0 454 415 39 415 39
TensorFlow 2 55 0 57 53 1 56 1
Tornado 1 289 0 290 261 4 286 4
Transformers 0 0 0 0 0 0 0 0
Requests 0 6 0 6 6 0 6 0
Matplotlib 333 161 0 494 444 44 444 44
FastAPI 4 47 0 51 0 0 51 0
NumPy 14 346 0 360 75 0 358 2
Pydantic 0 8 0 8 0 8 0 8
Redis 0 0 0 0 0 0 0 0
Faker 0 0 0 0 0 0 0 0
LightGBM 0 0 0 0 0 0 0 0
Loguru 0 3 0 3 3 0 3 0
SymPy 5 59 0 64 43 1 63 1
scikit-learn 412 289 0 701 692 9 692 9
Flask 0 2 0 2 0 2 0 2
Click 0 3 0 3 3 0 3 0
aiohttp 0 16 0 16 1 14 1 15
spaCy 0 6 0 6 6 0 6 0
Keras 0 100 0 100 100 0 100 0
HTTPX 0 65 0 65 57 0 65 0
NetworkX 51 94 0 145 123 22 123 22
XGBoost 0 5 0 5 5 0 5 0
Plotly 6 6,314 0 6,320 6,320 0 6,320 0
Django 3 4 0 7 7 0 7 0
Pillow 0 7 0 7 7 0 7 0
JAX 0 12 0 12 12 0 12 0
Polars 83 54 0 137 117 0 117 0
pandas 0 2,601 0 2,601 1,753 757 1,753 816
Rich 0 181 0 181 133 0 177 4
Dask 0 47 0 47 8 39 8 39
Total 1,199 11,388 0 12,587 11,066 963 11,500 1,029
Repair Accuracy 9.53% 0.00% 87.92% 91.36%

Table VII presents the comparison results of MLCatchUp [9], Relancer [10], and PCART in repairing API parameter compatibility issues on PCBench. Details of succeeded and failed repairs across different libraries are given in the table. Relancer fails in all attempted repairs, achieving a repair accuracy of 0%. In contrast, MLCatchUp achieves an overall accuracy of 9.53%. Notably, PCART exhibits exceptional performance with an accuracy of 91.36%, effectively fixing the majority of incompatible test cases.

The limited repair accuracy of MLCatchUp is primarily due to its constraints in handling repair operations. MLCatchUP does not support the repair of removal or reordering of positional parameters and is only capable of recognizing simple API calls within user code. It fails to address more complex invocation scenarios such as \mintinlinePythona().b() where an API call is made through another API’s return value, or \mintinlinePythona(b()) where an API call is made using another API as an argument.

For example, the TensorFlow API \mintinlinePythonrandom in Listing 17 has a new parameter (i.e., \mintinlinepythonrerandomize_each_iteration) in the target version, but MLCatchUp incorrectly recognizes \mintinlinePythonrandom, and thus applied the new parameter to the API \mintinlinepythontake, leading to a repair failure. Similarly, as shown in Listing 18, the TensorFlow API \mintinlinePythonEmbedding has a new parameter \mintinlinePythonsparse in the target version, but MLCatchUp mistakenly applies this parameter to the API \mintinlinePythonadd, resulting in a repair failure, too.

1import tensorflow as tf
2#Before MLCatchUp repair
3ds1 = tf.data.Dataset.random().take(10)
4#After MLCatchUp Repair
5ds1 = tf.data.Dataset.random().take(10, rerandomize_each_iteration=None)
Listing 17: Calling with a function’s return value.
1import tensorflow as tf
2model = tf.keras.Sequential()
3#Before MLCatchUp repair
4model.add(tf.keras.layers.Embedding(1000, 64))
5#After MLCatchUp repair
6model.add(tf.keras.layers.Embedding(1000, 64), sparse=False)
Listing 18: Calling with a function’s parameter.

Relancer supports modifications to parameter names and values, but since its repair knowledge base is built upon a predefined dataset extracted from GitHub and API documentation, its repair strategies and capabilities are limited to the known API deprecation patterns. Thus, when faced with new or unrecorded code snippets such as those in PCBench, Relancer fails to generate effective repair solutions.

PCART’s repair operations are deduced in real-time based on API parameter definitions and the actual parameter-passing methods of the invoked APIs, without the need for a pre-established repair knowledge base. Hence, it has a broader applicability and a higher repair accuracy than existing tools. Among the 12,587 incompatible test cases in PCBench, 11,066 are confirmed as successfully repaired through automated validation, while 963 failed, and 558 test cases remained unknown. The repair accuracy in the automated validation phase reaches 87.92%. The remaining 558 test cases with unknown repair status, primarily due to failures in the pickle file creation or loading, preventing automated validation. For these unknown cases, manual confirmation later elevates the repair accuracy to 91.36%, noted by “*” in the last two columns of Table VII.

Regarding the 963 incompatible test cases that failed to repair, we find that 848 are due to incorrect parameter mapping relations, which results in failure detection (i.e., detecting incompatible test cases as compatible), as shown in Table VI. Therefore, these mistakenly detected cases would not undergo a repair. For the remaining 115 test cases, the main reasons for repair failure are due to parameter default values and improper handling of \mintinlinePython**kwargs. For instance, in Listing 19, the parameters \mintinlinepythoninclude_start and \mintinlinepythoninclude_end of the Pandas API \mintinlinepythonbetween_time are removed in version 2.0.0. Even though they are successfully repaired by PCART, they could not run because the default value of \mintinlinePythoninclusive also changed, leading to a repair failure. Although PCART technically supports the modification of parameter values, it does not proceed with this step because it is uncertain whether the default values in the new version would align with users’ intentions in the development.

1#API Definition in pandas1.2
2def between_time(start_time, end_time, include_start: ’bool_t | lib.NoDefault’ = <no_default>, include_end: ’bool_t | lib.NoDefault’ = <no_default>, inclusive: ’IntervalClosedType | None’ = None, axis=None)
3
4#API Definition in pandas2.0.0
5def between_time(start_time, end_time, inclusive: ’IntervalClosedType’ = ’both’, axis: ’Axis | None’ = None)
6
7import pandas as pd
8i = pd.date_range(’2018-04-09’, periods=4, freq=’1D20min’)
9ts = pd.DataFrame({’A’: [1, 2, 3, 4]}, index=i)
10#Before repair
11ts.between_time(’0:15’, ’0:45’, True, True, None)
12#After repair
13ts.between_time(’0:15’, ’0:45’, None)
Listing 19: Change in the default value of a parameter.
1#API Definition in Tornado 5.1.1
2def fetch(self,request,callback=None,raise_error=True,**kwargs):
3 ...
4
5
6#API Definition in Tornado 6.0
7def fetch(self,request:Union[str,’HTTPRequest’],raise_error:bool=True,**kwargs:Any):
8 ...
9 if not isinstance(request, HTTPRequest):
10 request = HTTPRequest(url=request, **kwargs)
11 ...
12
13import tornado.httpclient
14import tornado.ioloop
15async def fetch_url():
16 http_client = tornado.httpclient.AsyncHTTPClient()
17 response = await http_client.fetch(’http://example.com’, callback=None)
18tornado.ioloop.IOLoop.current().run_sync(fetch_url)
Listing 20: Passing \mintinlinepython**kwargs to another API within the orignial API.

In addition, when the invoked API in the target version contains \mintinlinepython**kwargs parameter, the renaming or removal of keyword arguments are supposed to be compatible as \mintinlinepython**kwargs can accept a variadic number of keyword arguments. However, in Listing 20, the \mintinlinePythoncallback parameter of the Tornado API \mintinlinepythonfetch is removed in version 6.0. If the \mintinlinePythoncallback parameter continues to be passed in the new version, it would be automatically classified under \mintinlinePython**kwargs. Yet, inside the API \mintinlinepythonfetch, the parameter \mintinlinepython**kwargs is passed to another API, \mintinlinepythonHTTPRequest [41], which does not include \mintinlinePythoncallback in its definitions, thus leading to a syntax error.

VI-C RQ3: How does PCART Compare to ChatGPT in Detecting and Repairing API Parameter Compatibility Issues?

Table VIII compares the detection and repair performance between ChatGPT-4 and PCART on the 58 test cases, i.e., 29 compatible and 29 incompatible test cases, randomly selected in PCBench. We find that without providing API parameter definitions before and after updates, ChatGPT-4 achieves a precision of 78.94%, recall of 51.72%, and an F-measure of 62.50% in detecting API parameter compatibility issues, while its accuracy in repairing these issues is only 24.14%. When API parameter definitions are given, recall and F-measure improve to 72.41% and 68.85%, respectively, while precision declines to 65.53%. The accuracy of repairs increases to 65.29%. Overall, when querying the detection and repair of compatibility issues for the given code snippets, providing the API definitions to ChatGPT-4 can improve its performance.

Compared to ChatGPT-4, PCART demonstrates the best performance, achieving a precision of 96.15%, recall of 86.21%, and an F-measure of 90.91% in detection, successfully identifying 25 incompatible test cases and finally repairing 22 of them. The remaining cases that are not correctly detected or repaired is due to the aforementioned reasons, such as parameter mapping, improper handling of \mintinlinePython**kwargs, and changes in parameter default values.

Although ChatGPT-4 shows potential to address API parameter compatibility issues, its performance is limited by the model’s inherent hallucinations, inconsistencies, and randomness. For example, in nine of the 58 test cases, the responses (independently tested in two browsers) given by ChatGPT-4 are inconsistent.

TABLE VIII: Comparison of PCART and ChatGPT-4 in Detecting and Repairing API Parameter Compatibility Issues
ChatGPT-4 (w./o. Definition) ChatGPT-4 (w. Definition) PCART
Detection Repair Detection Repair Detection Repair
TP FP FN Success TP FP FN Success TP FP FN Success
15 4 14 7 21 11 8 19 25 1 4 22
Precision: 78.94% Accuracy: 24.14% Precision: 65.63% Accuracy: 65.52% Precision: 96.15% Accuracy: 75.86%
Recall: 51.72% Recall:72.41% Recall: 86.21%
F-measure: 62.50% F-measure: 68.85% F-measure: 90.91%

VI-D RQ4: What is the Time Cost of PCART in Detecting and Repairing API Parameter Compatibility Issues?

To answer RQ4, we measure the runtime of each test case and then divide it by the number of target library APIs invoked in the test case to calculate the average runtime for processing one API. Fig. 15 shows that in 90% of the test cases, the average processing time for an API is within 5500 ms, indicating that for most test cases, PCART is efficient in detecting and repairing API parameter compatibility issues. Besides, as depicted in Fig. 16, after removing outliers (i.e., data points exceeding the upper quartile plus 1.5 times the interquartile range or below the lower quartile minus 1.5 times the interquartile range), the average processing time for one API per test case is 2735 ms. Note that there are some test cases with average processing times significantly exceeding the norm, primarily due to APIs related to model training, such as \mintinlinepythongensim.models.fasttext.FastText and \mintinlinepythonsklearn.manifold.TSNE. Such APIs substantially increase the overall processing time.

Refer to caption
Figure 15: Cumulative graph of average time spent by PCART on processing an invoked API in PCBench test cases.
Refer to caption
Figure 16: Box plot of average time spent by PCART on processing an invoked API in PCBench test cases.

To fully automate the process from the detection to the repair of API parameter compatibility issues, PCART adopts both dynamic and static methods. The dynamic processes such as instrumentation and execution to obtain contextual dependency of the invoked APIs, as well as loading contextual dependency to establish API mappings and validate the repairs, significantly increase the time spent by PCART, particularly for large Python projects. However, compared to the manual effort in establishing API mappings, repairing, and later validating the fixes for compatibility issues, we believe PCART is efficient and effective. When leveraging PCART in large Python projects, it is suggested that developers apply some strategies, e.g., reducing the dataset size or setting a small execution epoch (especially for deep learning projects), to reduce the execution time.

VI-E RQ5: What is the Effectiveness of PCART in Real-world Python Projects?

The evaluation of PCART on the collected 14 real-world projects is presented in Table IX. After manual confirmation, PCART correctly identifies all the target library APIs invoked in each project and whether they are covered during execution. In these projects, PCART successfully detects all API parameter compatibility issues, as listed in the TP (incompatible) and TN (compatible) columns of Table IX. PCART further repairs the detected compatibility issues in 11 projects via automated validation, while providing repairs for the remaining three projects (noted by “*”), in which it could not complete the automated validation due to the failure of loading pickle files. For these projects, the manual validation confirmes that PCART’s repairs are all correct. Notably, two projects, “polars-book-cn” and “EJPLab_Computational”, can run but actually have underlying compatibility issues. PCART not only correctly detects these issues but also successfully repairs them. The evaluation demonstrates that PCART has good practicality in detecting and repairing API parameter compatibility issues in practical Python project development.

TABLE IX: Evaluation of PCART on Real-world GitHub Python Projects
Project #APIs #Covered APIs Detection Repair
TP TN Success
allnews 4 4 1 3 1
Youtube-Comedy 1 1 1 0 1
recommendation-engine 21 21 3 18 3*
political-polarisation 8 8 1 7 1
CustomSamplers 1 1 1 0 1
machine-learning 17 17 1 16 1
gistable 2 2 1 1 1
fuel_forecast_explorer 12 12 4 8 4*
sg-restart-regridder 3 3 1 2 1
polars-book-cn 10 10 1 9 1
EJPLab_Computational 12 12 1 11 1
Deep-Graph-Kernels 21 21 1 20 1
AIBO 4 1 1 0 1
django-selenium 7 3 1 2 1*

VI-F Limitations

Although PCART has several advantages, which successfully address the limitations of existing tools in detecting and repairing API parameter compatibility issues, it still possesses some shortages. In the following, we identify and discuss the limitations of PCART.

(1) API Context Serialization. The automated API mapping establishment in PCART mainly relies on the dynamic mapping approach, which requires obtaining the contextual dependency of the invoked APIs. During the instrumentation and execution, PCART captures such contextual dependency of each invoked API by running the instrumented project and serializing the context information of each called API using the Dill library. This serialized context is saved in binary format to pickle files for later direct loading during dynamic mapping and automated validation. Unfortunately, certain variable types, such as file descriptors, sockets, and other similar OS resources, are generally not serializable. For example, for the aiohttp library, an object instantiated with \mintinlinePythonaiohttp.ClientSession() will throw an exception, i.e., TypeError: Cannot serialize socket object, when attempting to serialize it. Besides, even if some variables are serializable, they may not load correctly if the internal modules they depend on have changed during the evolution in the new version. These factors lead to failures in the dynamic mapping and the automated validation phase.

(2) Mapping Relationship Establishment. On one hand, when dynamic mapping fails, PCART dives into the static mapping phase, i.e., extracting the parameter definitions of the invoked APIs from library source code. However, as mentioned in Section II-B, this task is difficult when the fully qualified call path of an API does not match its real path in the source code. Although PCART has converted some API paths in library source code to standard call paths provided by official documentation, there remains a portion of APIs affected by issues such as APIs with the same name, API aliases, and API overloadings. On the other hand, in the compatibility assessment phase, when determining the mapping relationships of API parameters between two versions, situations where the ratio of remaining unmapped parameters from the current to the target version is 1:N:1𝑁1:N1 : italic_N or N:N:𝑁𝑁N:Nitalic_N : italic_N (N>1𝑁1N>1italic_N > 1, represents the number of parameters) make it challenging to accurately determine the mapping relationships, especially in cases involving renaming or removal.

(3) Parameter Type Analysis. PCART mainly relies on parameter annotations to analyze parameter types, comparing the literal values of type annotations to determine if the parameter types have changed between the current and target versions. However, the style of type annotations varies among developers of different libraries, i.e., some use the Python standard type format, while others employ descriptive statements, complicating the analysis of type changes. As such, PCART does not support the repair of type changes and it only uses limited type annotation information to assist in establishing parameter mapping relationships.

(4) Parameter Value Handling. Changes in the default values of parameters in the new versions may also affect API compatibility. Technically, while PCART supports modifying default values of parameters, this action was not taken because we cannot ensure that the modified parameter values meet the actual development needs of the users. In addition, the introduction of new parameters without default values can also lead to compatibility issues. Currently, PCART does not support repairing this type of compatibility issue, as it is difficult to generate parameter values in real-time that satisfy the type requirements.

VII Threats to Validity

Threats to Internal Validity. The main threat to internal validity arises from potential implementation flaws in PCART. To mitigate this threat, we thoroughly examined the implementation logic of our code and used the test results from both the benchmark and real-world projects as feedback to continuously modify and refine PCART. Moreover, the process of manually labeling the compatibility of the test cases in PCBench and manually checking some experimental results may introduce subjective biases. Therefore, we mitigated this type of threat through independent checks and cross-validation of all results by the authors and have made our dataset publicly available for review and reproduction.

Threats to External Validity. The primary threat to external validity comes from the selection of datasets used to evaluate the performance of PCART. To mitigate this threat, we constructed a benchmark comprising 844 APIs with parameter changes, covering 33 popular Python libraries. We further performed three mutant operators on the number of parameters, the method of parameter passing, and the sequence of parameter transmission, to generate PCBench (i.e., 47,478 test cases). We believe PCBench represents the diversity of user calls of parameters in practice. Additionally, we collected 14 real-world projects from GitHub to assess PCART’s effectiveness and practicality in actual environments. Finally, we compared PCART with existing tools, i.e., MLCatchUP [9], Relancer [10], and ChatGPT-4 [26], which are all representative. The experimental results demonstrated that PCART performs best in detecting and repairing API parameter compatibility issues.

Threats to Construct Validity. The primary threat to construct validity lies in the possibility that the performance metrics used to evaluate PCART might not be comprehensive enough. To address this threat, we introduced incompatible test cases as positive cases and separately counted false positives (FP), true positives (TP), and false negatives (FN) in the detection of API parameter compatibility issues, calculating precision, recall, and F-measure. In terms of repairing API parameter compatibility issues, we tallied the successfully repaired test cases and calculated the repair accuracy, thus comprehensively evaluating PCART’s performance through these multidimensional assessment metrics.

VIII Related Work

In this section, we introduce the related work from two aspects: API evolution analysis and compatibility issues repair techniques in Python programming language.

VIII-A API Evolution Analysis

Many studies have summarized the characteristics of API evolution in Python third-party libraries and conducted various analyses to guide developers and practitioners [3, 42, 5]. Zhang et al. [3] presented the first comprehensive analysis of API evolution patterns within Python frameworks. They analyzed six popular Python frameworks and 5,538 open-source projects built on these frameworks. Their research identified five distinct API evolution patterns that are absent within Java libraries. Zhang et al. [42] delved into TensorFlow 2’s API evolution trends by mining relevant API documentation and mapping API changes to functional categories. They determined that efficiency and compatibility stand out as the primary reasons for API changes within TensorFlow 2, constituting 54% of the observed variations.

Du et al. [7] presented a system-level method based on an API model to detect breaking changes in Python third-party libraries. Building upon this, they designed and implemented a prototype tool, AexPy, for detecting documented and undocumented breaking changes in real-world libraries. Montandon et al. [43] found that 79% of the breaking changes in default parameter values in scikit-learn impact machine learning model training and evaluation, leading to unexpected results in client programs reliant on these APIs.

The study of API deprecation has become increasingly prevalent. Wang et al. [5] investigated how Python library developers maintain deprecated APIs. They found that inadequate documentation and declarations for deprecated APIs pose obstacles for Python API users. Vadlamani et al. [6] implemented an extension (APIScanner) that issues warnings when developers use deprecated APIs from Python libraries.

Compared to existing tools that analyze API definition changes in Python libraries, our work focuses on designing an automated approach (PCART) for detecting and repairing parameter compatibility issues within the invoked APIs in user projects. When evaluating API parameter compatibility, PCART comprehensively analyzes both the changes in the API definitions and the actual usage of parameter-passing methods within the invoked APIs. This significantly improves the accuracy of detecting API parameter compatibility issues.

VIII-B Compatibility Issues Repair Techniques

Zhu et al. proposed Relancer [10], an iterative runtime error-driven method with a combined search space derived from API migration examples and API documentation. It combined machine learning models to predict the API repair types required for correct execution, automating the upgrading of deprecated APIs to restore the functionality of breaking Jupyter Notebook runtime environments. Haryono et al. [44] initiated an empirical study to learn how to update deprecated APIs in Python libraries. Subsequently, they introduced MLCatchUp [9], which automatically infers the transformations necessary to migrate deprecated APIs to updated ones based on the differences mined from the manually provided signatures. Recently, Navarro et al. [45] presented a closed source tool to automatically update deprecated APIs in Python projects. This tool retrieves deprecation API information from library change logs through a web crawler to first build a knowledge base, and then detects and recommends fixes in the code via an IDE plugin. However, due to the inherent limitations, it cannot detect and repair compatibility issues arising from API parameter changes.

Compared to existing repair tools, PCART mainly fixes compatibility issues caused by API parameter changes (i.e., addition, removal, renaming, reordering of parameters, as well as the conversion of positional parameters to keyword parameters) in Python libraries. To the best of our knowledge, PCART is the first time to implement a fully automated process from API extraction, code instrumentation, API mapping establishment, to compatibility assessment, and finally to repair and validation, achieving outstanding detection and repair performance.

IX Conclusion

In this paper, we introduced an open-source tool named PCART, which adopts a combined static and dynamic approach to automate the entire process from API extraction, code instrumentation, API mapping establishment, and compatibility assessment, to repair and validation, precisely addressing Python API parameter compatibility issues. To comprehensively evaluate the detection and repair performance of PCART, we constructed a large-scale benchmark PCBench, consisting of 844 parameter-changed APIs from 33 popular Python libraries and a total of 47,478 test cases. The experimental results demonstrate that PCART outperforms existing tools (MLCatchUP and Relancer) as well as the LLM ChatGPT-4 in both the detection and repair of API parameter compatibility issues. Furthermore, we evaluated PCART on 14 real-world Python projects, proving that PCART can accurately detect and successfully repair all incompatible APIs in these projects.

PCART is an exploratory step towards fully automated Python API compatibility issues repair. In the future, we plan to address several limitations in PCART and continue to improve the practicality and effectiveness of PCART by testing it on more real-world projects.

References