Issue Downloads
Fair Enough: Searching for Sufficient Measures of Fairness
Testing machine learning software for ethical bias has become a pressing current concern. In response, recent research has proposed a plethora of new fairness metrics, for example, the dozens of fairness metrics in the IBM AIF360 toolkit. This raises the ...
Toward Understanding Deep Learning Framework Bugs
DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such a wide effect demonstrates the necessity and importance of guaranteeing DL ...
UniLoc: Unified Fault Localization of Continuous Integration Failures
Continuous integration (CI) practices encourage developers to frequently integrate code into a shared repository. Each integration is validated by automatic build and testing such that errors are revealed as early as possible. When CI failures or ...
TestSGD: Interpretable Testing of Neural Networks against Subtle Group Discrimination
Discrimination has been shown in many machine learning applications, which calls for sufficient fairness testing before their deployment in ethic-relevant domains. One widely concerning type of discrimination, testing against group discrimination, mostly ...
Automatic Core-Developer Identification on GitHub: A Validation Study
Many open-source software projects are self-organized and do not maintain official lists with information on developer roles. So, knowing which developers take core and maintainer roles is, despite being relevant, often tacit knowledge. We propose a ...
JavaScript SBST Heuristics to Enable Effective Fuzzing of NodeJS Web APIs
JavaScript is one of the most popular programming languages. However, its dynamic nature poses several challenges to automated testing techniques. In this paper, we propose an approach and open-source tool support to enable white-box testing of JavaScript ...
XCoS: Explainable Code Search Based on Query Scoping and Knowledge Graph
When searching code, developers may express additional constraints (e.g., functional constraints and nonfunctional constraints) on the implementations of desired functionalities in the queries. Existing code search tools treat the queries as a whole and ...
Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software Systems
Upon receiving a new issue report, practitioners start by investigating the defect type, the potential fixing effort needed to resolve the defect and the change impact. Moreover, issue reports contain valuable information, such as, the title, description ...
What Quality Aspects Influence the Adoption of Docker Images?
Docker is a containerization technology that allows developers to ship software applications along with their dependencies in Docker images. Developers can extend existing images using them as base images when writing Dockerfiles. However, a lot of ...
CodeEditor: Learning to Edit Source Code with Pre-trained Models
Developers often perform repetitive code editing activities (up to 70%) for various reasons (e.g., code refactoring) during software development. Many deep learning (DL) models have been proposed to automate code editing by learning from the code editing ...
Open Problems in Fuzzing RESTful APIs: A Comparison of Tools
RESTful APIs are a type of web service that are widely used in industry. In the past few years, a lot of effort in the research community has been spent in designing novel techniques to automatically fuzz those APIs to find faults in them. Many real ...
Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability Detection
AI models of code have made significant progress over the past few years. However, many models are actually not learning task-relevant source code features. Instead, they often fit non-relevant but correlated data, leading to a lack of robustness and ...
An Empirical Study on GitHub Pull Requests’ Reactions
The pull request mechanism is commonly used to propose source code modifications and get feedback from the community before merging them into a software repository. On GitHub, practitioners can provide feedback on a pull request by either commenting on ...
Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse
Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit ...
An Accurate Identifier Renaming Prediction and Suggestion Approach
Identifiers play an important role in helping developers analyze and comprehend source code. However, many identifiers exist that are inconsistent with the corresponding code conventions or semantic functions, leading to flawed identifiers. Hence, ...
Dependency Update Strategies and Package Characteristics
Managing project dependencies is a key maintenance issue in software development. Developers need to choose an update strategy that allows them to receive important updates and fixes while protecting them from breaking changes. Semantic Versioning was ...
DeepPatch: Maintaining Deep Learning Model Programs to Retain Standard Accuracy with Substantial Robustness Improvement
Maintaining a deep learning (DL) model by making the model substantially more robust through retraining with plenty of adversarial examples of non-trivial perturbation strength often reduces the model’s standard accuracy. Many existing model repair or ...
Optimization Techniques for Model Checking Leads-to Properties in a Stratified Way
We devised the L+1-layer divide & conquer approach to leads-to model checking (L+1-DCA2L2MC) and its parallel version, and developed sequential and parallel tools for L+1-DCA2L2MC. In a temporal logic called UNITY, designed by Chandy and Misra, the leads-...
Revisiting the Identification of the Co-evolution of Production and Test Code
Many software processes advocate that the test code should co-evolve with the production code. Prior work usually studies such co-evolution based on production-test co-evolution samples mined from software repositories. A production-test co-evolution ...
Exploring the Impact of Code Clones on Deep Learning Software
Deep learning (DL) is a really active topic in recent years. Code cloning is a common code implementation that could negatively impact software maintenance. For DL software, developers rely heavily on frameworks to implement DL features. Meanwhile, to ...
PatchCensor: Patch Robustness Certification for Transformers via Exhaustive Testing
In the past few years, Transformer has been widely adopted in many domains and applications because of its impressive performance. Vision Transformer (ViT), a successful and well-known variant, attracts considerable attention from both industry and ...
Tiny, Always-on, and Fragile: Bias Propagation through Design Choices in On-device Machine Learning Workflows
Billions of distributed, heterogeneous, and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast, and offline inference on personal data. On-device ML is highly context dependent and sensitive to user, usage, hardware, ...
Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective
Deep learning (DL) has become a key component of modern software. In the “big model” era, the rich features of DL-based software (i.e., DL software) substantially rely on powerful DL models, e.g., BERT, GPT-3, and the recently emerging GPT-4, which are ...
Pre-implementation Method Name Prediction for Object-oriented Programming
Method naming is a challenging development task in object-oriented programming. In recent years, several research efforts have been undertaken to provide automated tool support for assisting developers in this task. In general, literature approaches ...
Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic Analysis
Vulnerability is a major threat to software security. It has been proven that binary code similarity detection approaches are efficient to search for recurring vulnerabilities introduced by code sharing in binary software. However, these approaches suffer ...
A Systematic Review of Automated Query Reformulations in Source Code Search
Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they ...
NSFuzz: Towards Efficient and State-Aware Network Service Fuzzing
As an essential component responsible for communication, network services are security critical, thus, it is vital to find their vulnerabilities. Fuzzing is currently one of the most popular software vulnerability discovery techniques, widely adopted due ...
NSFuzz: Towards Efficient and State-Aware Network Service Fuzzing - RCR Report
We provide artifacts to reproduce the evaluation results of our article: “NSFuzz: Towards Efficient and State-Aware Network Service Fuzzing”. The provided artifacts can be downloaded from . It includes 14 docker containers,...