TOSEM: Vol 34, No 2

research-article

My Fuzzers Won’t Build: An Empirical Study of Fuzzing Build Failures

Article No.: 29, Pages 1–30https://doi.org/10.1145/3688842

Fuzzing is an automated software testing technique used to find software vulnerabilities that works by sending large amounts of inputs to a software system to trigger bad behaviors. In recent years, the open source software ecosystem has seen a ...

research-article

Open Access

Software Product Line Engineering via Software Transplantation

Article No.: 31, Pages 1–27https://doi.org/10.1145/3695987

Software Product Lines (SPLs) improve time-to-market, enhance software quality, and reduce maintenance costs. Current SPL reengineering practices are largely manual and require domain knowledge. Thus, adopting and, to a lesser extent, maintaining SPLs are ...

research-article

Open Access

A Large-Scale Study of IoT Security Weaknesses and Vulnerabilities in the Wild

Article No.: 32, Pages 1–40https://doi.org/10.1145/3691628

Internet of Things (IoT) is defined as the connection between places and physical objects (i.e., things) over the internet/network via smart computing devices. IoT is a rapidly emerging paradigm that now encompasses almost every aspect of our modern life. ...

research-article

Systematic Literature Review of Commercial Participation in Open Source Software

Article No.: 33, Pages 1–31https://doi.org/10.1145/3690632

Open source software (OSS) has been playing a fundamental role in not only information technology but also our social lives. Attracted by various advantages of OSS, increasing commercial companies are participating extensively in open source development, ...

research-article

T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax

Article No.: 34, Pages 1–31https://doi.org/10.1145/3690631

Program reduction strives to eliminate bug-irrelevant code elements from a bug-triggering program, so that (1) a smaller and more straightforward bug-triggering program can be obtained, (2) and the difference among duplicates (i.e., different programs ...

research-article

Open Access

Improving Fault Localization with External Oracle by Using Counterfactual Execution

Article No.: 35, Pages 1–22https://doi.org/10.1145/3695997

We present Flex, a new approach to improve fault localization with external oracles. Spectrum-based fault localization techniques estimate suspicious statements based on the execution trace of the test suite. State-of-the-art techniques rely on test ...

research-article

Is It Hard to Generate Holistic Commit Message?

Article No.: 36, Pages 1–28https://doi.org/10.1145/3695996

Commit messages are important for developers to understand the content and the reason for code changes. However, poor and even empty commit messages widely exist. To improve the quality of commit messages and development efficiency, many commit message ...

research-article

Structured Chain-of-Thought Prompting for Code Generation

Article No.: 37, Pages 1–23https://doi.org/10.1145/3690635

Large Language Models (LLMs) have shown impressive abilities in code generation. Chain-of-Thought (CoT) prompting is the state-of-the-art approach to utilizing LLMs. CoT prompting asks LLMs first to generate CoTs (i.e., intermediate natural language ...

research-article

Automatic Identification of Game Stuttering via Gameplay Videos Analysis

Article No.: 38, Pages 1–29https://doi.org/10.1145/3695992

Modern video games are extremely complex software systems and, as such, they might suffer from several types of post-release issues. A particularly insidious issue is constituted by drops in the frame rate (i.e., stuttering events), which might have a ...

research-article

ZigZagFuzz: Interleaved Fuzzing of Program Options and Files

Article No.: 39, Pages 1–31https://doi.org/10.1145/3697014

Command-line options (e.g., -l, -F, -R for ls) given to a command-line program can significantly alternate the behaviors of the program. Thus, fuzzing not only file input but also program options can improve test coverage and bug detection. In this ...

research-article

A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools

Article No.: 40, Pages 1–63https://doi.org/10.1145/3696002

Software undergoes constant changes to support new requirements, address bugs, enhance performance, and ensure maintainability. Thus, developers spend a great portion of their workday trying to understand and review the code changes of their teammates. ...

research-article

Identifying the Failure-Revealing Test Cases in Metamorphic Testing: A Statistical Approach

Article No.: 41, Pages 1–26https://doi.org/10.1145/3695990

Metamorphic testing, thanks to its high failure-detection effectiveness especially in the absence of test oracle, has been widely applied in both the traditional context of software testing and other relevant fields such as fault localization and program ...

research-article

Open Access

An Empirical Study of the Non-Determinism of ChatGPT in Code Generation

Article No.: 42, Pages 1–28https://doi.org/10.1145/3697010

There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-deterministically returning very different code for the ...

research-article

Open Access

Deep API Sequence Generation via Golden Solution Samples and API Seeds

Article No.: 44, Pages 1–21https://doi.org/10.1145/3695995

Automatic API recommendation can accelerate developers’ programming and has been studied for years. There are two orthogonal lines of approaches for this task, i.e., information retrieval-based (IR-based) approaches and sequence to sequence (seq2seq) ...

research-article

Non-Flaky and Nearly Optimal Time-Based Treatment of Asynchronous Wait Web Tests

Article No.: 45, Pages 1–29https://doi.org/10.1145/3695989

Asynchronous waits are a common root cause of flaky tests and a major time-influential factor of Web application testing. We build a dataset of 49 reproducible asynchronous wait flaky tests and their fixes from 26 open source projects to study their ...

research-article

Open Access

Demo2Test: Transfer Testing of Agent in Competitive Environment with Failure Demonstrations

Article No.: 46, Pages 1–28https://doi.org/10.1145/3696001

The competitive game between agents exists in many critical applications, such as military unmanned aerial vehicles. It is urgent to test these agents to reduce the significant losses caused by their failures. Existing studies mainly are to construct a ...

SECTION: Continuous Special Section: AI and SE

research-article

Open Access

Anatomizing Deep Learning Inference in Web Browsers

Article No.: 47, Pages 1–43https://doi.org/10.1145/3688843

Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the Quality of Experience (QoE) ...

research-article

QuanTest: Entanglement-Guided Testing of Quantum Neural Network Systems

Article No.: 48, Pages 1–32https://doi.org/10.1145/3688840

Quantum Neural Network (QNN) combines the deep learning (DL) principle with the fundamental theory of quantum mechanics to achieve machine learning tasks with quantum acceleration. Recently, QNN systems have been found to manifest robustness issues ...

research-article

NLPLego: Assembling Test Generation for Natural Language Processing Applications

Article No.: 49, Pages 1–36https://doi.org/10.1145/3691631

With the development of Deep Learning, Natural Language Processing (NLP) applications have reached or even exceeded human-level capabilities in certain tasks. Although NLP applications have shown good performance, they can still have bugs like traditional ...

research-article

AutoRIC: Automated Neural Network Repairing Based on Constrained Optimization

Article No.: 50, Pages 1–29https://doi.org/10.1145/3690634

Neural networks are important computational models used in the domains of artificial intelligence and software engineering. Parameters of a neural network are obtained via training it against a specific dataset with a standard process, which guarantees ...

research-article

Interpretable Failure Localization for Microservice Systems Based on Graph Autoencoder

Article No.: 52, Pages 1–28https://doi.org/10.1145/3695999

Accurate and efficient localization of root cause instances in large-scale microservice systems is of paramount importance. Unfortunately, prevailing methods face several limitations. Notably, some recent methods rely on supervised learning which ...

SECTION: Continuous Special Section: Human-Centric SE

research-article

Non-Linear Software Documentation with Interactive Code Examples

Article No.: 54, Pages 1–32https://doi.org/10.1145/3702976

Documentation enables sharing knowledge between the developers of a technology and its users. Creating quality documents, however, is challenging: Documents must satisfy the needs of a large audience without being overwhelming for individuals. We address ...

SECTION: Survey

research-article

Patch Correctness Assessment: A Survey

Article No.: 55, Pages 1–50https://doi.org/10.1145/3702972

Most automated program repair methods rely on test cases to determine the correctness of the generated patches. However, due to the incompleteness of available test suites, some patches that pass all the test cases may still be incorrect. This issue is ...

SECTION: Replicated Computational Results (RCR) Report

research-article

A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms—RCR Report

Article No.: 56, Pages 1–7https://doi.org/10.1145/3702985

This article represents the Replicated Computational Results (RCR) related to our TOSEM paper “A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms,” where we proposed LAFF, an approach to automatically suggest ...

ACM Transactions on Software Engineering and Methodology

Sections

My Fuzzers Won’t Build: An Empirical Study of Fuzzing Build Failures

Software Product Line Engineering via Software Transplantation

A Large-Scale Study of IoT Security Weaknesses and Vulnerabilities in the Wild

Systematic Literature Review of Commercial Participation in Open Source Software

T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax

Improving Fault Localization with External Oracle by Using Counterfactual Execution

Is It Hard to Generate Holistic Commit Message?

Structured Chain-of-Thought Prompting for Code Generation

Automatic Identification of Game Stuttering via Gameplay Videos Analysis

ZigZagFuzz: Interleaved Fuzzing of Program Options and Files

A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools

Identifying the Failure-Revealing Test Cases in Metamorphic Testing: A Statistical Approach

An Empirical Study of the Non-Determinism of ChatGPT in Code Generation

Deep API Sequence Generation via Golden Solution Samples and API Seeds

Non-Flaky and Nearly Optimal Time-Based Treatment of Asynchronous Wait Web Tests

Demo2Test: Transfer Testing of Agent in Competitive Environment with Failure Demonstrations

Anatomizing Deep Learning Inference in Web Browsers

QuanTest: Entanglement-Guided Testing of Quantum Neural Network Systems

NLPLego: Assembling Test Generation for Natural Language Processing Applications

AutoRIC: Automated Neural Network Repairing Based on Constrained Optimization

Interpretable Failure Localization for Microservice Systems Based on Graph Autoencoder

Non-Linear Software Documentation with Interactive Code Examples

Patch Correctness Assessment: A Survey

A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms—RCR Report

Sections

Save to Binder

Subjects

Comments