Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–23 of 23 results for author: Kang, H J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18881  [pdf, other

    cs.HC cs.LG cs.SE

    Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking

    Authors: Hong Jin Kang, Fabrice Harel-Canada, Muhammad Ali Gulzar, Violet Peng, Miryung Kim

    Abstract: Data augmentation techniques apply transformations to existing texts to generate additional data. The transformations may produce low-quality texts, where the meaning of the text is changed and the text may even be mangled beyond human comprehension. Analyzing the synthetically generated texts and their corresponding labels is slow and demanding. To winnow out texts with incorrect labels, we devel… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 Findings

  2. arXiv:2404.16947  [pdf, other

    cs.SE

    Fuzzing MLIR Compilers with Custom Mutation Synthesis

    Authors: Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim

    Abstract: Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs mak… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  3. BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies

    Authors: Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qijin Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong Jin Kang, Thong Hoang, David Lo, Eng Lieh Ouh

    Abstract: The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the a… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020) 1556-1560

  4. Greening Large Language Models of Code

    Authors: Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo

    Abstract: Large language models of code have shown remarkable effectiveness across various software engineering tasks. Despite the availability of many cloud services built upon these powerful models, there remain several scenarios where developers cannot take full advantage of them, stemming from factors such as restricted or unreliable internet access, institutional privacy policies that prohibit external… ▽ More

    Submitted 11 January, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted by Software Engineering in Society Track of the 46th IEEE/ACM International Conference on Software Engineering (ICSE '24)

  5. arXiv:2308.05060  [pdf, other

    cs.SE

    Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel

    Authors: Yunbo Lyu, Hong Jin Kang, Ratnadira Widyasari, Julia Lawall, David Lo

    Abstract: The SZZ algorithm is used to connect bug-fixing commits to the earlier commits that introduced bugs. This algorithm has many applications and many variants have been devised. However, there are some types of commits that cannot be traced by the SZZ algorithm, referred to as "ghost commits". The evaluation of how these ghost commits impact the SZZ algorithm remains limited. Moreover, these algorith… ▽ More

    Submitted 7 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: This article has been accepted for publication in IEEE Transactions on Software Engineering

  6. arXiv:2305.13884  [pdf, other

    cs.CR cs.AI cs.SE

    Multi-Granularity Detector for Vulnerability Fixes

    Authors: Truong Giang Nguyen, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Chengran Yang, Zhipeng Zhao, Bowen Xu, Jiayuan Zhou, Xin Xia, Ahmed E. Hassan, Xuan-Bach D. Le, David Lo

    Abstract: With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Journal ref: IEEE Transactions on Software Engineering, 2023

  7. arXiv:2301.03944  [pdf, other

    cs.SE cs.CR

    CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports

    Authors: Yunbo Lyu, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Zhipeng Zhao, Xuan-Bach D. Le, Ming Li, David Lo

    Abstract: Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow a… ▽ More

    Submitted 29 July, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: Accepted to the Technical Track of ICSE 2023

  8. arXiv:2301.02496  [pdf, other

    cs.CR cs.SE

    Stealthy Backdoor Attack for Code Models

    Authors: Zhou Yang, Bowen Xu, Jie M. Zhang, Hong Jin Kang, Jieke Shi, Junda He, David Lo

    Abstract: Code models, such as CodeBERT and CodeT5, offer general-purpose representations of code and play a vital role in supporting downstream automated software engineering tasks. Most recently, code models were revealed to be vulnerable to backdoor attacks. A code model that is backdoor-attacked can behave normally on clean examples but will produce pre-defined malicious outputs on examples injected wit… ▽ More

    Submitted 28 August, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

    Comments: 18 pages, Under review of IEEE Transactions on Software Engineering

  9. arXiv:2212.04038  [pdf, other

    cs.SE

    SkipFuzz: Active Learning-based Input Selection for Fuzzing Deep Learning Libraries

    Authors: Hong Jin Kang, Pattarakrit Rattanukul, Stefanus Agus Haryono, Truong Giang Nguyen, Chaiyong Ragkhitwetsagul, Corina Pasareanu, David Lo

    Abstract: Many modern software systems are enabled by deep learning libraries such as TensorFlow and PyTorch. As deep learning is now prevalent, the security of deep learning libraries is a key concern. Fuzzing deep learning libraries presents two challenges. Firstly, to reach the functionality of the libraries, fuzzers have to use inputs from the valid input domain of each API function, which may be unknow… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 13 pages

  10. arXiv:2209.03260  [pdf, other

    cs.CR cs.AI cs.SE

    VulCurator: A Vulnerability-Fixing Commit Detector

    Authors: Truong Giang Nguyen, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach D. Le, David Lo

    Abstract: Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time. Monitoring vulnerability-fixing commits is a part of the standard process to prevent vulnerability exploitation. Manually detecting vulnerability-fixing commits is, however, time consuming due to the possibly large number of commits to review.… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: accepted to ESEC/FSE 2022, Tool Demos Track

  11. AutoPruner: Transformer-Based Call Graph Pruning

    Authors: Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, Huynh Quyet Thang

    Abstract: Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture inform… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Accepted to ESEC/FSE 2022, Research Track

  12. arXiv:2209.01320  [pdf, other

    cs.CV

    Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

    Authors: Siddarth Ravichandran, Ondřej Texler, Dimitar Dinev, Hyun Jae Kang

    Abstract: Over the last few decades, many aspects of human life have been enhanced with virtual domains, from the advent of digital assistants such as Amazon's Alexa and Apple's Siri to the latest metaverse efforts of the rebranded Meta. These trends underscore the importance of generating photorealistic visual depictions of humans. This has led to the rapid growth of so-called deepfake and talking-head gen… ▽ More

    Submitted 23 March, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

  13. arXiv:2208.07120  [pdf, other

    cs.SE

    Compressing Pre-trained Models of Code into 3 MB

    Authors: Jieke Shi, Zhou Yang, Bowen Xu, Hong Jin Kang, David Lo

    Abstract: Although large pre-trained models of code have delivered significant advancements in various code processing tasks, there is an impediment to the wide and fluent adoption of these powerful models in software developers' daily workflow: these large models consume hundreds of megabytes of memory and run slowly on personal devices, which causes problems in model deployment and greatly degrades the us… ▽ More

    Submitted 4 September, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

    Comments: Accepted by the Research Papers Track of 37th IEEE/ACM International Conference on Automated Software Engineering (ASE '22)

  14. arXiv:2205.10504  [pdf, other

    cs.SE cs.LG

    How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs

    Authors: Rahul Yedida, Hong Jin Kang, Huy Tu, Xueqi Yang, David Lo, Tim Menzies

    Abstract: Automatically generated static code warnings suffer from a large number of false alarms. Hence, developers only take action on a small percent of those warnings. To better predict which static code warnings should not be ignored, we suggest that analysts need to look deeper into their algorithms to find choices that better improve the particulars of their specific problem. Specifically, we show he… ▽ More

    Submitted 23 December, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: Accepted to TSE

  15. Active Learning of Discriminative Subgraph Patterns for API Misuse Detection

    Authors: Hong Jin Kang, David Lo

    Abstract: A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inabilit… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  16. Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?

    Authors: Hong Jin Kang, Khai Loong Aw, David Lo

    Abstract: Automatic static analysis tools (ASATs), such as Findbugs, have a high false alarm rate. The large number of false alarms produced poses a barrier to adoption. Researchers have proposed the use of machine learning to prune false alarms and present only actionable warnings to developers. The state-of-the-art study has identified a set of "Golden Features" based on metrics computed over the characte… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: Accepted to the Technical Track of ICSE 2022

  17. Adversarial Specification Mining

    Authors: Hong Jin Kang, David Lo

    Abstract: There have been numerous studies on mining temporal specifications from execution traces. These approaches learn finite-state automata (FSA) from execution traces when running tests. To learn accurate specifications of a software system, many tests are required. Existing approaches generalize from a limited number of traces or use simple test generation strategies. Unfortunately, these strategies… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: Kang, Hong Jin, and David Lo. "Adversarial Specification Mining." ACM Transactions on Software Engineering and Methodology (TOSEM) 30.2 (2021): 1-40. The version that is on arxiv is the authors' version. The definitive version can be found at https://dl.acm.org/doi/10.1145/3424307

    Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM) 30.2 (2021): 1-40

  18. arXiv:2102.01859  [pdf, other

    cs.SE

    BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems

    Authors: Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, David Lo

    Abstract: Artificial Intelligence (AI) software systems, such as Sentiment Analysis (SA) systems, typically learn from large amounts of data that may reflect human biases. Consequently, the machine learning model in such software systems may exhibit unintended demographic bias based on specific characteristics (e.g., gender, occupation, country-of-origin, etc.). Such biases manifest in an SA system when it… ▽ More

    Submitted 4 October, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  19. arXiv:2012.07259  [pdf, other

    cs.SE

    AndroEvolve: Automated Update for Android Deprecated-API Usages

    Authors: Stefanus Agus Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, Gilles Muller

    Abstract: Android operating system (OS) is often updated, where each new version may involve API deprecation. Usages of deprecated APIs in Android apps need to be updated to ensure the apps' compatibility with the old and new versions of Android OS. In this work, we propose AndroEvolve, an automated tool to update usages of deprecated Android APIs, that addresses the limitations of the state-of-the-art tool… ▽ More

    Submitted 11 February, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

  20. arXiv:2011.05020  [pdf, other

    cs.SE

    AndroEvolve: Automated Android API Update with Data Flow Analysis and Variable Denormalization

    Authors: Stefanus A. Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, Gilles Muller

    Abstract: The Android operating system is frequently updated, with each version bringing a new set of APIs. New versions may involve API deprecation; Android apps using deprecated APIs need to be updated to ensure the apps' compatibility withold and new versions of Android. Updating deprecated APIs is a time-consuming endeavor. Hence, automating the updates of Android APIs can be beneficial for developers.… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  21. arXiv:2005.13220  [pdf, other

    cs.SE

    Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example

    Authors: Stefanus Agus Haryono, Ferdian Thung, Hong Jin Kang, Lucas Serrano, Gilles Muller, Julia Lawall, David Lo, Lingxiao Jiang

    Abstract: Due to the deprecation of APIs in the Android operating system,developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android.Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 5 pages, 8 figures. Accepted in The International Conference on Program Comprehension (ICPC) 2020, ERA Track

    ACM Class: I.2.2

  22. CC2Vec: Distributed Representations of Code Changes

    Authors: Thong Hoang, Hong Jin Kang, Julia Lawall, David Lo

    Abstract: Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for each task. In this work, we propose CC2Vec, a neural network model that learns a representation of code changes guided by their accompanying log messages, which represent the semantic intent of the code c… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  23. arXiv:1611.02956  [pdf, ps, other

    cs.CL

    A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation

    Authors: Hong Jin Kang, Tao Chen, Muthu Kumar Chandrasekaran, Min-Yen Kan

    Abstract: Word embeddings are now ubiquitous forms of word representation in natural language processing. There have been applications of word embeddings for monolingual word sense disambiguation (WSD) in English, but few comparisons have been done. This paper attempts to bridge that gap by examining popular embeddings for the task of monolingual English WSD. Our simplified method leads to comparable state-… ▽ More

    Submitted 9 April, 2017; v1 submitted 9 November, 2016; originally announced November 2016.

    Comments: 10 pages. Appears in the Proceedings of The 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2016)

    Journal ref: Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications, pages 30 to 39, Osaka, Japan, December 12 2016