Search | arXiv e-print repository

SecDOAR: A Software Reference Architecture for Security Data Orchestration, Analysis and Reporting

Authors: Muhammad Aufeef Chauhan, Muhammad Ali Babar, Fethi Rabhi

Abstract: A Software Reference Architecture (SRA) is a useful tool for standardising existing architectures in a specific domain and facilitating concrete architecture design, development and evaluation by instantiating SRA and using SRA as a benchmark for the development of new systems. In this paper, we have presented an SRA for Security Data Orchestration, Analysis and Reporting (SecDOAR) to provide stan… ▽ More A Software Reference Architecture (SRA) is a useful tool for standardising existing architectures in a specific domain and facilitating concrete architecture design, development and evaluation by instantiating SRA and using SRA as a benchmark for the development of new systems. In this paper, we have presented an SRA for Security Data Orchestration, Analysis and Reporting (SecDOAR) to provide standardisation of security data platforms that can facilitate the integration of security orchestration, analysis and reporting tools for security data. The SecDOAR SRA has been designed by leveraging existing scientific literature and security data standards. We have documented SecDOAR SRA in terms of design methodology, meta-models to relate to different concepts in the security data architecture, and details on different elements and components of the SRA. We have evaluated SecDOAR SRA for its effectiveness and completeness by comparing it with existing commercial solutions. We have demonstrated the feasibility of the proposed SecDOAR SRA by instantiating it as a prototype platform to support security orchestration, analysis and reporting for a selected set of tools. The proposed SecDOAR SRA consists of meta-models for security data, security events and security data management processes as well as security metrics and corresponding measurement schemes, a security data integration model, and a description of SecDOAR SRA components. The proposed SecDOAR SRA can be used by researchers and practitioners as a structured approach for designing and implementing cybersecurity monitoring, analysis and reporting systems in various domains. △ Less

Submitted 25 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

Comments: 21 pages, 17 Figures, 5 Tables

arXiv:2408.00435 [pdf, other]

A Qualitative Study on Using ChatGPT for Software Security: Perception vs. Practicality

Authors: M. Mehdi Kholoosi, M. Ali Babar, Roland Croft

Abstract: Artificial Intelligence (AI) advancements have enabled the development of Large Language Models (LLMs) that can perform a variety of tasks with remarkable semantic understanding and accuracy. ChatGPT is one such LLM that has gained significant attention due to its impressive capabilities for assisting in various knowledge-intensive tasks. Due to the knowledge-intensive nature of engineering secure… ▽ More Artificial Intelligence (AI) advancements have enabled the development of Large Language Models (LLMs) that can perform a variety of tasks with remarkable semantic understanding and accuracy. ChatGPT is one such LLM that has gained significant attention due to its impressive capabilities for assisting in various knowledge-intensive tasks. Due to the knowledge-intensive nature of engineering secure software, ChatGPT's assistance is expected to be explored for security-related tasks during the development/evolution of software. To gain an understanding of the potential of ChatGPT as an emerging technology for supporting software security, we adopted a two-fold approach. Initially, we performed an empirical study to analyse the perceptions of those who had explored the use of ChatGPT for security tasks and shared their views on Twitter. It was determined that security practitioners view ChatGPT as beneficial for various software security tasks, including vulnerability detection, information retrieval, and penetration testing. Secondly, we designed an experiment aimed at investigating the practicality of this technology when deployed as an oracle in real-world settings. In particular, we focused on vulnerability detection and qualitatively examined ChatGPT outputs for given prompts within this prominent software security task. Based on our analysis, responses from ChatGPT in this task are largely filled with generic security information and may not be appropriate for industry use. To prevent data leakage, we performed this analysis on a vulnerability dataset compiled after the OpenAI data cut-off date from real-world projects covering 40 distinct vulnerability types and 12 programming languages. We assert that the findings from this study would contribute to future research aimed at developing and evaluating LLMs dedicated to software security. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted for publication at International Conference on Trust, Privacy and Security - 2024

arXiv:2407.17803 [pdf, other]

Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?

Authors: Triet H. M. Le, M. Ali Babar

Abstract: Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quanti… ▽ More Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quantitatively and qualitatively study the quality and use of the state-of-the-art auto-labeled SV data, D2A, for SV prediction. Method: Using multiple sources and manual validation, we curate clean SV data from human-labeled SV-fixing commits in two well-known projects for investigating the auto-labeled counterparts. Results: We discover that 50+% of the auto-labeled SVs are noisy (incorrectly labeled), and they hardly overlap with the publicly reported ones. Yet, SV prediction models utilizing the noisy auto-labeled SVs can perform up to 22% and 90% better in Matthews Correlation Coefficient and Recall, respectively, than the original models. We also reveal the promises and difficulties of applying noise-reduction methods for automatically addressing the noise in auto-labeled SV data to maximize the data utilization for SV prediction. Conclusions: Our study informs the benefits and challenges of using auto-labeled SVs, paving the way for large-scale SV prediction. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2407.17053 [pdf, other]

doi 10.1145/3674805.3686670

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

Authors: Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar

Abstract: Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has be… ▽ More Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six multi-class ML models and five multi-class DL models for the SV assessment at the function level based on the Common Vulnerability Scoring System (CVSS). We further explore multi-task learning, which can leverage common vulnerable code to predict all SV assessment outputs simultaneously in a single model, and compare the effectiveness and efficiency of this model type with those of the original multi-class models. Results: We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time. Employing multi-task learning allows the DL models to perform significantly better, with an average of 8-22% increase in Matthews Correlation Coefficient (MCC). Conclusions: We distill the practices of using data-driven techniques for function-level SV assessment in C/C++, including the use of multi-task DL to balance efficiency and effectiveness. This can establish a strong foundation for future work in this area. △ Less

Submitted 3 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2407.10722 [pdf, other]

Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?

Authors: Triet H. M. Le, M. Ali Babar

Abstract: Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distribution… ▽ More Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distributions of the CVSS classes, but such data imbalance has been hardly understood and addressed in the literature. Aims: We conduct a large-scale study to quantify the impacts of data imbalance and mitigate the issue for SV assessment through the use of data augmentation. Method: We leverage nine data augmentation techniques to balance the class distributions of the CVSS metrics. We then compare the performance of SV assessment models with and without leveraging the augmented data. Results: Through extensive experiments on 180k+ real-world SVs, we show that mitigating data imbalance can significantly improve the predictive performance of models for all the CVSS tasks, by up to 31.8% in Matthews Correlation Coefficient. We also discover that simple text augmentation like combining random text insertion, deletion, and replacement can outperform the baseline across the board. Conclusions: Our study provides the motivation and the first promising step toward tackling data imbalance for effective SV assessment. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2406.19765 [pdf, other]

Systematic Literature Review on Application of Learning-based Approaches in Continuous Integration

Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, M. Ali Babar

Abstract: Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensivel… ▽ More Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensively review and analyze existing literature concerning learning-based methods within the CI domain. We endeavour to identify and analyse various techniques documented in the literature, emphasizing the fundamental attributes of training phases within learning-based solutions in the context of CI. Method: We conducted a Systematic Literature Review (SLR) involving 52 primary studies. Through statistical and thematic analyses, we explored the correlations between CI tasks and the training phases of learning-based methodologies across the selected studies, encompassing a spectrum from data engineering techniques to evaluation metrics. Results: This paper presents an analysis of the automation of CI tasks utilizing learning-based methods. We identify and analyze nine types of data sources, four steps in data preparation, four feature types, nine subsets of data features, five approaches for hyperparameter selection and tuning, and fifteen evaluation metrics. Furthermore, we discuss the latest techniques employed, existing gaps in CI task automation, and the characteristics of the utilized learning-based techniques. Conclusion: This study provides a comprehensive overview of learning-based methods in CI, offering valuable insights for researchers and practitioners developing CI task automation. It also highlights the need for further research to advance these methods in CI. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

Comments: This paper has been accepted to be published in IEEE Access

arXiv:2406.18813 [pdf, other]

Towards Secure Management of Edge-Cloud IoT Microservices using Policy as Code

Authors: Samodha Pallewatta, Muhammad Ali Babar

Abstract: IoT application providers increasingly use MicroService Architecture (MSA) to develop applications that convert IoT data into valuable information. The independently deployable and scalable nature of microservices enables dynamic utilization of edge and cloud resources provided by various service providers, thus improving performance. However, IoT data security should be ensured during multi-domai… ▽ More IoT application providers increasingly use MicroService Architecture (MSA) to develop applications that convert IoT data into valuable information. The independently deployable and scalable nature of microservices enables dynamic utilization of edge and cloud resources provided by various service providers, thus improving performance. However, IoT data security should be ensured during multi-domain data processing and transmission among distributed and dynamically composed microservices. The ability to implement granular security controls at the microservices level has the potential to solve this. To this end, edge-cloud environments require intricate and scalable security frameworks that operate across multi-domain environments to enforce various security policies during the management of microservices (i.e., initial placement, scaling, migration, and dynamic composition), considering the sensitivity of the IoT data. To address the lack of such a framework, we propose an architectural framework that uses Policy-as-Code to ensure secure microservice management within multi-domain edge-cloud environments. The proposed framework contains a "control plane" to intelligently and dynamically utilise and configure cloud-native (i.e., container orchestrators and service mesh) technologies to enforce security policies. We implement a prototype of the proposed framework using open-source cloud-native technologies such as Docker, Kubernetes, Istio, and Open Policy Agent to validate the framework. Evaluations verify our proposed framework's ability to enforce security policies for distributed microservices management, thus harvesting the MSA characteristics to ensure IoT application security needs. △ Less

Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 16 pages, 7 figures, Accepted for full paper presentation at ECSA 2024 conference

arXiv:2406.09737 [pdf, other]

A Multivocal Review of MLOps Practices, Challenges and Open Issues

Authors: Beyza Eken, Samodha Pallewatta, Nguyen Khoi Tran, Ayse Tosun, Muhammad Ali Babar

Abstract: With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understa… ▽ More With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understandable knowledge of MLOps workflows, practices, challenges and solutions to effectively and efficiently support the adoption of MLOps. Whilst the academic and industry literature on the MLOps has been growing rapidly, there have been relatively a few attempts at systematically synthesizing and analyzing the vast amount of existing literature of MLOps for improving ease of access and understanding. We conducted a Multivocal Literature Review (MLR) of 150 relevant academic studies and 48 gray literature to provide a comprehensive body of knowledge on MLOps. Through this MLR, we identified the emerging MLOps practices, adoption challenges and solutions related to various areas, including development and operation of complex pipelines, managing production at scale, managing artifacts, and ensuring quality, security, governance, and ethical aspects. We also report the socio-technical aspect of MLOps relating to diverse roles involved and collaboration practices across them through the MLOps lifecycle. We assert that this MLR provides valuable insights to researchers and practitioners seeking to navigate the rapidly evolving landscape of MLOps. We also identify the open issues that need to be addressed in order to advance the current state-of-the-art of MLOps. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 45 pages, 4 figures

arXiv:2406.04902 [pdf, other]

Beyond Data, Towards Sustainability: A Sydney Case Study on Urban Digital Twins

Authors: Ammar Sohail, Bojie Shen, Muhammad Aamir Cheema, Mohammed Eunus Ali, Anwaar Ulhaq, Muhammad Ali Babar, Asama Qureshi

Abstract: As urban areas grapple with unprecedented challenges stemming from population growth and climate change, the emergence of urban digital twins offers a promising solution. This paper presents a case study focusing on Sydney's urban digital twin, a virtual replica integrating diverse real-time and historical data, including weather, crime, emissions, and traffic. Through advanced visualization and d… ▽ More As urban areas grapple with unprecedented challenges stemming from population growth and climate change, the emergence of urban digital twins offers a promising solution. This paper presents a case study focusing on Sydney's urban digital twin, a virtual replica integrating diverse real-time and historical data, including weather, crime, emissions, and traffic. Through advanced visualization and data analysis techniques, the study explores some applications of this digital twin in urban sustainability, such as spatial ranking of suburbs and automatic identification of correlations between variables. Additionally, the research delves into predictive modeling, employing machine learning to forecast traffic crash risks using environmental data, showcasing the potential for proactive interventions. The contributions of this work lie in the comprehensive exploration of a city-scale digital twin for sustainable urban planning, offering a multifaceted approach to data-driven decision-making. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.15293 [pdf, other]

Transaction Fee Estimation in the Bitcoin System

Authors: Limeng Zhang, Rui Zhou, Qing Liu, Chengfei Liu, M. Ali Babar

Abstract: In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter.… ▽ More In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter. Hence it is indispensable to help a client set a reasonable fee, as a higher fee incurs over-spending and a lower fee could delay the confirmation. In this work, we focus on estimating the transaction fee for a new transaction to help with its confirmation within a given expected time. We identify two major drawbacks in the existing works. First, the current industry products are built on explicit analytical models, ignoring the complex interactions of different factors which could be better captured by machine learning based methods; Second, all of the existing works utilize limited knowledge for the estimation which hinders the potential of further improving the estimation quality. As a result, we propose a framework FENN, which aims to integrate the knowledge from a wide range of sources, including the transaction itself, unconfirmed transactions in the mempool and the blockchain confirmation environment, into a neural network model in order to estimate a proper transaction fee. Finally, we conduct experiments on real blockchain datasets to demonstrate the effectiveness and efficiency of our proposed framework over the state-of-the-art works evaluated by MAPE and RMSE. Each variation model in our framework can finish training within one block interval, which shows the potential of our framework to process the realtime transaction updates in the Bitcoin blockchain. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.17110 [pdf, other]

Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

Authors: Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

Abstract: Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and i… ▽ More Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted in the 4th International Workshop on Software Security co-located with the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

arXiv:2404.11294 [pdf, other]

LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking

Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

Abstract: Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data re… ▽ More Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data requirements and optimal detection performance, contrasting with their supervised and unsupervised counterparts. However, existing semi-supervised methods overlook the potential bias introduced by highly frequent log messages on the learned normal patterns, which leads to their less than satisfactory performance. In this study, we propose LogSD, a novel semi-supervised self-supervised learning approach. LogSD employs a dual-network architecture and incorporates a frequency-based masking scheme, a global-to-local reconstruction paradigm and three self-supervised learning tasks. These features enable LogSD to focus more on relatively infrequent log messages, thereby effectively learning less biased and more discriminative patterns from historical normal data. This emphasis ultimately leads to improved anomaly detection performance. Extensive experiments have been conducted on three commonly-used datasets and the results show that LogSD significantly outperforms eight state-of-the-art benchmark methods. △ Less

Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 23 pages with 11 figures

arXiv:2404.06043 [pdf, other]

Automatic Configuration Tuning on Cloud Database: A Survey

Authors: Limeng Zhang, M. Ali Babar

Abstract: Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware se… ▽ More Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware setup, software setup, database physical and logical design, etc., that control runtime behaviors and impact database performance. To find the optimal configuration for achieving optimal performance, extensive research has been conducted on automatic parameter tuning in DBMS. This paper provides a comprehensive survey of predominant configuration tuning techniques, including Bayesian optimization-based solutions, Neural network-based solutions, Reinforcement learning-based solutions, and Search-based solutions. Moreover, it investigates the fundamental aspects of parameter tuning pipeline, including tuning objective, workload characterization, feature pruning, knowledge from experience, configuration recommendation, and experimental settings. We highlight technique comparisons in each component, corresponding solutions, and introduce the experimental setting for performance evaluation. Finally, we conclude this paper and present future research opportunities. This paper aims to assist future researchers and practitioners in gaining a better understanding of automatic parameter tuning in cloud databases by providing state-of-the-art existing solutions, research directions, and evaluation benchmarks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03823 [pdf, other]

An Investigation into Misuse of Java Security APIs by Large Language Models

Authors: Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadbba, Muhammad Ali Babar

Abstract: The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding sof… ▽ More The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding software security, yet effectively integrating security APIs presents substantial challenges. This leads to inadvertent misuse by developers, thereby exposing software to vulnerabilities. To overcome these challenges, developers may seek assistance from LLMs. In this paper, we systematically assess ChatGPT's trustworthiness in code generation for security API use cases in Java. To conduct a thorough evaluation, we compile an extensive collection of 48 programming tasks for 5 widely used security APIs. We employ both automated and manual approaches to effectively detect security API misuse in the code generated by ChatGPT for these tasks. Our findings are concerning: around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Moreover, for roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by ACM ASIACCS 2024

arXiv:2401.13199 [pdf, other]

Why People Still Fall for Phishing Emails: An Empirical Investigation into How Users Make Email Response Decisions

Authors: Asangi Jayatilaka, Nalin Asanka Gamagedara Arachchilage, Muhammad Ali Babar

Abstract: Despite technical and non-technical countermeasures, humans continue to be tricked by phishing emails. How users make email response decisions is a missing piece in the puzzle to identifying why people still fall for phishing emails. We conducted an empirical study using a think-aloud method to investigate how people make 'response decisions' while reading emails. The grounded theory analysis of t… ▽ More Despite technical and non-technical countermeasures, humans continue to be tricked by phishing emails. How users make email response decisions is a missing piece in the puzzle to identifying why people still fall for phishing emails. We conducted an empirical study using a think-aloud method to investigate how people make 'response decisions' while reading emails. The grounded theory analysis of the in-depth qualitative data has enabled us to identify different elements of email users' decision-making that influence their email response decisions. Furthermore, we developed a theoretical model that explains how people could be driven to respond to emails based on the identified elements of users' email decision-making processes and the relationships uncovered from the data. The findings provide deeper insights into phishing email susceptibility due to people's email response decision-making behavior. We also discuss the implications of our findings for designers and researchers working in anti-phishing training, education, and awareness interventions △ Less

Submitted 23 January, 2024; originally announced January 2024.

Journal ref: Symposium on Usable Security and Privacy (USEC) 2024

arXiv:2401.11105 [pdf, other]

Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

Authors: Triet H. M. Le, Xiaoning Du, M. Ali Babar

Abstract: Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the useful… ▽ More Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the usefulness of these latent SVs for SV prediction. To bridge these gaps, we conduct a large-scale study on the latent vulnerable functions in two commonly used SV datasets and their utilization for function-level and line-level SV predictions. Leveraging the state-of-the-art SZZ algorithm, we identify more than 100k latent vulnerable functions in the studied datasets. We find that these latent functions can increase the number of SVs by 4x on average and correct up to 5k mislabeled functions, yet they have a noise level of around 6%. Despite the noise, we show that the state-of-the-art SV prediction model can significantly benefit from such latent SVs. The improvements are up to 24.5% in the performance (F1-Score) of function-level SV predictions and up to 67% in the effectiveness of localizing vulnerable lines. Overall, our study presents the first promising step toward the use of latent SVs to improve the quality of SV datasets and enhance the performance of SV prediction tasks. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted as a full paper in the technical track at the 21st International Conference on Mining Software Repositories (MSR) 2024

arXiv:2312.06056 [pdf, other]

METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

Authors: Sangwon Hyun, Mingyu Guo, M. Ali Babar

Abstract: Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have l… ▽ More Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have limited their coverage of QAs and tasks in LLMs and are difficult to extend. Additionally, these studies have only used one evaluation metric, Attack Success Rate (ASR), to assess the effectiveness of their approaches. We propose a MEtamorphic Testing for Analyzing LLMs (METAL) framework to address these issues by applying Metamorphic Testing (MT) techniques. This approach facilitates the systematic testing of LLM qualities by defining Metamorphic Relations (MRs), which serve as modularized evaluation metrics. The METAL framework can automatically generate hundreds of MRs from templates that cover various QAs and tasks. In addition, we introduced novel metrics that integrate the ASR method into the semantic qualities of text to assess the effectiveness of MRs accurately. Through the experiments conducted with three prominent LLMs, we have confirmed that the METAL framework effectively evaluates essential QAs on primary LLM tasks and reveals the quality risks in LLMs. Moreover, the newly proposed metrics can guide the optimal MRs for testing each task and suggest the most effective method for generating MRs. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted to International Conference on Software Testing, Verification and Validation (ICST) 2024 / Key words: Large-language models, Metamorphic testing, Quality evaluation, Text perturbations

arXiv:2310.06300 [pdf, other]

An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management

Authors: Nguyen Khoi Tran, Samodha Pallewatta, M. Ali Babar

Abstract: With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Ado… ▽ More With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Adopting SSC metadata requires organisations to procure or develop a Software Supply Chain Metadata Management system (SCM2), a suite of software tools for performing life cycle activities of SSC metadata documents such as creation, signing, distribution, and consumption. Selecting or developing an SCM2 is challenging due to the lack of a comprehensive domain model and architectural blueprint to aid practitioners in navigating the vast design space of SSC metadata terminologies, frameworks, and solutions. This paper addresses the above-mentioned challenge by presenting an empirically grounded Reference Architecture (RA) comprising of a domain model and an architectural blueprint for SCM2 systems. Our proposed RA is constructed systematically on an empirical foundation built with industry-driven and peer-reviewed SSC security frameworks. Our theoretical evaluation, which consists of an architectural mapping of five prominent SSC security tools on the RA, ensures its validity and applicability, thus affirming the proposed RA as an effective framework for analysing existing SCM2 solutions and guiding the engineering of new SCM2 systems. △ Less

Submitted 8 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted for full paper presentation at EASE 2024 conference

arXiv:2310.00635 [pdf, other]

Reinforcement Learning Based Neighbour Selection for VANET with Adaptive Trust Management

Authors: Orvila Sarker, Hong Shen, M. Ali Babar

Abstract: Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet dropping. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood informatio… ▽ More Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet dropping. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood information without compromising the quality of service requirements is an ongoing challenge. This work proposes a Reinforcement Learning (RL) based neighbour selection framework for VANET with an adaptive trust management system to capture the behavioural changes of potential attackers and to dynamically update the neighbourhood information. In contrast to existing works, we consider trust and link-life time in unison as neighbour selection criteria to achieve trustworthy communication. Our adaptive trust model takes into account the social relationship, time and confidence in trust observation to avoid four types of attackers. To update the neighbourhood information, our framework sets the learning rate of the RL agent according to the velocities of the neighbour nodes to improve the model's adaptability to network topology changes. Results demonstrate that our method can take less number of hops to the destination for large network sizes while can response is up to 54 percent faster compared to a baseline method. Also, the proposed model can outperform the other baseline method by reducing the packet dropping rate up to 57 percent caused by the attacker. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: This article is accepted at the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) 2023

arXiv:2308.11862 [pdf, other]

Empirical Analysis of Software Vulnerabilities Causing Timing Side Channels

Authors: M. Mehdi Kholoosi, M. Ali Babar, Cemal Yilmaz

Abstract: Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several effor… ▽ More Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several efforts have been devoted to exploring the various aspects of timing attacks, particularly in cryptography, little attention has been paid to empirically studying the timing attack-related vulnerabilities in non-cryptographic software. By inspecting these software vulnerabilities, this study aims to gain an evidence-based understanding of weaknesses in non-cryptographic software that may help timing attacks succeed. We used qualitative and quantitative research approaches to systematically study the timing attack-related vulnerabilities reported in the National Vulnerability Database (NVD) from March 2003 to December 2022. Our analysis was focused on the modifications made to the code for patching the identified vulnerabilities. We found that a majority of the timing attack-related vulnerabilities were introduced due to not following known secure coding practices. The findings of this study are expected to help the software security community gain evidence-based information about the nature and causes of the vulnerabilities related to timing attacks. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2307.04458 [pdf, other]

Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu

Authors: Victor Prokhorenko, Chadni Islam, Muhammad Ali Babar

Abstract: An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. When a multitude of independent packages are placed together in an OS, an implicit inter-package architecture is formed. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused… ▽ More An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. When a multitude of independent packages are placed together in an OS, an implicit inter-package architecture is formed. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files, specifically executable files, and dynamically loadable libraries. We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files and their associated evolutionary changes. We demonstrate the utility of DepEx by systematically investigating the evolution of a large-scale Open Source OS, Ubuntu. DepEx enabled us to systematically acquire and analyze the dependencies in different versions of Ubuntu released between 2005 (5.04) to 2023 (23.04). Our analysis revealed various evolutionary trends in package management and their implications based on the analysis of the 84 consecutive versions available for download (these include beta versions). This study has enabled us to assert that DepEx can provide researchers and practitioners with a better understanding of the implicit software dependencies in order to improve the stability, performance, and functionality of their software as well as to reduce the risk of issues arising during maintenance, updating, or migration. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: This paper is accepted for publication in the 17th international conference on Software Architecture

arXiv:2307.01225 [pdf, other]

Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

Authors: Bushra Sabir, M. Ali Babar, Sharif Abuadbba

Abstract: Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven… ▽ More Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven Detection and Transformation (IT-DT) framework. It focuses on interpretability and transparency in detecting and transforming textual adversarial examples. IT-DT utilizes techniques like attention maps, integrated gradients, and model feedback for interpretability during detection. This helps identify salient features and perturbed words contributing to adversarial classifications. In the transformation phase, IT-DT uses pre-trained embeddings and model feedback to generate optimal replacements for perturbed words. By finding suitable substitutions, we aim to convert adversarial examples into non-adversarial counterparts that align with the model's intended behavior while preserving the text's meaning. Transparency is emphasized through human expert involvement. Experts review and provide feedback on detection and transformation results, enhancing decision-making, especially in complex scenarios. The framework generates insights and threat intelligence empowering analysts to identify vulnerabilities and improve model robustness. Comprehensive experiments demonstrate the effectiveness of IT-DT in detecting and transforming adversarial examples. The approach enhances interpretability, provides transparency, and enables accurate identification and successful transformation of adversarial inputs. By combining technical analysis and human expertise, IT-DT significantly improves the resilience and trustworthiness of transformer-based text classifiers against adversarial attacks. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.08869 [pdf, other]

Detecting Misuse of Security APIs: A Systematic Review

Authors: Zahra Mousavi, Chadni Islam, M. Ali Babar, Alsharif Abuadbba, Kristen Moore

Abstract: Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and… ▽ More Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and evaluated several approaches to detecting security API misuse to help developers and organizations. This study rigorously reviews the literature on detecting misuse of security APIs to gain a comprehensive understanding of this critical domain. Our goal is to identify and analyze security API misuses, the detection approaches developed, and the evaluation methodologies employed along with the open research avenues to advance the state-of-the-art in this area. Employing the systematic literature review (SLR) methodology, we analyzed 69 research papers. Our review has yielded (a) identification of 6 security API types; (b) classification of 30 distinct misuses; (c) categorization of detection techniques into heuristic-based and ML-based approaches; and (d) identification of 10 performance measures and 9 evaluation benchmarks. The review reveals a lack of coverage of detection approaches in several areas. We recommend that future efforts focus on aligning security API development with developers' needs and advancing standardized evaluation methods for detection technologies. △ Less

Submitted 25 June, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.06600 [pdf, other]

Enabling Spatial Digital Twins: Technologies, Challenges, and Future Research Directions

Authors: Mohammed Eunus Ali, Muhammad Aamir Cheema, Tanzima Hashem, Anwaar Ulhaq, Muhammad Ali Babar

Abstract: A Digital Twin (DT) is a virtual replica of a physical object or system, created to monitor, analyze, and optimize its behavior and characteristics. A Spatial Digital Twin (SDT) is a specific type of digital twin that emphasizes the geospatial aspects of the physical entity, incorporating precise location and dimensional attributes for a comprehensive understanding within its spatial environment.… ▽ More A Digital Twin (DT) is a virtual replica of a physical object or system, created to monitor, analyze, and optimize its behavior and characteristics. A Spatial Digital Twin (SDT) is a specific type of digital twin that emphasizes the geospatial aspects of the physical entity, incorporating precise location and dimensional attributes for a comprehensive understanding within its spatial environment. The current body of research on SDTs primarily concentrates on analyzing their potential impact and opportunities within various application domains. As building an SDT is a complex process and requires a variety of spatial computing technologies, it is not straightforward for practitioners and researchers of this multi-disciplinary domain to grasp the underlying details of enabling technologies of the SDT. In this paper, we are the first to systematically analyze different spatial technologies relevant to building an SDT in layered approach (starting from data acquisition to visualization). More specifically, we present the key components of SDTs into four layers of technologies: (i) data acquisition; (ii) spatial database management \& big data analytics systems; (iii) GIS middleware software, maps \& APIs; and (iv) key functional components such as visualizing, querying, mining, simulation and prediction. Moreover, we discuss how modern technologies such as AI/ML, blockchains, and cloud computing can be effectively utilized in enabling and enhancing SDTs. Finally, we identify a number of research challenges and opportunities in SDTs. This work serves as an important resource for SDT researchers and practitioners as it explicitly distinguishes SDTs from traditional DTs, identifies unique applications, outlines the essential technological components of SDTs, and presents a vision for their future development along with the challenges that lie ahead. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: 26 pages, 2 figures

arXiv:2305.12736

Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgen… ▽ More Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgent need of identifying and evaluating suitable approaches that can help in reducing the retraining efforts and time for ML models used for TCP in CI environments. Aims: This study aims to investigate the performance of using data drift detection techniques for automatically detecting the retraining points for ML models for TCP in CI environments without requiring detailed knowledge of the software projects. Method: We employed the Hellinger distance to identify changes in both the values and distribution of input data and leveraged these changes as retraining points for the ML model. We evaluated the efficacy of this method on multiple datasets and compared the APFDc and NAPFD evaluation metrics against models that were regularly retrained, with careful consideration of the statistical methods. Results: Our experimental evaluation of the Hellinger distance-based method demonstrated its efficacy and efficiency in detecting retraining points and reducing the associated costs. However, the performance of this method may vary depending on the dataset. Conclusions: Our findings suggest that data drift detection methods can assist in identifying retraining points for ML models in CI environments, while significantly reducing the required retraining time. These methods can be helpful for practitioners who lack specialized knowledge of software projects, enabling them to maintain ML model accuracy. △ Less

Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

arXiv:2305.12695

Systematic Literature Review on Application of Machine Learning in Continuous Integration

Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metr… ▽ More This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metrics. In this paper, we have depicted the phases of CI testing, the connection between them, and the employed techniques in training the ML method phases. We presented nine types of data sources and four taken steps in the selected studies for preparing the data. Also, we identified four feature types and nine subsets of data features through thematic analysis of the selected studies. Besides, five methods for selecting and tuning the hyper-parameters are shown. In addition, we summarised the evaluation methods used in the literature and identified fifteen different metrics. The most commonly used evaluation methods were found to be precision, recall, and F1-score, and we have also identified five methods for evaluating the performance of trained ML models. Finally, we have presented the relationship between ML model types, performance measurements, and CI phases. The study provides valuable insights for researchers and practitioners interested in ML-based methods in CI and emphasizes the need for further research in this area. △ Less

Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

arXiv:2305.11657 [pdf, other]

Cost Sharing Public Project with Minimum Release Delay

Authors: Mingyu Guo, Diksha Goel, Guanhua Wang, Yong Yang, Muhammad Ali Babar

Abstract: We study the excludable public project model where the decision is binary (build or not build). In a classic excludable and binary public project model, an agent either consumes the project in its whole or is completely excluded. We study a setting where the mechanism can set different project release time for different agents, in the sense that high-paying agents can consume the project earlier t… ▽ More We study the excludable public project model where the decision is binary (build or not build). In a classic excludable and binary public project model, an agent either consumes the project in its whole or is completely excluded. We study a setting where the mechanism can set different project release time for different agents, in the sense that high-paying agents can consume the project earlier than low-paying agents. The release delay, while hurting the social welfare, is implemented to incentivize payments to cover the project cost. The mechanism design objective is to minimize the maximum release delay and the total release delay among all agents. We first consider the setting where we know the prior distribution of the agents' types. Our objectives are minimizing the expected maximum release delay and the expected total release delay. We propose the single deadline mechanisms. We show that the optimal single deadline mechanism is asymptotically optimal for both objectives, regardless of the prior distribution. For small number of agents, we propose the sequential unanimous mechanisms by extending the largest unanimous mechanisms from [Ohseto 2000]. We propose an automated mechanism design approach via evolutionary computation to optimize within the sequential unanimous mechanisms. We next study prior-free mechanism design. We propose the group-based optimal deadline mechanism and show that it is competitive against an undominated mechanism under minor technical assumptions. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2204.07315

arXiv:2304.02829 [pdf, other]

SoK: Machine Learning for Continuous Integration

Authors: Ali Kazemi Arani, Mansooreh Zahedi, Triet Huynh Minh Le, Muhammad Ali Babar

Abstract: Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches… ▽ More Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches for CI phases. This paper reports an SoK of different aspects of the use of ML for CI. Our systematic analysis also highlights the deficiencies of the existing ML-based solutions that can be improved for advancing the state-of-the-art. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 6 pages, 2 figures, accepted in the ICSE'23 Workshop on Cloud Intelligence / AIOps

arXiv:2301.05456 [pdf, other]

Data Quality for Software Vulnerability Datasets

Authors: Roland Croft, M. Ali Babar, Mehdi Kholoosi

Abstract: The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability datasets used for training and benchmarking. However, we observe that the quality of the data powering these solutions is currently ill-considered, hindering the reli… ▽ More The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability datasets used for training and benchmarking. However, we observe that the quality of the data powering these solutions is currently ill-considered, hindering the reliability and value of produced outcomes. Whilst awareness of software vulnerability data preparation challenges is growing, there has been little investigation into the potential negative impacts of software vulnerability data quality. For instance, we lack confirmation that vulnerability labels are correct or consistent. Our study seeks to address such shortcomings by inspecting five inherent data quality attributes for four state-of-the-art software vulnerability datasets and the subsequent impacts that issues can have on software vulnerability prediction models. Surprisingly, we found that all the analyzed datasets exhibit some data quality problems. In particular, we found 20-71% of vulnerability labels to be inaccurate in real-world datasets, and 17-99% of data points were duplicated. We observed that these issues could cause significant impacts on downstream models, either preventing effective model training or inflating benchmark performance. We advocate for the need to overcome such challenges. Our findings will enable better consideration and assessment of software vulnerability data quality in the future. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: Accepted for publication in the ICSE 23 Technical Track

arXiv:2211.08916 [pdf, other]

doi 10.1109/TSE.2023.3290237

Privacy Engineering in the Wild: Understanding the Practitioners' Mindset, Organisational Aspects, and Current Practices

Authors: Leonardo Horn Iwaya, Muhammad Ali Babar, Awais Rashid

Abstract: Privacy engineering, as an emerging field of research and practice, comprises the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. For that, software practitioners and other stakeholders in software companies need to work cooperatively toward building privacy-preserving businesses and engineering solutions. S… ▽ More Privacy engineering, as an emerging field of research and practice, comprises the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. For that, software practitioners and other stakeholders in software companies need to work cooperatively toward building privacy-preserving businesses and engineering solutions. Significant research has been done to understand the software practitioners' perceptions of information privacy, but more emphasis should be given to the uptake of concrete privacy engineering components. This research delves into the software practitioners' perspectives and mindset, organisational aspects, and current practices on privacy and its engineering processes. A total of 30 practitioners from nine countries and backgrounds were interviewed, sharing their experiences and voicing their opinions on a broad range of privacy topics. The thematic analysis methodology was adopted to code the interview data qualitatively and construct a rich and nuanced thematic framework. As a result, we identified three critical interconnected themes that compose our thematic framework for privacy engineering "in the wild": (1) personal privacy mindset and stance, categorised into practitioners' privacy knowledge, attitudes and behaviours; (2) organisational privacy aspects, such as decision-power and positive and negative examples of privacy climate; and, (3) privacy engineering practices, such as procedures and controls concretely used in the industry. Among the main findings, this study provides many insights about the state-of-the-practice of privacy engineering, pointing to a positive influence of privacy laws (e.g., EU General Data Protection Regulation) on practitioners' behaviours and organisations' cultures. Aspects such as organisational privacy culture and climate were also confirmed to have [...]. △ Less

Submitted 30 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 26 pages, 8 figures

arXiv:2211.07585 [pdf]

An Empirical Study on Secure Usage of Mobile Health Apps: The Attack Simulation Approach

Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

Abstract: Mobile applications, mobile apps for short, have proven their usefulness in enhancing service provisioning across a multitude of domains that range from smart healthcare, to mobile commerce, and areas of context sensitive computing. In recent years, a number of empirically grounded, survey-based studies have been conducted to investigate secure development and usage of mHealth apps. However, such… ▽ More Mobile applications, mobile apps for short, have proven their usefulness in enhancing service provisioning across a multitude of domains that range from smart healthcare, to mobile commerce, and areas of context sensitive computing. In recent years, a number of empirically grounded, survey-based studies have been conducted to investigate secure development and usage of mHealth apps. However, such studies rely on self reported behaviors documented via interviews or survey questions that lack a practical, i.e. action based approach to monitor and synthesise users actions and behaviors in security critical scenarios. We conducted an empirical study, engaging participants with attack simulation scenarios and analyse their actions, for investigating the security awareness of mHealth app users via action-based research. We simulated some common security attack scenarios in mHealth context and engaged a total of 105 app users to monitor their actions and analyse their behavior. We analysed users data with statistical analysis including reliability and correlations tests, descriptive analysis, and qualitative data analysis. Our results indicate that whilst the minority of our participants perceived access permissions positively, the majority had negative views by indicating that such an app could violate or cost them to lose privacy. Users provide their consent, granting permissions, without a careful review of privacy policies that leads to undesired or malicious access to health critical data. The results also indicated that 73.3% of our participants had denied at least one access permission, and 36% of our participants preferred no authentication method. The study complements existing research on secure usage of mHealth apps, simulates security threats to monitor users actions, and provides empirically grounded guidelines for secure development and usage of mobile health systems. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.06953 [pdf, other]

Collaborative Application Security Testing for DevSecOps: An Empirical Analysis of Challenges, Best Practices and Tool Support

Authors: Roshan Namal Rajapakse, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: DevSecOps is a software development paradigm that places a high emphasis on the culture of collaboration between developers (Dev), security (Sec) and operations (Ops) teams to deliver secure software continuously and rapidly. Adopting this paradigm effectively, therefore, requires an understanding of the challenges, best practices and available solutions for collaboration among these functional te… ▽ More DevSecOps is a software development paradigm that places a high emphasis on the culture of collaboration between developers (Dev), security (Sec) and operations (Ops) teams to deliver secure software continuously and rapidly. Adopting this paradigm effectively, therefore, requires an understanding of the challenges, best practices and available solutions for collaboration among these functional teams. However, collaborative aspects related to these teams have received very little empirical attention in the DevSecOps literature. Hence, we present a study focusing on a key security activity, Application Security Testing (AST), in which practitioners face difficulties performing collaborative work in a DevSecOps environment. Our study made novel use of 48 systematically selected webinars, technical talks and panel discussions as a data source to qualitatively analyse software practitioner discussions on the most recent trends and emerging solutions in this highly evolving field. We find that the lack of features that facilitate collaboration built into the AST tools themselves is a key tool-related challenge in DevSecOps. In addition, the lack of clarity related to role definitions, shared goals, and ownership also hinders Collaborative AST (CoAST). We also captured a range of best practices for collaboration (e.g., Shift-left security), emerging communication methods (e.g., ChatOps), and new team structures (e.g., hybrid teams) for CoAST. Finally, our study identified several requirements for new tool features and specific gap areas for future research to provide better support for CoAST in DevSecOps. △ Less

Submitted 25 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: Submitted to the Empirical Software Engineering journal_v2

arXiv:2210.06679 [pdf, other]

A Survey on UAV-enabled Edge Computing: Resource Management Perspective

Authors: Xiaoyu Xia, Sheik Mohammad Mostakim Fattah, Muhammad Ali Babar

Abstract: Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where… ▽ More Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where traditional terrestrial networks are limited or unavailable. In such environments, UAVs can be deployed as aerial edge servers or relays to facilitate edge computing services. This form of computing is also known as UAV-enabled Edge Computing (UEC), which offers several unique benefits such as mobility, line-of-sight, flexibility, computational capability, and cost-efficiency. However, the resources on UAVs, edge servers, and IoT devices are typically very limited in the context of UEC. Efficient resource management is, therefore, a critical research challenge in UEC. In this article, we present a survey on the existing research in UEC from the resource management perspective. We identify a conceptual architecture, different types of collaborations, wireless communication models, research directions, key techniques and performance indicators for resource management in UEC. We also present a taxonomy of resource management in UEC. Finally, we identify and discuss some open research challenges that can stimulate future research directions for resource management in UEC. △ Less

Submitted 26 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 36 pages, Accepted to ACM CSUR

arXiv:2209.09487 [pdf, other]

Design and Implementation of Fragmented Clouds for Evaluation of Distributed Databases

Authors: Yaser Mansouri, Faheem Ullah, Shagun Dhingra, M. Ali Babar

Abstract: In this paper, we present a Fragmented Hybrid Cloud (FHC) that provides a unified view of multiple geographically distributed private cloud datacenters. FHC leverages a fragmented usage model in which outsourcing is bi-directional across private clouds that can be hosted by static and mobile entities. The mobility aspect of private cloud nodes has important impact on the FHC performance in terms o… ▽ More In this paper, we present a Fragmented Hybrid Cloud (FHC) that provides a unified view of multiple geographically distributed private cloud datacenters. FHC leverages a fragmented usage model in which outsourcing is bi-directional across private clouds that can be hosted by static and mobile entities. The mobility aspect of private cloud nodes has important impact on the FHC performance in terms of latency and network throughput that are reversely proportional to time-varying distances among different nodes. Mobility also results in intermittent interruption among computing nodes and network links of FHC infrastructure. To fully consider mobility and its consequences, we implemented a layered FHC that leverages Linux utilities and bash-shell programming. We also evaluated the impact of the mobility of nodes on the performance of distributed databases as a result of time-varying latency and bandwidth, downsizing and upsizing cluster nodes, and network accessibility. The findings from our extensive experiments provide deep insights into the performance of well-known big data databases, such as Cassandra, MongoDB, Redis, and MySQL, when deployed on a FHC. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.07869 [pdf, other]

LogGD:Detecting Anomalies from System Logs by Graph Neural Networks

Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

Abstract: Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies.… ▽ More Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies. These anomalies are often identified as violations of quantitative relational patterns or sequential patterns of log events in log sequences. However, existing methods fail to leverage the spatial structural relationships among log events, resulting in potential false alarms and unstable performance. In this study, we propose a novel graph-based log anomaly detection method, LogGD, to effectively address the issue by transforming log sequences into graphs. We exploit the powerful capability of Graph Transformer Neural Network, which combines graph structure and node semantics for log-based anomaly detection. We evaluate the proposed method on four widely-used public log datasets. Experimental results show that LogGD can outperform state-of-the-art quantitative-based and sequence-based methods and achieve stable performance under different window size settings. The results confirm that LogGD is effective in log-based anomaly detection. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 12 pages, 12 figures

arXiv:2209.01518 [pdf, other]

doi 10.1145/3551349.3556969

An Empirical Study of Automation in Software Security Patch Management

Authors: Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: Several studies have shown that automated support for different activities of the security patch management process has great potential for reducing delays in installing security patches. However, it is also important to understand how automation is used in practice, its limitations in meeting real-world needs and what practitioners really need, an area that has not been empirically investigated i… ▽ More Several studies have shown that automated support for different activities of the security patch management process has great potential for reducing delays in installing security patches. However, it is also important to understand how automation is used in practice, its limitations in meeting real-world needs and what practitioners really need, an area that has not been empirically investigated in the existing software engineering literature. This paper reports an empirical study aimed at investigating different aspects of automation for security patch management using semi-structured interviews with 17 practitioners from three different organisations in the healthcare domain. The findings are focused on the role of automation in security patch management for providing insights into the as-is state of automation in practice, the limitations of current automation, how automation support can be enhanced to effectively meet practitioners' needs, and the role of the human in an automated process. Based on the findings, we have derived a set of recommendations for directing future efforts aimed at developing automated support for security patch management. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 13 pages, 2 figures

arXiv:2206.10110 [pdf, other]

ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems

Authors: Nguyen Khoi Tran, Bushra Sabir, M. Ali Babar, Nini Cui, Mehran Abolhasan, Justin Lipman

Abstract: Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fai… ▽ More Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted as full paper in ECSA 2022 conference. To be presented

arXiv:2205.07204 [pdf, other]

doi 10.1145/3534526

Mod2Dash: A Framework for Model-Driven Dashboards Generation

Authors: Liuyue Jiang, Nguyen Khoi Tran, M. Ali Babar

Abstract: The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, pr… ▽ More The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, practitioners have to implement this implicit design manually by coding or configuring it on a dashboard platform. This paper proposes Mod2Dash, a software framework that enables practitioners to capture their dashboard designs as models and generate operational dashboards automatically from these models. The framework also provides a GUI-driven customization approach for practitioners to fine-tune the auto-generated dashboards and update their models. With these abilities, Mod2Dash enables practitioners to rapidly prototype and deploy dashboards for both operational and research purposes. We evaluated the framework's effectiveness in a case study on cyber security visualization for decision support. A proof-of-concept of Mod2Dash was employed to model and reconstruct 31 diverse real-world cyber security dashboards. A human-assisted comparison between the Mod2Dash-generated dashboards and the baseline dashboards shows a close matching, indicating the framework's effectiveness for real-world scenarios. △ Less

Submitted 15 May, 2022; originally announced May 2022.

arXiv:2203.12132 [pdf, other]

Runtime Software Patching: Taxonomy, Survey and Future Directions

Authors: Chadni Islam, Victor Prokhorenko, M. Ali Babar

Abstract: Runtime software patching aims to minimize or eliminate service downtime, user interruptions and potential data losses while deploying a patch. Due to modern software systems' high variance and heterogeneity, no universal solutions are available or proposed to deploy and execute patches at runtime. Existing runtime software patching solutions focus on specific cases, scenarios, programming languag… ▽ More Runtime software patching aims to minimize or eliminate service downtime, user interruptions and potential data losses while deploying a patch. Due to modern software systems' high variance and heterogeneity, no universal solutions are available or proposed to deploy and execute patches at runtime. Existing runtime software patching solutions focus on specific cases, scenarios, programming languages and operating systems. This paper aims to identify, investigate and synthesize state-of-the-art runtime software patching approaches and gives an overview of currently unsolved challenges. It further provides insights on multiple aspects of runtime patching approaches such as patch scales, general strategies and responsibilities. This study identifies seven levels of granularity, two key strategies providing a conceptual model of three responsible entities and four capabilities of runtime patching solutions. Through the analysis of the existing literature, this research also reveals open issues hindering more comprehensive adoption of runtime patching in practice. Finally, it proposes several crucial future directions that require further attention from both researchers and practitioners. △ Less

Submitted 22 February, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.10647 [pdf, other]

doi 10.1016/j.jnca.2022.103460

A Framework for Automating Deployment and Evaluation of Blockchain Network

Authors: Nguyen Khoi Tran, M. Ali Babar, Andrew Walters

Abstract: Blockchain network deployment and evaluation have become prevalent due to the demand for private blockchains by enterprises, governments, and edge computing systems. Whilst a blockchain network's deployment and evaluation are driven by its architecture, practitioners still need to learn and carry out many repetitive and error-prone activities to transform architecture into an operational blockchai… ▽ More Blockchain network deployment and evaluation have become prevalent due to the demand for private blockchains by enterprises, governments, and edge computing systems. Whilst a blockchain network's deployment and evaluation are driven by its architecture, practitioners still need to learn and carry out many repetitive and error-prone activities to transform architecture into an operational blockchain network and evaluate it. Greater efficiency could be gained if practitioners focus solely on the architecture design, a valuable and hard-to-automate activity, and leave the implementation steps to an automation framework. This paper proposes an automation framework called NVAL (Network Deployment and Evaluation Framework), which can deploy and evaluate blockchain networks based on their architecture specifications. The key idea of NVAL is reusing and combining the existing automation scripts and utilities of various blockchain types to deploy and evaluate incoming blockchain network architectures. We propose a novel meta-model to capture blockchain network architectures as computer-readable artefacts and employ a state-space search approach to plan and conduct their deployment and evaluation. An evaluative case study shows that NVAL successfully combines seven deployment and evaluation procedures to deploy 65 networks with 12 different architectures and generate 295 evaluation datasets whilst incurring a negligible processing time overhead. △ Less

Submitted 24 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: Published in the Journal of Network and Computer Applications

arXiv:2203.08417 [pdf, other]

On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

Authors: Triet H. M. Le, M. Ali Babar

Abstract: Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize t… ▽ More Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs in 200 real-world projects, we investigate ML models for automating function-level SV assessment tasks, i.e., predicting seven Common Vulnerability Scoring System (CVSS) metrics. We particularly study the value and use of vulnerable statements as inputs for developing the assessment models because SVs in functions are originated in these statements. We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance (Matthews Correlation Coefficient (MCC)) than non-vulnerable statements. Incorporating context of vulnerable statements further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score). Overall, we provide the initial yet promising ML-based baselines for function-level SV assessment, paving the way for further research in this direction. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted as a full paper in the technical track at the 19th International Conference on Mining Software Repositories (MSR) 2022

arXiv:2203.07603 [pdf, other]

SmartValidator: A Framework for Automatic Identification and Classification of Cyber Threat Data

Authors: Chadni Islam, M. Ali Babar, Roland Croft, Helge Janicke

Abstract: A wide variety of Cyber Threat Information (CTI) is used by Security Operation Centres (SOCs) to perform validation of security incidents and alerts. Security experts manually define different types of rules and scripts based on CTI to perform validation tasks. These rules and scripts need to be updated continuously due to evolving threats, changing SOCs' requirements and dynamic nature of CTI. Th… ▽ More A wide variety of Cyber Threat Information (CTI) is used by Security Operation Centres (SOCs) to perform validation of security incidents and alerts. Security experts manually define different types of rules and scripts based on CTI to perform validation tasks. These rules and scripts need to be updated continuously due to evolving threats, changing SOCs' requirements and dynamic nature of CTI. The manual process of updating rules and scripts delays the response to attacks. To reduce the burden of human experts and accelerate response, we propose a novel Artificial Intelligence (AI) based framework, SmartValidator. SmartValidator leverages Machine Learning (ML) techniques to enable automated validation of alerts. It consists of three layers to perform the tasks of data collection, model building and alert validation. It projects the validation task as a classification problem. Instead of building and saving models for all possible requirements, we propose to automatically construct the validation models based on SOC's requirements and CTI. We built a Proof of Concept (PoC) system with eight ML algorithms, two feature engineering techniques and 18 requirements to investigate the effectiveness and efficiency of SmartValidator. The evaluation results showed that when prediction models were built automatically for classifying cyber threat data, the F1-score of 75\% of the models were above 0.8, which indicates adequate performance of the PoC for use in a real-world organization. The results further showed that dynamic construction of prediction models required 99\% less models to be built than pre-building models for all possible requirements. The framework can be followed by various industries to accelerate and automate the validation of alerts and incidents based on their CTI and SOC's preferences. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.05181 [pdf, other]

LineVD: Statement-level Vulnerability Detection using Graph Neural Networks

Authors: David Hin, Andrey Kan, Huaming Chen, M. Ali Babar

Abstract: Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to vulnerabilities. This limits the ability of developers to efficiently inspect and interpret the predictions from a learnt model, which is crucial for integrating mach… ▽ More Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to vulnerabilities. This limits the ability of developers to efficiently inspect and interpret the predictions from a learnt model, which is crucial for integrating machine-learning based tools into the software development workflow. Graph-based models have shown promising performance in function-level vulnerability detection, but their capability for statement-level vulnerability detection has not been extensively explored. While interpreting function-level predictions through explainable AI is one promising direction, we herein consider the statement-level software vulnerability detection task from a fully supervised learning perspective. We propose a novel deep learning framework, LineVD, which formulates statement-level vulnerability detection as a node classification task. LineVD leverages control and data dependencies between statements using graph neural networks, and a transformer-based model to encode the raw source code tokens. In particular, by addressing the conflicting outputs between function-level and statement-level information, LineVD significantly improve the prediction performance without vulnerability status for function code. We have conducted extensive experiments against a large-scale collection of real-world C/C++ vulnerabilities obtained from multiple real-world projects, and demonstrate an increase of 105\% in F1-score over the current state-of-the-art. △ Less

Submitted 25 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: Accepted in the 19th International Conference on Mining Software Repositories Technical Papers

arXiv:2203.04468 [pdf, other]

Noisy Label Learning for Security Defects

Authors: Roland Croft, M. Ali Babar, Huaming Chen

Abstract: Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It… ▽ More Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It results in uncertainty, introduces labeling noise in the datasets and affects conclusion validity. To address this issue, we propose novel learning methods that are robust to label impurities and can leverage the most from limited label data; noisy label learning. We investigate various noisy label learning methods applied to software vulnerability prediction. Specifically, we propose a two-stage learning method based on noise cleaning to identify and remediate the noisy samples, which improves AUC and recall of baselines by up to 8.9% and 23.4%, respectively. Moreover, we discuss several hurdles in terms of achieving a performance upper bound with semi-omniscient knowledge of the label noise. Overall, the experimental results show that learning from noisy labels can be effective for data-driven software and security analytics. △ Less

Submitted 1 April, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: Accepted at MSR 22

arXiv:2202.09016 [pdf, other]

doi 10.1145/3555087

Why, How and Where of Delays in Software Security Patch Management: An Empirical Investigation in the Healthcare Sector

Authors: Nesara Dissanayake, Mansooreh Zahedi, Asangi Jayatilaka, M. Ali Babar

Abstract: Numerous security attacks that resulted in devastating consequences can be traced back to a delay in applying a security patch. Despite the criticality of timely patch application, not much is known about why and how delays occur when applying security patches in practice, and how the delays can be mitigated. Based on longitudinal data collected from 132 delayed patching tasks over a period of fou… ▽ More Numerous security attacks that resulted in devastating consequences can be traced back to a delay in applying a security patch. Despite the criticality of timely patch application, not much is known about why and how delays occur when applying security patches in practice, and how the delays can be mitigated. Based on longitudinal data collected from 132 delayed patching tasks over a period of four years and observations of patch meetings involving eight teams from two organisations in the healthcare domain, and using quantitative and qualitative data analysis approaches, we identify a set of reasons relating to technology, people and organisation as key explanations that cause delays in patching. Our findings also reveal that the most prominent cause of delays is attributable to coordination delays in the patch management process and a majority of delays occur during the patch deployment phase. Towards mitigating the delays, we describe a set of strategies employed by the studied practitioners. This research serves as the first step towards understanding the practical reasons for delays and possible mitigation strategies in vulnerability patch management. Our findings provide useful insights for practitioners to understand what and where improvement is needed in the patch management process and guide them towards taking timely actions against potential attacks. Also, our findings help researchers to invest effort into designing and developing computer-supported tools to better support a timely security patch management process. △ Less

Submitted 3 September, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: 28 pages, 10 figures

arXiv:2201.09006 [pdf, other]

doi 10.1007/s10664-022-10236-0

On the Privacy of Mental Health Apps: An Empirical Investigation and its Implications for Apps Development

Authors: Leonardo Horn Iwaya, M. Ali Babar, Awais Rashid, Chamila Wijayarathna

Abstract: An increasing number of mental health services are offered through mobile systems, a paradigm called mHealth. Although there is an unprecedented growth in the adoption of mHealth systems, partly due to the COVID-19 pandemic, concerns about data privacy risks due to security breaches are also increasing. Whilst some studies have analyzed mHealth apps from different angles, including security, there… ▽ More An increasing number of mental health services are offered through mobile systems, a paradigm called mHealth. Although there is an unprecedented growth in the adoption of mHealth systems, partly due to the COVID-19 pandemic, concerns about data privacy risks due to security breaches are also increasing. Whilst some studies have analyzed mHealth apps from different angles, including security, there is relatively little evidence for data privacy issues that may exist in mHealth apps used for mental health services, whose recipients can be particularly vulnerable. This paper reports an empirical study aimed at systematically identifying and understanding data privacy incorporated in mental health apps. We analyzed 27 top-ranked mental health apps from Google Play Store. Our methodology enabled us to perform an in-depth privacy analysis of the apps, covering static and dynamic analysis, data sharing behaviour, server-side tests, privacy impact assessment requests, and privacy policy evaluation. Furthermore, we mapped the findings to the LINDDUN threat taxonomy, describing how threats manifest on the studied apps. The findings reveal important data privacy issues such as unnecessary permissions, insecure cryptography implementations, and leaks of personal data and credentials in logs and web requests. There is also a high risk of user profiling as the apps' development do not provide foolproof mechanisms against linkability, detectability and identifiability. Data sharing among third parties and advertisers in the current apps' ecosystem aggravates this situation. Based on the empirical findings of this study, we provide recommendations to be considered by different stakeholders of mHealth apps in general and apps developers in particular. [...] △ Less

Submitted 22 January, 2022; originally announced January 2022.

Comments: 40 pages, 13 figures

arXiv:2201.08066 [pdf, other]

NLP Methods in Host-based Intrusion Detection Systems: A Systematic Review and Future Directions

Authors: Zarrin Tasnim Sworna, Zahra Mousavi, Muhammad Ali Babar

Abstract: Host based Intrusion Detection System (HIDS) is an effective last line of defense for defending against cyber security attacks after perimeter defenses (e.g., Network based Intrusion Detection System and Firewall) have failed or been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among the top two most used security tools by Security Operation Centers (SOC) of organizations. Al… ▽ More Host based Intrusion Detection System (HIDS) is an effective last line of defense for defending against cyber security attacks after perimeter defenses (e.g., Network based Intrusion Detection System and Firewall) have failed or been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among the top two most used security tools by Security Operation Centers (SOC) of organizations. Although effective and efficient HIDS is highly desirable for industrial organizations, the evolution of increasingly complex attack patterns causes several challenges resulting in performance degradation of HIDS (e.g., high false alert rate creating alert fatigue for SOC staff). Since Natural Language Processing (NLP) methods are better suited for identifying complex attack patterns, an increasing number of HIDS are leveraging the advances in NLP that have shown effective and efficient performance in precisely detecting low footprint, zero day attacks and predicting the next steps of attackers. This active research trend of using NLP in HIDS demands a synthesized and comprehensive body of knowledge of NLP based HIDS. Thus, we conducted a systematic review of the literature on the end to end pipeline of the use of NLP in HIDS development. For the end to end NLP based HIDS development pipeline, we identify, taxonomically categorize and systematically compare the state of the art of NLP methods usage in HIDS, attacks detected by these NLP methods, datasets and evaluation metrics which are used to evaluate the NLP based HIDS. We highlight the relevant prevalent practices, considerations, advantages and limitations to support the HIDS developers. We also outline the future research directions for the NLP based HIDS development. △ Less

Submitted 19 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2201.07959 [pdf, other]

APIRO: A Framework for Automated Security Tools API Recommendation

Authors: Zarrin Tasnim Sworna, Chadni Islam, Muhammad Ali Babar

Abstract: Security Orchestration, Automation, and Response (SOAR) platforms integrate and orchestrate a wide variety of security tools to accelerate the operational activities of Security Operation Center (SOC). Integration of security tools in a SOAR platform is mostly done manually using APIs, plugins, and scripts. SOC teams need to navigate through API calls of different security tools to find a suitable… ▽ More Security Orchestration, Automation, and Response (SOAR) platforms integrate and orchestrate a wide variety of security tools to accelerate the operational activities of Security Operation Center (SOC). Integration of security tools in a SOAR platform is mostly done manually using APIs, plugins, and scripts. SOC teams need to navigate through API calls of different security tools to find a suitable API to define or update an incident response action. Analyzing various types of API documentation with diverse API format and presentation structure involves significant challenges such as data availability, data heterogeneity, and semantic variation for automatic identification of security tool APIs specific to a particular task. Given these challenges can have negative impact on SOC team's ability to handle security incident effectively and efficiently, we consider it important to devise suitable automated support solutions to address these challenges. We propose a novel learning-based framework for automated security tool API Recommendation for security Orchestration, automation, and response, APIRO. To mitigate data availability constraint, APIRO enriches security tool API description by applying a wide variety of data augmentation techniques. To learn data heterogeneity of the security tools and semantic variation in API descriptions, APIRO consists of an API-specific word embedding model and a Convolutional Neural Network (CNN) model that are used for prediction of top 3 relevant APIs for a task. We experimentally demonstrate the effectiveness of APIRO in recommending APIs for different tasks using 3 security tools and 36 augmentation techniques. Our experimental results demonstrate the feasibility of APIRO for achieving 91.9% Top-1 Accuracy. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2201.04736 [pdf, other]

Security for Machine Learning-based Software Systems: a survey of threats, practices and challenges

Authors: Huaming Chen, M. Ali Babar

Abstract: The rapid development of Machine Learning (ML) has demonstrated superior performance in many areas, such as computer vision, video and speech recognition. It has now been increasingly leveraged in software systems to automate the core tasks. However, how to securely develop the machine learning-based modern software systems (MLBSS) remains a big challenge, for which the insufficient consideration… ▽ More The rapid development of Machine Learning (ML) has demonstrated superior performance in many areas, such as computer vision, video and speech recognition. It has now been increasingly leveraged in software systems to automate the core tasks. However, how to securely develop the machine learning-based modern software systems (MLBSS) remains a big challenge, for which the insufficient consideration will largely limit its application in safety-critical domains. One concern is that the present MLBSS development tends to be rush, and the latent vulnerabilities and privacy issues exposed to external users and attackers will be largely neglected and hard to be identified. Additionally, machine learning-based software systems exhibit different liabilities towards novel vulnerabilities at different development stages from requirement analysis to system maintenance, due to its inherent limitations from the model and data and the external adversary capabilities. The successful generation of such intelligent systems will thus solicit dedicated efforts jointly from different research areas, i.e., software engineering, system security and machine learning. Most of the recent works regarding the security issues for ML have a strong focus on the data and models, which has brought adversarial attacks into consideration. In this work, we consider that security for machine learning-based software systems may arise from inherent system defects or external adversarial attacks, and the secure development practices should be taken throughout the whole lifecycle. While machine learning has become a new threat domain for existing software engineering practices, there is no such review work covering the topic. Overall, we present a holistic review regarding the security for MLBSS, which covers a systematic understanding from a structure review of three distinct aspects in terms of security threats... △ Less

Submitted 17 December, 2023; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: Accepted at ACM Computing Surveys

arXiv:2201.01972 [pdf, other]

A Framework for Energy-aware Evaluation of Distributed Data Processing Platforms in Edge-Cloud Environment

Authors: Faheem Ullah, Imaduddin Mohammed, M. Ali Babar

Abstract: Distributed data processing platforms (e.g., Hadoop, Spark, and Flink) are widely used to distribute the storage and processing of data among computing nodes of a cloud. The centralization of cloud resources has given birth to edge computing, which enables the processing of data closer to the data source instead of sending it to the cloud. However, due to resource constraints such as energy limita… ▽ More Distributed data processing platforms (e.g., Hadoop, Spark, and Flink) are widely used to distribute the storage and processing of data among computing nodes of a cloud. The centralization of cloud resources has given birth to edge computing, which enables the processing of data closer to the data source instead of sending it to the cloud. However, due to resource constraints such as energy limitations, edge computing cannot be used for deploying all kinds of applications. Therefore, tasks are offloaded from an edge device to the more resourceful cloud. Previous research has evaluated the energy consumption of the distributed data processing platforms in the isolated cloud and edge environments. However, there is a paucity of research on evaluating the energy consumption of these platforms in an integrated edge-cloud environment, where tasks are offloaded from a resource-constraint device to a resource-rich device. Therefore, in this paper, we first present a framework for the energy-aware evaluation of the distributed data processing platforms. We then leverage the proposed framework to evaluate the energy consumption of the three most widely used platforms (i.e., Hadoop, Spark, and Flink) in an integrated edge-cloud environment consisting of Raspberry Pi, edge node, edge server node, private cloud, and public cloud. Our evaluation reveals that (i) Flink is most energy-efficient followed by Spark and Hadoop is found least energy-efficient (ii) offloading tasks from resource-constraint to resource-rich devices reduces energy consumption by 55.2%, and (iii) bandwidth and distance between client and server are found key factors impacting the energy consumption. △ Less

Submitted 6 January, 2022; originally announced January 2022.

Showing 1–50 of 103 results for author: Babar, M A