Search | arXiv e-print repository

arXiv:2408.03335 [pdf, other]

Explainable AI-based Intrusion Detection System for Industry 5.0: An Overview of the Literature, associated Challenges, the existing Solutions, and Potential Research Directions

Authors: Naseem Khan, Kashif Ahmad, Aref Al Tamimi, Mohammed M. Alani, Amine Bermak, Issa Khalil

Abstract: Industry 5.0, which focuses on human and Artificial Intelligence (AI) collaboration for performing different tasks in manufacturing, involves a higher number of robots, Internet of Things (IoTs) devices and interconnections, Augmented/Virtual Reality (AR), and other smart devices. The huge involvement of these devices and interconnection in various critical areas, such as economy, health, educatio… ▽ More Industry 5.0, which focuses on human and Artificial Intelligence (AI) collaboration for performing different tasks in manufacturing, involves a higher number of robots, Internet of Things (IoTs) devices and interconnections, Augmented/Virtual Reality (AR), and other smart devices. The huge involvement of these devices and interconnection in various critical areas, such as economy, health, education and defense systems, poses several types of potential security flaws. AI itself has been proven a very effective and powerful tool in different areas of cybersecurity, such as intrusion detection, malware detection, and phishing detection, among others. Just as in many application areas, cybersecurity professionals were reluctant to accept black-box ML solutions for cybersecurity applications. This reluctance pushed forward the adoption of eXplainable Artificial Intelligence (XAI) as a tool that helps explain how decisions are made in ML-based systems. In this survey, we present a comprehensive study of different XAI-based intrusion detection systems for industry 5.0, and we also examine the impact of explainability and interpretability on Cybersecurity practices through the lens of Adversarial XIDS (Adv-XIDS) approaches. Furthermore, we analyze the possible opportunities and challenges in XAI cybersecurity systems for industry 5.0 that elicit future research toward XAI-based solutions to be adopted by high-stakes industry 5.0 applications. We believe this rigorous analysis will establish a foundational framework for subsequent research endeavors within the specified domain. △ Less

Submitted 21 July, 2024; originally announced August 2024.

Comments: 57 pages, 6 figures

arXiv:2406.16986 [pdf, ps, other]

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

Authors: Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce… ▽ More In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce the effectiveness of unlearning. Addressing these limitations, we introduce Mini-Unlearning, a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction mapping. Our method, Mini-Unlearning, utilizes a minimal subset of historical gradients and leverages this contraction mapping to facilitate scalable, efficient unlearning. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security, offering a promising solution for applications requiring robust unlearning capabilities. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2310.16625 [pdf, other]

Power Optimization in Satellite Communication Using Multi-Intelligent Reflecting Surfaces

Authors: Muhammad Ihsan Khalil

Abstract: This study introduces two innovative methodologies aimed at augmenting energy efficiency in satellite-to-ground communication systems through the integration of multiple Reflective Intelligent Surfaces (RISs). The primary objective of these methodologies is to optimize overall energy efficiency under two distinct scenarios. In the first scenario, denoted as Ideal Environment (IE), we enhance energ… ▽ More This study introduces two innovative methodologies aimed at augmenting energy efficiency in satellite-to-ground communication systems through the integration of multiple Reflective Intelligent Surfaces (RISs). The primary objective of these methodologies is to optimize overall energy efficiency under two distinct scenarios. In the first scenario, denoted as Ideal Environment (IE), we enhance energy efficiency by decomposing the problem into two sub-optimal tasks. The initial task concentrates on maximizing power reception by precisely adjusting the phase shift of each RIS element, followed by the implementation of Selective Diversity to identify the RIS element delivering maximal power. The second task entails minimizing power consumption, formulated as a binary linear programming problem, and addressed using the Binary Particle Swarm Optimization (BPSO) technique. The IE scenario presupposes an environment where signals propagate without any path loss, serving as a foundational benchmark for theoretical evaluations that elucidate the systems optimal capabilities. Conversely, the second scenario, termed Non-Ideal Environment (NIE), is designed for situations where signal transmission is subject to path loss. Within this framework, the Adam algorithm is utilized to optimize energy efficiency. This non ideal setting provides a pragmatic assessment of the systems capabilities under conventional operational conditions. Both scenarios emphasize the potential energy savings achievable by the satellite RIS system. Empirical simulations further corroborate the robustness and effectiveness of our approach, highlighting its potential to enhance energy efficiency in satellite-to-ground communication systems. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2308.11754 [pdf, other]

Multi-Instance Adversarial Attack on GNN-Based Malicious Domain Detection

Authors: Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan, Yao Ma

Abstract: Malicious domain detection (MDD) is an open security challenge that aims to detect if an Internet domain is associated with cyber-attacks. Among many approaches to this problem, graph neural networks (GNNs) are deemed highly effective. GNN-based MDD uses DNS logs to represent Internet domains as nodes in a maliciousness graph (DMG) and trains a GNN to infer their maliciousness by leveraging identi… ▽ More Malicious domain detection (MDD) is an open security challenge that aims to detect if an Internet domain is associated with cyber-attacks. Among many approaches to this problem, graph neural networks (GNNs) are deemed highly effective. GNN-based MDD uses DNS logs to represent Internet domains as nodes in a maliciousness graph (DMG) and trains a GNN to infer their maliciousness by leveraging identified malicious domains. Since this method relies on accessible DNS logs to construct DMGs, it exposes a vulnerability for adversaries to manipulate their domain nodes' features and connections within DMGs. Existing research mainly concentrates on threat models that manipulate individual attacker nodes. However, adversaries commonly generate multiple domains to achieve their goals economically and avoid detection. Their objective is to evade discovery across as many domains as feasible. In this work, we call the attack that manipulates several nodes in the DMG concurrently a multi-instance evasion attack. We present theoretical and empirical evidence that the existing single-instance evasion techniques for are inadequate to launch multi-instance evasion attacks against GNN-based MDDs. Therefore, we introduce MintA, an inference-time multi-instance adversarial attack on GNN-based MDDs. MintA enhances node and neighborhood evasiveness through optimized perturbations and operates successfully with only black-box access to the target model, eliminating the need for knowledge about the model's specifics or non-adversary nodes. We formulate an optimization challenge for MintA, achieving an approximate solution. Evaluating MintA on a leading GNN-based MDD technique with real-world data showcases an attack success rate exceeding 80%. These findings act as a warning for security experts, underscoring GNN-based MDDs' susceptibility to practical attacks that can undermine their effectiveness and benefits. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: To Appear in the 45th IEEE Symposium on Security and Privacy (IEEE S\&P 2024), May 20-23, 2024

arXiv:2308.09237 [pdf, other]

doi 10.1145/3591365.3592947

Blockchain-Based and Fuzzy Logic-Enabled False Data Discovery for the Intelligent Autonomous Vehicular System

Authors: Ziaur Rahman, Xun Yi, Ibrahim Khalil, Adnan Anwar, Shantanu Pal

Abstract: Since the beginning of this decade, several incidents report that false data injection attacks targeting intelligent connected vehicles cause huge industrial damage and loss of lives. Data Theft, Flooding, Fuzzing, Hijacking, Malware Spoofing and Advanced Persistent Threats have been immensely growing attack that leads to end-user conflict by abolishing trust on autonomous vehicle. Looking after t… ▽ More Since the beginning of this decade, several incidents report that false data injection attacks targeting intelligent connected vehicles cause huge industrial damage and loss of lives. Data Theft, Flooding, Fuzzing, Hijacking, Malware Spoofing and Advanced Persistent Threats have been immensely growing attack that leads to end-user conflict by abolishing trust on autonomous vehicle. Looking after those sensitive data that contributes to measure the localisation factors of the vehicle, conventional centralised techniques can be misused to update the legitimate vehicular status maliciously. As investigated, the existing centralized false data detection approach based on state and likelihood estimation has a reprehensible trade-off in terms of accuracy, trust, cost, and efficiency. Blockchain with Fuzzy-logic Intelligence has shown its potential to solve localisation issues, trust and false data detection challenges encountered by today's autonomous vehicular system. The proposed Blockchain-based fuzzy solution demonstrates a novel false data detection and reputation preservation technique. The illustrated proposed model filters false and anomalous data based on the vehicles' rules and behaviours. Besides improving the detection accuracy and eliminating the single point of failure, the contributions include appropriating fuzzy AI functions within the Road-side Unit node before authorizing status data by a Blockchain network. Finally, thorough experimental evaluation validates the effectiveness of the proposed model. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 11 pages, 11 figures, 4 tables AsiaCCS conference 2023

MSC Class: 11T71; 68T05 ACM Class: E.3.1; I.2.1

Journal ref: ACM Symposium on Information, Computer and Communications Security (ASIA CCS 2023)

arXiv:2308.05452 [pdf, other]

Optimizing Reconfigurable Intelligent Surfaces for Improved Space-based Communication Amidst Phase Shift Errors

Authors: Muhammad I Khalil

Abstract: Reconfigurable Intelligent Surfaces (RISs) have emerged as a promising technology for enhancing satellite communication systems by manipulating the phase of electromagnetic waves. This study addresses optimising phase shift values (φ_{R}) in RIS networks under both ideal and non-ideal conditions. For ideal scenarios, we introduce a novel approach that simplifies the traditional optimisation method… ▽ More Reconfigurable Intelligent Surfaces (RISs) have emerged as a promising technology for enhancing satellite communication systems by manipulating the phase of electromagnetic waves. This study addresses optimising phase shift values (φ_{R}) in RIS networks under both ideal and non-ideal conditions. For ideal scenarios, we introduce a novel approach that simplifies the traditional optimisation methods for determining the optimal value. Leveraging trigonometric identities and the law of cosines, we create a more tractable formulation for the received power that allows for efficient optimisation of φ_{R}. However, practical applications often grapple with non-ideal conditions. These conditions can introduce phase errors, significantly affecting the received signal and overall system performance. To accommodate these complexities, our optimisation framework extends to include phase errors, which are modelled as a uniform distribution. To solve this optimisation problem, we propose a stochastic framework that harnesses the Monte Carlo method to consider all plausible phase error values. Furthermore, we employ the Broyden Fletcher Goldfarb Shanno (BFGS) algorithm, an iterative method known for its efficacy. This algorithm systematically updates φ_{R} values, incorporating the gradient of the objective function and Hessian matrix approximations. The algorithm also monitors convergence to balance computational complexity and accuracy. The results of the theoretical analysis are illustrated with several examples. As herein demonstrated, the proposed solution offers profound insights into the impacts of phase errors on RIS system performance. It also unveils innovative optimisation strategies for real-world satellite communication scenarios under diverse conditions. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: Ten pages

arXiv:2305.16474 [pdf, other]

FairDP: Certified Fairness with Differential Privacy

Authors: Khang Tran, Ferdinando Fioretto, Issa Khalil, My T. Thai, NhatHai Phan

Abstract: This paper introduces FairDP, a novel mechanism designed to achieve certified fairness with differential privacy (DP). FairDP independently trains models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive… ▽ More This paper introduces FairDP, a novel mechanism designed to achieve certified fairness with differential privacy (DP). FairDP independently trains models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive model that balances privacy, utility, and fairness in downstream tasks. Extensive theoretical and empirical analyses validate the efficacy of FairDP and improved trade-offs between model utility, privacy, and fairness compared with existing methods. △ Less

Submitted 21 August, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.09224 [pdf, other]

doi 10.1109/JIOT.2022.3151982

Privacy-Preserving Ensemble Infused Enhanced Deep Neural Network Framework for Edge Cloud Convergence

Authors: Veronika Stephanie, Ibrahim Khalil, Mohammad Saidur Rahman, Mohammed Atiquzzaman

Abstract: We propose a privacy-preserving ensemble infused enhanced Deep Neural Network (DNN) based learning framework in this paper for Internet-of-Things (IoT), edge, and cloud convergence in the context of healthcare. In the convergence, edge server is used for both storing IoT produced bioimage and hosting DNN algorithm for local model training. The cloud is used for ensembling local models. The DNN-bas… ▽ More We propose a privacy-preserving ensemble infused enhanced Deep Neural Network (DNN) based learning framework in this paper for Internet-of-Things (IoT), edge, and cloud convergence in the context of healthcare. In the convergence, edge server is used for both storing IoT produced bioimage and hosting DNN algorithm for local model training. The cloud is used for ensembling local models. The DNN-based training process of a model with a local dataset suffers from low accuracy, which can be improved by the aforementioned convergence and Ensemble Learning. The ensemble learning allows multiple participants to outsource their local model for producing a generalized final model with high accuracy. Nevertheless, Ensemble Learning elevates the risk of leaking sensitive private data from the final model. The proposed framework presents a Differential Privacy-based privacy-preserving DNN with Transfer Learning for a local model generation to ensure minimal loss and higher efficiency at edge server. We conduct several experiments to evaluate the performance of our proposed framework. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Journal ref: IEEE Internet of Things Journal, vol. 10, no. 5, pp. 3763-3773, 1 March1, 2023

arXiv:2305.09209 [pdf, other]

doi 10.1109/TII.2022.3214998

Trustworthy Privacy-preserving Hierarchical Ensemble and Federated Learning in Healthcare 4.0 with Blockchain

Authors: Veronika Stephanie, Ibrahim Khalil, Mohammed Atiquzzaman, Xun Yi

Abstract: The advancement of Internet and Communication Technologies (ICTs) has led to the era of Industry 4.0. This shift is followed by healthcare industries creating the term Healthcare 4.0. In Healthcare 4.0, the use of IoT-enabled medical imaging devices for early disease detection has enabled medical practitioners to increase healthcare institutions' quality of service. However, Healthcare 4.0 is stil… ▽ More The advancement of Internet and Communication Technologies (ICTs) has led to the era of Industry 4.0. This shift is followed by healthcare industries creating the term Healthcare 4.0. In Healthcare 4.0, the use of IoT-enabled medical imaging devices for early disease detection has enabled medical practitioners to increase healthcare institutions' quality of service. However, Healthcare 4.0 is still lagging in Artificial Intelligence and big data compared to other Industry 4.0 due to data privacy concerns. In addition, institutions' diverse storage and computing capabilities restrict institutions from incorporating the same training model structure. This paper presents a secure multi-party computation-based ensemble federated learning with blockchain that enables heterogeneous models to collaboratively learn from healthcare institutions' data without violating users' privacy. Blockchain properties also allow the party to enjoy data integrity without trust in a centralized server while also providing each healthcare institution with auditability and version control capability. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Journal ref: IEEE Transactions on Industrial Informatics, 2022

arXiv:2305.09134 [pdf, other]

doi 10.1109/TNSM.2023.3276594

Smart Policy Control for Securing Federated Learning Management System

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammed Atiquzzaman

Abstract: The widespread adoption of Internet of Things (IoT) devices in smart cities, intelligent healthcare systems, and various real-world applications have resulted in the generation of vast amounts of data, often analyzed using different Machine Learning (ML) models. Federated learning (FL) has been acknowledged as a privacy-preserving machine learning technology, where multiple parties cooperatively t… ▽ More The widespread adoption of Internet of Things (IoT) devices in smart cities, intelligent healthcare systems, and various real-world applications have resulted in the generation of vast amounts of data, often analyzed using different Machine Learning (ML) models. Federated learning (FL) has been acknowledged as a privacy-preserving machine learning technology, where multiple parties cooperatively train ML models without exchanging raw data. However, the current FL architecture does not allow for an audit of the training process due to the various data-protection policies implemented by each FL participant. Furthermore, there is no global model verifiability available in the current architecture. This paper proposes a smart contract-based policy control for securing the Federated Learning (FL) management system. First, we develop and deploy a smart contract-based local training policy control on the FL participants' side. This policy control is used to verify the training process, ensuring that the evaluation process follows the same rules for all FL participants. We then enforce a smart contract-based aggregation policy to manage the global model aggregation process. Upon completion, the aggregated model and policy are stored on blockchain-based storage. Subsequently, we distribute the aggregated global model and the smart contract to all FL participants. Our proposed method uses smart policy control to manage access and verify the integrity of machine learning models. We conducted multiple experiments with various machine learning architectures and datasets to evaluate our proposed framework, such as MNIST and CIFAR-10. △ Less

Submitted 18 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Journal ref: IEEE Transactions on Network and Service Management, 2023

arXiv:2304.13379 [pdf, other]

doi 10.1007/978-3-031-23020-2_35

Blockchain-based Access Control for Secure Smart Industry Management Systems

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammad Saidur Rahman, Abdelaziz Bouras

Abstract: Smart manufacturing systems involve a large number of interconnected devices resulting in massive data generation. Cloud computing technology has recently gained increasing attention in smart manufacturing systems for facilitating cost-effective service provisioning and massive data management. In a cloud-based manufacturing system, ensuring authorized access to the data is crucial. A cloud platfo… ▽ More Smart manufacturing systems involve a large number of interconnected devices resulting in massive data generation. Cloud computing technology has recently gained increasing attention in smart manufacturing systems for facilitating cost-effective service provisioning and massive data management. In a cloud-based manufacturing system, ensuring authorized access to the data is crucial. A cloud platform is operated under a single authority. Hence, a cloud platform is prone to a single point of failure and vulnerable to adversaries. An internal or external adversary can easily modify users' access to allow unauthorized users to access the data. This paper proposes a role-based access control to prevent modification attacks by leveraging blockchain and smart contracts in a cloud-based smart manufacturing system. The role-based access control is developed to determine users' roles and rights in smart contracts. The smart contracts are then deployed to the private blockchain network. We evaluate our solution by utilizing Ethereum private blockchain network to deploy the smart contract. The experimental results demonstrate the feasibility and evaluation of the proposed framework's performance. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Journal ref: Network and System Security: 16th International Conference, NSS 2022, Denarau Island, Fiji, December, 2022

arXiv:2304.13360 [pdf, other]

doi 10.1109/TETC.2023.3268186

Blockchain-based Federated Learning with SMPC Model Verification Against Poisoning Attack for Healthcare Systems

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Xun Yi

Abstract: Due to the rising awareness of privacy and security in machine learning applications, federated learning (FL) has received widespread attention and applied to several areas, e.g., intelligence healthcare systems, IoT-based industries, and smart cities. FL enables clients to train a global model collaboratively without accessing their local training data. However, the current FL schemes are vulnera… ▽ More Due to the rising awareness of privacy and security in machine learning applications, federated learning (FL) has received widespread attention and applied to several areas, e.g., intelligence healthcare systems, IoT-based industries, and smart cities. FL enables clients to train a global model collaboratively without accessing their local training data. However, the current FL schemes are vulnerable to adversarial attacks. Its architecture makes detecting and defending against malicious model updates difficult. In addition, most recent studies to detect FL from malicious updates while maintaining the model's privacy have not been sufficiently explored. This paper proposed blockchain-based federated learning with SMPC model verification against poisoning attacks for healthcare systems. First, we check the machine learning model from the FL participants through an encrypted inference process and remove the compromised model. Once the participants' local models have been verified, the models are sent to the blockchain node to be securely aggregated. We conducted several experiments with different medical datasets to evaluate our proposed framework. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.13352 [pdf, other]

doi 10.1109/MNET.007.2100717

SMPC-based Federated Learning for 6G enabled Internet of Medical Things

Authors: Aditya Pribadi Kalapaaking, Veronika Stephanie, Ibrahim Khalil, Mohammed Atiquzzaman, Xun Yi, Mahathir Almashor

Abstract: Rapidly developing intelligent healthcare systems are underpinned by Sixth Generation (6G) connectivity, ubiquitous Internet of Things (IoT), and Deep Learning (DL) techniques. This portends a future where 6G powers the Internet of Medical Things (IoMT) with seamless, large-scale, and real-time connectivity amongst entities. This article proposes a Convolutional Neural Network (CNN) based Federate… ▽ More Rapidly developing intelligent healthcare systems are underpinned by Sixth Generation (6G) connectivity, ubiquitous Internet of Things (IoT), and Deep Learning (DL) techniques. This portends a future where 6G powers the Internet of Medical Things (IoMT) with seamless, large-scale, and real-time connectivity amongst entities. This article proposes a Convolutional Neural Network (CNN) based Federated Learning framework that combines Secure Multi-Party Computation (SMPC) based aggregation and Encrypted Inference methods, all within the context of 6G and IoMT. We consider multiple hospitals with clusters of mixed IoMT and edge devices that encrypt locally trained models. Subsequently, each hospital sends the encrypted local models for SMPC-based encrypted aggregation in the cloud, which generates the encrypted global model. Ultimately, the encrypted global model is returned to each edge server for more localized training, further improving model accuracy. Moreover, hospitals can perform encrypted inference on their edge servers or the cloud while maintaining data and model privacy. Multiple experiments were conducted with varying CNN models and datasets to evaluate the proposed framework's performance. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Journal ref: IEEE Network, vol. 36, no. 4, pp. 182-189, July/August 2022

arXiv:2304.12889 [pdf, other]

doi 10.1109/TII.2022.3170348

Blockchain-based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammad Saidur Rahman, Mohammed Atiquzzaman, Xun Yi, Mahathir Almashor

Abstract: This paper proposes a blockchain-based Federated Learning (FL) framework with Intel Software Guard Extension (SGX)-based Trusted Execution Environment (TEE) to securely aggregate local models in Industrial Internet-of-Things (IIoTs). In FL, local models can be tampered with by attackers. Hence, a global model generated from the tampered local models can be erroneous. Therefore, the proposed framew… ▽ More This paper proposes a blockchain-based Federated Learning (FL) framework with Intel Software Guard Extension (SGX)-based Trusted Execution Environment (TEE) to securely aggregate local models in Industrial Internet-of-Things (IIoTs). In FL, local models can be tampered with by attackers. Hence, a global model generated from the tampered local models can be erroneous. Therefore, the proposed framework leverages a blockchain network for secure model aggregation. Each blockchain node hosts an SGX-enabled processor that securely performs the FL-based aggregation tasks to generate a global model. Blockchain nodes can verify the authenticity of the aggregated model, run a blockchain consensus mechanism to ensure the integrity of the model, and add it to the distributed ledger for tamper-proof storage. Each cluster can obtain the aggregated model from the blockchain and verify its integrity before using it. We conducted several experiments with different CNN models and datasets to evaluate the performance of the proposed framework. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Journal ref: IEEE Transactions on Industrial Informatics, vol. 19, no. 2, pp. 1703-1714, Feb. 2023

arXiv:2304.08429

Security and Privacy Issues for Urban Smart Traffic Infrastructure

Authors: Anubhab Baksi, Ahmed Ibrahim Samir Khalil, Anupam Chattopadhyay

Abstract: In recent times, the research works relating to smart traffic infrastructure have gained serious attention. As a result, research has been carried out in multiple directions to ensure that such infrastructure can improve upon our existing (mostly) human-controlled traffic infrastructure, without violating the safety margins. For this reason, cyber security issues of such infrastructure are of para… ▽ More In recent times, the research works relating to smart traffic infrastructure have gained serious attention. As a result, research has been carried out in multiple directions to ensure that such infrastructure can improve upon our existing (mostly) human-controlled traffic infrastructure, without violating the safety margins. For this reason, cyber security issues of such infrastructure are of paramount interest. Keeping this in mind, we conduct a review of existing models, their vulnerabilities and how such vulnerabilities can be handled. Our work covers a vast area from the domain of security, starting from the theoretical notions of cryptography to the real-life adaptation of them. At the same time, we also consider the security issues that may arise due to the usage of artificial intelligence/machine learning in the infrastructure. We believe that our work will help future researchers to gain a comprehensive yet concise look at cyber security for smart traffic infrastructure. △ Less

Submitted 27 September, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: The study is partly outdated

arXiv:2212.14102 [pdf, other]

Customizing Knowledge Graph Embedding to Improve Clinical Study Recommendation

Authors: Xiong Liu, Iya Khalil, Murthy Devarakonda

Abstract: Inferring knowledge from clinical trials using knowledge graph embedding is an emerging area. However, customizing graph embeddings for different use cases remains a significant challenge. We propose custom2vec, an algorithmic framework to customize graph embeddings by incorporating user preferences in training the embeddings. It captures user preferences by adding custom nodes and links derived f… ▽ More Inferring knowledge from clinical trials using knowledge graph embedding is an emerging area. However, customizing graph embeddings for different use cases remains a significant challenge. We propose custom2vec, an algorithmic framework to customize graph embeddings by incorporating user preferences in training the embeddings. It captures user preferences by adding custom nodes and links derived from manually vetted results of a separate information retrieval method. We propose a joint learning objective to preserve the original network structure while incorporating the user's custom annotations. We hypothesize that the custom training improves user-expected predictions, for example, in link prediction tasks. We demonstrate the effectiveness of custom2vec for clinical trials related to non-small cell lung cancer (NSCLC) with two customization scenarios: recommending immuno-oncology trials evaluating PD-1 inhibitors and exploring similar trials that compare new therapies with a standard of care. The results show that custom2vec training achieves better performance than the conventional training methods. Our approach is a novel way to customize knowledge graph embeddings and enable more accurate recommendations and predictions. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2212.04951 [pdf, other]

EEG-NeXt: A Modernized ConvNet for The Classification of Cognitive Activity from EEG

Authors: Andac Demir, Iya Khalil, Bulent Kiziltan

Abstract: One of the main challenges in electroencephalogram (EEG) based brain-computer interface (BCI) systems is learning the subject/session invariant features to classify cognitive activities within an end-to-end discriminative setting. We propose a novel end-to-end machine learning pipeline, EEG-NeXt, which facilitates transfer learning by: i) aligning the EEG trials from different subjects in the Eucl… ▽ More One of the main challenges in electroencephalogram (EEG) based brain-computer interface (BCI) systems is learning the subject/session invariant features to classify cognitive activities within an end-to-end discriminative setting. We propose a novel end-to-end machine learning pipeline, EEG-NeXt, which facilitates transfer learning by: i) aligning the EEG trials from different subjects in the Euclidean-space, ii) tailoring the techniques of deep learning for the scalograms of EEG signals to capture better frequency localization for low-frequency, longer-duration events, and iii) utilizing pretrained ConvNeXt (a modernized ResNet architecture which supersedes state-of-the-art (SOTA) image classification models) as the backbone network via adaptive finetuning. On publicly available datasets (Physionet Sleep Cassette and BNCI2014001) we benchmark our method against SOTA via cross-subject validation and demonstrate improved accuracy in cognitive activity classification along with better generalizability across cohorts. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2211.05766 [pdf, other]

Heterogeneous Randomized Response for Differential Privacy in Graph Neural Networks

Authors: Khang Tran, Phung Lai, NhatHai Phan, Issa Khalil, Yao Ma, Abdallah Khreishah, My Thai, Xintao Wu

Abstract: Graph neural networks (GNNs) are susceptible to privacy inference attacks (PIAs), given their ability to learn joint representation from features and edges among nodes in graph data. To prevent privacy leakages in GNNs, we propose a novel heterogeneous randomized response (HeteroRR) mechanism to protect nodes' features and edges against PIAs under differential privacy (DP) guarantees without an un… ▽ More Graph neural networks (GNNs) are susceptible to privacy inference attacks (PIAs), given their ability to learn joint representation from features and edges among nodes in graph data. To prevent privacy leakages in GNNs, we propose a novel heterogeneous randomized response (HeteroRR) mechanism to protect nodes' features and edges against PIAs under differential privacy (DP) guarantees without an undue cost of data and model utility in training GNNs. Our idea is to balance the importance and sensitivity of nodes' features and edges in redistributing the privacy budgets since some features and edges are more sensitive or important to the model utility than others. As a result, we derive significantly better randomization probabilities and tighter error bounds at both levels of nodes' features and edges departing from existing approaches, thus enabling us to maintain high data utility for training GNNs. An extensive theoretical and empirical analysis using benchmark datasets shows that HeteroRR significantly outperforms various baselines in terms of model utility under rigorous privacy protection for both nodes' features and edges. That enables us to defend PIAs in DP-preserving GNNs effectively. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted in IEEE BigData 2022 (short paper)

arXiv:2210.01797 [pdf, other]

Ten Years after ImageNet: A 360° Perspective on AI

Authors: Sanjay Chawla, Preslav Nakov, Ahmed Ali, Wendy Hall, Issa Khalil, Xiaosong Ma, Husrev Taha Sencar, Ingmar Weber, Michael Wooldridge, Ting Yu

Abstract: It is ten years since neural networks made their spectacular comeback. Prompted by this anniversary, we take a holistic perspective on Artificial Intelligence (AI). Supervised Learning for cognitive tasks is effectively solved - provided we have enough high-quality labeled data. However, deep neural network models are not easily interpretable, and thus the debate between blackbox and whitebox mode… ▽ More It is ten years since neural networks made their spectacular comeback. Prompted by this anniversary, we take a holistic perspective on Artificial Intelligence (AI). Supervised Learning for cognitive tasks is effectively solved - provided we have enough high-quality labeled data. However, deep neural network models are not easily interpretable, and thus the debate between blackbox and whitebox modeling has come to the fore. The rise of attention networks, self-supervised learning, generative modeling, and graph neural networks has widened the application space of AI. Deep Learning has also propelled the return of reinforcement learning as a core building block of autonomous decision making systems. The possible harms made possible by new AI technologies have raised socio-technical issues such as transparency, fairness, and accountability. The dominance of AI by Big-Tech who control talent, computing resources, and most importantly, data may lead to an extreme AI divide. Failure to meet high expectations in high profile, and much heralded flagship projects like self-driving vehicles could trigger another AI winter. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.13848 [pdf]

Deep Learning based Automatic Quantification of Urethral Plate Quality using the Plate Objective Scoring Tool (POST)

Authors: Tariq O. Abbas, Mohamed AbdelMoniem, Ibrahim Khalil, Md Sakib Abrar Hossain, Muhammad E. H. Chowdhury

Abstract: Objectives: To explore the capacity of deep learning algorithm to further streamline and optimize urethral plate (UP) quality appraisal on 2D images using the plate objective scoring tool (POST), aiming to increase the objectivity and reproducibility of UP appraisal in hypospadias repair. Methods: The five key POST landmarks were marked by specialists in a 691-image dataset of prepubertal boys und… ▽ More Objectives: To explore the capacity of deep learning algorithm to further streamline and optimize urethral plate (UP) quality appraisal on 2D images using the plate objective scoring tool (POST), aiming to increase the objectivity and reproducibility of UP appraisal in hypospadias repair. Methods: The five key POST landmarks were marked by specialists in a 691-image dataset of prepubertal boys undergoing primary hypospadias repair. This dataset was then used to develop and validate a deep learning-based landmark detection model. The proposed framework begins with glans localization and detection, where the input image is cropped using the predicted bounding box. Next, a deep convolutional neural network (CNN) architecture is used to predict the coordinates of the five POST landmarks. These predicted landmarks are then used to assess UP quality in distal hypospadias. Results: The proposed model accurately localized the glans area, with a mean average precision (mAP) of 99.5% and an overall sensitivity of 99.1%. A normalized mean error (NME) of 0.07152 was achieved in predicting the coordinates of the landmarks, with a mean squared error (MSE) of 0.001 and a 20.2% failure rate at a threshold of 0.1 NME. Conclusions: This deep learning application shows robustness and high precision in using POST to appraise UP quality. Further assessment using international multi-centre image-based databases is ongoing. External validation could benefit deep learning algorithms and lead to better assessments, decision-making and predictions for surgical outcomes. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 20 pages, 5 figures, 1 table

arXiv:2209.01721 [pdf, other]

An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

Authors: Guanxiong Liu, Abdallah Khreishah, Fatima Sharadgah, Issa Khalil

Abstract: Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of N… ▽ More Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or is able to run back-propagation through it. In this work, we propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN. TrojDef tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed by random noise. We derive a function based on the prediction outputs which is called the prediction confidence bound to decide whether the input example is Trojan or not. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when augmented with noise due to the perturbation of the classification features. Through mathematical analysis, we show that if the attacker is perfect in injecting the backdoor, the Trojan infected model will be trained to learn the appropriate prediction confidence bound, which is used to distinguish Trojan and benign inputs under arbitrary perturbations. However, because the attacker might not be perfect in injecting the backdoor, we introduce a nonlinear transform to the prediction confidence bound to improve the detection accuracy in practical settings. Extensive empirical evaluations show that TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings, even when the classifier architecture, the training process, or the hyper-parameters change. △ Less

Submitted 4 September, 2022; originally announced September 2022.

arXiv:2206.05679 [pdf, other]

Exploration of Enterprise Server Data to Assess Ease of Modeling System Behavior

Authors: Enes Altinisik, Husrev Taha Sencar, Mohamed Nabeel, Issa Khalil, Ting Yu

Abstract: Enterprise networks are one of the major targets for cyber attacks due to the vast amount of sensitive and valuable data they contain. A common approach to detecting attacks in the enterprise environment relies on modeling the behavior of users and systems to identify unexpected deviations. The feasibility of this approach crucially depends on how well attack-related events can be isolated from be… ▽ More Enterprise networks are one of the major targets for cyber attacks due to the vast amount of sensitive and valuable data they contain. A common approach to detecting attacks in the enterprise environment relies on modeling the behavior of users and systems to identify unexpected deviations. The feasibility of this approach crucially depends on how well attack-related events can be isolated from benign and mundane system activities. Despite the significant focus on end-user systems, the background behavior of servers running critical services for the enterprise is less studied. To guide the design of detection methods tailored for servers, in this work, we examine system event records from 46 servers in a large enterprise obtained over a duration of ten weeks. We analyze the rareness characteristics and the similarity of the provenance relations in the event log data. Our findings show that server activity, in general, is highly variant over time and dissimilar across different types of servers. However, careful consideration of profiling window of historical events and service level grouping of servers improve rareness measurements by 24.5%. Further, utilizing better contextual representations, the similarity in provenance relationships could be improved. An important implication of our findings is that detection techniques developed considering experimental setups with non-representative characteristics may perform poorly in practice. △ Less

Submitted 12 June, 2022; originally announced June 2022.

arXiv:2205.13155 [pdf, other]

A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs

Authors: Euijin Choo, Mohamed Nabeel, Ravindu De Silva, Ting Yu, Issa Khalil

Abstract: VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individ… ▽ More VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individual VT scanners are known to be noisy in terms of their detection and attack type classification. To obtain high quality ground truth of URLs and actively take proper actions to mitigate different types of attacks, there are two challenges: (1) how to decide whether a given URL is malicious given noisy reports and (2) how to determine attack types (e.g., phishing or malware hosting) that the URL is involved in, given conflicting attack labels from different scanners. In this work, we provide a systematic comparative study on the behavior of VT scanners for different attack types of URLs. A common practice to decide the maliciousness is to use a cut-off threshold of scanners that report the URL as malicious. However, in this work, we show that using a fixed threshold is suboptimal, due to several reasons: (1) correlations between scanners; (2) lead/lag behavior; (3) the specialty of scanners; (4) the quality and reliability of scanners. A common practice to determine an attack type is to use majority voting. However, we show that majority voting could not accurately classify the attack type of a URL due to the bias from correlated scanners. Instead, we propose a machine learning-based approach to assign an attack type to URLs given the VT reports. △ Less

Submitted 26 May, 2022; originally announced May 2022.

arXiv:2204.02654 [pdf, other]

Adversarial Analysis of the Differentially-Private Federated Learning in Cyber-Physical Critical Infrastructures

Authors: Md Tamjid Hossain, Shahriar Badsha, Hung La, Haoting Shen, Shafkat Islam, Ibrahim Khalil, Xun Yi

Abstract: Federated Learning (FL) has become increasingly popular to perform data-driven analysis in cyber-physical critical infrastructures. Since the FL process may involve the client's confidential information, Differential Privacy (DP) has been proposed lately to secure it from adversarial inference. However, we find that while DP greatly alleviates the privacy concerns, the additional DP-noise opens a… ▽ More Federated Learning (FL) has become increasingly popular to perform data-driven analysis in cyber-physical critical infrastructures. Since the FL process may involve the client's confidential information, Differential Privacy (DP) has been proposed lately to secure it from adversarial inference. However, we find that while DP greatly alleviates the privacy concerns, the additional DP-noise opens a new threat for model poisoning in FL. Nonetheless, very little effort has been made in the literature to investigate this adversarial exploitation of the DP-noise. To overcome this gap, in this paper, we present a novel adaptive model poisoning technique α-MPELM} through which an attacker can exploit the additional DP-noise to evade the state-of-the-art anomaly detection techniques and prevent optimal convergence of the FL model. We evaluate our proposed attack on the state-of-the-art anomaly detection approaches in terms of detection accuracy and validation loss. The main significance of our proposed α-MPELM attack is that it reduces the state-of-the-art anomaly detection accuracy by 6.8% for norm detection, 12.6% for accuracy detection, and 13.8% for mix detection. Furthermore, we propose a Reinforcement Learning-based DP level selection process to defend α-MPELM attack. The experimental results confirm that our defense mechanism converges to an optimal privacy policy without human maneuver. △ Less

Submitted 1 December, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: 16 pages, 9 figures, 5 tables. This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2202.06053 [pdf, other]

Local Differential Privacy for Federated Learning

Authors: M. A. P. Chamikara, Dongxi Liu, Seyit Camtepe, Surya Nepal, Marthie Grobler, Peter Bertok, Ibrahim Khalil

Abstract: Advanced adversarial attacks such as membership inference and model memorization can make federated learning (FL) vulnerable and potentially leak sensitive private data. Local differentially private (LDP) approaches are gaining more popularity due to stronger privacy notions and native support for data distribution compared to other differentially private (DP) solutions. However, DP approaches ass… ▽ More Advanced adversarial attacks such as membership inference and model memorization can make federated learning (FL) vulnerable and potentially leak sensitive private data. Local differentially private (LDP) approaches are gaining more popularity due to stronger privacy notions and native support for data distribution compared to other differentially private (DP) solutions. However, DP approaches assume that the FL server (that aggregates the models) is honest (run the FL protocol honestly) or semi-honest (run the FL protocol honestly while also trying to learn as much information as possible). These assumptions make such approaches unrealistic and unreliable for real-world settings. Besides, in real-world industrial environments (e.g., healthcare), the distributed entities (e.g., hospitals) are already composed of locally running machine learning models (this setting is also referred to as the cross-silo setting). Existing approaches do not provide a scalable mechanism for privacy-preserving FL to be utilized under such settings, potentially with untrusted parties. This paper proposes a new local differentially private FL (named LDPFL) protocol for industrial settings. LDPFL can run in industrial settings with untrusted entities while enforcing stronger privacy guarantees than existing approaches. LDPFL shows high FL model performance (up to 98%) under small privacy budgets (e.g., epsilon = 0.5) in comparison to existing methods. △ Less

Submitted 3 August, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

Comments: 17 pages

arXiv:2201.12727 [pdf, other]

doi 10.1109/JIOT.2022.3147186

Blockchain based AI-enabled Industry 4.0 CPS Protection against Advanced Persistent Threat

Authors: Ziaur Rahman, Xun Yi Ibrahim Khalil

Abstract: Industry 4.0 is all about doing things in a concurrent, secure, and fine-grained manner. IoT edge-sensors and their associated data play a predominant role in today's industry ecosystem. Breaching data or forging source devices after injecting advanced persistent threats (APT) damages the industry owners' money and loss of operators' lives. The existing challenges include APT injection attacks tar… ▽ More Industry 4.0 is all about doing things in a concurrent, secure, and fine-grained manner. IoT edge-sensors and their associated data play a predominant role in today's industry ecosystem. Breaching data or forging source devices after injecting advanced persistent threats (APT) damages the industry owners' money and loss of operators' lives. The existing challenges include APT injection attacks targeting vulnerable edge devices, insecure data transportation, trust inconsistencies among stakeholders, incompliant data storing mechanisms, etc. Edge-servers often suffer because of their lightweight computation capacity to stamp out unauthorized data or instructions, which in essence, makes them exposed to attackers. When attackers target edge servers while transporting data using traditional PKI-rendered trusts, consortium blockchain (CBC) offers proven techniques to transfer and maintain those sensitive data securely. With the recent improvement of edge machine learning, edge devices can filter malicious data at their end which largely motivates us to institute a Blockchain and AI aligned APT detection system. The unique contributions of the paper include efficient APT detection at the edge and transparent recording of the detection history in an immutable blockchain ledger. In line with that, the certificateless data transfer mechanism boost trust among collaborators and ensure an economical and sustainable mechanism after eliminating existing certificate authority. Finally, the edge-compliant storage technique facilitates efficient predictive maintenance. The respective experimental outcomes reveal that the proposed technique outperforms the other competing systems and models. △ Less

Submitted 30 January, 2022; originally announced January 2022.

Comments: 10 Pages, 9 Figures, 3 Tables Published in the IEEE Internet of Things Journal

ACM Class: I.2; J.6

Journal ref: IEEE Internet of Things Journal Jan 2022

arXiv:2201.07063 [pdf, other]

How to Backdoor HyperNetwork in Personalized Federated Learning?

Authors: Phung Lai, NhatHai Phan, Issa Khalil, Abdallah Khreishah, Xintao Wu

Abstract: This paper explores previously unknown backdoor risks in HyperNet-based personalized federated learning (HyperNetFL) through poisoning attacks. Based upon that, we propose a novel model transferring attack (called HNTroj), i.e., the first of its kind, to transfer a local backdoor infected model to all legitimate and personalized local models, which are generated by the HyperNetFL model, through co… ▽ More This paper explores previously unknown backdoor risks in HyperNet-based personalized federated learning (HyperNetFL) through poisoning attacks. Based upon that, we propose a novel model transferring attack (called HNTroj), i.e., the first of its kind, to transfer a local backdoor infected model to all legitimate and personalized local models, which are generated by the HyperNetFL model, through consistent and effective malicious local gradients computed across all compromised clients in the whole training process. As a result, HNTroj reduces the number of compromised clients needed to successfully launch the attack without any observable signs of sudden shifts or degradation regarding model utility on legitimate data samples making our attack stealthy. To defend against HNTroj, we adapted several backdoor-resistant FL training algorithms into HyperNetFL. An extensive experiment that is carried out using several benchmark datasets shows that HNTroj significantly outperforms data poisoning and model replacement attacks and bypasses robust training algorithms even with modest numbers of compromised clients. △ Less

Submitted 11 December, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

arXiv:2112.11547 [pdf, other]

Decompose the Sounds and Pixels, Recompose the Events

Authors: Varshanth R. Rao, Md Ibrahim Khalil, Haoda Li, Peng Dai, Juwei Lu

Abstract: In this paper, we propose a framework centering around a novel architecture called the Event Decomposition Recomposition Network (EDRNet) to tackle the Audio-Visual Event (AVE) localization problem in the supervised and weakly supervised settings. AVEs in the real world exhibit common unravelling patterns (termed as Event Progress Checkpoints (EPC)), which humans can perceive through the cooperati… ▽ More In this paper, we propose a framework centering around a novel architecture called the Event Decomposition Recomposition Network (EDRNet) to tackle the Audio-Visual Event (AVE) localization problem in the supervised and weakly supervised settings. AVEs in the real world exhibit common unravelling patterns (termed as Event Progress Checkpoints (EPC)), which humans can perceive through the cooperation of their auditory and visual senses. Unlike earlier methods which attempt to recognize entire event sequences, the EDRNet models EPCs and inter-EPC relationships using stacked temporal convolutions. Based on the postulation that EPC representations are theoretically consistent for an event category, we introduce the State Machine Based Video Fusion, a novel augmentation technique that blends source videos using different EPC template sequences. Additionally, we design a new loss function called the Land-Shore-Sea loss to compactify continuous foreground and background representations. Lastly, to alleviate the issue of confusing events during weak supervision, we propose a prediction stabilization method called Bag to Instance Label Correction. Experiments on the AVE dataset show that our collective framework outperforms the state-of-the-art by a sizable margin. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: Accepted at AAAI 2022

arXiv:2111.11161 [pdf, other]

doi 10.1007/978-3-030-91424-0_11

Chaos and Logistic Map based Key Generation Technique for AES-driven IoT Security

Authors: Ziaur Rahman, Xun Yi, Ibrahim Khalil, Mousumi Sumi

Abstract: Several efforts have been seen claiming the lightweight block ciphers as a necessarily suitable substitute in securing the Internet of Things. Currently, it has been able to envisage as a pervasive frame of reference almost all across the privacy preserving of smart and sensor-oriented appliances. Different approaches are likely to be inefficient, bringing desired degree of security considering th… ▽ More Several efforts have been seen claiming the lightweight block ciphers as a necessarily suitable substitute in securing the Internet of Things. Currently, it has been able to envisage as a pervasive frame of reference almost all across the privacy preserving of smart and sensor-oriented appliances. Different approaches are likely to be inefficient, bringing desired degree of security considering the easiness and surely the process of simplicity but security. Strengthening the well-known symmetric key and block dependent algorithm using either chaos motivated logistic map or elliptic curve has shown a far-reaching potential to be a discretion in secure real-time communication. The popular feature of logistic maps, such as the un-foreseeability and randomness often expected to be used in dynamic key-propagation in sync with chaos and scheduling technique towards data integrity. As a bit alternation in keys, able to come up with oversize deviation, also would have consequence to leverage data confidentiality. Henceforth it may have proximity to time consumption, which may lead to a challenge to make sure instant data exchange between participating node entities. In consideration of delay latency required to both secure encryption and decryption, the proposed approach suggests a modification on the key-origination matrix along with S-box. It has plausibly been taken us to this point that the time required proportionate to the plain-text sent while the plain-text disproportionate to the probability happening a letter on the message made. In line with that the effort so far sought how apparent chaos escalates the desired key-initiation before message transmission. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 17 Pages, 3 Tables, 7 Figures, Conference

ACM Class: H.4.1

arXiv:2111.11158 [pdf, other]

doi 10.1007/978-3-030-91424-0_4

Blockchain for IoT: A Critical Analysis Concerning Performance and Scalability

Authors: Ziaur Rahman, Xun Yi, Ibrahim Khalil, Andrei Kelarev

Abstract: The world has been experiencing a mind-blowing expansion of blockchain technology since it was first introduced as an emerging means of cryptocurrency called bitcoin. Currently, it has been regarded as a pervasive frame of reference across almost all research domains, ranging from virtual cash to agriculture or even supply-chain to the Internet of Things. The ability to have a self-administering r… ▽ More The world has been experiencing a mind-blowing expansion of blockchain technology since it was first introduced as an emerging means of cryptocurrency called bitcoin. Currently, it has been regarded as a pervasive frame of reference across almost all research domains, ranging from virtual cash to agriculture or even supply-chain to the Internet of Things. The ability to have a self-administering register with legitimate immutability makes blockchain appealing for the Internet of Things (IoT). As billions of IoT devices are now online in distributed fashion, the huge challenges and questions require to addressed in pursuit of urgently needed solutions. The present paper has been motivated by the aim of facilitating such efforts. The contribution of this work is to figure out those trade-offs the IoT ecosystem usually encounters because of the wrong choice of blockchain technology. Unlike a survey or review, the critical findings of this paper target sorting out specific security challenges of blockchain-IoT Infrastructure. The contribution includes how to direct developers and researchers in this domain to pick out the unblemished combinations of Blockchain enabled IoT applications. In addition, the paper promises to bring a deep insight on Ethereum, Hyperledger blockchain and IOTA technology to show their limitations and prospects in terms of performance and scalability. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 18 Pages, 9 Figures, 1 Table Conference

ACM Class: H.1.1

arXiv:2110.10027 [pdf]

Clinical Trial Information Extraction with BERT

Authors: Xiong Liu, Greg L. Hersch, Iya Khalil, Murthy Devarakonda

Abstract: Natural language processing (NLP) of clinical trial documents can be useful in new trial design. Here we identify entity types relevant to clinical trial design and propose a framework called CT-BERT for information extraction from clinical trial text. We trained named entity recognition (NER) models to extract eligibility criteria entities by fine-tuning a set of pre-trained BERT models. We then… ▽ More Natural language processing (NLP) of clinical trial documents can be useful in new trial design. Here we identify entity types relevant to clinical trial design and propose a framework called CT-BERT for information extraction from clinical trial text. We trained named entity recognition (NER) models to extract eligibility criteria entities by fine-tuning a set of pre-trained BERT models. We then compared the performance of CT-BERT with recent baseline methods including attention-based BiLSTM and Criteria2Query. The results demonstrate the superiority of CT-BERT in clinical trial NLP. △ Less

Submitted 11 September, 2021; originally announced October 2021.

Comments: HealthNLP 2021, IEEE International Conference on Healthcare Informatics (ICHI 2021)

arXiv:2109.02808 [pdf, other]

A Scalable AI Approach for Clinical Trial Cohort Optimization

Authors: Xiong Liu, Cheng Shi, Uday Deore, Yingbo Wang, Myah Tran, Iya Khalil, Murthy Devarakonda

Abstract: FDA has been promoting enrollment practices that could enhance the diversity of clinical trial populations, through broadening eligibility criteria. However, how to broaden eligibility remains a significant challenge. We propose an AI approach to Cohort Optimization (AICO) through transformer-based natural language processing of the eligibility criteria and evaluation of the criteria using real-wo… ▽ More FDA has been promoting enrollment practices that could enhance the diversity of clinical trial populations, through broadening eligibility criteria. However, how to broaden eligibility remains a significant challenge. We propose an AI approach to Cohort Optimization (AICO) through transformer-based natural language processing of the eligibility criteria and evaluation of the criteria using real-world data. The method can extract common eligibility criteria variables from a large set of relevant trials and measure the generalizability of trial designs to real-world patients. It overcomes the scalability limits of existing manual methods and enables rapid simulation of eligibility criteria design for a disease of interest. A case study on breast cancer trial design demonstrates the utility of the method in improving trial generalizability. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: PharML 2021 (Machine Learning for Pharma and Healthcare Applications) at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

arXiv:2109.01275 [pdf, other]

A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples

Authors: Guanxiong Liu, Issa Khalil, Abdallah Khreishah, NhatHai Phan

Abstract: In this work, we show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of… ▽ More In this work, we show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model. We leverage adversarial noise in the input space to move Trojan-infected examples across the model decision boundary, making it difficult to detect. The stealthiness behavior of AdvTrojan fools the users into accidentally trust the infected model as a robust classifier against adversarial examples. AdvTrojan can be implemented by only poisoning the training data similar to conventional Trojan backdoor attacks. Our thorough analysis and extensive experiments on several benchmark datasets show that AdvTrojan can bypass existing defenses with a success rate close to 100% in most of our experimental scenarios and can be extended to attack federated learning tasks as well. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2106.13339 [pdf, other]

doi 10.1109/MCOM.001.2000679.

Blockchain-based Security Framework for Critical Industry 4.0 Cyber-physical System

Authors: Ziaur Rahman, Ibrahim Khalil, Xun Yi, Mohammed Atiquzzaman

Abstract: There has been an intense concern for security alternatives because of the recent rise of cyber attacks, mainly targeting critical systems such as industry, medical, or energy ecosystem. Though the latest industry infrastructures largely depend on AI-driven maintenance, the prediction based on corrupted data undoubtedly results in loss of life and capital. Admittedly, an inadequate data-protection… ▽ More There has been an intense concern for security alternatives because of the recent rise of cyber attacks, mainly targeting critical systems such as industry, medical, or energy ecosystem. Though the latest industry infrastructures largely depend on AI-driven maintenance, the prediction based on corrupted data undoubtedly results in loss of life and capital. Admittedly, an inadequate data-protection mechanism can readily challenge the security and reliability of the network. The shortcomings of the conventional cloud or trusted certificate-driven techniques have motivated us to exhibit a unique Blockchain-based framework for a secure and efficient industry 4.0 system. The demonstrated framework obviates the long-established certificate authority after enhancing the consortium Blockchain that reduces the data processing delay, and increases cost-effective throughput. Nonetheless, the distributed industry 4.0 security model entails cooperative trust than depending on a single party, which in essence indulges the costs and threat of the single point of failure. Therefore, multi-signature technique of the proposed framework accomplishes the multi-party authentication, which confirms its applicability for the real-time and collaborative cyber-physical system. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: 07 Pages, 4 Figures, IEEE Communication Magazine

ACM Class: E.3

Journal ref: in IEEE Communications Magazine, vol. 59, no. 5, pp. 128-134, May 2021

arXiv:2106.07466 [pdf, other]

doi 10.1063/5.0098008

An open-source automated magnetic optical density meter for analysis of suspensions of magnetic cells and particles

Authors: Marcel K. Welleweerd, Tijmen Hageman, Marc Pichel, Dave van As, Hans Keizer, Jordi Hendrix, Mina M. Micheal, Islam S. M. Khalil, Alveena Mir, Nuriye Korkmaz, Robbert Kräwinkel, Daniel Chevrier, Damien Faivre, Alfred Fernandez-Castane, Daniel Pfeiffer, Leon Abelmann

Abstract: We present a spectrophotometer (optical density meter) combined with electromagnets dedicated to the analysis of suspensions of magnetotactic bacteria. The instrument can also be applied to suspensions of other magnetic cells and magnetic particles. We have ensured that our system, called MagOD, can be easily reproduced by providing the source of the 3D prints for the housing, electronic designs,… ▽ More We present a spectrophotometer (optical density meter) combined with electromagnets dedicated to the analysis of suspensions of magnetotactic bacteria. The instrument can also be applied to suspensions of other magnetic cells and magnetic particles. We have ensured that our system, called MagOD, can be easily reproduced by providing the source of the 3D prints for the housing, electronic designs, circuit board layouts, and microcontroller software. We compare the performance of our system to existing adapted commercial spectrophotometers. In addition, we demonstrate its use by analyzing the absorbance of magnetotactic bacteria as a function of their orientation with respect to the light path and their speed of reorientation after the field has been rotated by 90 degrees. We continuously monitored the development of a culture of magnetotactic bacteria over a period of five days, and measured the development of their velocity distribution over a period of one hour. Even though this dedicated spectrophotometer is relatively simple to construct and cost-effective, a range of magnetic field-dependent parameters can be extracted from suspensions of magnetotactic bacteria. Therefore, this instrument will help the magnetotactic research community to understand and apply this intriguing micro-organism. △ Less

Submitted 11 August, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2103.04673 [pdf, other]

doi 10.1145/3446372

Social Media Identity Deception Detection: A Survey

Authors: Ahmed Alharbi, Hai Dong, Xun Yi, Zahir Tari, Ibrahim Khalil

Abstract: Social media have been growing rapidly and become essential elements of many people's lives. Meanwhile, social media have also come to be a popular source for identity deception. Many social media identity deception cases have arisen over the past few years. Recent studies have been conducted to prevent and detect identity deception. This survey analyses various identity deception attacks, which c… ▽ More Social media have been growing rapidly and become essential elements of many people's lives. Meanwhile, social media have also come to be a popular source for identity deception. Many social media identity deception cases have arisen over the past few years. Recent studies have been conducted to prevent and detect identity deception. This survey analyses various identity deception attacks, which can be categorized into fake profile, identity theft and identity cloning. This survey provides a detailed review of social media identity deception detection techniques. It also identifies primary research challenges and issues in the existing detection techniques. This article is expected to benefit both researchers and social media providers. △ Less

Submitted 22 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Accepted for publication in ACM Computing Surveys

Journal ref: ACM Computing Surveys (CSUR), 54(3), 1-35 (2021)

arXiv:2012.13971 [pdf, other]

Time-Window Group-Correlation Support vs. Individual Features: A Detection of Abnormal Users

Authors: Lun-Pin Yuan, Euijin Choo, Ting Yu, Issa Khalil, Sencun Zhu

Abstract: Autoencoder-based anomaly detection methods have been used in identifying anomalous users from large-scale enterprise logs with the assumption that adversarial activities do not follow past habitual patterns. Most existing approaches typically build models by reconstructing single-day and individual-user behaviors. However, without capturing long-term signals and group-correlation signals, the mod… ▽ More Autoencoder-based anomaly detection methods have been used in identifying anomalous users from large-scale enterprise logs with the assumption that adversarial activities do not follow past habitual patterns. Most existing approaches typically build models by reconstructing single-day and individual-user behaviors. However, without capturing long-term signals and group-correlation signals, the models cannot identify low-signal yet long-lasting threats, and will wrongly report many normal users as anomalies on busy days, which, in turn, lead to high false positive rate. In this paper, we propose ACOBE, an Anomaly detection method based on COmpound BEhavior, which takes into consideration long-term patterns and group behaviors. ACOBE leverages a novel behavior representation and an ensemble of deep autoencoders and produces an ordered investigation list. Our evaluation shows that ACOBE outperforms prior work by a large margin in terms of precision and recall, and our case study demonstrates that ACOBE is applicable in practice for cyberattack detection. △ Less

Submitted 27 December, 2020; originally announced December 2020.

arXiv:2012.10063 [pdf]

Attention-Based LSTM Network for COVID-19 Clinical Trial Parsing

Authors: Xiong Liu, Luca A. Finelli, Greg L. Hersch, Iya Khalil

Abstract: COVID-19 clinical trial design is a critical task in developing therapeutics for the prevention and treatment of COVID-19. In this study, we apply a deep learning approach to extract eligibility criteria variables from COVID-19 trials to enable quantitative analysis of trial design and optimization. Specifically, we train attention-based bidirectional Long Short-Term Memory (Att-BiLSTM) models and… ▽ More COVID-19 clinical trial design is a critical task in developing therapeutics for the prevention and treatment of COVID-19. In this study, we apply a deep learning approach to extract eligibility criteria variables from COVID-19 trials to enable quantitative analysis of trial design and optimization. Specifically, we train attention-based bidirectional Long Short-Term Memory (Att-BiLSTM) models and use the optimal model to extract entities (i.e., variables) from the eligibility criteria of COVID-19 trials. We compare the performance of Att-BiLSTM with traditional ontology-based method. The result on a benchmark dataset shows that Att-BiLSTM outperforms the ontology model. Att-BiLSTM achieves a precision of 0.942, recall of 0.810, and F1 of 0.871, while the ontology model only achieves a precision of 0.715, recall of 0.659, and F1 of 0.686. Our analyses demonstrate that Att-BiLSTM is an effective approach for characterizing patient populations in COVID-19 clinical trials. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Journal ref: 2020 IEEE International Conference on Big Data (IEEE BigData 2020)

arXiv:2011.06933 [pdf, other]

Morshed: Guiding Behavioral Decision-Makers towards Better Security Investment in Interdependent Systems

Authors: Mustafa Abdallah, Daniel Woods, Parinaz Naghizadeh, Issa Khalil, Timothy Cason, Shreyas Sundaram, Saurabh Bagchi

Abstract: We model the behavioral biases of human decision-making in securing interdependent systems and show that such behavioral decision-making leads to a suboptimal pattern of resource allocation compared to non-behavioral (rational) decision-making. We provide empirical evidence for the existence of such behavioral bias model through a controlled subject study with 145 participants. We then propose thr… ▽ More We model the behavioral biases of human decision-making in securing interdependent systems and show that such behavioral decision-making leads to a suboptimal pattern of resource allocation compared to non-behavioral (rational) decision-making. We provide empirical evidence for the existence of such behavioral bias model through a controlled subject study with 145 participants. We then propose three learning techniques for enhancing decision-making in multi-round setups. We illustrate the benefits of our decision-making model through multiple interdependent real-world systems and quantify the level of gain compared to the case in which the defenders are behavioral. We also show the benefit of our learning techniques against different attack models. We identify the effects of different system parameters on the degree of suboptimality of security outcomes due to behavioral decision-making. △ Less

Submitted 22 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: Accepted to appear at the 16th ACM Asia Conference on Computer and Communications Security (ASIACCS), 2021. arXiv admin note: text overlap with arXiv:2004.01958

arXiv:2007.05817 [pdf, other]

ManiGen: A Manifold Aided Black-box Generator of Adversarial Examples

Authors: Guanxiong Liu, Issa Khalil, Abdallah Khreishah, Abdulelah Algosaibi, Adel Aldalbahi, Mohammed Alaneem, Abdulaziz Alhumam, Mohammed Anan

Abstract: Machine learning models, especially neural network (NN) classifiers, have acceptable performance and accuracy that leads to their wide adoption in different aspects of our daily lives. The underlying assumption is that these models are generated and used in attack free scenarios. However, it has been shown that neural network based classifiers are vulnerable to adversarial examples. Adversarial ex… ▽ More Machine learning models, especially neural network (NN) classifiers, have acceptable performance and accuracy that leads to their wide adoption in different aspects of our daily lives. The underlying assumption is that these models are generated and used in attack free scenarios. However, it has been shown that neural network based classifiers are vulnerable to adversarial examples. Adversarial examples are inputs with special perturbations that are ignored by human eyes while can mislead NN classifiers. Most of the existing methods for generating such perturbations require a certain level of knowledge about the target classifier, which makes them not very practical. For example, some generators require knowledge of pre-softmax logits while others utilize prediction scores. In this paper, we design a practical black-box adversarial example generator, dubbed ManiGen. ManiGen does not require any knowledge of the inner state of the target classifier. It generates adversarial examples by searching along the manifold, which is a concise representation of input data. Through extensive set of experiments on different datasets, we show that (1) adversarial examples generated by ManiGen can mislead standalone classifiers by being as successful as the state-of-the-art white-box generator, Carlini, and (2) adversarial examples generated by ManiGen can more effectively attack classifiers with state-of-the-art defenses. △ Less

Submitted 11 July, 2020; originally announced July 2020.

arXiv:2007.02013 [pdf, other]

doi 10.1016/j.comcom.2021.04.006

PPaaS: Privacy Preservation as a Service

Authors: Pathum Chamikara Mahawaga Arachchige, Peter Bertok, Ibrahim Khalil, Dongxi Liu, Seyit Camtepe

Abstract: Personally identifiable information (PII) can find its way into cyberspace through various channels, and many potential sources can leak such information. Data sharing (e.g. cross-agency data sharing) for machine learning and analytics is one of the important components in data science. However, due to privacy concerns, data should be enforced with strong privacy guarantees before sharing. Differe… ▽ More Personally identifiable information (PII) can find its way into cyberspace through various channels, and many potential sources can leak such information. Data sharing (e.g. cross-agency data sharing) for machine learning and analytics is one of the important components in data science. However, due to privacy concerns, data should be enforced with strong privacy guarantees before sharing. Different privacy-preserving approaches were developed for privacy preserving data sharing; however, identifying the best privacy-preservation approach for the privacy-preservation of a certain dataset is still a challenge. Different parameters can influence the efficacy of the process, such as the characteristics of the input dataset, the strength of the privacy-preservation approach, and the expected level of utility of the resulting dataset (on the corresponding data mining application such as classification). This paper presents a framework named \underline{P}rivacy \underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice (PPaaS) to reduce this complexity. The proposed method employs selective privacy preservation via data perturbation and looks at different dynamics that can influence the quality of the privacy preservation of a dataset. PPaaS includes pools of data perturbation methods, and for each application and the input dataset, PPaaS selects the most suitable data perturbation approach after rigorous evaluation. It enhances the usability of privacy-preserving methods within its pool; it is a generic platform that can be used to sanitize big data in a granular, application-specific manner by employing a suitable combination of diverse privacy-preserving algorithms to provide a proper balance between privacy and utility. △ Less

Submitted 21 April, 2021; v1 submitted 4 July, 2020; originally announced July 2020.

arXiv:2006.03208 [pdf, other]

Can the Multi-Incoming Smart Meter Compressed Streams be Re-Compressed?

Authors: Sharif Abuadbba, Ayman Ibaida, Ibrahim Khalil, Naveen Chilamkurti, Surya Nepal, Xinghuo Yu

Abstract: Smart meters have currently attracted attention because of their high efficiency and throughput performance. They transmit a massive volume of continuously collected waveform readings (e.g. monitoring). Although many compression models are proposed, the unexpected size of these compressed streams required endless storage and management space which poses a unique challenge. Therefore, this paper ex… ▽ More Smart meters have currently attracted attention because of their high efficiency and throughput performance. They transmit a massive volume of continuously collected waveform readings (e.g. monitoring). Although many compression models are proposed, the unexpected size of these compressed streams required endless storage and management space which poses a unique challenge. Therefore, this paper explores the question of can the compressed smart meter readings be re-compressed? We first investigate the applicability of re-applying general compression algorithms directly on compressed streams. The results were poor due to the lack of redundancy. We further propose a novel technique to enhance the theoretical entropy and exploit that to re-compress. This is successfully achieved by using unsupervised learning as a similarity measurement to cluster the compressed streams into subgroups. The streams in every subgroup have been interleaved, followed by the first derivative to minimize the values and increase the redundancy. After that, two rotation steps have been applied to rearrange the readings in a more consecutive format before applying a developed dynamic run length. Finally, entropy coding is performed. Both mathematical and empirical experiments proved the significant improvement of the compressed streams entropy (i.e. almost reduced by half) and the resultant compression ratio (i.e. up to 50%). △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 8 pages. Submitted to IEEE Transaction on Smart Grid

arXiv:2005.10486 [pdf, other]

doi 10.1016/j.cose.2020.101951

Privacy Preserving Face Recognition Utilizing Differential Privacy

Authors: M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe

Abstract: Facial recognition technologies are implemented in many areas, including but not limited to, citizen surveillance, crime control, activity monitoring, and facial expression evaluation. However, processing biometric information is a resource-intensive task that often involves third-party servers, which can be accessed by adversaries with malicious intent. Biometric information delivered to untruste… ▽ More Facial recognition technologies are implemented in many areas, including but not limited to, citizen surveillance, crime control, activity monitoring, and facial expression evaluation. However, processing biometric information is a resource-intensive task that often involves third-party servers, which can be accessed by adversaries with malicious intent. Biometric information delivered to untrusted third-party servers in an uncontrolled manner can be considered a significant privacy leak (i.e. uncontrolled information release) as biometrics can be correlated with sensitive data such as healthcare or financial records. In this paper, we propose a privacy-preserving technique for "controlled information release", where we disguise an original face image and prevent leakage of the biometric features while identifying a person. We introduce a new privacy-preserving face recognition protocol named PEEP (Privacy using EigEnface Perturbation) that utilizes local differential privacy. PEEP applies perturbation to Eigenfaces utilizing differential privacy and stores only the perturbed data in the third-party servers to run a standard Eigenface recognition algorithm. As a result, the trained model will not be vulnerable to privacy attacks such as membership inference and model memorization attacks. Our experiments show that PEEP exhibits a classification accuracy of around 70% - 90% under standard privacy settings. △ Less

Submitted 4 July, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

arXiv:2004.12108 [pdf, other]

doi 10.1016/j.comcom.2021.02.014

Privacy Preserving Distributed Machine Learning with Federated Learning

Authors: M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe

Abstract: Edge computing and distributed machine learning have advanced to a level that can revolutionize a particular organization. Distributed devices such as the Internet of Things (IoT) often produce a large amount of data, eventually resulting in big data that can be vital in uncovering hidden patterns, and other insights in numerous fields such as healthcare, banking, and policing. Data related to are… ▽ More Edge computing and distributed machine learning have advanced to a level that can revolutionize a particular organization. Distributed devices such as the Internet of Things (IoT) often produce a large amount of data, eventually resulting in big data that can be vital in uncovering hidden patterns, and other insights in numerous fields such as healthcare, banking, and policing. Data related to areas such as healthcare and banking can contain potentially sensitive data that can become public if they are not appropriately sanitized. Federated learning (FedML) is a recently developed distributed machine learning (DML) approach that tries to preserve privacy by bringing the learning of an ML model to data owners'. However, literature shows different attack methods such as membership inference that exploit the vulnerabilities of ML models as well as the coordinating servers to retrieve private data. Hence, FedML needs additional measures to guarantee data privacy. Furthermore, big data often requires more resources than available in a standard computer. This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB alleviates computational bottlenecks by distributing the task of privacy preservation utilizing the asymmetry of resources of a distributed environment, which can have resource-constrained devices as well as high-performance computers. Experiments show that DISTPAB provides high accuracy, high efficiency, high scalability, and high attack resistance. Further experiments on privacy-preserving FedML show that DISTPAB is an excellent solution to stop privacy leaks in DML while preserving high data utility. △ Less

Submitted 25 February, 2021; v1 submitted 25 April, 2020; originally announced April 2020.

arXiv:2004.01958 [pdf, other]

BASCPS: How does behavioral decision making impact the security of cyber-physical systems?

Authors: Mustafa Abdallah, Daniel Woods, Parinaz Naghizadeh, Issa Khalil, Timothy Cason, Shreyas Sundaram, Saurabh Bagchi

Abstract: We study the security of large-scale cyber-physical systems (CPS) consisting of multiple interdependent subsystems, each managed by a different defender. Defenders invest their security budgets with the goal of thwarting the spread of cyber attacks to their critical assets. We model the security investment decisions made by the defenders as a security game. While prior work has used security games… ▽ More We study the security of large-scale cyber-physical systems (CPS) consisting of multiple interdependent subsystems, each managed by a different defender. Defenders invest their security budgets with the goal of thwarting the spread of cyber attacks to their critical assets. We model the security investment decisions made by the defenders as a security game. While prior work has used security games to analyze such scenarios, we propose behavioral security games, in which defenders exhibit characteristics of human decision making that have been identified in behavioral economics as representing typical human cognitive biases. This is important as many of the critical security decisions in our target class of systems are made by humans. We provide empirical evidence for our behavioral model through a controlled subject experiment. We then show that behavioral decision making leads to a suboptimal pattern of resource allocation compared to non-behavioral decision making. We illustrate the effects of behavioral decision making using two representative real-world interdependent CPS. In particular, we identify the effects of the defenders' security budget availability and distribution, the degree of interdependency among defenders, and collaborative defense strategies, on the degree of suboptimality of security outcomes due to behavioral decision making. In this context, the adverse effects of behavioral decision making are most severe with moderate defense budgets. Moreover, the impact of behavioral suboptimal decision making is magnified as the degree of the interdependency between subnetworks belonging to different defenders increases. We also observe that selfish defense decisions together with behavioral decisions significantly increase security risk. △ Less

Submitted 7 April, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

Comments: 32 pages

arXiv:2003.13721 [pdf, other]

Amharic Abstractive Text Summarization

Authors: Amr M. Zaki, Mahmoud I. Khalil, Hazem M. Abbas

Abstract: Text Summarization is the task of condensing long text into just a handful of sentences. Many approaches have been proposed for this task, some of the very first were building statistical models (Extractive Methods) capable of selecting important words and copying them to the output, however these models lacked the ability to paraphrase sentences, as they simply select important words without actu… ▽ More Text Summarization is the task of condensing long text into just a handful of sentences. Many approaches have been proposed for this task, some of the very first were building statistical models (Extractive Methods) capable of selecting important words and copying them to the output, however these models lacked the ability to paraphrase sentences, as they simply select important words without actually understanding their contexts nor understanding their meaning, here comes the use of Deep Learning based architectures (Abstractive Methods), which effectively tries to understand the meaning of sentences to build meaningful summaries. In this work we discuss one of these new novel approaches which combines curriculum learning with Deep Learning, this model is called Scheduled Sampling. We apply this work to one of the most widely spoken African languages which is the Amharic Language, as we try to enrich the African NLP community with top-notch Deep Learning architectures. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: content 3 pages, reference 2 pages, 2 figures, presented to AfricaNLP workshop ICLR 2020

arXiv:2002.09632 [pdf, other]

Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples

Authors: Guanxiong Liu, Issa Khalil, Abdallah Khreishah

Abstract: Adversarial examples have become one of the largest challenges that machine learning models, especially neural network classifiers, face. These adversarial examples break the assumption of attack-free scenario and fool state-of-the-art (SOTA) classifiers with insignificant perturbations to human. So far, researchers achieved great progress in utilizing adversarial training as a defense. However, t… ▽ More Adversarial examples have become one of the largest challenges that machine learning models, especially neural network classifiers, face. These adversarial examples break the assumption of attack-free scenario and fool state-of-the-art (SOTA) classifiers with insignificant perturbations to human. So far, researchers achieved great progress in utilizing adversarial training as a defense. However, the overwhelming computational cost degrades its applicability and little has been done to overcome this issue. Single-Step adversarial training methods have been proposed as computationally viable solutions, however they still fail to defend against iterative adversarial examples. In this work, we first experimentally analyze several different SOTA defense methods against adversarial examples. Then, based on observations from experiments, we propose a novel single-step adversarial training method which can defend against both single-step and iterative adversarial examples. Lastly, through extensive evaluations, we demonstrate that our proposed method outperforms the SOTA single-step and iterative adversarial training defense. Compared with ATDA (single-step method) on CIFAR10 dataset, our proposed method achieves 35.67% enhancement in test accuracy and 19.14% reduction in training time. When compared with methods that use BIM or Madry examples (iterative methods) on CIFAR10 dataset, it saves up to 76.03% in training time with less than 3.78% degeneration in test accuracy. △ Less

Submitted 27 February, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

arXiv:1911.12080 [pdf, other]

DeviceWatch: Identifying Compromised Mobile Devices through Network Traffic Analysis and Graph Inference

Authors: Euijin Choo, Mohamed Nabeel, Mashael Alsabah, Issa Khalil, Ting Yu, Wei Wang

Abstract: In this paper, we propose to identify compromised mobile devices from a network administrator's point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often allured to install malicious apps through in-app advertisement or phishing. We thus hypothesize that devices sharing a similar set of apps will have a similar probability of b… ▽ More In this paper, we propose to identify compromised mobile devices from a network administrator's point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often allured to install malicious apps through in-app advertisement or phishing. We thus hypothesize that devices sharing a similar set of apps will have a similar probability of being compromised, resulting in the association between a device being compromised and apps in the device. Our goal is to leverage such associations to identify unknown compromised devices (i.e., devices possibly having yet currently not having known malicious apps) using the guilt-by-association principle. Admittedly, such associations could be quite weak as it is often hard, if not impossible, for an app to automatically download and install other apps without explicit initiation from a user. We describe how we can magnify such weak associations between devices and apps by carefully choosing parameters when applying graph-based inferences. We empirically show the effectiveness of our approach with a comprehensive study on the mobile network traffic provided by a major mobile service provider. Concretely, we achieve nearly 98\% accuracy in terms of AUC (area under the ROC curve). Given the relatively weak nature of association, we further conduct in-depth analysis of the different behavior of a graph-inference approach, by comparing it to active DNS data. Moreover, we validate our results by showing that detected compromised devices indeed present undesirable behavior in terms of their privacy leakage and network infrastructure accessed. △ Less

Submitted 27 November, 2019; originally announced November 2019.

arXiv:1911.00604 [pdf, ps, other]

IoTSign: Protecting Privacy and Authenticity of IoT using Discrete Cosine Based Steganography

Authors: Sharif Abuadbba, Ayman Ibaida, Ibrahim Khalil

Abstract: Remotely generated data by Intent of Things (IoT) has recently had a lot of attention for their huge benefits such as efficient monitoring and risk reduction. The transmitted streams usually consist of periodical streams (e.g. activities) and highly private information (e.g. IDs). Despite the obvious benefits, the concerns are the secrecy and the originality of the transferred data. Surprisingly,… ▽ More Remotely generated data by Intent of Things (IoT) has recently had a lot of attention for their huge benefits such as efficient monitoring and risk reduction. The transmitted streams usually consist of periodical streams (e.g. activities) and highly private information (e.g. IDs). Despite the obvious benefits, the concerns are the secrecy and the originality of the transferred data. Surprisingly, although these concerns have been well studied for static data, they have received only limited attention for streaming data. Therefore, this paper introduces a new steganographic mechanism that provides (1) robust privacy protection of secret information by concealing them arbitrarily in the transported readings employing a random key, and (2) permanent proof of originality for the normal streams. This model surpasses our previous works by employing the Discrete Cosine Transform to expand the hiding capacity and reduce complexity. The resultant distortion has been accurately measured at all stages - the original, the stego, and the recovered forms - using a well-known measurement matrix called Percentage Residual Difference (PRD). After thorough experiments on three types of streams (i.e. chemical, environmental and smart homes), it has been proven that the original streams have not been affected (< 1 %). Also, the mathematical analysis shows that the model has much lighter (i.e. linear) computational complexity O(n) compared to existing work. △ Less

Submitted 1 April, 2022; v1 submitted 1 November, 2019; originally announced November 2019.

Comments: 12 pages

arXiv:1908.02997 [pdf, other]

doi 10.1109/JIOT.2019.2952146

Local Differential Privacy for Deep Learning

Authors: M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe, M. Atiquzzaman

Abstract: The internet of things (IoT) is transforming major industries including but not limited to healthcare, agriculture, finance, energy, and transportation. IoT platforms are continually improving with innovations such as the amalgamation of software-defined networks (SDN) and network function virtualization (NFV) in the edge-cloud interplay. Deep learning (DL) is becoming popular due to its remarkabl… ▽ More The internet of things (IoT) is transforming major industries including but not limited to healthcare, agriculture, finance, energy, and transportation. IoT platforms are continually improving with innovations such as the amalgamation of software-defined networks (SDN) and network function virtualization (NFV) in the edge-cloud interplay. Deep learning (DL) is becoming popular due to its remarkable accuracy when trained with a massive amount of data, such as generated by IoT. However, DL algorithms tend to leak privacy when trained on highly sensitive crowd-sourced data such as medical data. Existing privacy-preserving DL algorithms rely on the traditional server-centric approaches requiring high processing powers. We propose a new local differentially private (LDP) algorithm named LATENT that redesigns the training process. LATENT enables a data owner to add a randomization layer before data leave the data owners' devices and reach a potentially untrusted machine learning service. This feature is achieved by splitting the architecture of a convolutional neural network (CNN) into three layers: (1) convolutional module, (2) randomization module, and (3) fully connected module. Hence, the randomization module can operate as an NFV privacy preservation service in an SDN-controlled NFV, making LATENT more practical for IoT-driven cloud-based environments compared to existing approaches. The randomization module employs a newly proposed LDP protocol named utility enhancing randomization, which allows LATENT to maintain high utility compared to existing LDP protocols. Our experimental evaluation of LATENT on convolutional deep neural networks demonstrates excellent accuracy (e.g. 91%- 96%) with high model quality even under low privacy budgets (e.g. $\varepsilon=0.5$). △ Less

Submitted 9 November, 2019; v1 submitted 8 August, 2019; originally announced August 2019.

Showing 1–50 of 69 results for author: Khalil, I