Cryptography and Security
See recent articles
- [1] arXiv:2409.12314 [pdf, html, other]
-
Title: Understanding Implosion in Text-to-Image Generative ModelsComments: ACM CCS 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Recent works show that text-to-image generative models are surprisingly vulnerable to a variety of poisoning attacks. Empirical results find that these models can be corrupted by altering associations between individual text prompts and associated visual features. Furthermore, a number of concurrent poisoning attacks can induce "model implosion," where the model becomes unable to produce meaningful images for unpoisoned prompts. These intriguing findings highlight the absence of an intuitive framework to understand poisoning attacks on these models. In this work, we establish the first analytical framework on robustness of image generative models to poisoning attacks, by modeling and analyzing the behavior of the cross-attention mechanism in latent diffusion models. We model cross-attention training as an abstract problem of "supervised graph alignment" and formally quantify the impact of training data by the hardness of alignment, measured by an Alignment Difficulty (AD) metric. The higher the AD, the harder the alignment. We prove that AD increases with the number of individual prompts (or concepts) poisoned. As AD grows, the alignment task becomes increasingly difficult, yielding highly distorted outcomes that frequently map meaningful text prompts to undefined or meaningless visual representations. As a result, the generative model implodes and outputs random, incoherent images at large. We validate our analytical framework through extensive experiments, and we confirm and explain the unexpected (and unexplained) effect of model implosion while producing new, unforeseen insights. Our work provides a useful tool for studying poisoning attacks against diffusion models and their defenses.
- [2] arXiv:2409.12331 [pdf, html, other]
-
Title: FuzzEval: Assessing Fuzzers on Generating Context-Sensitive InputsSubjects: Cryptography and Security (cs.CR)
Cryptographic protocols form the backbone of modern security systems, yet vulnerabilities persist within their implementations. Traditional testing techniques, including fuzzing, have struggled to effectively identify vulnerabilities in cryptographic libraries due to their reliance on context-sensitive inputs. This paper presents a comprehensive evaluation of eleven state-of-the-art fuzzers' ability to generate context-sensitive inputs for testing a cryptographic standard, PKCS#1-v1.5, across thirteen implementations. Our study reveals nuanced performance differences among the fuzzers in terms of the validity and diversity of the produced inputs. This investigation underscores the limitations of existing fuzzers in handling context-sensitive inputs. These findings are expected to drive further research and development in this area.
- [3] arXiv:2409.12341 [pdf, html, other]
-
Title: Provable Privacy Guarantee for Individual Identities and Locations in Large-Scale Contact TracingSubjects: Cryptography and Security (cs.CR)
The task of infectious disease contact tracing is crucial yet challenging, especially when meeting strict privacy requirements. Previous attempts in this area have had limitations in terms of applicable scenarios and efficiency. Our paper proposes a highly scalable, practical contact tracing system called PREVENT that can work with a variety of location collection methods to gain a comprehensive overview of a person's trajectory while ensuring the privacy of individuals being tracked, without revealing their plain text locations to any party, including servers. Our system is very efficient and can provide real-time query services for large-scale datasets with millions of locations. This is made possible by a newly designed secret-sharing based architecture that is tightly integrated into unique private space partitioning trees. Notably, our experimental results on both real and synthetic datasets demonstrate that our system introduces negligible performance overhead compared to traditional contact tracing methods. PREVENT could be a game-changer in the fight against infectious diseases and set a new standard for privacy-preserving location tracking.
- [4] arXiv:2409.12472 [pdf, html, other]
-
Title: TEAM: Temporal Adversarial Examples Attack Model against Network Intrusion Detection System Applied to RNNSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
With the development of artificial intelligence, neural networks play a key role in network intrusion detection systems (NIDS). Despite the tremendous advantages, neural networks are susceptible to adversarial attacks. To improve the reliability of NIDS, many research has been conducted and plenty of solutions have been proposed. However, the existing solutions rarely consider the adversarial attacks against recurrent neural networks (RNN) with time steps, which would greatly affect the application of NIDS in real world. Therefore, we first propose a novel RNN adversarial attack model based on feature reconstruction called \textbf{T}emporal adversarial \textbf{E}xamples \textbf{A}ttack \textbf{M}odel \textbf{(TEAM)}, which applied to time series data and reveals the potential connection between adversarial and time steps in RNN. That is, the past adversarial examples within the same time steps can trigger further attacks on current or future original examples. Moreover, TEAM leverages Time Dilation (TD) to effectively mitigates the effect of temporal among adversarial examples within the same time steps. Experimental results show that in most attack categories, TEAM improves the misjudgment rate of NIDS on both black and white boxes, making the misjudgment rate reach more than 96.68%. Meanwhile, the maximum increase in the misjudgment rate of the NIDS for subsequent original samples exceeds 95.57%.
- [5] arXiv:2409.12553 [pdf, html, other]
-
Title: Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and MitigationsComments: 13 pages, 12 figures, 6 tablesSubjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Thanks to the popularisation of transformer-based models, speech recognition (SR) is gaining traction in various application fields, such as industrial and robotics environments populated with mission-critical devices. While transformer-based SR can provide various benefits for simplifying human-machine interfacing, the research on the cybersecurity aspects of these models is lacklustre. In particular, concerning backdoor poisoning attacks. In this paper, we propose a new poisoning approach that maps different environmental trigger sounds to target phrases of different lengths, during the fine-tuning phase. We test our approach on Whisper, one of the most popular transformer-based SR model, showing that it is highly vulnerable to our attack, under several testing conditions. To mitigate the attack proposed in this paper, we investigate the use of Silero VAD, a state-of-the-art voice activity detection (VAD) model, as a defence mechanism. Our experiments show that it is possible to use VAD models to filter out malicious triggers and mitigate our attacks, with a varying degree of success, depending on the type of trigger sound and testing conditions.
- [6] arXiv:2409.12701 [pdf, html, other]
-
Title: An Empirical Study on the Distance Metric in Guiding Directed Grey-box FuzzingSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Directed grey-box fuzzing (DGF) aims to discover vulnerabilities in specific code areas efficiently. Distance metric, which is used to measure the quality of seed in DGF, is a crucial factor in affecting the fuzzing performance. Despite distance metrics being widely applied in existing DGF frameworks, it remains opaque about how different distance metrics guide the fuzzing process and affect the fuzzing result in practice. In this paper, we conduct the first empirical study to explore how different distance metrics perform in guiding DGFs. Specifically, we systematically discuss different distance metrics in the aspect of calculation method and granularity. Then, we implement different distance metrics based on AFLGo. On this basis, we conduct comprehensive experiments to evaluate the performance of these distance metrics on the benchmarks widely used in existing DGF-related work. The experimental results demonstrate the following insights. First, the difference among different distance metrics with varying methods of calculation and granularities is not significant. Second, the distance metrics may not be effective in describing the difficulty of triggering the target vulnerability. In addition, by scrutinizing the quality of testcases, our research highlights the inherent limitation of existing mutation strategies in generating high-quality testcases, calling for designing effective mutation strategies for directed fuzzing. We open-source the implementation code and experiment dataset to facilitate future research in DGF.
- [7] arXiv:2409.12765 [pdf, html, other]
-
Title: Towards AI-enabled Cyber Threat Assessment in the Health SectorComments: 10 pages (including references), 5 figures, 2 tables, cs.CRSubjects: Cryptography and Security (cs.CR)
Cyber attacks on the healthcare industry can have tremendous consequences and the attack surface expands continuously. In order to handle the steadily rising workload, an expanding amount of analog processes in healthcare institutions is digitized. Despite regulations becoming stricter, not all existing infrastructure is sufficiently protected against cyber attacks. With an increasing number of devices and digital processes, the system and network landscape becomes more complex and harder to manage and therefore also more difficult to protect. The aim of this project is to introduce an AI-enabled platform that collects security relevant information from the outside of a health organization, analyzes it, delivers a risk score and supports decision makers in healthcare institutions to optimize investment choices for security measures. Therefore, an architecture of such a platform is designed, relevant information sources are identified, and AI methods for relevant data collection, selection, and risk scoring are explored.
- [8] arXiv:2409.12850 [pdf, other]
-
Title: USBIPS Framework: Protecting Hosts from Malicious USB PeripheralsComments: Under review by Computer Standards & InterfacesSubjects: Cryptography and Security (cs.CR)
USB-based attacks have increased in complexity in recent years. Modern attacks incorporate a wide range of attack vectors, from social engineering to signal injection. The security community is addressing these challenges using a growing set of fragmented defenses. Regardless of the vector of a USB-based attack, the most important risks concerning most people and enterprises are service crashes and data loss. The host OS manages USB peripherals, and malicious USB peripherals, such as those infected with BadUSB, can crash a service or steal data from the OS. Although USB firewalls have been proposed to thwart malicious USB peripherals, such as USBFilter and USBGuard, they cannot prevent real-world intrusions. This paper focuses on building a security framework called USBIPS within OSs to defend against malicious USB peripherals. This includes major efforts to explore the nature of malicious behavior and build persistent protection from USB-based intrusions. We first present a behavior-based detection mechanism focusing on attacks integrated into USB peripherals. We then introduce the novel idea of an allowlisting-based method for USB access control. We finally develop endpoint detection and response system to build the first generic security framework that thwarts USB-based intrusion. Within a centralized threat analysis framework, it provides persistent protection and may detect unknown malicious behavior. By addressing key security and performance challenges, these efforts help modern OSs against attacks from untrusted USB peripherals.
- [9] arXiv:2409.12882 [pdf, html, other]
-
Title: On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine AttacksComments: To appear in Proceedings of the 22nd International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt 2024)Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
In this paper, we study a fully-decentralized multi-agent policy evaluation problem, which is an important sub-problem in cooperative multi-agent reinforcement learning, in the presence of up to $f$ faulty agents. In particular, we focus on the so-called Byzantine faulty model with model poisoning setting. In general, policy evaluation is to evaluate the value function of any given policy. In cooperative multi-agent system, the system-wide rewards are usually modeled as the uniform average of rewards from all agents. We investigate the multi-agent policy evaluation problem in the presence of Byzantine agents, particularly in the setting of heterogeneous local rewards. Ideally, the goal of the agents is to evaluate the accumulated system-wide rewards, which are uniform average of rewards of the normal agents for a given policy. It means that all agents agree upon common values (the consensus part) and furthermore, the consensus values are the value functions (the convergence part). However, we prove that this goal is not achievable. Instead, we consider a relaxed version of the problem, where the goal of the agents is to evaluate accumulated system-wide reward, which is an appropriately weighted average reward of the normal agents. We further prove that there is no correct algorithm that can guarantee that the total number of positive weights exceeds $|\mathcal{N}|-f $, where $|\mathcal{N}|$ is the number of normal agents. Towards the end, we propose a Byzantine-tolerant decentralized temporal difference algorithm that can guarantee asymptotic consensus under scalar function approximation. We then empirically test the effective of the proposed algorithm.
- [10] arXiv:2409.12884 [pdf, html, other]
-
Title: Hypersphere Secure Sketch Revisited: Probabilistic Linear Regression Attack on IronMask in Multiple UsageSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Protection of biometric templates is a critical and urgent area of focus. IronMask demonstrates outstanding recognition performance while protecting facial templates against existing known attacks. In high-level, IronMask can be conceptualized as a fuzzy commitment scheme building on the hypersphere directly. We devise an attack on IronMask targeting on the security notion of renewability. Our attack, termed as Probabilistic Linear Regression Attack, utilizes the linearity of underlying used error correcting code. This attack is the first algorithm to successfully recover the original template when getting multiple protected templates in acceptable time and requirement of storage. We implement experiments on IronMask applied to protect ArcFace that well verify the validity of our attacks. Furthermore, we carry out experiments in noisy environments and confirm that our attacks are still applicable. Finally, we put forward two strategies to mitigate this type of attacks.
New submissions (showing 10 of 10 entries)
- [11] arXiv:2409.12367 (cross-list from cs.LG) [pdf, html, other]
-
Title: Extracting Memorized Training Data via DecompositionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.
- [12] arXiv:2409.12384 (cross-list from cs.LG) [pdf, html, other]
-
Title: Privacy-Preserving Student Learning with Differentially Private Data-Free DistillationComments: Published by IEEE MMSP 2022Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Deep learning models can achieve high inference accuracy by extracting rich knowledge from massive well-annotated data, but may pose the risk of data privacy leakage in practical deployment. In this paper, we present an effective teacher-student learning approach to train privacy-preserving deep learning models via differentially private data-free distillation. The main idea is generating synthetic data to learn a student that can mimic the ability of a teacher well-trained on private data. In the approach, a generator is first pretrained in a data-free manner by incorporating the teacher as a fixed discriminator. With the generator, massive synthetic data can be generated for model training without exposing data privacy. Then, the synthetic data is fed into the teacher to generate private labels. Towards this end, we propose a label differential privacy algorithm termed selective randomized response to protect the label information. Finally, a student is trained on the synthetic data with the supervision of private labels. In this way, both data privacy and label privacy are well protected in a unified framework, leading to privacy-preserving models. Extensive experiments and analysis clearly demonstrate the effectiveness of our approach.
- [13] arXiv:2409.12572 (cross-list from cs.NI) [pdf, html, other]
-
Title: Scalable and Robust Mobile Activity Fingerprinting via Over-the-Air Control Channel in 5G NetworksComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
5G has undergone significant changes in its over-the-air control channel architecture compared to legacy networks, aimed at enhancing performance. These changes have unintentionally strengthened the security of control channels, reducing vulnerabilities in radio channels for attackers. However, based on our experimental results, less than 10% of Physical Downlink Control Channel (PDCCH) messages could be decoded using sniffers. We demonstrate that even with this limited data, cell scanning and targeted user mobile activity tracking are feasible. This privacy attack exposes the number of active communication channels and reveals the mobile applications and their usage time. We propose an efficient deep learning-based mobile traffic classification method that eliminates the need for manual feature extraction, enabling scalability across various applications while maintaining high performance even in scenarios with data loss. We evaluated the effectiveness of our approach using both an open-source testbed and a commercial 5G testbed, demonstrating the feasibility of mobile activity fingerprinting and targeted attacks. To the best of our knowledge, this is the first study to track mobile activity over-the-air using PDCCH messages.
- [14] arXiv:2409.12575 (cross-list from cs.LG) [pdf, html, other]
-
Title: Deep Transfer Hashing for Adaptive Learning on Federated Streaming DataComments: Presented at ECML2024: 8th Intl. Worksh. and Tutorial on Interactive Adaptive Learning, Sep. 9th, 2024, Vilnius, LithuaniaSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
This extended abstract explores the integration of federated learning with deep transfer hashing for distributed prediction tasks, emphasizing resource-efficient client training from evolving data streams. Federated learning allows multiple clients to collaboratively train a shared model while maintaining data privacy - by incorporating deep transfer hashing, high-dimensional data can be converted into compact hash codes, reducing data transmission size and network loads. The proposed framework utilizes transfer learning, pre-training deep neural networks on a central server, and fine-tuning on clients to enhance model accuracy and adaptability. A selective hash code sharing mechanism using a privacy-preserving global memory bank further supports client fine-tuning. This approach addresses challenges in previous research by improving computational efficiency and scalability. Practical applications include Car2X event predictions, where a shared model is collectively trained to recognize traffic patterns, aiding in tasks such as traffic density assessment and accident detection. The research aims to develop a robust framework that combines federated learning, deep transfer hashing and transfer learning for efficient and secure downstream task execution.
- [15] arXiv:2409.12651 (cross-list from cs.IR) [pdf, other]
-
Title: A Deep Dive into Fairness, Bias, Threats, and Privacy in Recommender Systems: Insights and Future ResearchComments: 38 pages, 6 figuresSubjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
Recommender systems are essential for personalizing digital experiences on e-commerce sites, streaming services, and social media platforms. While these systems are necessary for modern digital interactions, they face fairness, bias, threats, and privacy challenges. Bias in recommender systems can result in unfair treatment of specific users and item groups, and fairness concerns demand that recommendations be equitable for all users and items. These systems are also vulnerable to various threats that compromise reliability and security. Furthermore, privacy issues arise from the extensive use of personal data, making it crucial to have robust protection mechanisms to safeguard user information. This study explores fairness, bias, threats, and privacy in recommender systems. It examines how algorithmic decisions can unintentionally reinforce biases or marginalize specific user and item groups, emphasizing the need for fair recommendation strategies. The study also looks at the range of threats in the form of attacks that can undermine system integrity and discusses advanced privacy-preserving techniques. By addressing these critical areas, the study highlights current limitations and suggests future research directions to improve recommender systems' robustness, fairness, and privacy. Ultimately, this research aims to help develop more trustworthy and ethical recommender systems that better serve diverse user populations.
- [16] arXiv:2409.12726 (cross-list from cs.NI) [pdf, html, other]
-
Title: Cloudy with a Chance of Anomalies: Dynamic Graph Neural Network for Early Detection of Cloud Services' User AnomaliesSubjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Ensuring the security of cloud environments is imperative for sustaining organizational growth and operational efficiency. As the ubiquity of cloud services continues to rise, the inevitability of cyber threats underscores the importance of preemptive detection. This paper introduces a pioneering time-based embedding approach for Cloud Services Graph-based Anomaly Detection (CS-GAD), utilizing a Graph Neural Network (GNN) to discern anomalous user behavior during interactions with cloud services. Our method employs a dynamic tripartite graph representation to encapsulate the evolving interactions among cloud services, users, and their activities over time. Leveraging GNN models in each time frame, our approach generates a graph embedding wherein each user is assigned a score based on their historical activity, facilitating the identification of unusual behavior. Results demonstrate a notable reduction in false positive rates (2-9%) compared to prevailing methods, coupled with a commendable true positive rate (100%). The contributions of this work encompass early detection capabilities, a low false positive rate, an innovative tripartite graph representation incorporating action types, the introduction of a new cloud services dataset featuring various user attacks, and an open-source implementation for community collaboration in advancing cloud service security.
Cross submissions (showing 6 of 6 entries)
- [17] arXiv:2304.09603 (replaced) [pdf, html, other]
-
Title: Visualising Personal Data Flows: Insights from a Case Study of Booking.comComments: This is the full edition of a paper published in Intelligent Information Systems: CAiSE Forum 2023, Zaragoza, Spain, June 12-16, 2023, Proceedings, Lecture Notes in Business Information Processing (LNBIP), Volume 477, pp. 52-60, 2023, Springer Nature, this https URLJournal-ref: Lecture Notes in Business Information Processing (LNBIP), 2023Subjects: Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
Commercial organisations are holding and processing an ever-increasing amount of personal data. Policies and laws are continually changing to require these companies to be more transparent regarding the collection, storage, processing and sharing of this data. This paper reports our work of taking this http URL as a case study to visualise personal data flows extracted from their privacy policy. By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policies to inform online users about the true scale and the landscape of personal data flows. This case study can inform us about future research on more data flow-oriented privacy policy analysis and on the construction of a more comprehensive ontology on personal data flows in complicated business ecosystems.
- [18] arXiv:2311.00207 (replaced) [pdf, html, other]
-
Title: Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication SystemsComments: Accepted at NDSS 2025Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Machine Learning (ML) has been instrumental in enabling joint transceiver optimization by merging all physical layer blocks of the end-to-end wireless communication systems. Although there have been a number of adversarial attacks on ML-based wireless systems, the existing methods do not provide a comprehensive view including multi-modality of the source data, common physical layer protocols, and wireless domain constraints. This paper proposes Magmaw, a novel wireless attack methodology capable of generating universal adversarial perturbations for any multimodal signal transmitted over a wireless channel. We further introduce new objectives for adversarial attacks on downstream applications. We adopt the widely-used defenses to verify the resilience of Magmaw. For proof-of-concept evaluation, we build a real-time wireless attack platform using a software-defined radio system. Experimental results demonstrate that Magmaw causes significant performance degradation even in the presence of strong defense mechanisms. Furthermore, we validate the performance of Magmaw in two case studies: encrypted communication channel and channel modality-based ML model.
- [19] arXiv:2312.04356 (replaced) [pdf, other]
-
Title: NeuJeans: Private Neural Network Inference with Joint Optimization of Convolution and BootstrappingComments: 15 pages, 6 figuresSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Fully homomorphic encryption (FHE) is a promising cryptographic primitive for realizing private neural network inference (PI) services by allowing a client to fully offload the inference task to a cloud server while keeping the client data oblivious to the server. This work proposes NeuJeans, an FHE-based solution for the PI of deep convolutional neural networks (CNNs). NeuJeans tackles the critical problem of the enormous computational cost for the FHE evaluation of CNNs. We introduce a novel encoding method called Coefficients-in-Slot (CinS) encoding, which enables multiple convolutions in one HE multiplication without costly slot permutations. We further observe that CinS encoding is obtained by conducting the first several steps of the Discrete Fourier Transform (DFT) on a ciphertext in conventional Slot encoding. This property enables us to save the conversion between CinS and Slot encodings as bootstrapping a ciphertext starts with DFT. Exploiting this, we devise optimized execution flows for various two-dimensional convolution (conv2d) operations and apply them to end-to-end CNN implementations. NeuJeans accelerates the performance of conv2d-activation sequences by up to 5.68 times compared to state-of-the-art FHE-based PI work and performs the PI of a CNN at the scale of ImageNet within a mere few seconds.
- [20] arXiv:2401.01568 (replaced) [pdf, html, other]
-
Title: A Survey of Protocol FuzzingXiaohan Zhang, Cen Zhang, Xinghua Li, Zhengjie Du, Bing Mao, Yuekang Li, Yaowen Zheng, Yeting Li, Li Pan, Yang Liu, Robert H. DengSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Communication protocols form the bedrock of our interconnected world, yet vulnerabilities within their implementations pose significant security threats. Recent developments have seen a surge in fuzzing-based research dedicated to uncovering these vulnerabilities within protocol implementations. However, there still lacks a systematic overview of protocol fuzzing for answering the essential questions such as what the unique challenges are, how existing works solve them, etc. To bridge this gap, we conducted a comprehensive investigation of related works from both academia and industry. Our study includes a detailed summary of the specific challenges in protocol fuzzing, and provides a systematic categorization and overview of existing research efforts. Furthermore, we explore and discuss potential future research directions in protocol fuzzing. This survey serves as a foundational guideline for researchers and practitioners in the field.
- [21] arXiv:2401.07261 (replaced) [pdf, html, other]
-
Title: LookAhead: Preventing DeFi Attacks via Unveiling Adversarial ContractsComments: 21 pages, 7 figuresSubjects: Cryptography and Security (cs.CR)
Decentralized Finance (DeFi) incidents stemming from the exploitation of smart contract vulnerabilities have culminated in financial damages exceeding 3 billion US dollars. Existing defense mechanisms typically focus on detecting and reacting to malicious transactions executed by attackers that target victim contracts. However, with the emergence of private transaction pools where transactions are sent directly to miners without first appearing in public mempools, current detection tools face significant challenges in identifying attack activities effectively.
Based on the fact that most attack logic rely on deploying one or more intermediate smart contracts as supporting components to the exploitation of victim contracts, in this paper, we propose a new direction for detecting DeFi attacks that focuses on identifying adversarial contracts instead of adversarial transactions. Our approach allows us to leverage common attack patterns, code semantics and intrinsic characteristics found in malicious smart contracts to build the LookAhead system based on Machine Learning (ML) classifiers and a transformer model that is able to effectively distinguish adversarial contracts from benign ones, and make just-in-time predictions of potential zero-day attacks. Our contributions are three-fold: First, we construct a comprehensive dataset consisting of features extracted and constructed from recent contracts deployed on the Ethereum and BSC blockchains. Secondly, we design a condensed representation of smart contract programs called Pruned Semantic-Control Flow Tokenization (PSCFT) and use it to train a combination of ML models that understand the behaviour of malicious codes based on function calls, control flows and other pattern-conforming features. Lastly, we provide the complete implementation of LookAhead and the evaluation of its performance metrics for detecting adversarial contracts. - [22] arXiv:2403.01215 (replaced) [pdf, html, other]
-
Title: Efficient Algorithm Level Error Detection for Number-Theoretic Transform used for Kyber Assessed on FPGAs and ARMSubjects: Cryptography and Security (cs.CR)
Polynomial multiplication stands out as a highly demanding arithmetic process in the development of post-quantum cryptosystems. The importance of the number-theoretic transform (NTT) extends beyond post-quantum cryptosystems, proving valuable in enhancing existing security protocols such as digital signature schemes and hash functions. CRYSTALS-KYBER stands out as the sole public key encryption (PKE) algorithm chosen by the National Institute of Standards and Technology (NIST) in its third round selection, making it highly regarded as a leading post-quantum cryptography (PQC) solution. Due to the potential for errors to significantly disrupt the operation of secure, cryptographically-protected systems, compromising data integrity, and safeguarding against side-channel attacks initiated through faults it is essential to incorporate mitigating error detection schemes. This paper introduces algorithm level fault detection schemes in the NTT multiplication using Negative Wrapped Convolution and the NTT tailored for Kyber Round 3, representing a significant enhancement compared to previous research. We evaluate this through the simulation of a fault model, ensuring that the conducted assessments accurately mirror the obtained results. Consequently, we attain a notably comprehensive coverage of errors. Furthermore, we assess the performance of our efficient error detection scheme for Negative Wrapped Convolution on FPGAs to showcase its implementation and resource requirements. Through implementation of our error detection approach on Xilinx/AMD Zynq Ultrascale+ and Artix-7, we achieve a comparable throughput with just a 9% increase in area and 13% increase in latency compared to the original hardware implementations. Finally, we attained an error detection ratio of nearly 100% for the NTT operation in Kyber Round 3, with a clock cycle overhead of 16% on the Cortex-A72 processor.
- [23] arXiv:2404.07527 (replaced) [pdf, html, other]
-
Title: Security Modelling for Cyber-Physical Systems: A Systematic Literature ReviewComments: Preprint under submissionSubjects: Cryptography and Security (cs.CR)
Cyber-physical systems (CPS) are at the intersection of digital technology and engineering domains, rendering them high-value targets of sophisticated and well-funded cybersecurity threat actors. Prominent cybersecurity attacks on CPS have brought attention to the vulnerability of these systems, and the inherent weaknesses of critical infrastructure reliant on CPS. Security modelling for CPS is an important mechanism to systematically identify and assess vulnerabilities, threats, and risks throughout system lifecycles, and to ultimately ensure system resilience, safety, and reliability. This survey delves into state-of-the-art research in CPS security modelling, encompassing both threat and attack modelling. While these terms are sometimes used interchangeably, they are different concepts. This article elaborates on the differences between threat and attack modelling, examining their implications for CPS security. A systematic search yielded 428 articles, from which 15 were selected and categorised into three clusters: those focused on threat modelling methods, attack modelling methods, and literature reviews. Specifically, we sought to examine what security modelling methods exist today, and how they address real-world cybersecurity threats and CPS-specific attacker capabilities throughout the lifecycle of CPS, which typically span longer durations compared to traditional IT systems. This article also highlights several limitations in existing research, wherein security models adopt simplistic approaches that do not adequately consider the dynamic, multi-layer, multi-path, and multi-agent characteristics of real-world cyber-physical attacks.
- [24] arXiv:2405.05175 (replaced) [pdf, html, other]
-
Title: AirGapAgent: Protecting Privacy-Conscious Conversational AgentsEugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel RamageComments: at CCS'24Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand.
Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective. - [25] arXiv:2405.06830 (replaced) [pdf, html, other]
-
Title: Towards Browser Controls to Protect Cookies from Malicious ExtensionsSubjects: Cryptography and Security (cs.CR)
Cookies maintain state across related web traffic. As such, cookies are commonly used for authentication by storing a user's session ID and replacing the need to re-enter credentials in subsequent traffic. These so-called ``session cookies'' are prime targets for attacks that aim to steal them to gain unauthorized access to user accounts. To mitigate these attacks, the Secure and HttpOnly cookie attributes limit a cookie's accessibility from malicious networks and websites. However, these controls overlook browser extensions: third-party HTML/JavaScript add-ons with access to privileged browser APIs and the ability to operate across multiple websites. Thus malicious or compromised extensions can provide unrestricted access to a user's session cookies.
In this work, we first analyze the prevalence of extensions with access to ``risky'' APIs (those that enable cookie modification and theft) and find that they have hundreds of millions of users. Motivated by this, we propose a mechanism to protect cookies from malicious extensions by introducing two new cookie attributes: BrowserOnly and Monitored. The BrowserOnly attribute prevents extension access to cookies altogether. While effective, not all cookies can be made inaccessible. Thus cookies with the Monitored attribute remain accessible but are tied to a single browser and any changes made to the cookie are logged. As a result, stolen Monitored cookies are unusable outside their original browser and servers can validate the modifications performed. To demonstrate the proposed functionalities, we design and implement CREAM (Cookie Restrictions for Extension Abuse Mitigation) a modified version of the open-source Chromium browser realizing these controls. Our evaluation indicates that CREAM effectively protects cookies from malicious extensions while incurring little run-time overheads. - [26] arXiv:2406.15585 (replaced) [pdf, html, other]
-
Title: Ten Years of ZMapComments: 10 pages, 7 figures, in submission at Internet Measurement Conference 2024Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Since ZMap's debut in 2013, networking and security researchers have used the open-source scanner to write hundreds of research papers that study Internet behavior. In addition, ZMap has been adopted by the security industry to build new classes of enterprise security and compliance products. Over the past decade, much of ZMap's behavior -- ranging from its pseudorandom IP generation to its packet construction -- has evolved as we have learned more about how to scan the Internet. In this work, we quantify ZMap's adoption over the ten years since its release, describe its modern behavior (and the measurements that motivated changes), and offer lessons from releasing and maintaining ZMap for future tools.
- [27] arXiv:2408.01508 (replaced) [pdf, html, other]
-
Title: Blockchain Amplification AttackSubjects: Cryptography and Security (cs.CR)
Strategies related to the blockchain concept of Extractable Value (MEV/BEV), such as arbitrage, front- or backrunning create an economic incentive for network nodes to reduce latency. A modified node, that minimizes transaction validation time and neglects to filter invalid transactions in the Ethereum P2P network, introduces a novel attack vector -- Blockchain Amplification Attack. An attacker exploits those modified nodes to amplify an invalid transaction thousands of times, posing a threat to the entire network. To illustrate attack feasibility and practicality in the current mainnet, we 1) identify thousands of similar attacks in the wild, 2) mathematically model propagation mechanism, 3) empirically measure model parameters from our two monitoring nodes, and 4) compare performance with existing Denial-of-Service attacks through local simulation. We show that an attacker can amplify network traffic at modified nodes by a factor of 3,600, and cause economic damages 13,800 times greater than the amount needed to carry out the attack. Despite these risks, aggressive latency reduction may still be profitable enough to justify the existence of modified nodes. To assess this tradeoff, we 1) simulate the transaction validation process in the local network and 2) empirically measure the latency reduction by deploying our modified node in the Ethereum testnet. We conclude with a cost-benefit analysis of skipping validation and provide mitigation strategies against this attack.
- [28] arXiv:2408.10695 (replaced) [pdf, html, other]
-
Title: On NVD Users' Attitudes, Experiences, Hopes and HurdlesComments: To appear in ACM DTRAP Special Issue on IMF 2024Subjects: Cryptography and Security (cs.CR)
The National Vulnerability Database (NVD) is a major vulnerability database that is free to use for everyone. It provides information about vulnerabilities and further useful resources such as linked advisories and patches. The NVD is often considered as the central source for vulnerability information and as a help to improve the resource-intensive process of vulnerability management. Although the NVD receives much public attention, little is known about its usage in vulnerability management, users' attitudes towards it and whether they encounter any problems during usage. We explored these questions using a preliminary interview study with seven people, and a follow-up survey with 71 participants. The results show that the NVD is consulted regularly and often aids decision making. Generally, users are positive about the NVD and perceive it as a helpful, clearly structured tool. But users also faced issues: missing or incorrect entries, incomplete descriptions or incomprehensible CVSS ratings. In order to identify the problems origins, we discussed the results with two senior NVD members. Many of the problems can be attributed to higher-level problems such as the CVE List or limited resources. Nevertheless, the NVD is working on improving existing problems.
- [29] arXiv:2409.05543 (replaced) [pdf, html, other]
-
Title: On-line Anomaly Detection and Qualification of Random Bit StreamsComments: 9 pages, 12 figures, 3 tables, submitted at IEEE CSR 2024Subjects: Cryptography and Security (cs.CR)
Generating random bit streams is required in various applications, most notably cyber-security. Ensuring high-quality and robust randomness is crucial to mitigate risks associated with predictability and system compromise. True random numbers provide the highest unpredictability levels. However, potential biases in the processes exploited for the random number generation must be carefully monitored. This paper reports the implementation and characterization of an on-line procedure for the detection of anomalies in a true random bit stream. It is based on the NIST Adaptive Proportion and Repetition Count tests, complemented by statistical analysis relying on the Monobit and RUNS. The procedure is firmware implemented and performed simultaneously with the bit stream generation, and providing as well an estimate of the entropy of the source. The experimental validation of the approach is performed upon the bit streams generated by a quantum, silicon-based entropy source.
- [30] arXiv:2409.09288 (replaced) [pdf, html, other]
-
Title: Generating API Parameter Security Rules with LLM for API Misuse DetectionComments: Accepted by NDSS Symposium 2025. Please cite this paper as "Jinghua Liu, Yi Yang, Kai Chen, and Miaoqian Lin. Generating API Parameter Security Rules with LLM for API Misuse Detection. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
In this paper, we present a new framework, named GPTAid, for automatic APSRs generation by analyzing API source code with LLM and detecting API misuse caused by incorrect parameter use. To validate the correctness of the LLM-generated APSRs, we propose an execution feedback-checking approach based on the observation that security-critical API misuse is often caused by APSRs violations, and most of them result in runtime errors. Specifically, GPTAid first uses LLM to generate raw APSRs and the Right calling code, and then generates Violation code for each raw APSR by modifying the Right calling code using LLM. Subsequently, GPTAid performs dynamic execution on each piece of Violation code and further filters out the incorrect APSRs based on runtime errors. To further generate concrete APSRs, GPTAid employs a code differential analysis to refine the filtered ones. Particularly, as the programming language is more precise than natural language, GPTAid identifies the key operations within Violation code by differential analysis, and then generates the corresponding concrete APSR based on the aforementioned operations. These concrete APSRs could be precisely interpreted into applicable detection code, which proven to be effective in API misuse detection. Implementing on the dataset containing 200 randomly selected APIs from eight popular libraries, GPTAid achieves a precision of 92.3%. Moreover, it generates 6 times more APSRs than state-of-the-art detectors on a comparison dataset of previously reported bugs and APSRs. We further evaluated GPTAid on 47 applications, 210 unknown security bugs were found potentially resulting in severe security issues (e.g., system crashes), 150 of which have been confirmed by developers after our reports.
- [31] arXiv:2409.09380 (replaced) [pdf, html, other]
-
Title: The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse DetectionComments: Accepted by NDSS Symposium 2025. Please cite this paper as "Yi Yang, Jinghua Liu, Kai Chen, Miaoqian Lin. The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)."Subjects: Cryptography and Security (cs.CR)
In this paper, we propose an LLM-empowered RM-API misuse detection solution, ChatDetector, which fully automates LLMs for documentation understanding which helps RM-API constraints retrieval and RM-API misuse detection. To correctly retrieve the RM-API constraints, ChatDetector is inspired by the ReAct framework which is optimized based on Chain-of-Thought (CoT) to decompose the complex task into allocation APIs identification, RM-object (allocated/released by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs). It first verifies the semantics of allocation APIs based on the retrieved RM sentences from API documentation through LLMs. Inspired by the LLMs' performance on various prompting methods,ChatDetector adopts a two-dimensional prompting approach for cross-validation. At the same time, an inconsistency-checking approach between the LLMs' output and the reasoning process is adopted for the allocation APIs confirmation with an off-the-shelf Natural Language Processing (NLP) tool. To accurately pair the RM-APIs, ChatDetector decomposes the task again and identifies the RM-object type first, with which it can then accurately pair the releasing APIs and further construct the RM-API constraints for misuse detection. With the diminished hallucinations, ChatDetector identifies 165 pairs of RM-APIs with a precision of 98.21% compared with the state-of-the-art API detectors. By employing a static detector CodeQL, we ethically report 115 security bugs on the applications integrating on six popular libraries to the developers, which may result in severe issues, such as Denial-of-Services (DoS) and memory corruption. Compared with the end-to-end benchmark method, the result shows that ChatDetector can retrieve at least 47% more RM sentences and 80.85% more RM-API constraints.
- [32] arXiv:2409.11693 (replaced) [pdf, html, other]
-
Title: On the second-order zero differential properties of several classes of power functions over finite fieldsSubjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Feistel Boomerang Connectivity Table (FBCT) is an important cryptanalytic technique on analysing the resistance of the Feistel network-based ciphers to power attacks such as differential and boomerang attacks. Moreover, the coefficients of FBCT are closely related to the second-order zero differential spectra of the function $F(x)$ over the finite fields with even characteristic and the Feistel boomerang uniformity is the second-order zero differential uniformity of $F(x)$. In this paper, by computing the number of solutions of specific equations over finite fields, we determine explicitly the second-order zero differential spectra of power functions $x^{2^m+3}$ and $x^{2^m+5}$ with $m>2$ being a positive integer over finite field with even characteristic, and $x^{p^k+1}$ with integer $k\geq1$ over finite field with odd characteristic $p$. It is worth noting that $x^{2^m+3}$ is a permutation over $\mathbb{F}_{2^n}$ and only when $m$ is odd, $x^{2^m+5}$ is a permutation over $\mathbb{F}_{2^n}$, where integer $n=2m$. As a byproduct, we find $F(x)=x^4$ is a PN and second-order zero differentially $0$-uniform function over $\mathbb{F}_{3^n}$ with odd $n$. The computation of these entries and the cardinalities in each table aimed to facilitate the analysis of differential and boomerang cryptanalysis of S-boxes when studying distinguishers and trails.
- [33] arXiv:2409.11868 (replaced) [pdf, other]
-
Title: Practical Investigation on the Distinguishability of Longa's Atomic PatternsSubjects: Cryptography and Security (cs.CR)
This paper investigates the distinguishability of the atomic patterns for elliptic curve point doubling and addition operations proposed by Longa. We implemented a binary elliptic curve scalar multiplication kP algorithm with Longa's atomic patterns for the NIST elliptic curve P-256 using the open-source cryptographic library FLECC in C. We measured and analysed an electromagnetic trace of a single kP execution on a microcontroller (TI Launchpad F28379 board). Due to various technical limitations, significant differences in the execution time and the shapes of the atomic blocks could not be determined. Further investigations of the side channel analysis-resistance can be performed based on this work. Last but not least, we examined and corrected Longa's atomic patterns corresponding to formulae proposed by Longa.
- [34] arXiv:2103.09320 (replaced) [pdf, html, other]
-
Title: Quantum Pseudorandomness and Classical ComplexityComments: 31 pages. V2: added a new result about Haar random oracles (Corollary 5); various writing improvements. V3: added a new section about t-designs and corrected some proofs involving the use of t-designs. V4: corrected Lemma 25. V5: major changes to Section 5; the proof uses a new oracle construction after a bug was discovered in the previous versionJournal-ref: 16th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2021), Leibniz International Proceedings in Informatics (LIPIcs) 197, pp. 2:1-2:20 (2021)Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Cryptography and Security (cs.CR)
We construct a quantum oracle relative to which $\mathsf{BQP} = \mathsf{QMA}$ but cryptographic pseudorandom quantum states and pseudorandom unitary transformations exist, a counterintuitive result in light of the fact that pseudorandom states can be "broken" by quantum Merlin-Arthur adversaries. We explain how this nuance arises as the result of a distinction between algorithms that operate on quantum and classical inputs. On the other hand, we show that some computational complexity assumption is needed to construct pseudorandom states, by proving that pseudorandom states do not exist if $\mathsf{BQP} = \mathsf{PP}$. We discuss implications of these results for cryptography, complexity theory, and shadow tomography.
- [35] arXiv:2209.02275 (replaced) [pdf, html, other]
-
Title: Multi-class Classifier based Failure Prediction with Artificial and Anonymous Training for Data PrivacySubjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
This paper proposes a novel non-intrusive system failure prediction technique using available information from developers and minimal information from raw logs (rather than mining entire logs) but keeping the data entirely private with the data owners. A neural network based multi-class classifier is developed for failure prediction, using artificially generated anonymous data set, applying a combination of techniques, viz., genetic algorithm (steps), pattern repetition, etc., to train and test the network. The proposed mechanism completely decouples the data set used for training process from the actual data which is kept private. Moreover, multi-criteria decision making (MCDM) schemes are used to prioritize failures meeting business requirements. Results show high accuracy in failure prediction under different parameter configurations. On a broader context, any classification problem, beyond failure prediction, can be performed using the proposed mechanism with artificially generated data set without looking into the actual data as long as the input features can be translated to binary values (e.g. output from private binary classifiers) and can provide classification-as-a-service.
- [36] arXiv:2308.11406 (replaced) [pdf, html, other]
-
Title: Designing an attack-defense game: how to increase robustness of financial transaction models via a competitionAlexey Zaytsev, Maria Kovaleva, Alex Natekin, Evgeni Vorsin, Valerii Smirnov, Georgii Smirnov, Oleg Sidorshin, Alexander Senin, Alexander Dudin, Dmitry BerestnevSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistical Finance (q-fin.ST)
Banks routinely use neural networks to make decisions. While these models offer higher accuracy, they are susceptible to adversarial attacks, a risk often overlooked in the context of event sequences, particularly sequences of financial transactions, as most works consider computer vision and NLP modalities.
We propose a thorough approach to studying these risks: a novel type of competition that allows a realistic and detailed investigation of problems in financial transaction data. The participants directly oppose each other, proposing attacks and defenses -- so they are examined in close-to-real-life conditions.
The paper outlines our unique competition structure with direct opposition of participants, presents results for several different top submissions, and analyzes the competition results. We also introduce a new open dataset featuring financial transactions with credit default labels, enhancing the scope for practical research and development. - [37] arXiv:2402.03990 (replaced) [pdf, html, other]
-
Title: Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic OptimisationComments: After the publication of this work (ICML 2024), the Conjecture 6.3 has been proven by Kalinin (2024). Kalinin, N. P. Notes on Sampled Gaussian Mechanism. arXiv:2409.04636, 2024Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
We study how the batch size affects the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP deep learning, its properties have been widely studied, and recent works have empirically found large batch sizes to be beneficial. However, theoretical explanations of this benefit are currently heuristic at best. We first observe that the total gradient variance in DP-SGD can be decomposed into subsampling-induced and noise-induced variances. We then prove that in the limit of an infinite number of iterations, the effective noise-induced variance is invariant to the batch size. The remaining subsampling-induced variance decreases with larger batch sizes, so large batches reduce the effective total gradient variance. We confirm numerically that the asymptotic regime is relevant in practical settings when the batch size is not small, and find that outside the asymptotic regime, the total gradient variance decreases even more with large batch sizes. We also find a sufficient condition that implies that large batch sizes similarly reduce effective DP noise variance for one iteration of DP-SGD.
- [38] arXiv:2402.18607 (replaced) [pdf, html, other]
-
Title: Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial PerspectiveSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Diffusion models have recently gained significant attention in both academia and industry due to their impressive generative performance in terms of both sampling quality and distribution coverage. Accordingly, proposals are made for sharing pre-trained diffusion models across different organizations, as a way of improving data utilization while enhancing privacy protection by avoiding sharing private data directly. However, the potential risks associated with such an approach have not been comprehensively examined.
In this paper, we take an adversarial perspective to investigate the potential privacy and fairness risks associated with the sharing of diffusion models. Specifically, we investigate the circumstances in which one party (the sharer) trains a diffusion model using private data and provides another party (the receiver) black-box access to the pre-trained model for downstream tasks. We demonstrate that the sharer can execute fairness poisoning attacks to undermine the receiver's downstream models by manipulating the training data distribution of the diffusion model. Meanwhile, the receiver can perform property inference attacks to reveal the distribution of sensitive features in the sharer's dataset. Our experiments conducted on real-world datasets demonstrate remarkable attack performance on different types of diffusion models, which highlights the critical importance of robust data auditing and privacy protection protocols in pertinent applications. - [39] arXiv:2404.08064 (replaced) [pdf, other]
-
Title: The Impact of Speech Anonymization on Pathology and Its LimitsSoroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee YangComments: Published in Communications MedicineJournal-ref: Commun Med 4, (2024)Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods. We document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experience minimal utility changes, while Dysglossia shows slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis reveals consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.
- [40] arXiv:2408.12240 (replaced) [pdf, other]
-
Title: The Bright Side of Timed OpacityComments: This is the author (and extended) version of the manuscript of the same name published in the proceedings of the 25th International Conference on Formal Engineering Methods (ICFEM 2024)Subjects: Logic in Computer Science (cs.LO); Cryptography and Security (cs.CR); Formal Languages and Automata Theory (cs.FL)
In 2009, Franck Cassez showed that the timed opacity problem, where an attacker can observe some actions with their timestamps and attempts to deduce information, is undecidable for timed automata (TAs). Moreover, he showed that the undecidability holds even for subclasses such as event-recording automata. In this article, we consider the same definition of opacity for several other subclasses of TAs: with restrictions on the number of clocks, of actions, on the nature of time, or on a new subclass called observable event-recording automata. We show that opacity can mostly be retrieved, except for one-action TAs and for one-clock TAs with $\epsilon$-transitions, for which undecidability remains. We then exhibit a new decidable subclass in which the number of observations made by the attacker is limited.
- [41] arXiv:2409.02802 (replaced) [pdf, html, other]
-
Title: Boosting Certified Robustness for Time Series Classification with Efficient Self-EnsembleComments: 6 figures, 4 tables, 10 pagesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radius under $\ell_p$-ball attacks. Recognizing its success, research in the time series domain has started focusing on these aspects. However, existing research predominantly focuses on time series forecasting, or under the non-$\ell_p$ robustness in statistic feature augmentation for time series classification~(TSC). Our review found that Randomized Smoothing performs modestly in TSC, struggling to provide effective assurances on datasets with poor robustness. Therefore, we propose a self-ensemble method to enhance the lower bound of the probability confidence of predicted labels by reducing the variance of classification margins, thereby certifying a larger radius. This approach also addresses the computational overhead issue of Deep Ensemble~(DE) while remaining competitive and, in some cases, outperforming it in terms of robustness. Both theoretical analysis and experimental results validate the effectiveness of our method, demonstrating superior performance in robustness testing compared to baseline approaches.
- [42] arXiv:2409.11430 (replaced) [pdf, other]
-
Title: Federated Learning with Quantum Computing and Fully Homomorphic Encryption: A Novel Computing Paradigm Shift in Privacy-Preserving MLSiddhant Dutta, Pavana P Karanth, Pedro Maciel Xavier, Iago Leal de Freitas, Nouhaila Innan, Sadok Ben Yahia, Muhammad Shafique, David E. Bernal NeiraComments: 10 pages, 2 figuresSubjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The widespread deployment of products powered by machine learning models is raising concerns around data privacy and information security worldwide. To address this issue, Federated Learning was first proposed as a privacy-preserving alternative to conventional methods that allow multiple learning clients to share model knowledge without disclosing private data. A complementary approach known as Fully Homomorphic Encryption (FHE) is a quantum-safe cryptographic system that enables operations to be performed on encrypted weights. However, implementing mechanisms such as these in practice often comes with significant computational overhead and can expose potential security threats. Novel computing paradigms, such as analog, quantum, and specialized digital hardware, present opportunities for implementing privacy-preserving machine learning systems while enhancing security and mitigating performance loss. This work instantiates these ideas by applying the FHE scheme to a Federated Learning Neural Network architecture that integrates both classical and quantum layers.