-
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Authors:
Edoardo Debenedetti,
Javier Rando,
Daniel Paleka,
Silaghi Fineas Florin,
Dragos Albastroiu,
Niv Cohen,
Yuval Lemberg,
Reshmi Ghosh,
Rui Wen,
Ahmed Salem,
Giovanni Cherubin,
Santiago Zanella-Beguelin,
Robin Schmid,
Victor Klemm,
Takahiro Miki,
Chenhao Li,
Stefan Kraft,
Mario Fritz,
Florian Tramèr,
Sahar Abdelnabi,
Lea Schönherr
Abstract:
Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed…
▽ More
Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed defenses to prevent the model from leaking the secret. During the second phase, teams were challenged to extract the secrets hidden for defenses proposed by the other teams. This report summarizes the main insights from the competition. Notably, we found that all defenses were bypassed at least once, highlighting the difficulty of designing a successful defense and the necessity for additional research to protect LLM systems. To foster future research in this direction, we compiled a dataset with over 137k multi-turn attack chats and open-sourced the platform.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain
Authors:
Marc Zeller,
Thomas Waschulzik,
Reiner Schmid,
Claus Bahlmann
Abstract:
Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling i…
▽ More
Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Capturing Dependencies within Machine Learning via a Formal Process Model
Authors:
Fabian Ritz,
Thomy Phan,
Andreas Sedlmeier,
Philipp Altmann,
Jan Wieghardt,
Reiner Schmid,
Horst Sauer,
Cornel Klein,
Claudia Linnhoff-Popien,
Thomas Gabor
Abstract:
The development of Machine Learning (ML) models is more than just a special case of software development (SD): ML models acquire properties and fulfill requirements even without direct human interaction in a seemingly uncontrollable manner. Nonetheless, the underlying processes can be described in a formal way. We define a comprehensive SD process model for ML that encompasses most tasks and artif…
▽ More
The development of Machine Learning (ML) models is more than just a special case of software development (SD): ML models acquire properties and fulfill requirements even without direct human interaction in a seemingly uncontrollable manner. Nonetheless, the underlying processes can be described in a formal way. We define a comprehensive SD process model for ML that encompasses most tasks and artifacts described in the literature in a consistent way. In addition to the production of the necessary artifacts, we also focus on generating and validating fitting descriptions in the form of specifications. We stress the importance of further evolving the ML model throughout its life-cycle even after initial training and testing. Thus, we provide various interaction points with standard SD processes in which ML often is an encapsulated task. Further, our SD process model allows to formulate ML as a (meta-) optimization problem. If automated rigorously, it can be used to realize self-adaptive autonomous systems. Finally, our SD process model features a description of time that allows to reason about the progress within ML development processes. This might lead to further applications of formal methods within the field of ML.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Self-Supervised Traversability Prediction by Learning to Reconstruct Safe Terrain
Authors:
Robin Schmid,
Deegan Atha,
Frederik Schöller,
Sharmita Dey,
Seyed Fakoorian,
Kyohei Otsu,
Barry Ridge,
Marko Bjelonic,
Lorenz Wellhausen,
Marco Hutter,
Ali-akbar Agha-mohammadi
Abstract:
Navigating off-road with a fast autonomous vehicle depends on a robust perception system that differentiates traversable from non-traversable terrain. Typically, this depends on a semantic understanding which is based on supervised learning from images annotated by a human expert. This requires a significant investment in human time, assumes correct expert classification, and small details can lea…
▽ More
Navigating off-road with a fast autonomous vehicle depends on a robust perception system that differentiates traversable from non-traversable terrain. Typically, this depends on a semantic understanding which is based on supervised learning from images annotated by a human expert. This requires a significant investment in human time, assumes correct expert classification, and small details can lead to misclassification. To address these challenges, we propose a method for predicting high- and low-risk terrains from only past vehicle experience in a self-supervised fashion. First, we develop a tool that projects the vehicle trajectory into the front camera image. Second, occlusions in the 3D representation of the terrain are filtered out. Third, an autoencoder trained on masked vehicle trajectory regions identifies low- and high-risk terrains based on the reconstruction error. We evaluated our approach with two models and different bottleneck sizes with two different training and testing sites with a fourwheeled off-road vehicle. Comparison with two independent test sets of semantic labels from similar terrain as training sites demonstrates the ability to separate the ground as low-risk and the vegetation as high-risk with 81.1% and 85.1% accuracy.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
PrePARE: Predictive Proprioception for Agile Failure Event Detection in Robotic Exploration of Extreme Terrains
Authors:
Sharmita Dey,
David Fan,
Robin Schmid,
Anushri Dixit,
Kyohei Otsu,
Thomas Touma,
Arndt F. Schilling,
Ali-akbar Agha-mohammadi
Abstract:
Legged robots can traverse a wide variety of terrains, some of which may be challenging for wheeled robots, such as stairs or highly uneven surfaces. However, quadruped robots face stability challenges on slippery surfaces. This can be resolved by adjusting the robot's locomotion by switching to more conservative and stable locomotion modes, such as crawl mode (where three feet are in contact with…
▽ More
Legged robots can traverse a wide variety of terrains, some of which may be challenging for wheeled robots, such as stairs or highly uneven surfaces. However, quadruped robots face stability challenges on slippery surfaces. This can be resolved by adjusting the robot's locomotion by switching to more conservative and stable locomotion modes, such as crawl mode (where three feet are in contact with the ground always) or amble mode (where one foot touches down at a time) to prevent potential falls. To tackle these challenges, we propose an approach to learn a model from past robot experience for predictive detection of potential failures. Accordingly, we trigger gait switching merely based on proprioceptive sensory information. To learn this predictive model, we propose a semi-supervised process for detecting and annotating ground truth slip events in two stages: We first detect abnormal occurrences in the time series sequences of the gait data using an unsupervised anomaly detector, and then, the anomalies are verified with expert human knowledge in a replay simulation to assert the event of a slip. These annotated slip events are then used as ground truth examples to train an ensemble decision learner for predicting slip probabilities across terrains for traversability. We analyze our model on data recorded by a legged robot on multiple sites with slippery terrain. We demonstrate that a potential slip event can be predicted up to 720 ms ahead of a potential fall with an average precision greater than 0.95 and an average F-score of 0.82. Finally, we validate our approach in real-time by deploying it on a legged robot and switching its gait mode based on slip event detection.
△ Less
Submitted 30 July, 2022;
originally announced August 2022.
-
Implicit Model Specialization through DAG-based Decentralized Federated Learning
Authors:
Jossekin Beilharz,
Bjarne Pfitzner,
Robert Schmid,
Paul Geppert,
Bert Arnrich,
Andreas Polze
Abstract:
Federated learning allows a group of distributed clients to train a common machine learning model on private data. The exchange of model updates is managed either by a central entity or in a decentralized way, e.g. by a blockchain. However, the strong generalization across all clients makes these approaches unsuited for non-independent and identically distributed (non-IID) data.
We propose a uni…
▽ More
Federated learning allows a group of distributed clients to train a common machine learning model on private data. The exchange of model updates is managed either by a central entity or in a decentralized way, e.g. by a blockchain. However, the strong generalization across all clients makes these approaches unsuited for non-independent and identically distributed (non-IID) data.
We propose a unified approach to decentralization and personalization in federated learning that is based on a directed acyclic graph (DAG) of model updates. Instead of training a single global model, clients specialize on their local data while using the model updates from other clients dependent on the similarity of their respective data. This specialization implicitly emerges from the DAG-based communication and selection of model updates. Thus, we enable the evolution of specialized models, which focus on a subset of the data and therefore cover non-IID data better than federated learning in a centralized or blockchain-based setup.
To the best of our knowledge, the proposed solution is the first to unite personalization and poisoning robustness in fully decentralized federated learning. Our evaluation shows that the specialization of models emerges directly from the DAG-based communication of model updates on three different datasets. Furthermore, we show stable model accuracy and less variance across clients when compared to federated averaging.
△ Less
Submitted 3 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
From CCS-Planning to Testautomation: The Digital Testfield of Deutsche Bahn in Scheibenberg -- A Case Study
Authors:
Arne Boockmeyer,
Dirk Friedenberger,
Lukas Pirl,
Robert Schmid,
Andreas Polze,
Heiko Herholz,
Gisela Freiin von Arnim,
Pedro Lehmann Ibáñez,
Torsten Friedrich,
Christoph Klaus,
Christian Wilhelmi
Abstract:
The digitalization of railway systems should increase the efficiency of the train operation to achieve future mobility challenges and climate goals. But this digitalization also comes with several new challenges in providing a secure and reliable train operation. The work resulting in this paper tackles two major challenges. First, there is no single university curriculum combining computer scienc…
▽ More
The digitalization of railway systems should increase the efficiency of the train operation to achieve future mobility challenges and climate goals. But this digitalization also comes with several new challenges in providing a secure and reliable train operation. The work resulting in this paper tackles two major challenges. First, there is no single university curriculum combining computer science, railway operation, and certification processes. Second, many railway processes are still manual and without the usage of digital tools and result in static implementations and configurations of the railway infrastructure devices. This case study occurred as part of the Digital Rail Summer School 2021, a university course combining the three mentioned aspects as cooperation of several German universities with partners from the railway industry. It passes through all steps from a digital Control-Command and Signalling (CCS) planning in ProSig 7.3, the transfer, and validation of the planning in the PlanPro data format and toolbox, to the generation of code of an interlocking for the digital CCS planning to contribute to the vision of test automation. This paper contributes the experiences of the case study and a proof-of-concept of the whole lifecycle for the Digital Testfield of Deutsche Bahn in Scheibenberg. This proof-of-concept will be continued in ongoing and following projects to fulfill the vision of test automation and automated launching of new devices.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
A Limitlessly Scalable Transaction System
Authors:
Max Mathys,
Roland Schmid,
Jakub Sliwinski,
Roger Wattenhofer
Abstract:
We present Accept, a simple, asynchronous transaction system that achieves perfect horizontal scaling.
Usual blockchain-based transaction systems come with a fundamental throughput limitation as they require that all (potentially unrelated) transactions must be totally ordered. Such solutions thus require serious compromises or are outright unsuitable for large-scale applications, such as global…
▽ More
We present Accept, a simple, asynchronous transaction system that achieves perfect horizontal scaling.
Usual blockchain-based transaction systems come with a fundamental throughput limitation as they require that all (potentially unrelated) transactions must be totally ordered. Such solutions thus require serious compromises or are outright unsuitable for large-scale applications, such as global retail payments.
Accept provides efficient horizontal scaling without any limitation. To that end, Accept satisfies a relaxed form of consensus and does not establish an ordering of unrelated transactions. Furthermore, Accept achieves instant finality and does not depend on a source of randomness.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Two-Class (r,k)-Coloring: Coloring with Service Guarantees
Authors:
Pál András Papp,
Roland Schmid,
Valentin Stoppiello,
Roger Wattenhofer
Abstract:
This paper introduces the Two-Class ($r$,$k$)-Coloring problem: Given a fixed number of $k$ colors, such that only $r$ of these $k$ colors allow conflicts, what is the minimal number of conflicts incurred by an optimal coloring of the graph?
We establish that the family of Two-Class ($r$,$k$)-Coloring problems is NP-complete for any $k \geq 2$ when $(r, k) \neq (0,2)$. Furthermore, we show that…
▽ More
This paper introduces the Two-Class ($r$,$k$)-Coloring problem: Given a fixed number of $k$ colors, such that only $r$ of these $k$ colors allow conflicts, what is the minimal number of conflicts incurred by an optimal coloring of the graph?
We establish that the family of Two-Class ($r$,$k$)-Coloring problems is NP-complete for any $k \geq 2$ when $(r, k) \neq (0,2)$. Furthermore, we show that Two-Class ($r$,$k$)-Coloring for $k \geq 2$ colors with one ($r = 1$) relaxed color cannot be approximated to any constant factor ($\notin$ APX). Finally, we show that Two-Class ($r$,$k$)-Coloring with $k \geq r \geq 2$ colors is APX-complete.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Predicting Medical Interventions from Vital Parameters: Towards a Decision Support System for Remote Patient Monitoring
Authors:
Kordian Gontarska,
Weronika Wrazen,
Jossekin Beilharz,
Robert Schmid,
Lauritz Thamsen,
Andreas Polze
Abstract:
Cardiovascular diseases and heart failures in particular are the main cause of non-communicable disease mortality in the world. Constant patient monitoring enables better medical treatment as it allows practitioners to react on time and provide the appropriate treatment. Telemedicine can provide constant remote monitoring so patients can stay in their homes, only requiring medical sensing equipmen…
▽ More
Cardiovascular diseases and heart failures in particular are the main cause of non-communicable disease mortality in the world. Constant patient monitoring enables better medical treatment as it allows practitioners to react on time and provide the appropriate treatment. Telemedicine can provide constant remote monitoring so patients can stay in their homes, only requiring medical sensing equipment and network connections. A limiting factor for telemedical centers is the amount of patients that can be monitored simultaneously. We aim to increase this amount by implementing a decision support system. This paper investigates a machine learning model to estimate a risk score based on patient vital parameters that allows sorting all cases every day to help practitioners focus their limited capacities on the most severe cases. The model we propose reaches an AUCROC of 0.84, whereas the baseline rule-based model reaches an AUCROC of 0.73. Our results indicate that the usage of deep learning to improve the efficiency of telemedical centers is feasible. This way more patients could benefit from better health-care through remote monitoring.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Towards a Staging Environment for the Internet of Things
Authors:
Jossekin Beilharz,
Philipp Wiesner,
Arne Boockmeyer,
Florian Brokhausen,
Ilja Behnke,
Robert Schmid,
Lukas Pirl,
Lauritz Thamsen
Abstract:
Internet of Things (IoT) applications promise to make many aspects of our lives more efficient and adaptive through the use of distributed sensing and computing nodes. A central aspect of such applications is their complex communication behavior that is heavily influenced by the physical environment of the system. To continuously improve IoT applications, a staging environment is needed that can p…
▽ More
Internet of Things (IoT) applications promise to make many aspects of our lives more efficient and adaptive through the use of distributed sensing and computing nodes. A central aspect of such applications is their complex communication behavior that is heavily influenced by the physical environment of the system. To continuously improve IoT applications, a staging environment is needed that can provide operating conditions representative of deployments in the actual production environments -- similar to what is common practice in cloud application development today. Towards such a staging environment, we present Marvis, a framework that orchestrates hybrid testbeds, co-simulated domain environments, and a central network simulation for testing distributed IoT applications. Our preliminary results include an open source prototype and a demonstration of a Vehicle-to-everything (V2X) communication scenario.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
SAT-MARL: Specification Aware Training in Multi-Agent Reinforcement Learning
Authors:
Fabian Ritz,
Thomy Phan,
Robert Müller,
Thomas Gabor,
Andreas Sedlmeier,
Marc Zeller,
Jan Wieghardt,
Reiner Schmid,
Horst Sauer,
Cornel Klein,
Claudia Linnhoff-Popien
Abstract:
A characteristic of reinforcement learning is the ability to develop unforeseen strategies when solving problems. While such strategies sometimes yield superior performance, they may also result in undesired or even dangerous behavior. In industrial scenarios, a system's behavior also needs to be predictable and lie within defined ranges. To enable the agents to learn (how) to align with a given s…
▽ More
A characteristic of reinforcement learning is the ability to develop unforeseen strategies when solving problems. While such strategies sometimes yield superior performance, they may also result in undesired or even dangerous behavior. In industrial scenarios, a system's behavior also needs to be predictable and lie within defined ranges. To enable the agents to learn (how) to align with a given specification, this paper proposes to explicitly transfer functional and non-functional requirements into shaped rewards. Experiments are carried out on the smart factory, a multi-agent environment modeling an industrial lot-size-one production facility, with up to eight agents and different multi-agent reinforcement learning algorithms. Results indicate that compliance with functional and non-functional constraints can be achieved by the proposed approach.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
FnF-BFT: Exploring Performance Limits of BFT Protocols
Authors:
Zeta Avarikioti,
Lioba Heimbach,
Roland Schmid,
Laurent Vanbever,
Roger Wattenhofer,
Patrick Wintermeyer
Abstract:
We introduce FnF-BFT, a parallel-leader byzantine fault-tolerant state-machine replication protocol for the partially synchronous model with theoretical performance bounds during synchrony. By allowing all replicas to act as leaders and propose requests independently, FnF-BFT parallelizes the execution of requests. Leader parallelization distributes the load over the entire network -- increasing t…
▽ More
We introduce FnF-BFT, a parallel-leader byzantine fault-tolerant state-machine replication protocol for the partially synchronous model with theoretical performance bounds during synchrony. By allowing all replicas to act as leaders and propose requests independently, FnF-BFT parallelizes the execution of requests. Leader parallelization distributes the load over the entire network -- increasing throughput by overcoming the single-leader bottleneck. We further use historical data to ensure that well-performing replicas are in command. FnF-BFT's communication complexity is linear in the number of replicas during synchrony and thus competitive with state-of-the-art protocols. Finally, with FnF-BFT, we introduce a BFT protocol with performance guarantees in stable network conditions under truly byzantine attacks.
A prototype implementation of \prot outperforms (state-of-the-art) HotStuff's throughput, especially as replicas increase, showcasing \prot's significantly improved scaling capabilities.
△ Less
Submitted 10 March, 2021; v1 submitted 4 September, 2020;
originally announced September 2020.
-
PermitBFT: Exploring the Byzantine Fast-Path
Authors:
Roland Schmid,
Roger Wattenhofer
Abstract:
PermitBFT establishes a permissioned byzantine ledger in the partially synchronous networking model. For n replicas, PermitBFT tolerates up to f < n/3 byzantine replicas. It is the first BFT protocol to achieve a latency of just 2 message delays despite tolerating byzantine replicas throughout the "fast track", as long as they are not the leader. The design of PermitBFT relies on two fundamental c…
▽ More
PermitBFT establishes a permissioned byzantine ledger in the partially synchronous networking model. For n replicas, PermitBFT tolerates up to f < n/3 byzantine replicas. It is the first BFT protocol to achieve a latency of just 2 message delays despite tolerating byzantine replicas throughout the "fast track", as long as they are not the leader. The design of PermitBFT relies on two fundamental concepts. First, in PermitBFT the participating nodes do not wait for a distinguished leader to act and subsequently confirm its actions, but send permits to the next leader proactively. Second, PermitBFT achieves a separation of the decision powers that are usually concentrated on a single leader node. A leader in PermitBFT controls which transactions to include in a new block, but not where to append the block in the block graph.
△ Less
Submitted 30 October, 2020; v1 submitted 25 June, 2019;
originally announced June 2019.
-
Adapting Quality Assurance to Adaptive Systems: The Scenario Coevolution Paradigm
Authors:
Thomas Gabor,
Marie Kiermeier,
Andreas Sedlmeier,
Bernhard Kempter,
Cornel Klein,
Horst Sauer,
Reiner Schmid,
Jan Wieghardt
Abstract:
From formal and practical analysis, we identify new challenges that self-adaptive systems pose to the process of quality assurance. When tackling these, the effort spent on various tasks in the process of software engineering is naturally re-distributed. We claim that all steps related to testing need to become self-adaptive to match the capabilities of the self-adaptive system-under-test. Otherwi…
▽ More
From formal and practical analysis, we identify new challenges that self-adaptive systems pose to the process of quality assurance. When tackling these, the effort spent on various tasks in the process of software engineering is naturally re-distributed. We claim that all steps related to testing need to become self-adaptive to match the capabilities of the self-adaptive system-under-test. Otherwise, the adaptive system's behavior might elude traditional variants of quality assurance. We thus propose the paradigm of scenario coevolution, which describes a pool of test cases and other constraints on system behavior that evolves in parallel to the (in part autonomous) development of behavior in the system-under-test. Scenario coevolution offers a simple structure for the organization of adaptive testing that allows for both human-controlled and autonomous intervention, supporting software engineering for adaptive systems on a procedural as well as technical level.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Differential Diagnosis for Pancreatic Cysts in CT Scans Using Densely-Connected Convolutional Networks
Authors:
Hongwei Li,
Kanru Lin,
Maximilian Reichert,
Lina Xu,
Rickmer Braren,
Deliang Fu,
Roland Schmid,
Ji Li,
Bjoern Menze,
Kuangyu Shi
Abstract:
The lethal nature of pancreatic ductal adenocarcinoma (PDAC) calls for early differential diagnosis of pancreatic cysts, which are identified in up to 16% of normal subjects, and some of which may develop into PDAC. Previous computer-aided developments have achieved certain accuracy for classification on segmented cystic lesions in CT. However, pancreatic cysts have a large variation in size and s…
▽ More
The lethal nature of pancreatic ductal adenocarcinoma (PDAC) calls for early differential diagnosis of pancreatic cysts, which are identified in up to 16% of normal subjects, and some of which may develop into PDAC. Previous computer-aided developments have achieved certain accuracy for classification on segmented cystic lesions in CT. However, pancreatic cysts have a large variation in size and shape, and the precise segmentation of them remains rather challenging, which restricts the computer-aided interpretation of CT images acquired for differential diagnosis. We propose a computer-aided framework for early differential diagnosis of pancreatic cysts without pre-segmenting the lesions using densely-connected convolutional networks (Dense-Net). The Dense-Net learns high-level features from whole abnormal pancreas and builds mappings between medical imaging appearance to different pathological types of pancreatic cysts. To enhance the clinical applicability, we integrate saliency maps in the framework to assist the physicians to understand the decision of the deep learning method. The test on a cohort of 206 patients with 4 pathologically confirmed subtypes of pancreatic cysts has achieved an overall accuracy of 72.8%, which is significantly higher than the baseline accuracy of 48.1%, which strongly supports the clinical potential of our developed method.
△ Less
Submitted 19 June, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.