2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019
Position estimation of multiple objects in a 3D environment poses a challenging task, even more s... more Position estimation of multiple objects in a 3D environment poses a challenging task, even more so in the presence of occlusions due to infrastructure. In this paper we present a method to accurately localize up to 10 moving pedestrians by fusing the output of a Sparsity Driven Detector with volumes generated by a Shape-from-Silhouette approach. We also show how occlusion information from a 3D map of the environment can be integrated into our algorithm to further improve performance. We investigate the influence of different camera heights and image sizes on the optimization problem and demonstrate real-time capability for certain configurations. Additionally, our code is made publicly available under an open source license.1
In order to render software viable for highly safety-critical applications, we describe how to in... more In order to render software viable for highly safety-critical applications, we describe how to incorporate fault tolerance mechanisms into the real-time programming language PEARL. Therefore, we present, classify, evaluate and illustrate known fault tolerance methods for software. We link them together with the requirements of the international standard IEC 61508-3 for functional safety. We contribute PEARL-2020 programming language constructs for fault tolerance methods that need to be implemented by operating systems, and code-snippets as well as libraries for those independent from runtime systems.
This survey presents an overview of integrating prior knowledge into machine learning systems in ... more This survey presents an overview of integrating prior knowledge into machine learning systems in order to improve explainability. The complexity of machine learning models has elicited research to make them more explainable. However, most explainability methods cannot provide insight beyond the given data, requiring additional information about the context. We propose to harness prior knowledge to improve upon the explanation capabilities of machine learning models. In this paper, we present a categorization of current research into three main categories which either integrate knowledge into the machine learning pipeline, into the explainability method or derive knowledge from explanations. To classify the papers, we build upon the existing taxonomy of informed machine learning and extend it from the perspective of explainability. We conclude with open challenges and research directions.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021
An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in ... more An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in neural networks to afford their deployment in critical applications. An ubiquitous class of safety risks are learned shortcuts, i.e. spurious correlations a network exploits for its decisions that have no semantic connection to the actual task. Networks relying on such shortcuts bear the risk of not generalizing well to unseen inputs. Explainability methods help to uncover such network vulnerabilities. However, many of these techniques are not directly applicable if access to the network is constrained, in so-called black-box setups. These setups are prevalent when using third-party ML components. To address this constraint, we present an approach to detect learned shortcuts using an interpretable-by-design network as a proxy to the black-box model of interest. Leveraging the proxy’s guarantees on introspection we automatically extract candidates for learned shortcuts. Their transferabil...
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018
Finding the parameters of a vignetting function for a camera currently involves the acquisition o... more Finding the parameters of a vignetting function for a camera currently involves the acquisition of several images in a given scene under very controlled lighting conditions, a cumbersome and error-prone task where the end result can only be confirmed visually. Many computer vision algorithms assume photoconsistency, constant intensity between scene points in different images, and tend to perform poorly if this assumption is violated. We present a real-time online vignetting and response calibration with additional exposure estimation for global-shutter color cameras. Our method does not require uniformly illuminated surfaces, known texture or specific geometry. The only assumptions are that the camera is moving, the illumination is static and reflections are Lambertian. Our method estimates the camera view poses by sparse visual SLAM and models the vignetting function by a small number of thin plate splines (TPS) together with a sixth-order polynomial to provide a dense estimation o...
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new be... more The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of groundbased and flying robots. This article covers our approaches to solve the two challenges that involved micro aerial vehicles (MAV). Challenge 1 required reliable target perception, fast trajectory planning, and stable control of an MAV in order to land on a moving vehicle. Challenge 3 demanded a team of MAVs to perform a search and transportation task, coined “Treasure Hunt”, which required mission planning and multi-robot coordination as well as adaptive control to account for the additional object weight. We describe our base MAV setup and the challenge-specific extensions, cover the camera-based perception, explain control and trajectory-planning in detail, and elaborate on mission planning and team coordination. We evaluated our systems in simulation as well as with real-robot experiments during the competition ...
Accurately self-localizing a vehicle is of high importance as it allows to robustify nearly all m... more Accurately self-localizing a vehicle is of high importance as it allows to robustify nearly all modern driver assistance functionality, e.g . lane keeping and coordinated autonomous driving maneuvers. We examine vehicle self-localization relying only on video sensors, in particular, a system of four fisheye cameras providing a view surrounding the car, a setup currently growing popular in upper-class cars. The presented work aims at an autonomous parking scenario. The method is based on park markings as orientation marks since they can be found in nearly every parking deck and require only little additional preparation. Our contribution is twofold: 1) We present a new real-time capable image processing pipeline for topview systems extracting park markings and show how to obtain a reliable and accurate ego pose and ego motion estimation given a coarse pose as starting point. 2) The aptitude of this often neglected sensor array for vehicle self-localization is demonstrated. Experiment...
2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018
The long-term goal of autonomous driving will require a detailed understanding of complex traffic... more The long-term goal of autonomous driving will require a detailed understanding of complex traffic scenes, in particular the state and possibly intentions of other traffic participants, the most prominent one being non-ego vehicles. An intermediate step and cornerstone to this understanding is a precise estimate of a vehicle's pose and category as well as potential other cues that facilitate predicting its future behavior. The output representations of current state-of-the-art computer vision algorithms, e.g. detection or semantic segmentation, hold little pose information and generally do not straightforwardly allow for any vehicle state analysis. In this work we focus on new vehicle representations that can be learned by semantic segmentation algorithms. We present three vehicle fragmentation levels that divide a vehicle into part areas based on a mixture of material and function, jointly aiming at vehicle state analysis and embedding pose information. To avoid expensive manual...
2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019
Over the last years, pixel-wise analysis of semantic segmentation was established as a powerful m... more Over the last years, pixel-wise analysis of semantic segmentation was established as a powerful method in scene understanding for autonomous driving, providing classification and 2D shape estimation even with monocular camera systems. Despite this positive resonance, a way to take advantage of this representation for the extraction of 3D information solely from a single-shot RGB image has never been presented.In this paper we present a full-fledged six degree-of-freedom vehicle pose estimation algorithm, demonstrating that a segmentation representation can be utilized to extract precise 3D information for non-ego vehicles. We train a neural network to predict a multi-class mask from segmentation, defining classes based on mechanical parts and relative part positions, treating different entities of a part as separate classes. The multi-class mask is transformed to a variable set of key points, serving as a set of 2D-3D correspondences for a Point-n-Perspective-solver. Our paper shows...
The invention relates to a method and a device for displaying the vicinity of a vehicle (20, 220)... more The invention relates to a method and a device for displaying the vicinity of a vehicle (20, 220) and a driver assistance system with such a device with at least one sensor (1, 101, 111, 121, 131) for generating sensor data of the surroundings of a vehicle ( 20, 220). It is carried out a processing of the sensor data to raw image data, if necessary, using a grid model (21, 221) around the vehicle (20, 220) optionally processing the raw image data into object information with the aid of a lattice model (21, 221) around the vehicle (20, 220) and using the obtained object information for the processing of raw image data to image-object data, and finally displaying the image object data.
The utilization of automatically generated image training data is a feasible way to enhance exist... more The utilization of automatically generated image training data is a feasible way to enhance existing datasets, e.g., by strengthening underrepresented classes or by adding new lighting or weather conditions for more variety. Synthetic images can also be used to introduce entirely new classes to a given dataset. In order to maximize the positive effects of generated image data on classifier training and reduce the possible downsides of potentially problematic image samples, an automatic quality assessment of each generated image seems sensible for overall quality enhancement of the training set and, thus, of the resulting classifier. In this paper we extend our previous work on synthetic traffic sign images by assessing the quality of a fully generated dataset consisting of 215,000 traffic sign images using four different measures. According to each sample’s quality, we successively reduce the size of our training set and evaluate the performance with SVM and CNN classifiers to verif...
The invention relates to an apparatus for the examination of cells with an elastomer and use ther... more The invention relates to an apparatus for the examination of cells with an elastomer and use thereof. The elastomer has a bottom and a thicker edge portion, and in the bottom periodic microstructures are arranged. Such elastomers are for expanded testing, especially uniaxial type, particularly suitable. Advantageous uses of such devices are also disclosed.
Most modern computer vision techniques rely on large amounts of meticulously annotated data for t... more Most modern computer vision techniques rely on large amounts of meticulously annotated data for training and evaluation. In close-to-market development, this demand is even higher since numerous common and—more important—less common situations have to be tested and must hence be covered datawise. However, gathering the necessary amount of data ready-labeled for the task at hand is a challenge of its own. Depending on the complexity of the objective and the chosen approach, the required amount of data can be vast. At the same time, the effort to capture all possible cases of a given problem grows with their variability. This makes recording new video data unfeasible, even impossible at times. In this work, we regard parking space classification as an exemplary application to target the imbalance of cost and benefit w.r.t. image data creation for machine learning approaches. We rely on a fully-fledged park deck simulation created with Unreal Engine 4 for data creation and replace all ...
Video-based traffic sign recognition poses a highly challenging problem due to the significant nu... more Video-based traffic sign recognition poses a highly challenging problem due to the significant number of possible classes and large variances of recording conditions in natural environments. Gathering an appropriate amount of data to solve this task with machine learning techniques remains an overall issue. In this study, we assess the suitability of automatically generated traffic sign images for training corresponding image classifiers. To this end, we adapt the recently proposed cycle-consistent generative adversarial networks in order to transfer automatically rendered prototypical traffic sign images for which we control type, pose, and-to a degree-background into their true-to-life counterparts. We test the proposed system by extensive experiments on the German Traffic Sign Recognition Benchmark dataset [1] and learn that both a HOG-feature-based SVM classifier and a state-of-the-art CNN exhibit reasonable performance when solely trained on artificial data. Consequently, it is...
The use of neural networks in perception pipelines of autonomous systems such as autonomous drivi... more The use of neural networks in perception pipelines of autonomous systems such as autonomous driving is indispensable due to their outstanding performance. But, at the same time their complexity poses a challenge with respect to safety. An important question in this regard is how to substantiate test sufficiency for such a function. One approach from software testing literature is that of coverage metrics. Similar notions of coverage, called neuron coverage, have been proposed for deep neural networks and try to assess to what extent test input activates neurons in a network. Still, the correspondence between high neuron coverage and safety-related network qualities remains elusive. Potentially, a high coverage could imply sufficiency of test data. In this paper, we argue that the coverage metrics as discussed in the current literature do not satisfy these high expectations and present a line of experiments from the field of computer vision to prove this claim.
2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019
Position estimation of multiple objects in a 3D environment poses a challenging task, even more s... more Position estimation of multiple objects in a 3D environment poses a challenging task, even more so in the presence of occlusions due to infrastructure. In this paper we present a method to accurately localize up to 10 moving pedestrians by fusing the output of a Sparsity Driven Detector with volumes generated by a Shape-from-Silhouette approach. We also show how occlusion information from a 3D map of the environment can be integrated into our algorithm to further improve performance. We investigate the influence of different camera heights and image sizes on the optimization problem and demonstrate real-time capability for certain configurations. Additionally, our code is made publicly available under an open source license.1
In order to render software viable for highly safety-critical applications, we describe how to in... more In order to render software viable for highly safety-critical applications, we describe how to incorporate fault tolerance mechanisms into the real-time programming language PEARL. Therefore, we present, classify, evaluate and illustrate known fault tolerance methods for software. We link them together with the requirements of the international standard IEC 61508-3 for functional safety. We contribute PEARL-2020 programming language constructs for fault tolerance methods that need to be implemented by operating systems, and code-snippets as well as libraries for those independent from runtime systems.
This survey presents an overview of integrating prior knowledge into machine learning systems in ... more This survey presents an overview of integrating prior knowledge into machine learning systems in order to improve explainability. The complexity of machine learning models has elicited research to make them more explainable. However, most explainability methods cannot provide insight beyond the given data, requiring additional information about the context. We propose to harness prior knowledge to improve upon the explanation capabilities of machine learning models. In this paper, we present a categorization of current research into three main categories which either integrate knowledge into the machine learning pipeline, into the explainability method or derive knowledge from explanations. To classify the papers, we build upon the existing taxonomy of informed machine learning and extend it from the perspective of explainability. We conclude with open challenges and research directions.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021
An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in ... more An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in neural networks to afford their deployment in critical applications. An ubiquitous class of safety risks are learned shortcuts, i.e. spurious correlations a network exploits for its decisions that have no semantic connection to the actual task. Networks relying on such shortcuts bear the risk of not generalizing well to unseen inputs. Explainability methods help to uncover such network vulnerabilities. However, many of these techniques are not directly applicable if access to the network is constrained, in so-called black-box setups. These setups are prevalent when using third-party ML components. To address this constraint, we present an approach to detect learned shortcuts using an interpretable-by-design network as a proxy to the black-box model of interest. Leveraging the proxy’s guarantees on introspection we automatically extract candidates for learned shortcuts. Their transferabil...
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018
Finding the parameters of a vignetting function for a camera currently involves the acquisition o... more Finding the parameters of a vignetting function for a camera currently involves the acquisition of several images in a given scene under very controlled lighting conditions, a cumbersome and error-prone task where the end result can only be confirmed visually. Many computer vision algorithms assume photoconsistency, constant intensity between scene points in different images, and tend to perform poorly if this assumption is violated. We present a real-time online vignetting and response calibration with additional exposure estimation for global-shutter color cameras. Our method does not require uniformly illuminated surfaces, known texture or specific geometry. The only assumptions are that the camera is moving, the illumination is static and reflections are Lambertian. Our method estimates the camera view poses by sparse visual SLAM and models the vignetting function by a small number of thin plate splines (TPS) together with a sixth-order polynomial to provide a dense estimation o...
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new be... more The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of groundbased and flying robots. This article covers our approaches to solve the two challenges that involved micro aerial vehicles (MAV). Challenge 1 required reliable target perception, fast trajectory planning, and stable control of an MAV in order to land on a moving vehicle. Challenge 3 demanded a team of MAVs to perform a search and transportation task, coined “Treasure Hunt”, which required mission planning and multi-robot coordination as well as adaptive control to account for the additional object weight. We describe our base MAV setup and the challenge-specific extensions, cover the camera-based perception, explain control and trajectory-planning in detail, and elaborate on mission planning and team coordination. We evaluated our systems in simulation as well as with real-robot experiments during the competition ...
Accurately self-localizing a vehicle is of high importance as it allows to robustify nearly all m... more Accurately self-localizing a vehicle is of high importance as it allows to robustify nearly all modern driver assistance functionality, e.g . lane keeping and coordinated autonomous driving maneuvers. We examine vehicle self-localization relying only on video sensors, in particular, a system of four fisheye cameras providing a view surrounding the car, a setup currently growing popular in upper-class cars. The presented work aims at an autonomous parking scenario. The method is based on park markings as orientation marks since they can be found in nearly every parking deck and require only little additional preparation. Our contribution is twofold: 1) We present a new real-time capable image processing pipeline for topview systems extracting park markings and show how to obtain a reliable and accurate ego pose and ego motion estimation given a coarse pose as starting point. 2) The aptitude of this often neglected sensor array for vehicle self-localization is demonstrated. Experiment...
2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018
The long-term goal of autonomous driving will require a detailed understanding of complex traffic... more The long-term goal of autonomous driving will require a detailed understanding of complex traffic scenes, in particular the state and possibly intentions of other traffic participants, the most prominent one being non-ego vehicles. An intermediate step and cornerstone to this understanding is a precise estimate of a vehicle's pose and category as well as potential other cues that facilitate predicting its future behavior. The output representations of current state-of-the-art computer vision algorithms, e.g. detection or semantic segmentation, hold little pose information and generally do not straightforwardly allow for any vehicle state analysis. In this work we focus on new vehicle representations that can be learned by semantic segmentation algorithms. We present three vehicle fragmentation levels that divide a vehicle into part areas based on a mixture of material and function, jointly aiming at vehicle state analysis and embedding pose information. To avoid expensive manual...
2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019
Over the last years, pixel-wise analysis of semantic segmentation was established as a powerful m... more Over the last years, pixel-wise analysis of semantic segmentation was established as a powerful method in scene understanding for autonomous driving, providing classification and 2D shape estimation even with monocular camera systems. Despite this positive resonance, a way to take advantage of this representation for the extraction of 3D information solely from a single-shot RGB image has never been presented.In this paper we present a full-fledged six degree-of-freedom vehicle pose estimation algorithm, demonstrating that a segmentation representation can be utilized to extract precise 3D information for non-ego vehicles. We train a neural network to predict a multi-class mask from segmentation, defining classes based on mechanical parts and relative part positions, treating different entities of a part as separate classes. The multi-class mask is transformed to a variable set of key points, serving as a set of 2D-3D correspondences for a Point-n-Perspective-solver. Our paper shows...
The invention relates to a method and a device for displaying the vicinity of a vehicle (20, 220)... more The invention relates to a method and a device for displaying the vicinity of a vehicle (20, 220) and a driver assistance system with such a device with at least one sensor (1, 101, 111, 121, 131) for generating sensor data of the surroundings of a vehicle ( 20, 220). It is carried out a processing of the sensor data to raw image data, if necessary, using a grid model (21, 221) around the vehicle (20, 220) optionally processing the raw image data into object information with the aid of a lattice model (21, 221) around the vehicle (20, 220) and using the obtained object information for the processing of raw image data to image-object data, and finally displaying the image object data.
The utilization of automatically generated image training data is a feasible way to enhance exist... more The utilization of automatically generated image training data is a feasible way to enhance existing datasets, e.g., by strengthening underrepresented classes or by adding new lighting or weather conditions for more variety. Synthetic images can also be used to introduce entirely new classes to a given dataset. In order to maximize the positive effects of generated image data on classifier training and reduce the possible downsides of potentially problematic image samples, an automatic quality assessment of each generated image seems sensible for overall quality enhancement of the training set and, thus, of the resulting classifier. In this paper we extend our previous work on synthetic traffic sign images by assessing the quality of a fully generated dataset consisting of 215,000 traffic sign images using four different measures. According to each sample’s quality, we successively reduce the size of our training set and evaluate the performance with SVM and CNN classifiers to verif...
The invention relates to an apparatus for the examination of cells with an elastomer and use ther... more The invention relates to an apparatus for the examination of cells with an elastomer and use thereof. The elastomer has a bottom and a thicker edge portion, and in the bottom periodic microstructures are arranged. Such elastomers are for expanded testing, especially uniaxial type, particularly suitable. Advantageous uses of such devices are also disclosed.
Most modern computer vision techniques rely on large amounts of meticulously annotated data for t... more Most modern computer vision techniques rely on large amounts of meticulously annotated data for training and evaluation. In close-to-market development, this demand is even higher since numerous common and—more important—less common situations have to be tested and must hence be covered datawise. However, gathering the necessary amount of data ready-labeled for the task at hand is a challenge of its own. Depending on the complexity of the objective and the chosen approach, the required amount of data can be vast. At the same time, the effort to capture all possible cases of a given problem grows with their variability. This makes recording new video data unfeasible, even impossible at times. In this work, we regard parking space classification as an exemplary application to target the imbalance of cost and benefit w.r.t. image data creation for machine learning approaches. We rely on a fully-fledged park deck simulation created with Unreal Engine 4 for data creation and replace all ...
Video-based traffic sign recognition poses a highly challenging problem due to the significant nu... more Video-based traffic sign recognition poses a highly challenging problem due to the significant number of possible classes and large variances of recording conditions in natural environments. Gathering an appropriate amount of data to solve this task with machine learning techniques remains an overall issue. In this study, we assess the suitability of automatically generated traffic sign images for training corresponding image classifiers. To this end, we adapt the recently proposed cycle-consistent generative adversarial networks in order to transfer automatically rendered prototypical traffic sign images for which we control type, pose, and-to a degree-background into their true-to-life counterparts. We test the proposed system by extensive experiments on the German Traffic Sign Recognition Benchmark dataset [1] and learn that both a HOG-feature-based SVM classifier and a state-of-the-art CNN exhibit reasonable performance when solely trained on artificial data. Consequently, it is...
The use of neural networks in perception pipelines of autonomous systems such as autonomous drivi... more The use of neural networks in perception pipelines of autonomous systems such as autonomous driving is indispensable due to their outstanding performance. But, at the same time their complexity poses a challenge with respect to safety. An important question in this regard is how to substantiate test sufficiency for such a function. One approach from software testing literature is that of coverage metrics. Similar notions of coverage, called neuron coverage, have been proposed for deep neural networks and try to assess to what extent test input activates neurons in a network. Still, the correspondence between high neuron coverage and safety-related network qualities remains elusive. Potentially, a high coverage could imply sufficiency of test data. In this paper, we argue that the coverage metrics as discussed in the current literature do not satisfy these high expectations and present a line of experiments from the field of computer vision to prove this claim.
Uploads
Papers by Sebastian Houben