Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Predication is a well-known alternative to conditional branching. However, the implementation of predication is costly in terms of extending the instruction set of the processor architecture. In this paper, a predication encoding... more
Predication is a well-known alternative to conditional branching. However, the implementation of predication is costly in terms of extending the instruction set of the processor architecture. In this paper, a predication encoding technique for VLIW processors is proposed. Instead of using additional bits in the instruction encoding, the assigned issue-slot of a conditionally executed instruction encodes the associated predicate register. The number of addressable predicate registers scales with the number of issue-slots. All predicate registers have only one read and write port and can be accessed in parallel. Compared to the related work, no additional instruction encoding bits for selecting a predicate register are required and the processor core area increases only by about 1% per predicate register set. With the proposed predication technique, the processing performance increases by up to 4.5% when using two instead of one predicate register for a digital filter case study with floating-point emulation operations. A second case study shows, that conditional execution with two predicate register in combination with loop unrolling and operation merging almost doubles the achieved parallel instructions per cycle for a bit-reversal permutation algorithm.
Optical wireless communication (OWC) or visible light communication (VLC) has recently experienced a significant growth in interest. Due to the vast bandwidth available in the unlicensed light spectrum, the ascent of LED-based lighting,... more
Optical wireless communication (OWC) or visible light communication (VLC) has recently experienced a significant growth in interest. Due to the vast bandwidth available in the unlicensed light spectrum, the ascent of LED-based lighting, inherent security and EM benefits, OWC can be regarded as an attractive alternative to commonly available RF-based communication, specifically Wi-Fi. With its importance grows the need for properly matched system models and tools for exploration of the OWC-specific design space in order to predict system performance under real-life conditions. Therefore, a simulation model for the discrete event-based simulation framework OMNeT++ was created, which precisely models transmissions on the physical level in LOS scenarios and enables quantitative impact analysis of changes in either the optical front-ends or the simulated scenario, e.g. field of view (FOV), position and orientation of transmitter and receiver. The physical characteristics of the simulation model have been verified against measurements performed with an actual OWC system. In order to illustrate design space exploration capabilities of our simulation model, four case studies are presented involving changes to transmission aspects, e.g. modulation scheme or type of FEC, or the utilized protocols. Each demonstrates the possibility to quantify the impact of static and dynamic parameter changes on the resulting system performance parameters, such as throughput and covered area.
Background Since social distancing during the COVID-19-pandemic had a profound impact on professional life, this study investigated the effect of PCR testing on on-site work. Methods PCR screening, antibody testing, and questionnaires... more
Background Since social distancing during the COVID-19-pandemic had a profound impact on professional life, this study investigated the effect of PCR testing on on-site work. Methods PCR screening, antibody testing, and questionnaires offered to 4,890 working adults in Lower Saxony were accompanied by data collection on demographics, family status, comorbidities, social situation, health-related behavior, and the number of work-related contacts. Relative risks (RR) with 95 % confidence intervals were estimated for the associations between regular PCR testing and other work and health-related variables, respectively, and working on-site. Analyses were stratified by the suitability of work tasks for mobile office. Results Between April 2020 and February 2021, 1,643 employees underwent PCR testing. Whether mobile working was possible strongly influenced the work behavior. Persons whose work was suitable for mobile office (mobile workers) had a lower probability of working on-site than ...
Melt electro writing (MEW) is a high-resolution 3D printing technique that combines elements of electro-hydrodynamic fiber attraction and melts extrusion. The ability to precisely deposit micro- to nanometer strands of biocompatible... more
Melt electro writing (MEW) is a high-resolution 3D printing technique that combines elements of electro-hydrodynamic fiber attraction and melts extrusion. The ability to precisely deposit micro- to nanometer strands of biocompatible polymers in a layer-by-layer fashion makes MEW a promising scaffold fabrication method for all kinds of tissue engineering applications. This review describes possibilities to optimize multi-parametric MEW processes for precise fiber deposition over multiple layers and prevent printing defects. Printing protocols for nonlinear scaffolds structures, concrete MEW scaffold pore geometries and printable biocompatible materials for MEW are introduced. The review discusses approaches to combining MEW with other fabrication techniques with the purpose to generate advanced scaffolds structures. The outlined MEW printer modifications enable customizable collector shapes or sacrificial materials for non-planar fiber deposition and nozzle adjustments allow redesign...
Microgravity eases several constraints limiting experiments with ultracold and condensed atoms on ground. It enables extended times of flight without suspension and eliminates the gravitational sag for trapped atoms. These advantages... more
Microgravity eases several constraints limiting experiments with ultracold and condensed atoms on ground. It enables extended times of flight without suspension and eliminates the gravitational sag for trapped atoms. These advantages motivated numerous initiatives to adapt and operate experimental setups on microgravity platforms. We describe the design of the payload, motivations for design choices, and capabilities of the Bose-Einstein Condensate and Cold Atom Laboratory (BECCAL), a NASA-DLR collaboration. BECCAL builds on the heritage of previous devices operated in microgravity, features rubidium and potassium, multiple options for magnetic and optical trapping, different methods for coherent manipulation, and will offer new perspectives for experiments on quantum optics, atom optics, and atom interferometry in the unique microgravity environment on board the International Space Station.
Direction of arrival (DOA) estimation is an important array signal processing technique, used by various applications such as radar, sonar or wireless communication. Most of the known DOA algorithms suffer from a significant performance... more
Direction of arrival (DOA) estimation is an important array signal processing technique, used by various applications such as radar, sonar or wireless communication. Most of the known DOA algorithms suffer from a significant performance reduction and even fail completely under difficult conditions, like small antenna aperture size, correlated signals or a small number of snapshots. Maximum-Likelihood (ML) methods have been investigated thoroughly and are known to still work even in such difficult scenarios. Though, the major drawback of ML methods is their computational cost, especially in the case of large MIMO (multiple-input multiple-output) configurations. This work presents a novel hardware accelerator architecture, which is able to compute the exact ML estimation in the case of one or two targets. It is shown, that the computational demanding vector product can be implemented with the help of CORDIC units, which help to save a considerable amount of hardware resources. Furthermore, the result of the single target estimator can be reused to efficiently compute the estimates in the two-target case. Finally, the performance of the architecture is evaluated by a FPGA implementation which is able to process more than 20 000 detections from 16 channels with 256 steering vectors in real-time (25 Hz).
With the rise of pulse-width modulated LED light sources in car headlights and peripheral illumination, in-vehicle cameras for traffic sign recognition and electronic mirror replacement face new issues with flicker artifacts due to... more
With the rise of pulse-width modulated LED light sources in car headlights and peripheral illumination, in-vehicle cameras for traffic sign recognition and electronic mirror replacement face new issues with flicker artifacts due to misaligned irradiation and imager integration times. In this paper, we propose a novel post-processing algorithm based on three consecutive frames, being able to detect and filter those artifacts without the need for high-dynamic range imaging. The local reconstruction of the original light intensity is based on a local flicker detection mask and multiple candidate pixels, generated from bidirectional motion estimation to adjacent frames.
In general, the signal chain in modern mobile Brain-Computer Interfaces (BCIs) is subdivided into at least two blocks. These are usually wirelessly connected with digital signal processing part implemented separately and often stationary.... more
In general, the signal chain in modern mobile Brain-Computer Interfaces (BCIs) is subdivided into at least two blocks. These are usually wirelessly connected with digital signal processing part implemented separately and often stationary. This causes a limited mobility and results in an additional, although avoidable, latency due to the wireless transmission channel. Therefore, a novel, entirely mobile FPGA-based platform for BCIs has been designed and implemented. While featuring highly efficient adaptability to targeted algorithms due to the ultra low power Flash-based FPGA, the stackable system design and the configurable hardware ensure flexibility for the use in different application scenarios. Powered through a single Li-ion battery, the miniaturized system area of half the size of a credit card leads to high mobility and thus allow for real-world scenario applicability. A Bluetooth Low Energy extension can be connected without any significant area cost, if a wireless data or control signal transmission channel is required. The resulting system is capable of acquiring and fully processing of up to 32 EEG channels with 24 bit precision each and a sampling rate of 250-16k samples per second with a total weight less than 60 g.
The integration of application specific instruction set processors (ASIPs) in hearing aids requires various architectural customizations and software-side optimizations in order to meet the stringent power consumption constraints and... more
The integration of application specific instruction set processors (ASIPs) in hearing aids requires various architectural customizations and software-side optimizations in order to meet the stringent power consumption constraints and processing performance demands. This paper presents the KAVUAKA application specific hearing aid processor and its ASIC integration as a system on chip (SoC). The final system contains four KAVUAKA processor cores and ten co-processors. Each of these processors and co-processors were individually customized and differ in their data path width. The processors are organized in two clusters, which share memories, an audio interface, co-processors and a serial interface. With this system, different hearing aid systems are evaluated in terms of performance, power and area by activating different processor and co-processor combinations. A 40 nm low power technology was used to build this research hearing aid system. The die size is 3.6 mm2 with less than 1 mm2 per core. The measured average power consumption is less than 1 mW per core.
Conventional synthetic vascular grafts require ongoing anticoagulation, and autologous venous grafts are often not available in elderly patients. This review highlights the development of bioartificial vessels replacing brain‐dead donor‐... more
Conventional synthetic vascular grafts require ongoing anticoagulation, and autologous venous grafts are often not available in elderly patients. This review highlights the development of bioartificial vessels replacing brain‐dead donor‐ or animal‐deriving vessels with ongoing immune reactivity. The vision for such bio‐hybrids exists in a combination of biodegradable scaffolds and seeding with immune‐neutral cells, and here different cells sources such as autologous progenitor cells or stem cells are relevant. This kind of in situ tissue engineering depends on a suitable bioreactor system with elaborate monitoring systems, three‐dimensional (3D) visualization and a potential of cell conditioning into the direction of the targeted vascular cell phenotype. Necessary bioreactor tools for dynamic and pulsatile cultivation are described. In addition, a concept for design of vasa vasorum is outlined, that is needed for sustainable nutrition of the wall structure in large caliber vessels. For scaffold design and cell adhesion additives, different materials and technologies are discussed. 3D printing is introduced as a relatively new field with promising prospects, for example, to create complex geometries or micro‐structured surfaces for optimal cell adhesion and ingrowth in a standardized and custom designed procedure. Summarizing, a bio‐hybrid vascular prosthesis from a controlled biotechnological process is thus coming more and more into view. It has the potential to withstand strict approval requirements applied for advanced therapy medicinal products.
Neural networks (NN) are a powerful tool to tackle complex problems in hearing aid research, but their use on hearing aid hardware is currently limited by memory and processing power. To enable the training with these constrains, a fixed... more
Neural networks (NN) are a powerful tool to tackle complex problems in hearing aid research, but their use on hearing aid hardware is currently limited by memory and processing power. To enable the training with these constrains, a fixed point analysis and a memory friendly power of two quantization (replacing multiplications with shift operations) scheme has been implemented extending TensorFlow, a standard framework for training neural networks, and the Qkeras package [1, 2]. The implemented fixed point analysis detects quantization issues like overflows, underflows, precision problems and zero gradients. The analysis is done for each layer in every epoch for weights, biases and activations respectively. With this information the quantization can be optimized, e.g. by modifying the bit width, number of integer bits or the quantization scheme to a power of two quantization. To demonstrate the applicability of this method a case study has been conducted. Therefore a CNN has been tra...
In neurological and neuropsychiatric disorders neuronal oscillatory activity between basal ganglia and cortical circuits are altered, which may be useful as biomarker for adaptive deep brain stimulation. We investigated whether changes in... more
In neurological and neuropsychiatric disorders neuronal oscillatory activity between basal ganglia and cortical circuits are altered, which may be useful as biomarker for adaptive deep brain stimulation. We investigated whether changes in the spectral power of oscillatory activity in the motor cortex (MCtx) and the sensorimotor cortex (SMCtx) of rats after injection of the dopamine (DA) receptor antagonist haloperidol (HALO) would be similar to those observed in Parkinson disease. Thereafter, we tested whether a convolutional neural network (CNN) model would identify brain signal alterations in this acute model of parkinsonism. A sixteen channel surface micro-electrocorticogram (ECoG) recording array was placed under the dura above the MCtx and SMCtx areas of one hemisphere under general anaesthesia in rats. Seven days after surgery, micro ECoG was recorded in individual free moving rats in three conditions: (1) basal activity, (2) after injection of HALO (0.5 mg/kg), and (3) with additional injection of apomorphine (APO) (1 mg/kg). Furthermore, a CNN-based classification consisting of 23,530 parameters was applied on the raw data. HALO injection decreased oscillatory theta band activity (4-8 Hz) and enhanced beta (12-30 Hz) and gamma (30-100 Hz) in MCtx and SMCtx, which was compensated after APO injection (P ¡ 0.001). Evaluation of classification performance of the CNN model provided accuracy of 92%, sensitivity of 90% and specificity of 93% on one-dimensional signals. The CNN proposed model requires a minimum of sensory hardware and may be integrated into future research on therapeutic devices for Parkinson disease, such as adaptive closed loop stimulation, thus contributing to more efficient way of treatment.
The efficient hardware implementation of signal processing algorithms requires a rigid characterization of the interdependencies between system parameters and hardware costs. Pure software simulation of bit-true implementations of... more
The efficient hardware implementation of signal processing algorithms requires a rigid characterization of the interdependencies between system parameters and hardware costs. Pure software simulation of bit-true implementations of algorithms with high computational complexity is prohibitive because of the excessive runtime. Therefore, we present a field-programmable gate array (FPGA) based hybrid hardware-in-the-loop design space exploration (DSE) framework combining high-level tools (e.g. MATLAB, C++) with a System-on-Chip (SoC) template mapped on FPGA-based emulation systems. This combination significantly accelerates the design process and characterization of highly optimized hardware modules. Furthermore, the approach helps to quantify the interdependencies between system parameters and hardware costs. The achievable emulation speedup using bit-true hardware modules is a key enabling the optimization of complex signal processing systems using Monte Carlo approaches which are infeasible for pure software simulation due to the large required stimuli sets. The framework supports a divide-and-conquer approach through a flexible partitioning of complex algorithms across the system resources on different layers of abstraction. This facilitates to efficiently split the design process among different teams. The presented framework comprises a generic state of the art SoC infrastructure template, a transparent communication layer including MATLAB and hardware interfaces, module wrappers and DSE facilities. The hardware template is synthesizable for a variety of FPGA-based platforms. Implementation and DSE results for two case studies from the different application fields of synthetic aperture radar image processing and interference alignment in communication systems are presented.
Localization algorithms have become of considerable interest for robot audition, acoustic navigation, teleconferencing, speaker localization, and many other applications over the last decade. In this paper, we present a real-time... more
Localization algorithms have become of considerable interest for robot audition, acoustic navigation, teleconferencing, speaker localization, and many other applications over the last decade. In this paper, we present a real-time implementation of a Gaussian mixture model (GMM) based probabilistic sound source localization algorithm for a low-power VLIW-SIMD processor for hearing devices. The algorithm has been proven to allow for robust localization of multiple sound sources simultaneously in reverberant and noisy environments. Real-time computation for audio frames of 512 samples at 16 kHz was achieved by introducing algorithmic optimizations and hardware customizations. To the best of our knowledge, this is the first real-time capable implementation of a computationally complex GMM-based sound source localization algorithm on a low-power processor. The resulting estimated core area without consideration of memory in 40nm low-power TSMC technology is 188,511 pm2.
Public health is promoted when physical activity and social interaction increase. One’s physical activity can be measured by wearable pedometers based on step counting, which contributes to physical health. However, thus far, no wearable... more
Public health is promoted when physical activity and social interaction increase. One’s physical activity can be measured by wearable pedometers based on step counting, which contributes to physical health. However, thus far, no wearable device supports monitoring social interactions. In this paper, a single head-worn inertial measurement unit (H-IMU) is proposed to evaluate interpersonal activities, which can measure three objective components of rapport: positivity, mutual attentiveness, and coordination. Using angular kinematics, the number of head nods is counted for positivity (recognition rate: 94.34%). In terms of mutual attentiveness, we computed angle differences between two interactants’ head orientations. The H-IMU can also measure coordination of two interactants’ gait patterns based on individual gait events. These results can be a reference to estimate the level of rapport during walk and talk. The H-IMU is applicable to head-worn devices, such as smart earphones and head mounted displays (HMD).
This paper introduces a Field Programmable Gate Array (FPGA) architecture for Synthetic Aperture Radar (SAR) processing. The architecture implements a Fast Factorized Backprojection (FFBP) algorithm for stripmap imagery. The designed... more
This paper introduces a Field Programmable Gate Array (FPGA) architecture for Synthetic Aperture Radar (SAR) processing. The architecture implements a Fast Factorized Backprojection (FFBP) algorithm for stripmap imagery. The designed architecture is platform independent as it only requires regular external memory, and by encapsulation the design in a superordinated system that controls board specific periphery, a switch between platforms is possible without design adjustments. The design supports scalable parallelism of dedicated processing modules in order to provide high performance and allow for exhaustive resource utilization of different FPGAs.
Multicore processors serve as target platforms in a broad variety of applications ranging from high-performance computing to embedded mobile computing and automotive applications. But, the required parallel programming opens up a huge... more
Multicore processors serve as target platforms in a broad variety of applications ranging from high-performance computing to embedded mobile computing and automotive applications. But, the required parallel programming opens up a huge design space of parallelization strategies each with potential bottlenecks. Therefore, an early estimation of an application’s performance is a desirable development tool. However, out-of-order execution, superscalar instruction pipelines, as well as communication costs and (shared-) cache effects essentially influence the performance of parallel programs. While offering low modeling effort and good simulation speed, current approximate analytic models provide moderate prediction results so far. Virtual prototyping requires a time-consuming simulation, but produces better accuracy. Furthermore, even existing statistical methods often require detailed knowledge of the hardware for characterization. In this work, we present a concept called Multicore Per...
Introduction: In this work, we use simulated data to quantify the different failure mechanisms of a previously presented low-cost jump height measurement system, based on widely available consumer smartphone technology. Methods: In order... more
Introduction: In this work, we use simulated data to quantify the different failure mechanisms of a previously presented low-cost jump height measurement system, based on widely available consumer smartphone technology. Methods: In order to assess the importance of the different preconditions of the jump height measurement algorithm, we generate a synthetic dataset of 2000 random jump parabolas for 2000 randomly generated persons without real-world artifacts. We then selectively add different perturbations to the parabolas and reconstruct the jump height using the evaluated algorithm. The degree to which the manipulations influence the reconstructed jump height gives us insights into how critical each precondition is for the method’s accuracy. Results: For a subject-to-camera distance of 2.5 meters, we found the most important influences to be tracking inaccuracies and distance changes (non-vertical jumps). These are also the most difficult factors to control. Camera angle and lens ...
In an aging society, diseases associated with irreversible damage of organs are frequent. An increasing percentage of patients requires bioartificial tissue or organ substitutes. Tissue engineering products depend on a well-defined... more
In an aging society, diseases associated with irreversible damage of organs are frequent. An increasing percentage of patients requires bioartificial tissue or organ substitutes. Tissue engineering products depend on a well-defined process to ensure successful cultivation while meeting high regulatory demands. The goal of the presented work is the development of a bioreactor system for the cultivation of tissue-engineered vascular grafts (TEVGs) for autologous implantation and transition from a lab scale setup to standardized production. Key characteristics include (i) the automated reliable monitoring and control of a wide-range of parameters regarding implant conditioning, (ii) easy and sterile setup and operation, (iii) reasonable costs of disposables, and (iv) parallelization of automated cultivation processes. The presented prototype bioreactor system provides comprehensive physiologically conditioning, sensing, and imaging functionality to meet all requirements for the successful cultivation of vascular grafts on a productional scale.
Complex signal processing algorithms targeted on architectures with increasingly high numbers of parallel processing units require high performance core-interconnections (i.e., low latencies, high throughput, no pinch-offs or... more
Complex signal processing algorithms targeted on architectures with increasingly high numbers of parallel processing units require high performance core-interconnections (i.e., low latencies, high throughput, no pinch-offs or bottlenecks). Therefore, assisting techniques, exploring characteristics of diverse topologies of common as well as innovative Network-on-Chips (NoCs), are necessary for the development of chips with massive parallel processing cores. In contrast to analytic NoC models, event driven NoC simulations can handle even complex task graphs, but however feature long simulation times. Enabling the simulation of even complex task graphs, in this work, we propose to use FPGA accelerated simulation. While we extend such a simulator in order to imitate cache coherence communication-behavior, we also present a translation of real measured profiles to task graphs for in-depth simulation of the communication behavior of an existing NoC-based manycore. Therefore, this approach is able to not only deal with synthetic scenarios, but analyse the communication behavior of real world applications. Additionally, a simulation of the Histograms of Oriented Gradients algorithm, running on the Intel Xeon Phi manycore, exhibiting a 70-stop ring-bus, exemplifies this approach.
The reliable detection of vulnerable road users and the assessment of the actual vulnerability is an important task for the collision warning algorithms of driver assistance systems. Current systems make assumptions about the road... more
The reliable detection of vulnerable road users and the assessment of the actual vulnerability is an important task for the collision warning algorithms of driver assistance systems. Current systems make assumptions about the road geometry which can lead to misclassification. We propose a deep learning-based approach to reliably detect pedestrians and classify their vulnerability based on the traffic area they are walking in. Since there are no pre-labeled datasets available for this task, we developed a method to train a network first on custom synthetic data and then use the network to augment a customer-provided training dataset for a neural network working on real world images. The evaluation shows that our network is able to accurately classify the vulnerability of pedestrians in complex real world scenarios without making assumptions on road geometry.
The performance of automotive radar systems is expected to significantly increase in the near future. With enhanced resolution capabilities more accurate and denser point clouds of traffic participants and roadside infrastructure can be... more
The performance of automotive radar systems is expected to significantly increase in the near future. With enhanced resolution capabilities more accurate and denser point clouds of traffic participants and roadside infrastructure can be acquired and so the amount of gathered information is growing drastically. One main driver for this development is the global trend towards self-driving cars, which all rely on precise and fine-grained sensor information. New radar signal processing concepts have to be developed in order to provide this additional information. This paper presents a prototype high resolution radar sensor which helps to facilitate algorithm development and verification. The system is operational under real-time conditions and achieves excellent performance in terms of range, velocity and angular resolution. Complex traffic scenarios can be acquired out of a moving test vehicle, which is very close to the target application. First measurement runs on public roads are extremely promising and show an outstanding single-snapshot performance. Complex objects can be precisely located and recognized by their contour shape. In order to increase the possible recording time, the raw data rate is reduced by several orders of magnitude in real-time by means of constant false alarm rate (CFAR) processing. The number of target cells can still exceed more than 10 000 points in a single measurement cycle for typical road scenarios.
Widespread vaccination in pursuit of herd immunity has been recognized as the most promising approach to ending the global pandemic of coronavirus disease 19 (COVID-19). The vaccination of children and adolescents has been extensively... more
Widespread vaccination in pursuit of herd immunity has been recognized as the most promising approach to ending the global pandemic of coronavirus disease 19 (COVID-19). The vaccination of children and adolescents has been extensively debated and the first COVID-19 vaccine is now approved in European countries for children aged > 12 years of age. Our study investigates vaccination hesitancy in a cohort of German secondary school students. We assessed 903 students between age 9 and 20 in the period between 17 May 2021 and 30 June 2021. 68.3% (n = 617) reported intention to undergo COVID-19 vaccination, while 7% (n = 62) did not want to receive the vaccine and 15% (n = 135) were not yet certain. Age and parental level of education influenced COVID-19 vaccine hesitancy. Children under the age of 16 as well as students whose parents had lower education levels showed significantly higher vaccine hesitancy.  Conclusion: Identifying subsets with higher vaccination hesitancy is important...
Due to its computational complexity, the Scale-Invariant Feature Transform (SIFT) algorithm poses a challenge for use in embedded applications. To meet real-time at low power, hardware acceleration is necessary. This paper presents an... more
Due to its computational complexity, the Scale-Invariant Feature Transform (SIFT) algorithm poses a challenge for use in embedded applications. To meet real-time at low power, hardware acceleration is necessary. This paper presents an FPGA-based balanced processor system for real-time SIFT feature detection, containing a dedicated hardware coprocessor coupled to a custom VLIW soft-core processor using a FIFO memory. The coprocessor calculates the scale-space and performs the extrema detection for the extraction of feature candidates, whereas the VLIW soft-core processor performs sub-pixel localization and stability checks to get stable SIFT-features. The system achieves a peak frame rate of up to 338 fps on 1,024×376 px images at less than 3 W on a Xilinx Virtex-6 FPGA. The filters within the Gaussian pyramid operate in a time-multiplexed scheme on clock frequencies up to 400 MHz. Furthermore, this paper presents a comprehensive design space exploration, evaluating architectural performance, hardware resources and power consumption trade-offs as well as exposing performance-balanced and pareto-optimal design variants.
Multiple modalities for stereo matching are beneficial for robust path estimation and actioning of autonomous robots in harsh environments, e.g. in the presence of smoke and dust. In order to combine the information resulting from the... more
Multiple modalities for stereo matching are beneficial for robust path estimation and actioning of autonomous robots in harsh environments, e.g. in the presence of smoke and dust. In order to combine the information resulting from the different modalities, a dense stereo matching approach based on semi-global matching and a combined cost function using cross-based support regions and phase congruency shows a good performance. However, these computationally complex algorithmic steps set high requirements for the mobile processing platform and prohibit a real-time execution at limited power budget on mobile platforms. Therefore, this paper explores the usage of graphic processors for the parallelization and acceleration of the aforementioned algorithm. The resulting implementation performs the computation of phase congruency and cross-based support regions at 68 and 5 frames per second for $[960\mathrm{x}560]$ pixel images on a Nvidia Quadro P5000 and Tegra X2 GPU respectively.

And 185 more