Search | arXiv e-print repository

High-Resolution Hyperspectral Video Imaging Using A Hexagonal Camera Array

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspect… ▽ More Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspectral image at once, also called snapshot hyperspectral imaging, to avoid spectral smearing. For this task, a high-resolution snapshot hyperspectral camera array using a hexagonal shape is introduced.The hexagonal array for hyperspectral imaging uses off-the-shelf hardware, which enables high flexibility regarding employed cameras, lenses and filters. Hence, the spectral range can be easily varied by mounting a different set of filters. Moreover, the concept of using off-the-shelf hardware enables low prices in comparison to other approaches with highly specialized hardware. Since classical industrial cameras are used in this hyperspectral camera array, the spatial and temporal resolution is very high, while recording 37 hyperspectral channels in the range from 400 nm to 760 nm in 10 nm steps. A registration process is required for near-field imaging, which maps the peripheral camera views to the center view. It is shown that this combination using a hyperspectral camera array and the corresponding image registration pipeline is superior in comparison to other popular snapshot approaches. For this evaluation, a synthetic hyperspectral database is rendered. On the synthetic data, the novel approach outperforms its best competitor by more than 3 dB in reconstruction quality. This synthetic data is also used to show the superiority of the hexagonal shape in comparison to an orthogonal-spaced one. Moreover, a real-world high resolution hyperspectral video database is provided. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2406.13709 [pdf, other]

A Study on the Effect of Color Spaces in Learned Image Compression

Authors: Srivatsa Prativadibhayankaram, Mahadev Prasad Panda, Jürgen Seiler, Thomas Richter, Heiko Sparenberg, Siegfried Fößel, André Kaup

Abstract: In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels i… ▽ More In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels in a single branch, similar to most learned image codecs operating in RGB. The models are trained for multiple bitrate configurations in each color space. We report the findings from our experiments by evaluating them on various datasets and compare the results to state-of-the-art image codecs. The YUV model performs better than the LAB variant in terms of MS-SSIM with a Bjøntegaard delta bitrate (BD-BR) gain of 7.5\% using VTM intra-coding mode as the baseline. Whereas the LAB variant has a better performance than YUV model in terms of CIEDE2000 having a BD-BR gain of 8\%. Overall, the RGB variant of SLIC achieves the best performance with a BD-BR gain of 13.14\% in terms of MS-SSIM and a gain of 17.96\% in CIEDE2000 at the cost of a higher model complexity. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepter pre-print version for ICIP 2024

arXiv:2406.11284 [pdf, other]

Multispectral Snapshot Image Registration Using Learned Cross Spectral Disparity Estimation and a Deep Guided Occlusion Reconstruction Network

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Multispectral imaging aims at recording images in different spectral bands. This is extremely beneficial in diverse discrimination applications, for example in agriculture, recycling or healthcare. One approach for snapshot multispectral imaging, which is capable of recording multispectral videos, is by using camera arrays, where each camera records a different spectral band. Since the cameras are… ▽ More Multispectral imaging aims at recording images in different spectral bands. This is extremely beneficial in diverse discrimination applications, for example in agriculture, recycling or healthcare. One approach for snapshot multispectral imaging, which is capable of recording multispectral videos, is by using camera arrays, where each camera records a different spectral band. Since the cameras are at different spatial positions, a registration procedure is necessary to map every camera to the same view. In this paper, we present a multispectral snapshot image registration with three novel components. First, a cross spectral disparity estimation network is introduced, which is trained on a popular stereo database using pseudo spectral data augmentation. Subsequently, this disparity estimation is used to accurately detect occlusions by warping the disparity map in a layer-wise manner. Finally, these detected occlusions are reconstructed by a learned deep guided neural network, which leverages the structure from other spectral components. It is shown that each element of this registration process as well as the final result is superior to the current state of the art. In terms of PSNR, our registration achieves an improvement of over 3 dB. At the same time, the runtime is decreased by a factor of over 3 on a CPU. Additionally, the registration is executable on a GPU, where the runtime can be decreased by a factor of 111. The source code and the data is available at https://github.com/FAU-LMS/MSIR. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2401.04787 [pdf, other]

A Convex Optimization Approach to Compute Trapping Regions for Lossless Quadratic Systems

Authors: Shih-Chi Liao, A. Leonid Heide, Maziar S. Hemati, Peter J. Seiler

Abstract: Quadratic systems with lossless quadratic terms arise in many applications, including models of atmosphere and incompressible fluid flows. Such systems have a trapping region if all trajectories eventually converge to and stay within a bounded set. Conditions for the existence and characterization of trapping regions have been established in prior works for boundedness analysis. However, prior sol… ▽ More Quadratic systems with lossless quadratic terms arise in many applications, including models of atmosphere and incompressible fluid flows. Such systems have a trapping region if all trajectories eventually converge to and stay within a bounded set. Conditions for the existence and characterization of trapping regions have been established in prior works for boundedness analysis. However, prior solutions have used non-convex optimization methods, resulting in conservative estimates. In this paper, we build on this prior work and provide a convex semidefinite programming condition for the existence of a trapping region. The condition allows precise verification or falsification of the existence of a trapping region. If a trapping region exists, then we provide a second semidefinite program to compute the least conservative trapping region in the form of a ball. Two low-dimensional systems are provided as examples to illustrate the results. A third high-dimensional example is also included to demonstrate that the computation required for the analysis can be scaled to systems of up to $\sim O(100)$ states. The proposed method provides a precise and computationally efficient numerical approach for computing trapping regions. We anticipate this work will benefit future studies on modeling and control of lossless quadratic dynamical systems. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2312.08949 [pdf, other]

A Guided Upsampling Network for Short Wave Infrared Images Using Graph Regularization

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Exploiting the infrared area of the spectrum for classification problems is getting increasingly popular, because many materials have characteristic absorption bands in this area. However, sensors in the short wave infrared (SWIR) area and even higher wavelengths have a very low spatial resolution in comparison to classical cameras that operate in the visible wavelength area. Thus, in this paper a… ▽ More Exploiting the infrared area of the spectrum for classification problems is getting increasingly popular, because many materials have characteristic absorption bands in this area. However, sensors in the short wave infrared (SWIR) area and even higher wavelengths have a very low spatial resolution in comparison to classical cameras that operate in the visible wavelength area. Thus, in this paper an upsampling method for SWIR images guided by a visible image is presented. For that, the proposed guided upsampling network (GUNet) uses a graph-regularized optimization problem based on learned affinities is presented. The evaluation is based on a novel synthetic near-field visible-SWIR stereo database. Different guided upsampling methods are evaluated, which shows an improvement of nearly 1 dB on this database for the proposed upsampling method in comparison to the second best guided upsampling network. Furthermore, a visual example of an upsampled SWIR image of a real-world scene is depicted for showing real-world applicability. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2312.08946 [pdf, other]

Color Agnostic Cross-Spectral Disparity Estimation

Authors: Frank Sippel, Nils Genser, Hannah Och, Jürgen Seiler, André Kaup

Abstract: Since camera modules become more and more affordable, multispectral camera arrays have found their way from special applications to the mass market, e.g., in automotive systems, smartphones, or drones. Due to multiple modalities, the registration of different viewpoints and the required cross-spectral disparity estimation is up to the present extremely challenging. To overcome this problem, we int… ▽ More Since camera modules become more and more affordable, multispectral camera arrays have found their way from special applications to the mass market, e.g., in automotive systems, smartphones, or drones. Due to multiple modalities, the registration of different viewpoints and the required cross-spectral disparity estimation is up to the present extremely challenging. To overcome this problem, we introduce a novel spectral image synthesis in combination with a color agnostic transform. Thus, any recently published stereo matching network can be turned to a cross-spectral disparity estimator. Our novel algorithm requires only RGB stereo data to train a cross-spectral disparity estimator and a generalization from artificial training data to camera-captured images is obtained. The theoretical examination of the novel color agnostic method is completed by an extensive evaluation compared to state of the art including self-recorded multispectral data and a reference implementation. The novel color agnostic disparity estimation improves cross-spectral as well as conventional color stereo matching by reducing the average end-point error by 41% for cross-spectral and by 22% for mono-modal content, respectively. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2309.05996 [pdf, other]

doi 10.1109/IWSSIP58668.2023.10180299

RGB-Guided Resolution Enhancement of IR Images

Authors: Marcel Trammer, Nils Genser, Jürgen Seiler

Abstract: This paper introduces a novel method for RGB-Guided Resolution Enhancement of infrared (IR) images called Guided IR Resolution Enhancement (GIRRE). In the area of single image super resolution (SISR) there exists a wide variety of algorithms like interpolation methods or neural networks to improve the spatial resolution of images. In contrast to SISR, even more information can be gathered on the r… ▽ More This paper introduces a novel method for RGB-Guided Resolution Enhancement of infrared (IR) images called Guided IR Resolution Enhancement (GIRRE). In the area of single image super resolution (SISR) there exists a wide variety of algorithms like interpolation methods or neural networks to improve the spatial resolution of images. In contrast to SISR, even more information can be gathered on the recorded scene when using multiple cameras. In our setup, we are dealing with multi image super resolution, especially with stereo super resolution. We consider a color camera and an IR camera. Current IR sensors have a very low resolution compared to color sensors so that recent color sensors take up 100 times more pixels than IR sensors. To this end, GIRRE increases the spatial resolution of the low-resolution IR image. After that, the upscaled image is filtered with the aid of the high-resolution color image. We show that our method achieves an average PSNR gain of 1.2 dB and at best up to 1.8 dB compared to state-of-the-art methods, which is visually noticeable. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2307.12864 [pdf, other]

Conditional Residual Coding: A Remedy for Bottleneck Problems in Conditional Inter Frame Coding

Authors: Fabian Brand, Jürgen Seiler, André Kaup

Abstract: Conditional coding is a new video coding paradigm enabled by neural-network-based compression. It can be shown that conditional coding is in theory better than the traditional residual coding, which is widely used in video compression standards like HEVC or VVC. However, on closer inspection, it becomes clear that conditional coders can suffer from information bottlenecks in the prediction path, i… ▽ More Conditional coding is a new video coding paradigm enabled by neural-network-based compression. It can be shown that conditional coding is in theory better than the traditional residual coding, which is widely used in video compression standards like HEVC or VVC. However, on closer inspection, it becomes clear that conditional coders can suffer from information bottlenecks in the prediction path, i.e., that due to the data processing inequality not all information from the prediction signal can be passed to the reconstructed signal, thereby impairing the coder performance. In this paper we propose the conditional residual coding concept, which we derive from information theoretical properties of the conditional coder. This coder significantly reduces the influence of bottlenecks, while maintaining the theoretical performance of the conditional coder. We provide a theoretical analysis of the coding paradigm and demonstrate the performance of the conditional residual coder in a practical example. We show that conditional residual coders alleviate the disadvantages of conditional coders while being able to maintain their advantages over residual coders. In the spectrum of residual and conditional coding, we can therefore consider them as ``the best from both worlds''. △ Less

Submitted 26 January, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 12 pages, 8 figures Accepted for Publication in TCSVT

arXiv:2306.15237 [pdf, other]

doi 10.1109/ICIP49359.2023.10222159

Cross Spectral Image Reconstruction Using a Deep Guided Neural Network

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Cross spectral camera arrays, where each camera records different spectral content, are becoming increasingly popular for RGB, multispectral and hyperspectral imaging, since they are capable of a high resolution in every dimension using off-the-shelf hardware. For these, it is necessary to build an image processing pipeline to calculate a consistent image data cube, i.e., it should look like as if… ▽ More Cross spectral camera arrays, where each camera records different spectral content, are becoming increasingly popular for RGB, multispectral and hyperspectral imaging, since they are capable of a high resolution in every dimension using off-the-shelf hardware. For these, it is necessary to build an image processing pipeline to calculate a consistent image data cube, i.e., it should look like as if every camera records the scene from the center camera. Since the cameras record the scene from a different angle, this pipeline needs a reconstruction component for pixels that are not visible to peripheral cameras. For that, a novel deep guided neural network (DGNet) is presented. Since only little cross spectral data is available for training, this neural network is highly regularized. Furthermore, a new data augmentation process is introduced to generate the cross spectral content. On synthetic and real multispectral camera array data, the proposed network outperforms the state of the art by up to 2 dB in terms of PSNR on average. Besides, DGNet also tops its best competitor in terms of SSIM as well as in runtime by a factor of nearly 12. Moreover, a qualitative evaluation reveals visually more appealing results for real camera array data. △ Less

Submitted 14 September, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Journal ref: 2023 IEEE International Conference on Image Processing (ICIP)

arXiv:2302.01594 [pdf, ps, other]

Analysis of mesh-based motion compensation in wavelet lifting of dynamical 3-D+t CT data

Authors: Wolfgang Schnurrer, Thomas Richter, Jürgen Seiler, André Kaup

Abstract: Factorized in the lifting structure, the wavelet transform can easily be extended by arbitrary compensation methods. Thereby, the transform can be adapted to displacements in the signal without losing the ability of perfect reconstruction. This leads to an improvement of scalability. In temporal direction of dynamic medical 3-D+t volumes from Computed Tomography, displacement is mainly given by ex… ▽ More Factorized in the lifting structure, the wavelet transform can easily be extended by arbitrary compensation methods. Thereby, the transform can be adapted to displacements in the signal without losing the ability of perfect reconstruction. This leads to an improvement of scalability. In temporal direction of dynamic medical 3-D+t volumes from Computed Tomography, displacement is mainly given by expansion and compression of tissue. We show that these smooth movements can be well compensated with a mesh-based method. We compare the properties of triangle and quadrilateral meshes. We also show that with a mesh-based compensation approach coding results are comparable to the common slice wise coding with JPEG 2000 while a scalable representation in temporal direction can be achieved. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Journal ref: IEEE 14th International Workshop on Multimedia Signal Processing (MMSP), Banff, AB, Canada, 2012, pp. 152-157

arXiv:2302.01014 [pdf, ps, other]

Compression of Dynamic Medical CT Data Using Motion Compensated Wavelet Lifting with Denoised Update

Authors: Daniela Lanz, Jürgen Seiler, Karina Jaskolka, André Kaup

Abstract: For the lossless compression of dynamic 3-D+t volumes as produced by medical devices like Computed Tomography, various coding schemes can be applied. This paper shows that 3-D subband coding outperforms lossless HEVC coding and additionally provides a scalable representation, which is often required in telemedicine applications. However, the resulting lowpass subband, which shall be used as a down… ▽ More For the lossless compression of dynamic 3-D+t volumes as produced by medical devices like Computed Tomography, various coding schemes can be applied. This paper shows that 3-D subband coding outperforms lossless HEVC coding and additionally provides a scalable representation, which is often required in telemedicine applications. However, the resulting lowpass subband, which shall be used as a downscaled representative of the whole original sequence, contains a lot of ghosting artifacts. This can be alleviated by incorporating motion compensation methods into the subband coder. This results in a high quality lowpass subband but also leads to a lower compression ratio. In order to cope with this, we introduce a new approach for improving the compression efficiency of compensated 3-D wavelet lifting by performing denoising in the update step. We are able to reduce the file size of the lowpass subband by up to 1.64\%, while the lowpass subband is still applicable for being used as a downscaled representative of the whole original sequence. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: Picture Coding Symposium (PCS), San Francisco, CA, USA, 2018, pp. 56-60

arXiv:2301.07551 [pdf, ps, other]

doi 10.1364/JOSAA.479552

Synthetic Hyperspectral Array Video Database with Applications to Cross-Spectral Reconstruction and Hyperspectral Video Coding

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: In this paper, a synthetic hyperspectral video database is introduced. Since it is impossible to record ground truth hyperspectral videos, this database offers the possibility to leverage the evaluation of algorithms in diverse applications. For all scenes, depth maps are provided as well to yield the position of a pixel in all spatial dimensions as well as the reflectance in spectral dimension. T… ▽ More In this paper, a synthetic hyperspectral video database is introduced. Since it is impossible to record ground truth hyperspectral videos, this database offers the possibility to leverage the evaluation of algorithms in diverse applications. For all scenes, depth maps are provided as well to yield the position of a pixel in all spatial dimensions as well as the reflectance in spectral dimension. Two novel algorithms for two different applications are proposed to prove the diversity of applications that can be addressed by this novel database. First, a cross-spectral image reconstruction algorithm is extended to exploit the temporal correlation between two consecutive frames. The evaluation using this hyperspectral database shows an increase in PSNR of up to 5.6 dB dependent on the scene. Second, a hyperspectral video coder is introduced which extends an existing hyperspectral image coder by exploiting temporal correlation. The evaluation shows rate savings of up to 10% depending on the scene. The novel hyperspectral video database and source code is available at https:// github.com/ FAU-LMS/ HyViD for use by the research community. △ Less

Submitted 20 February, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

Journal ref: J. Opt. Soc. Am. A 40, 479-491 (2023)

arXiv:2301.04840 [pdf, ps, other]

Centroid adapted frequency selective extrapolation for reconstruction of lost image areas

Authors: Wolfgang Schnurrer, Markus Jonscher, Jürgen Seiler, Thomas Richter, Michel Bätz, André Kaup

Abstract: Lost image areas with different size and arbitrary shape can occur in many scenarios such as error-prone communication, depth-based image rendering or motion compensated wavelet lifting. The goal of image reconstruction is to restore these lost image areas as close to the original as possible. Frequency selective extrapolation is a block-based method for efficiently reconstructing lost areas in im… ▽ More Lost image areas with different size and arbitrary shape can occur in many scenarios such as error-prone communication, depth-based image rendering or motion compensated wavelet lifting. The goal of image reconstruction is to restore these lost image areas as close to the original as possible. Frequency selective extrapolation is a block-based method for efficiently reconstructing lost areas in images. So far, the actual shape of the lost area is not considered directly. We propose a centroid adaption to enhance the existing frequency selective extrapolation algorithm that takes the shape of lost areas into account. To enlarge the test set for evaluation we further propose a method to generate arbitrarily shaped lost areas. On our large test set, we obtain an average reconstruction gain of 1.29 dB. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Journal ref: Visual Communications and Image Processing (VCIP), Singapore, 2015, pp. 1-4

arXiv:2301.04351 [pdf, ps, other]

Analysis of displacement compensation methods for wavelet lifting of medical 3-D thorax CT volume data

Authors: Wolfgang Schnurrer, Jürgen Seiler, Eugen Wige, André Kaup

Abstract: A huge advantage of the wavelet transform in image and video compression is its scalability. Wavelet-based coding of medical computed tomography (CT) data becomes more and more popular. While much effort has been spent on encoding of the wavelet coefficients, the extension of the transform by a compensation method as in video coding has not gained much attention so far. We will analyze two compens… ▽ More A huge advantage of the wavelet transform in image and video compression is its scalability. Wavelet-based coding of medical computed tomography (CT) data becomes more and more popular. While much effort has been spent on encoding of the wavelet coefficients, the extension of the transform by a compensation method as in video coding has not gained much attention so far. We will analyze two compensation methods for medical CT data and compare the characteristics of the displacement compensated wavelet transform with video data. We will show that for thorax CT data the transform coding gain can be improved by a factor of 2 and the quality of the lowpass band can be improved by 8 dB in terms of PSNR compared to the original transform without compensation. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Journal ref: Visual Communications and Image Processing, San Diego, CA, USA, 2012, pp. 1-6

arXiv:2301.04349 [pdf, ps, other]

Efficient lossless coding of highpass bands from block-based motion compensated wavelet lifting using JPEG 2000

Authors: Wolfgang Schnurrer, Tobias Tröger, Thomas Richter, Jürgen Seiler, André Kaup

Abstract: Lossless image coding is a crucial task especially in the medical area, e.g., for volumes from Computed Tomography or Magnetic Resonance Tomography. Besides lossless coding, compensated wavelet lifting offers a scalable representation of such huge volumes. While compensation methods increase the details in the lowpass band, they also vary the characteristics of the wavelet coefficients, so an adap… ▽ More Lossless image coding is a crucial task especially in the medical area, e.g., for volumes from Computed Tomography or Magnetic Resonance Tomography. Besides lossless coding, compensated wavelet lifting offers a scalable representation of such huge volumes. While compensation methods increase the details in the lowpass band, they also vary the characteristics of the wavelet coefficients, so an adaption of the coefficient coder should be considered. We propose a simple invertible extension for JPEG 2000 that can reduce the filesize for lossless coding of the highpass band by 0.8% on average with peak rate saving of 1.1%. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Journal ref: IEEE Visual Communications and Image Processing Conference, Valletta, Malta, 2014, pp. 398-401

arXiv:2301.04348 [pdf, ps, other]

On the influence of clipping in lossless predictive and wavelet coding of noisy images

Authors: Wolfgang Schnurrer, Jürgen Seiler, Michael Schöberl, André Kaup

Abstract: Especially in lossless image coding the obtainable compression ratio strongly depends on the amount of noise included in the data as all noise has to be coded, too. Different approaches exist for lossless image coding. We analyze the compression performance of three kinds of approaches, namely direct entropy, predictive and wavelet-based coding. The results from our theoretical model are compared… ▽ More Especially in lossless image coding the obtainable compression ratio strongly depends on the amount of noise included in the data as all noise has to be coded, too. Different approaches exist for lossless image coding. We analyze the compression performance of three kinds of approaches, namely direct entropy, predictive and wavelet-based coding. The results from our theoretical model are compared to simulated results from standard algorithms that base on the three approaches. As long as no clipping occurs with increasing noise more bits are needed for lossless compression. We will show that for very noisy signals it is more advantageous to directly use an entropy coder without advanced preprocessing steps. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Journal ref: Picture Coding Symposium, Krakow, Poland, 2012, pp. 185-188

arXiv:2212.04330 [pdf, ps, other]

Improving block-based compensated wavelet lifting by reconstructing unconnected pixels

Authors: Wolfgang Schnurrer, Jürgen Seiler, André Kaup

Abstract: This paper presents a new approach for improving the visual quality of the lowpass band of a compensated wavelet transform. A high quality of the lowpass band is very important as it can then be used as a downscaled version of the original signal. To adapt the transform to the signal, compensation methods can be implemented directly into the transform. We propose an improved inversion of the block… ▽ More This paper presents a new approach for improving the visual quality of the lowpass band of a compensated wavelet transform. A high quality of the lowpass band is very important as it can then be used as a downscaled version of the original signal. To adapt the transform to the signal, compensation methods can be implemented directly into the transform. We propose an improved inversion of the block-based motion compensation by processing unconnected pixels by a reconstruction method. We obtain a better subjective visual quality while furthermore saving up to 2.6% of bits for lossless coding. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Journal ref: International Symposium on Signals, Circuits and Systems, 2013, pp. 1-4

arXiv:2212.04324 [pdf, ps, other]

3-D mesh compensated wavelet lifting for 3-D+t medical CT data

Authors: Wolfgang Schnurrer, Thomas Richter, Jürgen Seiler, Christian Herglotz, André Kaup

Abstract: For scalable coding, a high quality of the lowpass band of a wavelet transform is crucial when it is used as a downscaled version of the original signal. However, blur and motion can lead to disturbing artifacts. By incorporating feasible compensation methods directly into the wavelet transform, the quality of the lowpass band can be improved. The displacement in dynamic medical 3-D+t volumes from… ▽ More For scalable coding, a high quality of the lowpass band of a wavelet transform is crucial when it is used as a downscaled version of the original signal. However, blur and motion can lead to disturbing artifacts. By incorporating feasible compensation methods directly into the wavelet transform, the quality of the lowpass band can be improved. The displacement in dynamic medical 3-D+t volumes from Computed Tomography is mainly given by expansion and compression of tissue over time and can be modeled well by mesh-based methods. We extend a 2-D mesh-based compensation method to three dimensions to obtain a volume compensation method that can additionally compensate deforming displacements in the third dimension. We show that a 3-D mesh can obtain a higher quality of the lowpass band by 0.28 dB with less than 40% of the model parameters of a comparable 2-D mesh. Results from lossless coding with JPEG 2000 3D and SPECK3D show that the compensated subbands using a 3-D mesh need about 6% less data compared to using a 2-D mesh. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Journal ref: IEEE International Conference on Image Processing (ICIP), 2014, pp. 3631-3635

arXiv:2211.16995 [pdf, ps, other]

A hybrid motion estimation technique for fisheye video sequences based on equisolid re-projection

Authors: Andrea Eichenseer, Michel Bätz, Jürgen Seiler, André Kaup

Abstract: Capturing large fields of view with only one camera is an important aspect in surveillance and automotive applications, but the wide-angle fisheye imagery thus obtained exhibits very special characteristics that may not be very well suited for typical image and video processing methods such as motion estimation. This paper introduces a motion estimation method that adapts to the typical radial cha… ▽ More Capturing large fields of view with only one camera is an important aspect in surveillance and automotive applications, but the wide-angle fisheye imagery thus obtained exhibits very special characteristics that may not be very well suited for typical image and video processing methods such as motion estimation. This paper introduces a motion estimation method that adapts to the typical radial characteristics of fisheye video sequences by making use of an equisolid re-projection after moving part of the motion vector search into the perspective domain via a corresponding back-projection. By combining this approach with conventional translational motion estimation and compensation, average gains in luminance PSNR of up to 1.14 dB are achieved for synthetic fish-eye sequences and up to 0.96 dB for real-world data. Maximum gains for selected frame pairs amount to 2.40 dB and 1.39 dB for synthetic and real-world data, respectively. △ Less

Submitted 30 November, 2022; originally announced November 2022.

Journal ref: IEEE International Conference on Image Processing (ICIP), 2015, pp. 3565-3569

arXiv:2211.11472 [pdf, ps, other]

Temporal error concealment for fisheye video sequences based on equisolid re-projection

Authors: Andrea Eichenseer, Jürgen Seiler, Michel Bätz, André Kaup

Abstract: Wide-angle video sequences obtained by fisheye cameras exhibit characteristics that may not very well comply with standard image and video processing techniques such as error concealment. This paper introduces a temporal error concealment technique designed for the inherent characteristics of equisolid fisheye video sequences by applying a re-projection into the equisolid domain after conducting p… ▽ More Wide-angle video sequences obtained by fisheye cameras exhibit characteristics that may not very well comply with standard image and video processing techniques such as error concealment. This paper introduces a temporal error concealment technique designed for the inherent characteristics of equisolid fisheye video sequences by applying a re-projection into the equisolid domain after conducting part of the error concealment in the perspective domain. Combining this technique with conventional decoder motion vector estimation achieves average gains of 0.71 dB compared against pure decoder motion vector estimation for the test sequences used. Maximum gains amount to up to 2.04 dB for selected frames. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Journal ref: European Signal Processing Conference (EUSIPCO), 2015, pp. 1611-1615

arXiv:2210.07737 [pdf, other]

On Benefits and Challenges of Conditional Interframe Video Coding in Light of Information Theory

Authors: Fabian Brand, Jürgen Seiler, André Kaup

Abstract: The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of transmitting the residual between the original frame and the predicted frame (often obtained by motion compensation), the current frame is transmitted under the condition of knowing the p… ▽ More The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of transmitting the residual between the original frame and the predicted frame (often obtained by motion compensation), the current frame is transmitted under the condition of knowing the prediction signal. In practice, conditional coding can be straightforwardly implemented using a conditional autoencoder, which has also shown good results in recent works. In this paper, we provide an information theoretical analysis of conditional coding for inter frames and show in which cases gains compared to traditional residual coding can be expected. We also show the effect of information bottlenecks which can occur in practical video coders in the prediction signal path due to the network structure, as a consequence of the data-processing theorem or due to quantization. We demonstrate that conditional coding has theoretical benefits over residual coding but that there are cases in which the benefits are quickly canceled by small information bottlenecks of the prediction signal. △ Less

Submitted 13 December, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures, accepted to be presented at PCS 2022. arXiv admin note: text overlap with arXiv:2112.08011 Update Note: Fixed notation in Eq. 10, no changes otherwise

arXiv:2210.02386 [pdf, other]

doi 10.1109/ICIP46576.2022.9897339

Domain Adaptation for Unknown Image Distortions in Instance Segmentation

Authors: Maximiliane Gruber, Fabian Brand, Alina Mosebach, Jürgen Seiler, André Kaup

Abstract: Data-driven techniques for machine vision heavily depend on the training data to sufficiently resemble the data occurring during test and application. However, in practice unknown distortion can lead to a domain gap between training and test data, impeding the performance of a machine vision system. With our proposed approach this domain gap can be closed by unpaired learning of the pristine-to-di… ▽ More Data-driven techniques for machine vision heavily depend on the training data to sufficiently resemble the data occurring during test and application. However, in practice unknown distortion can lead to a domain gap between training and test data, impeding the performance of a machine vision system. With our proposed approach this domain gap can be closed by unpaired learning of the pristine-to-distortion mapping function of the unknown distortion. This learned mapping function may then be used to emulate the unknown distortion in the training data. Employing a fixed setup, our approach is independent from prior knowledge of the distortion. Within this work, we show that we can effectively learn unknown distortions at arbitrary strengths. When applying our approach to instance segmentation in an autonomous driving scenario, we achieve results comparable to an oracle with knowledge of the distortion. An average gain in mean Average Precision (mAP) of up to 0.19 can be achieved. △ Less

Submitted 24 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 5 pages, 5 figures, accepted for International Conference on Image Processing (ICIP) 2022

arXiv:2209.14970 [pdf, other]

doi 10.1109/ISSCS52333.2021.9497438

3D Rendering Framework for Data Augmentation in Optical Character Recognition

Authors: Andreas Spruck, Maximiliane Hawesch, Anatol Maier, Christian Riess, Jürgen Seiler, André Kaup

Abstract: In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the availa… ▽ More In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points in terms of Word Error Rate (WER) are achieved on the subset. Especially the recognition of challenging text lines can be improved. The CER may be decreased by up to 14.92 percentage points and the WER by up to 18.19 percentage points for this class. Moreover, we are able to achieve smaller error rates when training on the 15% subset augmented with the proposed method than on the original non-augmented full dataset. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: IEEE International Symposium on Signals, Circuits and Systems (ISSCS), 1-4, July 2021

arXiv:2209.14448 [pdf, other]

Synthesizing Annotated Image and Video Data Using a Rendering-Based Pipeline for Improved License Plate Recognition

Authors: Andreas Spruck, Maximilane Gruber, Anatol Maier, Denise Moussa, Jürgen Seiler, Christian Riess, André Kaup

Abstract: An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating… ▽ More An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating and annotating synthetic and partly-real image and video data in a fully automatic procedure. Moreover, the pipeline can aid the acquisition of real data. The proposed pipeline is based on a rendering process. This process generates synthetic data. Partly-real data bring the synthetic sequences closer to reality by incorporating real cameras during the acquisition process. The benefits of the proposed data generation pipeline, especially for machine learning scenarios with limited available training data, are demonstrated by an extensive experimental validation in the context of automatic license plate recognition. The experiments demonstrate a significant reduction of the character error rate and miss rate from 73.74% and 100% to 14.11% and 41.27% respectively, compared to an OCR algorithm trained on a real data set solely. These improvements are achieved by training the algorithm on synthesized data solely. When additionally incorporating real data, the error rates can be decreased further. Thereby, the character error rate and miss rate can be reduced to 11.90% and 39.88% respectively. All data used during the experiments as well as the proposed rendering-based pipeline for the automated data generation is made publicly available under (URL will be revealed upon publication). △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: submitted to IEEE Transactions on Intelligent Transportation Systems

arXiv:2209.13648 [pdf, other]

doi 10.1109/MetroInd4.0IoT48571.2020.9138205

Quality Assurance of Weld Seams Using Laser Triangulation Imaging and Deep Neural Networks

Authors: Andreas Spruck, Jürgen Seiler, Michael Roll, Thomas Dudziak, Jürgen Eckstein, André Kaup

Abstract: In this paper, a novel optical inspection system is presented that is directly suitable for Industry 4.0 and the implementation on IoT-devices controlling the manufacturing process. The proposed system is capable of distinguishing between erroneous and faultless weld seams, without explicitly defining measurement criteria . The developed system uses a deep neural network based classifier for the c… ▽ More In this paper, a novel optical inspection system is presented that is directly suitable for Industry 4.0 and the implementation on IoT-devices controlling the manufacturing process. The proposed system is capable of distinguishing between erroneous and faultless weld seams, without explicitly defining measurement criteria . The developed system uses a deep neural network based classifier for the class prediction. A weld seam dataset was acquired and labelled by an expert committee. Thereby, the visual impression and assessment of the experts is learnt accurately. In the scope of this paper laser triangulation images are used. Due to their special characteristics, the images must be pre-processed to enable the use of a deep neural network. Furthermore, two different approaches are investigated to enable an inspection of differently sized weld seams. Both approaches yield very high classification accuracies of up to 96.88\%, which is competitive to current state of the art optical inspection systems. Moreover, the proposed system enables a higher flexibility and an increased robustness towards systematic errors and environmental conditions due to its ability to generalize. A further benefit of the proposed system is the fast decision process enabling the usage directly within the manufacturing line. Furthermore, standard hardware is used throughout the whole presented work, keeping the roll-out costs for the proposed system as low as possible. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Journal ref: IEEE International Workshop on Metrology for Industry 4.0 and IoT (MetroInd4.0 & IoT), 407-412, June 2020

arXiv:2209.07894 [pdf, other]

doi 10.1109/MMSP55362.2022.9949059

Optimal Filter Selection for Multispectral Object Classification Using Fast Binary Search

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: When designing multispectral imaging systems for classifying different spectra it is necessary to choose a small number of filters from a set with several hundred different ones. Tackling this problem by full search leads to a tremendous number of possibilities to check and is NP-hard. In this paper we introduce a novel fast binary search for optimal filter selection that guarantees a minimum dist… ▽ More When designing multispectral imaging systems for classifying different spectra it is necessary to choose a small number of filters from a set with several hundred different ones. Tackling this problem by full search leads to a tremendous number of possibilities to check and is NP-hard. In this paper we introduce a novel fast binary search for optimal filter selection that guarantees a minimum distance metric between the different spectra to classify. In our experiments, this procedure reaches the same optimal solution as with full search at much lower complexity. The desired number of filters influences the full search in factorial order while the fast binary search stays constant. Thus, fast binary search allows to find the optimal solution of all combinations in an adequate amount of time and avoids prevailing heuristics. Moreover, our fast binary search algorithm outperforms other filter selection techniques in terms of misclassified spectra in a real-world classification problem. △ Less

Submitted 18 January, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Journal ref: IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 2022

arXiv:2209.07891 [pdf, other]

doi 10.1109/MMSP53017.2021.9733655

Hyperspectral Image Reconstruction from Multispectral Images Using Non-Local Filtering

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Using light spectra is an essential element in many applications, for example, in material classification. Often this information is acquired by using a hyperspectral camera. Unfortunately, these cameras have some major disadvantages like not being able to record videos. Therefore, multispectral cameras with wide-band filters are used, which are much cheaper and are often able to capture videos. H… ▽ More Using light spectra is an essential element in many applications, for example, in material classification. Often this information is acquired by using a hyperspectral camera. Unfortunately, these cameras have some major disadvantages like not being able to record videos. Therefore, multispectral cameras with wide-band filters are used, which are much cheaper and are often able to capture videos. However, using multispectral cameras requires an additional reconstruction step to yield spectral information. Usually, this reconstruction step has to be done in the presence of imaging noise, which degrades the reconstructed spectra severely. Typically, same or similar pixels are found across the image with the advantage of having independent noise. In contrast to state-of-the-art spectral reconstruction methods which only exploit neighboring pixels by block-based processing, this paper introduces non-local filtering in spectral reconstruction. First, a block-matching procedure finds similar non-local multispectral blocks. Thereafter, the hyperspectral pixels are reconstructed by filtering the matched multispectral pixels collaboratively using a reconstruction Wiener filter. The proposed novel procedure even works under very strong noise. The method is able to lower the spectral angle up to 18% and increase the peak signal-to-noise-ratio up to 1.1dB in noisy scenarios compared to state-of-the-art methods. Moreover, the visual results are much more appealing. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Journal ref: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021, pp. 1-6

arXiv:2209.07890 [pdf, other]

doi 10.1109/VCIP53242.2021.9675421

Spatio-spectral Image Reconstruction Using Non-local Filtering

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: In many image processing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in one color channel are lost. Nearly all modern applications in image processing and transmission use at least three color channels, some of the applications employ even more bands, for e… ▽ More In many image processing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in one color channel are lost. Nearly all modern applications in image processing and transmission use at least three color channels, some of the applications employ even more bands, for example in the infrared and ultraviolet area of the light spectrum. Typically, only some pixels and blocks in a subset of color channels are distorted. Thus, other channels can be used to reconstruct the missing pixels, which is called spatio-spectral reconstruction. Current state-of-the-art methods purely rely on the local neighborhood, which works well for homogeneous regions. However, in high-frequency regions like edges or textures, these methods fail to properly model the relationship between color bands. Hence, this paper introduces non-local filtering for building a linear regression model that describes the inter-band relationship and is used to reconstruct the missing pixels. Our novel method is able to increase the PSNR on average by 2 dB and yields visually much more appealing images in high-frequency regions. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Journal ref: 2021 International Conference on Visual Communications and Image Processing (VCIP), 2021, pp. 1-5

arXiv:2209.07889 [pdf, other]

doi 10.1364/JOSAA.400485

Structure-Preserving Spectral Reflectance Estimation using Guided Filtering

Authors: Frank Sippel, Jürgen Seiler, Nils Genser, André Kaup

Abstract: Light spectra are a very important source of information for diverse classification problems, e.g., for discrimination of materials. To lower the cost for acquiring this information, multispectral cameras are used. Several techniques exist for estimating light spectra out of multispectral images by exploiting properties about the spectrum. Unfortunately, especially when capturing multispectral vid… ▽ More Light spectra are a very important source of information for diverse classification problems, e.g., for discrimination of materials. To lower the cost for acquiring this information, multispectral cameras are used. Several techniques exist for estimating light spectra out of multispectral images by exploiting properties about the spectrum. Unfortunately, especially when capturing multispectral videos, the images are heavily affected by noise due to the nature of limited exposure times in videos. Therefore, models that explicitly try to lower the influence of noise on the reconstructed spectrum are highly desirable. Hence, a novel reconstruction algorithm is presented. This novel estimation method is based on the guided filtering technique which preserves basic structures, while using spatial information to reduce the influence of noise. The evaluation based on spectra of natural images reveals that this new technique yields better quantitative and subjective results in noisy scenarios than other state-of-the-art spatial reconstruction methods. Specifically, the proposed algorithm lowers the mean squared error and the spectral angle up to 46% and 35% in noisy scenarios, respectively. Furthermore, it is shown that the proposed reconstruction technique works out-of-the-box and does not need any calibration or training by reconstructing spectra from a real-world multispectral camera with nine channels. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Journal ref: J. Opt. Soc. Am. A 37, 1695-1710 (2020)

arXiv:2209.07231 [pdf, other]

doi 10.1109/ICASSP.2019.868321

Motion-Adapted Three-Dimensional Frequency Selective Extrapolation

Authors: Andreas Spruck, Markus Jonscher, JÜrgen Seiler, André Kaup

Abstract: It has been shown, that high resolution images can be acquired using a low resolution sensor with non-regular sampling. Therefore, post-processing is necessary. In terms of video data, not only the spatial neighborhood can be used to assist the reconstruction, but also the temporal neighborhood. A popular and well performing algorithm for this kind of problem is the three-dimensional frequency sel… ▽ More It has been shown, that high resolution images can be acquired using a low resolution sensor with non-regular sampling. Therefore, post-processing is necessary. In terms of video data, not only the spatial neighborhood can be used to assist the reconstruction, but also the temporal neighborhood. A popular and well performing algorithm for this kind of problem is the three-dimensional frequency selective extrapolation (3D-FSE) for which a motion adapted version is introduced in this paper. This proposed extension solves the problem of changing content within the area considered by the 3D-FSE, which is caused by motion within the sequence. Because of this motion, it may happen that regions are emphasized during the reconstruction that are not present in the original signal within the considered area. By that, false content is introduced into the extrapolated sequence, which affects the resulting image quality negatively. The novel extension, presented in the following, incorporates motion data of the sequence in order to adapt the algorithm accordingly, and compensates changing content, resulting in gains of up to 1.75 dB compared to the existing 3D-FSE. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2392-2396, May 2019

arXiv:2207.09737 [pdf, ps, other]

Optimized processing order for 3D hole filling in video sequences using frequency selective extrapolation

Authors: Jürgen Seiler, Susanne Schöll, Wolfgang Schnurrer, André Kaup

Abstract: A problem often arising in video communication is the reconstruction of missing or distorted areas in a video sequence. Such holes of unavailable pixels may be caused for example by transmission errors of coded video data or undesired objects like logos. In order to close the holes given neighboring available content, a signal extrapolation has to be performed. The best quality can be achieved, if… ▽ More A problem often arising in video communication is the reconstruction of missing or distorted areas in a video sequence. Such holes of unavailable pixels may be caused for example by transmission errors of coded video data or undesired objects like logos. In order to close the holes given neighboring available content, a signal extrapolation has to be performed. The best quality can be achieved, if spatial as well as temporal information is used for the reconstruction. However, the question always is in which order to process the extrapolation to obtain the best result. In this paper, an optimized processing order is introduced for improving the extrapolation quality of Three-dimensional Frequency Selective Extrapolation. Using the proposed optimized order, holes in video sequences can be closed from the outer margin to the center, leading to a higher reconstruction quality, and visually noticeable gains of more than 0.5 dB PSNR are possible. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Journal ref: Picture Coding Symposium, 2016, pp. 1-5

arXiv:2207.09729 [pdf, ps, other]

Spatio-temporal prediction in video coding by non-local means refined motion compensation

Authors: Jürgen Seiler, Thomas Richter, André Kaup

Abstract: The prediction step is a very important part of hybrid video codecs. In this contribution, a novel spatio-temporal prediction algorithm is introduced. For this, the prediction is carried out in two steps. Firstly, a preliminary temporal prediction is conducted by motion compensation. Afterwards, spatial refinement is carried out for incorporating spatial redundancies from already decoded neighbori… ▽ More The prediction step is a very important part of hybrid video codecs. In this contribution, a novel spatio-temporal prediction algorithm is introduced. For this, the prediction is carried out in two steps. Firstly, a preliminary temporal prediction is conducted by motion compensation. Afterwards, spatial refinement is carried out for incorporating spatial redundancies from already decoded neighboring blocks. Thereby, the spatial refinement is achieved by applying Non-Local Means de-noising to the union of the motion compensated block and the already decoded blocks. Including the spatial refinement into H.264/AVC, a rate reduction of up to 14 % or respectively a gain of up to 0.7 dB PSNR compared to unrefined motion compensated prediction can be achieved. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Journal ref: Picture Coding Symposium, 2010, pp. 318-321

arXiv:2207.09727 [pdf, ps, other]

Spatio-temporal prediction in video coding by best approximation

Authors: Jürgen Seiler, Haricharan Lakshman, André Kaup

Abstract: Within the scope of this contribution we propose a novel efficient spatio-temporal prediction algorithm for video coding. The algorithm operates in two stages. First, motion compensation is performed on the block to be predicted in order to exploit temporal correlations. Afterwards, in order to exploit spatial correlations, this preliminary estimate is spatially refined by forming a joint model of… ▽ More Within the scope of this contribution we propose a novel efficient spatio-temporal prediction algorithm for video coding. The algorithm operates in two stages. First, motion compensation is performed on the block to be predicted in order to exploit temporal correlations. Afterwards, in order to exploit spatial correlations, this preliminary estimate is spatially refined by forming a joint model of the motion compensated block and spatially adjacent already decoded blocks. Compared to an earlier refinement algorithm, the novel one only needs very little iteration, leading to a speedup of factor 17. The implementation of this new algorithm into the H.264/AVC leads to a maximum reduction in data rate of up to nearly 13% for the considered sequences. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Journal ref: Picture Coding Symposium, 2009, pp. 1-4

arXiv:2207.09724 [pdf, ps, other]

Orthogonality Deficiency Compensation for Improved Frequency Selective Image Extrapolation

Authors: Jürgen Seiler, Katrin Meisinger, André Kaup

Abstract: This paper describes a very efficient algorithm for image signal extrapolation. It can be used for various applications in image and video communication, e.g. the concealment of data corrupted by transmission errors or prediction in video coding. The extrapolation is performed on a limited number of known samples and extends the signal beyond these samples. Therefore the signal from the known samp… ▽ More This paper describes a very efficient algorithm for image signal extrapolation. It can be used for various applications in image and video communication, e.g. the concealment of data corrupted by transmission errors or prediction in video coding. The extrapolation is performed on a limited number of known samples and extends the signal beyond these samples. Therefore the signal from the known samples is iteratively projected onto different basis functions in order to generate a model of the signal. As the basis functions are not orthogonal with respect to the area of the known samples we propose a new extension, the orthogonality deficiency compensation, to cope with the non-orthogonality. Using this extension, very good extrapolation results for structured as well as for smooth areas are achievable. This algorithm improves PSNR up to 2 dB and gives a better visual quality for concealment of block losses compared to extrapolation algorithms existent so far. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Journal ref: Picture Coding Symposium, 2007

arXiv:2207.06797 [pdf, ps, other]

Adaptive frequency prior for frequency selective reconstruction of images from non-regular subsampling

Authors: Jürgen Seiler, André Kaup

Abstract: Image signals typically are defined on a rectangular two-dimensional grid. However, there exist scenarios where this is not fulfilled and where the image information only is available for a non-regular subset of pixel position. For processing, transmitting or displaying such an image signal, a re-sampling to a regular grid is required. Recently, Frequency Selective Reconstruction (FSR) has been pr… ▽ More Image signals typically are defined on a rectangular two-dimensional grid. However, there exist scenarios where this is not fulfilled and where the image information only is available for a non-regular subset of pixel position. For processing, transmitting or displaying such an image signal, a re-sampling to a regular grid is required. Recently, Frequency Selective Reconstruction (FSR) has been proposed as a very effective sparsity-based algorithm for solving this under-determined problem. For this, FSR iteratively generates a model of the signal in the Fourier-domain. In this context, a fixed frequency prior inspired by the optical transfer function is used for favoring low-frequency content. However, this fixed prior is often too strict and may lead to a reduced reconstruction quality. To resolve this weakness, this paper proposes an adaptive frequency prior which takes the local density of the available samples into account. The proposed adaptive prior allows for a very high reconstruction quality, yielding gains of up to 0.6 dB PSNR over the fixed prior, independently of the density of the available samples. Compared to other state-of-the-art algorithms, visually noticeable gains of several dB are possible. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Journal ref: IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 2016, pp. 1-6

arXiv:2207.06795 [pdf, ps, other]

Multiple Selection Extrapolation for Improved Spatial Error Concealment

Authors: Jürgen Seiler, André Kaup

Abstract: This contribution introduces a novel signal extrapolation algorithm and its application to image error concealment. The signal extrapolation is carried out by iteratively generating a model of the signal suffering from distortion. Thereby, the model results from a weighted superposition of two-dimensional basis functions whereas in every iteration step a set of these is selected and the approximat… ▽ More This contribution introduces a novel signal extrapolation algorithm and its application to image error concealment. The signal extrapolation is carried out by iteratively generating a model of the signal suffering from distortion. Thereby, the model results from a weighted superposition of two-dimensional basis functions whereas in every iteration step a set of these is selected and the approximation residual is projected onto the subspace they span. The algorithm is an improvement to the Frequency Selective Extrapolation that has proven to be an effective method for concealing lost or distorted image regions. Compared to this algorithm, the novel algorithm is able to reduce the processing time by a factor larger than three, by still preserving the very high extrapolation quality. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Journal ref: 2009 IEEE International Workshop on Multimedia Signal Processing, 2009, pp. 1-6

arXiv:2207.06794 [pdf, ps, other]

Adaptive joint spatio-temporal error concealment for video communication

Authors: Jürgen Seiler, André Kaup

Abstract: In the past years, video communication has found its application in an increasing number of environments. Unfortunately, some of them are error-prone and the risk of block losses caused by transmission errors is ubiquitous. To reduce the effects of these block losses, a new spatio-temporal error concealment algorithm is presented. The algorithm uses spatial as well as temporal information for extr… ▽ More In the past years, video communication has found its application in an increasing number of environments. Unfortunately, some of them are error-prone and the risk of block losses caused by transmission errors is ubiquitous. To reduce the effects of these block losses, a new spatio-temporal error concealment algorithm is presented. The algorithm uses spatial as well as temporal information for extrapolating the signal into the lost areas. The extrapolation is carried out in two steps, first a preliminary temporal extrapolation is performed which then is used to generate a model of the original signal, using the spatial neighborhood of the lost block. By applying the spatial refinement a significantly higher concealment quality can be achieved resulting in a gain of up to 5.2 dB in PSNR compared to the unrefined underlying pure temporal extrapolation. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Journal ref: IEEE 10th Workshop on Multimedia Signal Processing, 2008, pp. 229-234

arXiv:2207.03774 [pdf, ps, other]

Spatio-temporal error concealment in video by denoised temporal extrapolation refinement

Authors: Jürgen Seiler, Michael Schöberl, André Kaup

Abstract: In video communication, the concealment of distortions caused by transmission errors is important for allowing for a pleasant visual quality and for reducing error propagation. In this article, Denoised Temporal Extrapolation Refinement is introduced as a novel spatiotemporal error concealment algorithm. The algorithm operates in two steps. First, temporal error concealment is used for obtaining a… ▽ More In video communication, the concealment of distortions caused by transmission errors is important for allowing for a pleasant visual quality and for reducing error propagation. In this article, Denoised Temporal Extrapolation Refinement is introduced as a novel spatiotemporal error concealment algorithm. The algorithm operates in two steps. First, temporal error concealment is used for obtaining an initial estimate. Afterwards, a spatial denoising algorithm is used for reducing the imperfectness of the temporal extrapolation. For this, Non-Local Means denoising is used which is extended by a spiral scan processing order and is improved by an adaptation step for taking the preliminary temporal extrapolation into account. In doing so, a spatio-temporal error concealment results. By making use of the refinement, a visually noticeable average gain of 1 dB over pure temporal error concealment is possible. With this, the algorithm also is able to clearly outperform other spatio-temporal error concealment algorithms. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Journal ref: IEEE International Conference on Image Processing, 2013, pp. 1613-1616

arXiv:2207.03770 [pdf, ps, other]

Content-Adaptive Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Communication

Authors: Jürgen Seiler, André Kaup

Abstract: If digital video data is transmitted over unreliable channels such as the internet or wireless terminals, the risk of severe image distortion due to transmission errors is ubiquitous. To cope with this, error concealment can be applied on the distorted data at the receiver. In this contribution we propose a novel spatio-temporal error concealment algorithm, the Content-Adaptive Motion Compensated… ▽ More If digital video data is transmitted over unreliable channels such as the internet or wireless terminals, the risk of severe image distortion due to transmission errors is ubiquitous. To cope with this, error concealment can be applied on the distorted data at the receiver. In this contribution we propose a novel spatio-temporal error concealment algorithm, the Content-Adaptive Motion Compensated Frequency Selective Extrapolation. The algorithm operates in two stages, whereas at first the motion in a distorted sequence is estimated. After that, a model of the signal is generated for concealing the distortion. The novel algorithm is based on an already existent error concealment algorithm. But by adapting the model generation to the content of a sequence, the novel algorithm is able to exploit the remaining information, which is still available in the distorted sequence, more effectively compared to the original algorithm. In doing so, a visually noticeable gain of up to 0.51 dB PSNR compared to the underlying algorithm and more than 3 dB compared to other error concealment algorithms can be achieved. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Journal ref: IEEE International Conference on Image Processing, 2010, pp. 469-472

arXiv:2207.03766 [pdf, ps, other]

Spatio-temporal prediction in video coding by spatially refined motion compensation

Authors: Jürgen Seiler, André Kaup

Abstract: The purpose of this contribution is to introduce a new method of signal prediction in video coding. Unlike most existent prediction methods that either use temporal or use spatial correlations to generate the prediction signal, the proposed method uses spatial and temporal correlations at the same time. The spatio-temporal prediction is obtained by first performing motion compensation for a macrob… ▽ More The purpose of this contribution is to introduce a new method of signal prediction in video coding. Unlike most existent prediction methods that either use temporal or use spatial correlations to generate the prediction signal, the proposed method uses spatial and temporal correlations at the same time. The spatio-temporal prediction is obtained by first performing motion compensation for a macroblock, followed by a refinement step that pays attention to the correlations between the macroblock and its surroundings. At the decoder, the refinement step can be performed in the same manner, thus no additional side information has to be transmitted. Implementation of the spatial refinement step into the H.264/AVC video codec leads to reduction in data rate of up to nearly 15% and increase in PSNR of up to 0.75 dB, compared to pure motion compensated prediction. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Journal ref: IEEE International Conference on Image Processing, 2008, pp. 2788-2791

arXiv:2207.01210 [pdf, ps, other]

Reusing the H.264/AVC deblocking filter for efficient spatio-temporal prediction in video coding

Authors: Jürgen Seiler, André Kaup

Abstract: The prediction step is a very important part of hybrid video codecs for effectively compressing video sequences. While existing video codecs predict either in temporal or in spatial direction only, the compression efficiency can be increased by a combined spatio-temporal prediction. In this paper we propose an algorithm for reusing the H.264/AVC deblocking filter for spatio-temporal prediction. Re… ▽ More The prediction step is a very important part of hybrid video codecs for effectively compressing video sequences. While existing video codecs predict either in temporal or in spatial direction only, the compression efficiency can be increased by a combined spatio-temporal prediction. In this paper we propose an algorithm for reusing the H.264/AVC deblocking filter for spatio-temporal prediction. Reusing this highly op timized filter allows for a very low computational complexity of this prediction mode and an average rate reduction of up to 7.2% can be achieved. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 1049-1052

arXiv:2207.01207 [pdf, ps, other]

Multiple Selection Approximation for Improved Spatio-Temporal Prediction in Video Coding

Authors: Jürgen Seiler, André Kaup

Abstract: In this contribution, a novel spatio-temporal prediction algorithm for video coding is introduced. This algorithm exploits temporal as well as spatial redundancies for effectively predicting the signal to be encoded. To achieve this, the algorithm operates in two stages. Initially, motion compensated prediction is applied on the block being encoded. Afterwards this preliminary temporal prediction… ▽ More In this contribution, a novel spatio-temporal prediction algorithm for video coding is introduced. This algorithm exploits temporal as well as spatial redundancies for effectively predicting the signal to be encoded. To achieve this, the algorithm operates in two stages. Initially, motion compensated prediction is applied on the block being encoded. Afterwards this preliminary temporal prediction is refined by forming a joint model of the initial predictor and the spatially adjacent already transmitted blocks. The novel algorithm is able to outperform earlier refinement algorithms in speed and prediction quality. Compared to pure motion compensated prediction, the mean data rate can be reduced by up to 15% and up to 1.16 dB gain in PSNR can be achieved for the considered sequences. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 886-889

arXiv:2207.01205 [pdf, ps, other]

Fast orthogonality deficiency compensation for improved frequency selective image extrapolation

Authors: Jürgen Seiler, André Kaup

Abstract: The purpose of this paper is to introduce a very efficient algorithm for signal extrapolation. It can widely be used in many applications in image and video communication, e. g. for concealment of block errors caused by transmission errors or for prediction in video coding. The signal extrapolation is performed by extending a signal from a limited number of known samples into areas beyond these sa… ▽ More The purpose of this paper is to introduce a very efficient algorithm for signal extrapolation. It can widely be used in many applications in image and video communication, e. g. for concealment of block errors caused by transmission errors or for prediction in video coding. The signal extrapolation is performed by extending a signal from a limited number of known samples into areas beyond these samples. Therefore a finite set of orthogonal basis functions is used and the known part of the signal is projected onto them. Since the basis functions are not orthogonal regarding the area of the known samples, the projection does not lead to the real portion a basis function has of the signal. The proposed algorithm efficiently copes with this non-orthogonality resulting in very good objective and visual extrapolation results for edges, smooth areas, as well as structured areas. Compared to an existent implementation, this algorithm has a significantly lower computational complexity without any degradation in quality. The processing time can be reduced by a factor larger than 100. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 781-784

arXiv:2207.00238 [pdf, ps, other]

Distributed Parallel Image Signal Extrapolation Framework using Message Passing Interface

Authors: Jürgen Seiler, André Kaup

Abstract: This paper introduces a framework for distributed parallel image signal extrapolation. Since high-quality image signal processing often comes along with a high computational complexity, a parallel execution is desirable. The proposed framework allows for the application of existing image signal extrapolation algorithms without the need to modify them for a parallel processing. The unaltered applic… ▽ More This paper introduces a framework for distributed parallel image signal extrapolation. Since high-quality image signal processing often comes along with a high computational complexity, a parallel execution is desirable. The proposed framework allows for the application of existing image signal extrapolation algorithms without the need to modify them for a parallel processing. The unaltered application of existing algorithms is achieved by dividing input images into overlapping tiles which are distributed to compute nodes via Message Passing Interface. In order to keep the computational overhead low, a novel image tiling algorithm is proposed. Using this algorithm, a nearly optimum tiling is possible at a very small processing time. For showing the efficacy of the framework, it is used for parallelizing a high-complexity extrapolation algorithm. Simulation results show that the proposed framework has no negative impact on extrapolation quality while at the same time offering good scaling behavior on compute clusters. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Journal ref: 24th European Signal Processing Conference, 2016, pp. 81-85

arXiv:2207.00233 [pdf, ps, other]

Optimized and Parallelized Processing Order for Improved Frequency Selective Signal Extrapolation

Authors: Jürgen Seiler, André Kaup

Abstract: In the recent years, multi-core processor designs have found their way into many computing devices. To exploit the capabilities of such devices in the best possible way, signal processing algorithms have to be adapted to an operation in parallel tasks. In this contribution an optimized processing order is proposed for Frequency Selective Extrapolation, a powerful signal extrapolation algorithm. Us… ▽ More In the recent years, multi-core processor designs have found their way into many computing devices. To exploit the capabilities of such devices in the best possible way, signal processing algorithms have to be adapted to an operation in parallel tasks. In this contribution an optimized processing order is proposed for Frequency Selective Extrapolation, a powerful signal extrapolation algorithm. Using this optimized order, the extrapolation can be carried out in parallel. The algorithm scales very good, resulting in an acceleration of a factor of up to 7.7 for an eight core computer. Additionally, the optimized processing order aims at reducing the propagation of extrapolation errors over consecutive losses. Thus, in addition to the acceleration, a visually noticeable improvement in quality of up to 0.5 dB PSNR can be achieved. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Journal ref: 19th European Signal Processing Conference, 2011, pp. 269-273

arXiv:2207.00231 [pdf, ps, other]

Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Coding

Authors: Jürgen Seiler, André Kaup

Abstract: Although wireless and IP-based access to video content gives a new degree of freedom to the viewers, the risk of severe block losses caused by transmission errors is always present. The purpose of this paper is to present a new method for concealing block losses in erroneously received video sequences. For this, a motion compensated data set is generated around the lost block. Based on this aligne… ▽ More Although wireless and IP-based access to video content gives a new degree of freedom to the viewers, the risk of severe block losses caused by transmission errors is always present. The purpose of this paper is to present a new method for concealing block losses in erroneously received video sequences. For this, a motion compensated data set is generated around the lost block. Based on this aligned data set, a model of the signal is created that continues the signal into the lost areas. Since spatial as well as temporal informations are used for the model generation, the proposed method is superior to methods that use either spatial or temporal information for concealment. Furthermore it outperforms current state of the art spatio-temporal concealment algorithms by up to 1.4 dB in PSNR. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Journal ref: 16th European Signal Processing Conference, 2008

arXiv:2205.11202 [pdf, other]

doi 10.1109/ICIP.2015.7351671

Denoising-based image reconstruction from pixels located at non-integer positions

Authors: Ján Koloda, Jürgen Seiler, André Kaup

Abstract: Digital images are commonly represented as regular 2D arrays, so pixels are organized in form of a matrix addressed by integers. However, there are many image processing operations, such as rotation or motion compensation, that produce pixels at non-integer positions. Typically, image reconstruction techniques cannot handle samples at non-integer positions. In this paper, we propose to use triangu… ▽ More Digital images are commonly represented as regular 2D arrays, so pixels are organized in form of a matrix addressed by integers. However, there are many image processing operations, such as rotation or motion compensation, that produce pixels at non-integer positions. Typically, image reconstruction techniques cannot handle samples at non-integer positions. In this paper, we propose to use triangulation-based reconstruction as initial estimate that is later refined by a novel adaptive denoising framework. Simulations reveal that improvements of up to more than 1.8 dB (in terms of PSNR) are achieved with respect to the initial estimate. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2205.10138

ACM Class: I.4.3; I.4.5

Journal ref: 2015 IEEE International Conference on Image Processing (ICIP), 2015, pp. 4565-4569

arXiv:2205.10138 [pdf, other]

doi 10.1109/MMSP.2016.7813344

Reliability-based Mesh-to-Grid Image Reconstruction

Authors: Ján Koloda, Jürgen Seiler, André Kaup

Abstract: This paper presents a novel method for the reconstruction of images from samples located at non-integer positions, called mesh. This is a common scenario for many image processing applications, such as super-resolution, warping or virtual view generation in multi-camera systems. The proposed method relies on a set of initial estimates that are later refined by a new reliability-based content-adapt… ▽ More This paper presents a novel method for the reconstruction of images from samples located at non-integer positions, called mesh. This is a common scenario for many image processing applications, such as super-resolution, warping or virtual view generation in multi-camera systems. The proposed method relies on a set of initial estimates that are later refined by a new reliability-based content-adaptive framework that employs denoising in order to reduce the reconstruction error. The reliability of the initial estimate is computed so stronger denoising is applied to less reliable estimates. The proposed technique can improve the reconstruction quality by more than 2 dB (in terms of PSNR) with respect to the initial estimate and it outperforms the state-of-the-art denoising-based refinement by up to 0.7 dB. △ Less

Submitted 20 May, 2022; originally announced May 2022.

ACM Class: I.4.3; I.4.5

Journal ref: 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 2016, pp. 1-5

arXiv:2205.02646 [pdf, other]

doi 10.1109/TCSII.2022.3160012

Fast Reconstruction of Three-Quarter Sampling Measurements Using Recurrent Local Joint Sparse Deconvolution and Extrapolation

Authors: Simon Grosche, Andy Regensky, Alexander Sinn, Jürgen Seiler, André Kaup

Abstract: Recently, non-regular three-quarter sampling has shown to deliver an increased image quality of image sensors by using differently oriented L-shaped pixels compared to the same number of square pixels. A three-quarter sampling sensor can be understood as a conventional low-resolution sensor where one quadrant of each square pixel is opaque. Subsequent to the measurement, the data can be reconstruc… ▽ More Recently, non-regular three-quarter sampling has shown to deliver an increased image quality of image sensors by using differently oriented L-shaped pixels compared to the same number of square pixels. A three-quarter sampling sensor can be understood as a conventional low-resolution sensor where one quadrant of each square pixel is opaque. Subsequent to the measurement, the data can be reconstructed on a regular grid with twice the resolution in both spatial dimensions using an appropriate reconstruction algorithm. For this reconstruction, local joint sparse deconvolution and extrapolation (L-JSDE) has shown to perform very well. As a disadvantage, L-JSDE requires long computation times of several dozen minutes per megapixel. In this paper, we propose a faster version of L-JSDE called recurrent L-JSDE (RL-JSDE) which is a reformulation of L-JSDE. For reasonable recurrent measurement patterns, RL-JSDE provides significant speedups on both CPU and GPU without sacrificing image quality. Compared to L-JSDE, 20-fold and 733-fold speedups are achieved on CPU and GPU, respectively. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2204.14194 [pdf, ps, other]

doi 10.1155/2011/495394

A Fast Algorithm for Selective Signal Extrapolation with Arbitrary Basis Functions

Authors: Jürgen Seiler, André Kaup

Abstract: Signal extrapolation is an important task in digital signal processing for extending known signals into unknown areas. The Selective Extrapolation is a very effective algorithm to achieve this. Thereby, the extrapolation is obtained by generating a model of the signal to be extrapolated as weighted superposition of basis functions. Unfortunately, this algorithm is computationally very expensive an… ▽ More Signal extrapolation is an important task in digital signal processing for extending known signals into unknown areas. The Selective Extrapolation is a very effective algorithm to achieve this. Thereby, the extrapolation is obtained by generating a model of the signal to be extrapolated as weighted superposition of basis functions. Unfortunately, this algorithm is computationally very expensive and, up to now, efficient implementations exist only for basis function sets that emanate from discrete transforms. Within the scope of this contribution, a novel efficient solution for Selective Extrapolation is presented for utilization with arbitrary basis functions. The proposed algorithm mathematically behaves identically to the original Selective Extrapolation, but is several decades faster. Furthermore, it is able to outperform existent fast transform domain algorithms which are limited to basis function sets that belong to the corresponding transform. With that, the novel algorithm allows for an efficient use of arbitrary basis functions, even if they are only numerically defined. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Journal ref: EURASIP Journal on Advances in Signal Processing, 2011, 495394 (2011)

Showing 1–50 of 72 results for author: Seiler, J