Search | arXiv e-print repository

doi 10.1145/3519021

3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey

Authors: Thiago Lopes Trugillo da Silveira, Paulo Gamarra Lessa Pinto, Jeffri Erwin Murrugarra Llerena, Claudio Rosito Jung

Abstract: This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360… ▽ More This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360$^\circ$, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Published in ACM Computing Surveys

Journal ref: ACM Comput. Surv. 55, 4, Article 68, 2023

arXiv:2312.16626 [pdf, other]

Sorting of Smartphone Components for Recycling Through Convolutional Neural Networks

Authors: Álvaro G. Becker, Marcelo P. Cenci, Thiago L. T. da Silveira, Hugo M. Veit

Abstract: The recycling of waste electrical and electronic equipment is an essential tool in allowing for a circular economy, presenting the potential for significant environmental and economic gain. However, traditional material separation techniques, based on physical and chemical processes, require substantial investment and do not apply to all cases. In this work, we investigate using an image classific… ▽ More The recycling of waste electrical and electronic equipment is an essential tool in allowing for a circular economy, presenting the potential for significant environmental and economic gain. However, traditional material separation techniques, based on physical and chemical processes, require substantial investment and do not apply to all cases. In this work, we investigate using an image classification neural network as a potential means to control an automated material separation process in treating smartphone waste, acting as a more efficient, less costly, and more widely applicable alternative to existing tools. We produced a dataset with 1,127 images of pyrolyzed smartphone components, which was then used to train and assess a VGG-16 image classification model. The model achieved 83.33% accuracy, lending credence to the viability of using such a neural network in material separation. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2310.18324 [pdf, ps, other]

"A Nova Eletricidade: Aplicações, Riscos e Tendências da IA Moderna -- "The New Electricity": Applications, Risks, and Trends in Current AI

Authors: Ana L. C. Bazzan, Anderson R. Tavares, André G. Pereira, Cláudio R. Jung, Jacob Scharcanski, Joel Luis Carbonera, Luís C. Lamb, Mariana Recamonde-Mendoza, Thiago L. T. da Silveira, Viviane Moreira

Abstract: The thought-provoking analogy between AI and electricity, made by computer scientist and entrepreneur Andrew Ng, summarizes the deep transformation that recent advances in Artificial Intelligence (AI) have triggered in the world. This chapter presents an overview of the ever-evolving landscape of AI, written in Portuguese. With no intent to exhaust the subject, we explore the AI applications that… ▽ More The thought-provoking analogy between AI and electricity, made by computer scientist and entrepreneur Andrew Ng, summarizes the deep transformation that recent advances in Artificial Intelligence (AI) have triggered in the world. This chapter presents an overview of the ever-evolving landscape of AI, written in Portuguese. With no intent to exhaust the subject, we explore the AI applications that are redefining sectors of the economy, impacting society and humanity. We analyze the risks that may come along with rapid technological progress and future trends in AI, an area that is on the path to becoming a general-purpose technology, just like electricity, which revolutionized society in the 19th and 20th centuries. A provocativa comparação entre IA e eletricidade, feita pelo cientista da computação e empreendedor Andrew Ng, resume a profunda transformação que os recentes avanços em Inteligência Artificial (IA) têm desencadeado no mundo. Este capítulo apresenta uma visão geral pela paisagem em constante evolução da IA. Sem pretensões de exaurir o assunto, exploramos as aplicações que estão redefinindo setores da economia, impactando a sociedade e a humanidade. Analisamos os riscos que acompanham o rápido progresso tecnológico e as tendências futuras da IA, área que trilha o caminho para se tornar uma tecnologia de propósito geral, assim como a eletricidade, que revolucionou a sociedade dos séculos XIX e XX. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: In Portuguese

MSC Class: 68 ACM Class: I.2

arXiv:2207.11638 [pdf, other]

doi 10.1016/j.image.2017.06.014

DCT Approximations Based on Chen's Factorization

Authors: C. J. Tablada, T. L. T. da Silveira, R. J. Cintra, F. M. Bayer

Abstract: In this paper, two 8-point multiplication-free DCT approximations based on the Chen's factorization are proposed and their fast algorithms are also derived. Both transformations are assessed in terms of computational cost, error energy, and coding gain. Experiments with a JPEG-like image compression scheme are performed and results are compared with competing methods. The proposed low-complexity t… ▽ More In this paper, two 8-point multiplication-free DCT approximations based on the Chen's factorization are proposed and their fast algorithms are also derived. Both transformations are assessed in terms of computational cost, error energy, and coding gain. Experiments with a JPEG-like image compression scheme are performed and results are compared with competing methods. The proposed low-complexity transforms are scaled according to Jridi-Alfalou-Meher algorithm to effect 16- and 32-point approximations. The new sets of transformations are embedded into an HEVC reference software to provide a fully HEVC-compliant video coding scheme. We show that approximate transforms can outperform traditional transforms and state-of-the-art methods at a very low complexity cost. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: 19 pages, 8 figures, 5 tables

Journal ref: Signal Processing: Image Communication, Volume 58, October 2017, Pages 14-23

arXiv:2206.00122 [pdf, other]

doi 10.1109/TCSVT.2021.3134054

A Class of Low-complexity DCT-like Transforms for Image and Video Coding

Authors: T. L. T. da Silveira, D. R. Canterle, D. F. G. Coelho, V. A. Coutinho, F. M. Bayer, R. J. Cintra

Abstract: The discrete cosine transform (DCT) is a relevant tool in signal processing applications, mainly known for its good decorrelation properties. Current image and video coding standards -- such as JPEG and HEVC -- adopt the DCT as a fundamental building block for compression. Recent works have introduced low-complexity approximations for the DCT, which become paramount in applications demanding real-… ▽ More The discrete cosine transform (DCT) is a relevant tool in signal processing applications, mainly known for its good decorrelation properties. Current image and video coding standards -- such as JPEG and HEVC -- adopt the DCT as a fundamental building block for compression. Recent works have introduced low-complexity approximations for the DCT, which become paramount in applications demanding real-time computation and low-power consumption. The design of DCT approximations involves a trade-off between computational complexity and performance. This paper introduces a new multiparametric transform class encompassing the round-off DCT (RDCT) and the modified RDCT (MRDCT), two relevant multiplierless 8-point approximate DCTs. The associated fast algorithm is provided. Four novel orthogonal low-complexity 8-point DCT approximations are obtained by solving a multicriteria optimization problem. The optimal 8-point transforms are scaled to lengths 16 and 32 while keeping the arithmetic complexity low. The proposed methods are assessed by proximity and coding measures with respect to the exact DCT. Image and video coding experiments hardware realization are performed. The novel transforms perform close to or outperform the current state-of-the-art DCT approximations. △ Less

Submitted 8 December, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

Comments: Corrected a typo in the general expression for the diagonal matrix S(a) (Equation 11, Section 3.1). Manuscript has 20 pages, 8 figures, 9 tables

MSC Class: 94A08; 65D15

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, v. 32, n. 7, July 2022

arXiv:2111.14237 [pdf, other]

Data-independent Low-complexity KLT Approximations for Image and Video Coding

Authors: A. P. Radünz, T. L. T. da Silveira, F. M. Bayer, R. J. Cintra

Abstract: The Karhunen-Loève transform (KLT) is often used for data decorrelation and dimensionality reduction. The KLT is able to optimally retain the signal energy in only few transform components, being mathematically suitable for image and video compression. However, in practice, because of its high computational cost and dependence on the input signal, its application in real-time scenarios is preclude… ▽ More The Karhunen-Loève transform (KLT) is often used for data decorrelation and dimensionality reduction. The KLT is able to optimally retain the signal energy in only few transform components, being mathematically suitable for image and video compression. However, in practice, because of its high computational cost and dependence on the input signal, its application in real-time scenarios is precluded. This work proposes low-computational cost approximations for the KLT. We focus on the blocklengths $N \in \{4, 8, 16, 32 \}$ because they are widely employed in image and video coding standards such as JPEG and high efficiency video coding (HEVC). Extensive computational experiments demonstrate the suitability of the proposed low-complexity transforms for image and video compression. △ Less

Submitted 28 November, 2021; originally announced November 2021.

Comments: 18 pages, 15 figures, 4 tables

arXiv:2006.11418 [pdf, ps, other]

doi 10.1016/j.sigpro.2020.107685

A Multiparametric Class of Low-complexity Transforms for Image and Video Coding

Authors: D. R. Canterle, T. L. T. da Silveira, F. M. Bayer, R. J. Cintra

Abstract: Discrete transforms play an important role in many signal processing applications, and low-complexity alternatives for classical transforms became popular in recent years. Particularly, the discrete cosine transform (DCT) has proven to be convenient for data compression, being employed in well-known image and video coding standards such as JPEG, H.264, and the recent high efficiency video coding (… ▽ More Discrete transforms play an important role in many signal processing applications, and low-complexity alternatives for classical transforms became popular in recent years. Particularly, the discrete cosine transform (DCT) has proven to be convenient for data compression, being employed in well-known image and video coding standards such as JPEG, H.264, and the recent high efficiency video coding (HEVC). In this paper, we introduce a new class of low-complexity 8-point DCT approximations based on a series of works published by Bouguezel, Ahmed and Swamy. Also, a multiparametric fast algorithm that encompasses both known and novel transforms is derived. We select the best-performing DCT approximations after solving a multicriteria optimization problem, and submit them to a scaling method for obtaining larger size transforms. We assess these DCT approximations in both JPEG-like image compression and video coding experiments. We show that the optimal DCT approximations present compelling results in terms of coding efficiency and image quality metrics, and require only few addition or bit-shifting operations, being suitable for low-complexity and low-power systems. △ Less

Submitted 19 June, 2020; originally announced June 2020.

Comments: Fixed Figure 1 and typos in the reference list

MSC Class: 94A08; MSC 33F05

Journal ref: Signal Processing, Volume 176, November 2020

arXiv:2002.05544 [pdf, other]

Superpixel Image Classification with Graph Attention Networks

Authors: Pedro H. C. Avelar, Anderson R. Tavares, Thiago L. T. da Silveira, Cláudio R. Jung, Luís C. Lamb

Abstract: This paper presents a methodology for image classification using Graph Neural Network (GNN) models. We transform the input images into region adjacency graphs (RAGs), in which regions are superpixels and edges connect neighboring superpixels. Our experiments suggest that Graph Attention Networks (GATs), which combine graph convolutions with self-attention mechanisms, outperforms other GNN models.… ▽ More This paper presents a methodology for image classification using Graph Neural Network (GNN) models. We transform the input images into region adjacency graphs (RAGs), in which regions are superpixels and edges connect neighboring superpixels. Our experiments suggest that Graph Attention Networks (GATs), which combine graph convolutions with self-attention mechanisms, outperforms other GNN models. Although raw image classifiers perform better than GATs due to information loss during the RAG generation, our methodology opens an interesting avenue of research on deep learning beyond rectangular-gridded images, such as 360-degree field of view panoramas. Traditional convolutional kernels of current state-of-the-art methods cannot handle panoramas, whereas the adapted superpixel algorithms and the resulting region adjacency graphs can naturally feed a GNN, without topology issues. △ Less

Submitted 15 November, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

arXiv:1808.02950 [pdf, other]

doi 10.1007/s11045-018-0601-5

Low-complexity 8-point DCT Approximation Based on Angle Similarity for Image and Video Coding

Authors: R. S. Oliveira, R. J. Cintra, F. M. Bayer, T. L. T. da Silveira, A. Madanayake, A. Leite

Abstract: The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computatio… ▽ More The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computationally efficient approximations to the DCT. The proposed method aims at the minimization of the angle between the rows of the exact DCT matrix and the rows of the approximated transformation matrix. The resulting transformations matrices are orthogonal and have extremely low arithmetic complexity. Considering popular performance measures, one of the proposed transformation matrices outperforms the best competitors in both matrix error and coding capabilities. Practical applications in image and video coding demonstrate the relevance of the proposed transformation. In fact, we show that the proposed approximate DCT can outperform the exact DCT for image encoding under certain compression ratios. The proposed transform and its direct competitors are also physically realized as digital prototype circuits using FPGA technology. △ Less

Submitted 30 January, 2024; v1 submitted 8 August, 2018; originally announced August 2018.

Comments: Corrected typo in formula for the coding gain. 16 pages, 12 figures, 10 tables

Journal ref: Multidimensional Systems and Signal Processing, 1-32, 2018

arXiv:1606.05562 [pdf, ps, other]

doi 10.1007/s11045-014-0291-6

An Orthogonal 16-point Approximate DCT for Image and Video Compression

Authors: T. L. T. da Silveira, F. M. Bayer, R. J. Cintra, S. Kulasekera, A. Madanayake, A. J. Kozakevicius

Abstract: A low-complexity orthogonal multiplierless approximation for the 16-point discrete cosine transform (DCT) was introduced. The proposed method was designed to possess a very low computational cost. A fast algorithm based on matrix factorization was proposed requiring only 60~additions. The proposed architecture outperforms classical and state-of-the-art algorithms when assessed as a tool for image… ▽ More A low-complexity orthogonal multiplierless approximation for the 16-point discrete cosine transform (DCT) was introduced. The proposed method was designed to possess a very low computational cost. A fast algorithm based on matrix factorization was proposed requiring only 60~additions. The proposed architecture outperforms classical and state-of-the-art algorithms when assessed as a tool for image and video compression. Digital VLSI hardware implementations were also proposed being physically realized in FPGA technology and implemented in 45 nm up to synthesis and place-route levels. Additionally, the proposed method was embedded into a high efficiency video coding (HEVC) reference software for actual proof-of-concept. Obtained results show negligible video degradation when compared to Chen DCT algorithm in HEVC. △ Less

Submitted 26 May, 2016; originally announced June 2016.

Comments: 18 pages, 7 figures, 6 tables

Journal ref: Multidimensional Systems and Signal Processing, vol. 27, no. 1, pp. 87-104, 2016

Showing 1–10 of 10 results for author: da Silveira, T L T